以下是为本地部署的DeepSeek模型设计的3个API接口代码示例,涵盖不同场景和使用方式:
# 示例1:基础FastAPI接口服务
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
# 加载本地模型(假设模型已下载到指定路径)
MODEL_PATH = "/path/to/your/deepseek-model"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH)
app = FastAPI()
class Query(BaseModel):
prompt: str
max_length: int = 512
temperature: float = 0.7
@app.post("/generate")
async def generate_text(query: Query):
inputs = tokenizer(query.prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_length=query.max_length,
temperature=query.temperature,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return {"response": response}
# 启动命令:uvicorn api:app –reload –port 8000
# 示例2:带流式输出的WebSocket接口
from fastapi import FastAPI, WebSocket
from transformers import TextIteratorStreamer
from threading import Thread
app = FastAPI()
@app.websocket("/ws-generate")
async def websocket_generate(websocket: WebSocket):
await websocket.accept()
try:
data = await websocket.receive_json()
inputs = tokenizer(data["prompt"], return_tensors="pt").to("cuda")
streamer = TextIteratorStreamer(tokenizer)
generation_kwargs = dict(
inputs,
streamer=streamer,
max_new_tokens=512,
temperature=0.7
)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
for token in streamer:
await websocket.send_text(token)
except Exception as e:
await websocket.close(code=1011)
```
```python
# 示例3:带身份验证的REST API(生产环境建议)
from fastapi import FastAPI, Depends, HTTPException
from fastapi.security import APIKeyHeader
API_KEYS = {"your-secret-key"}
security = APIKeyHeader(name="X-API-Key")
def api_key_auth(api_key: str = Depends(security)):
if api_key not in API_KEYS:
raise HTTPException(status_code=401, detail="Invalid API Key")
return api_key
@app.post("/secure-generate")
async def secure_generate(query: Query, _ = Depends(api_key_auth)):
# 同示例1的生成逻辑
return generate_text(query)
使用方式:
1. 安装依赖:`pip install fastapi uvicorn[standard] transformers torch`
2. 将模型路径替换为实际下载的模型位置
3. 按需调整生成参数(max_length, temperature等)
4. 启动服务后可通过以下方式测试:
– HTTP请求:`curl -X POST http://localhost:8000/generate -H "Content-Type: application/json" -d '{"prompt":"如何学习人工智能?"}'`
– WebSocket测试工具:使用websocat或Postman
– 带认证请求:添加Header `X-API-Key: your-secret-key`
注意事项:
1. 确保本地有足够显存/内存加载模型
2. 生产部署建议添加:
– 请求速率限制
– 输入内容过滤
– 更完善的错误处理
3. 模型加载建议使用量化版本减小资源消耗
4. 根据实际硬件情况调整batch_size和max_length参数
可以根据具体模型类型(对话/生成/嵌入)调整输入输出格式,如需更具体的实现细节请提供模型类型信息。
原文链接:https://blog.csdn.net/sanjina/article/details/145628704?ops_request_misc=%257B%2522request%255Fid%2522%253A%252292cdf893db0455a8db40cdc64431da5e%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=92cdf893db0455a8db40cdc64431da5e&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-16-145628704-null-null.nonecase&utm_term=deepseek%E9%83%A8%E7%BD%B2