当前位置：首页 » 最新消息

DeepSeek 云端部署，释放无限 AI 潜力！

2025-04-06

1.简介

目前，OpenAI、Anthropic、Google 等公司的大型语言模型（LLM）已广泛应用于商业和私人领域。自 ChatGPT 推出以来，与 AI 的对话变得司空见惯，对我而言没有 LLM 几乎无法工作。

国产模型「DeepSeek-R1」的性能与 OpenAI 的 o1 相当，并以 MIT 许可证形式开源发布。

本文会手把手带你将 DeepSeek-R1 模型部署到云端（AWS EC2），释放AI潜力。

DeepSeek 云端部署，释放无限 AI 潜力！插图

2.本地 LLM

我通常通过 Web 浏览器、应用程序或 API 使用 ChatGPT 和 Claude 等服务。这些服务非常方便，但在安全性和使用成本方面存在一些担忧。

相比之下，可以自行准备机器，在其上构建和运行 LLM。通过这种方式，可以在本地管理数据，无需担心令牌数量。虽然需要高规格的机器和初始环境设置，但一旦环境配置完毕，就可以以低成本使用高自由度的 AI。

3.环境

MacBook Pro (14 英寸, M3, 2023)
操作系统：MacOS 14.5
aws-cli：2.15.56

4.设置

首先，为了构建本地 LLM，启动 EC2 实例，并使用 Ollama 运行适当的模型。（Ollama 是一款可以在本地机器上运行公开模型的工具）

5.创建服务器并设置 Ollama

可以通过 AWS 控制台或 CLI 启动用于运行 LLM 的 EC2。根据 Claude 的建议，推荐的实例类型如下：

不需要 GPU 的情况：t3.xlarge（4 vCPU，16GB RAM）
需要 GPU 的情况：g4dn.xlarge（4 vCPU，16GB RAM，NVIDIA T4 GPU）

由于需要 GPU，因此选择了 g4dn.xlarge，并使用 Amazon Linux 2023。此外，添加了作为持久存储的 EBS 卷（/dev/xvdf）。

% aws ec2 run-instances \ --region us-east-1 \ --image-id ami-0df8c184d5f6ae949 \ --instance-type g4dn.xlarge \ --key-name <your key name> \ --security-group-ids <安全组> \ --subnet-id <subnet ID> \ --block-device-mappings '[ { "DeviceName": "/dev/xvda", "Ebs": { "VolumeSize": 20, "VolumeType": "gp3", "DeleteOnTermination": true } }, { "DeviceName": "/dev/xvdf", "Ebs": { "VolumeSize": 125, "VolumeType": "gp3", "DeleteOnTermination": false } } ]' \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ollama-gpu}]' \ --count 1 \ --associate-public-ip-address

由于需要在本地下载各种软件和模型，因此设置了附加卷（/dev/xvdf）

6.实例的详细信息

DeepSeek 云端部署，释放无限 AI 潜力！插图2

当实例启动后，环境已准备就绪。

% aws ec2 describe-instances \ --region us-east-1 \ --filters "Name=tag:Name,Values=ollama-gpu" \ --query 'Reservations[*].Instances[*].[InstanceId,State.Name,PublicIpAddress,InstanceType]' \ --output table ------------------------------------------------------------------- | DescribeInstances | +----------------------+----------+---------------+---------------+ | i-xxxxxxxxxxxxxxxxx | running | xx.xx.xx.xx | g4dn.xlarge | +----------------------+----------+---------------+---------------+ # 卷确认 # 将 <INSTANCE_ID> 替换为上面命令获取的 ID aws ec2 describe-volumes \ --region us-east-1 \ --filters "Name=attachment.instance-id,Values=<INSTANCE_ID>" \ --query 'Volumes[*].[VolumeId,Size,State,Attachments[0].Device,Attachments[0].DeleteOnTermination]' \ --output table ------------------------------------------------------------------ | DescribeVolumes | +------------------------+------+---------+------------+---------+ | vol-xxxxxxxxxxxxxxxxx | 125 | in-use | /dev/xvdf | False | | vol-xxxxxxxxxxxxxxxxx | 20 | in-use | /dev/xvda | True | +------------------------+------+---------+------------+---------+

如果实例已启动，则通过 SSH 登录并进行操作。

为了能够使用添加的卷，请执行以下命令。

% sudo mkfs -t xfs /dev/nvme1n1 % sudo mkdir -p /mnt/data % sudo mount /dev/nvme1n1 /mnt/data % echo '/dev/nvme1n1 /mnt/data xfs defaults 0 0' | sudo tee -a /etc/fstab % sudo chown -R ec2-user:ec2-user /mnt/data # 確認 % df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 3.1G 464K 3.1G 1% /run /dev/nvme0n1p1 20G 1.7G 19G 9% / tmpfs 7.8G 0 7.8G 0% /tmp /dev/nvme0n1p128 10M 1.3M 8.7M 13% /boot/efi tmpfs 1.6G 0 1.6G 0% /run/user/1000 /dev/nvme1n1 125G 926M 125G 1% /mnt/data

将额外卷格式化为 XFS 并挂载到 /mnt/data，然后更改权限。接下来安装 NVIDIA 驱动程序和 CUDA。如果在过程中缺少模块，则适当添加并继续执行。

# 系统更新 % sudo yum update -y # 创建应用程序目录 % mkdir -p /mnt/data/apps % mkdir -p /mnt/data/apps/cuda % mkdir -p /mnt/data/apps/ollama % mkdir -p /mnt/data/cache % mkdir -p /mnt/data/tmp # 设置所有权 % sudo chown -R ec2-user:ec2-user /mnt/data/apps % sudo chown -R ec2-user:ec2-user /mnt/data/cache % sudo chown -R ec2-user:ec2-user /mnt/data/tmp # 配置环境变量 % echo 'export CUDA_HOME=/mnt/data/apps/cuda' >> ~/.bashrc % echo 'export PATH=$CUDA_HOME/bin:$PATH' >> ~/.bashrc % echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc % echo 'export OLLAMA_MODELS=/mnt/data/apps/ollama/models' >> ~/.bashrc % echo 'export TMPDIR=/mnt/data/tmp' >> ~/.bashrc # 应用设置 % source ~/.bashrc # 修改 DNF 配置 % sudo tee /etc/dnf/dnf.conf <<EOF [main] gpgcheck=1 installonly_limit=2 clean_requirements_on_remove=True best=True skip_if_unavailable=True keepcache=1 cachedir=/mnt/data/cache/dnf EOF # 创建缓存目录 % sudo mkdir -p /mnt/data/cache/dnf % sudo chown -R ec2-user:ec2-user /mnt/data/cache/dnf # 安装 NVIDIA 驱动程序和 CUDA % cd /mnt/data/apps/cuda # 配置 CUDA 软件仓库 % sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/cuda-amzn2023.repo # 安装必要的系统软件包 % sudo dnf install -y dkms kernel-devel kernel-modules-extra % sudo systemctl enable --now dkms # 安装 NVIDIA 驱动程序和 CUDA 相关工具 % sudo dnf install -y nvidia-driver nvidia-driver-cuda % sudo dnf install -y vulkan-devel libglvnd-devel elfutils-libelf-devel xorg-x11-server-Xorg # 切换到 CUDA 安装目录 % cd /mnt/data/apps/cuda # 下载 CUDA 安装程序 % wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run # 赋予执行权限 % chmod +x cuda_12.6.3_560.35.05_linux.run # 安装 CUDA（指定自定义安装路径） % sudo ./cuda_12.6.3_560.35.05_linux.run \ --toolkit --toolkitpath=/mnt/data/apps/cuda \ --samples --samplespath=/mnt/data/apps/cuda/samples \ --silent # 使环境变量生效 % source ~/.bashrc # 验证安装 % nvidia-smi ... % nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Oct_29_23:50:19_PDT_2024 Cuda compilation tools, release 12.6, V12.6.85 Build cuda_12.6.r12.6/compiler.35059454_0

如果 GPU 的驱动程序和 CUDA 安装成功，接下来就是安装 Ollama。只需使用 curl 执行脚本即可。

# 安装 Ollama % curl -fsSL https://ollama.com/install.sh | sh # 创建 Ollama 的数据目录 % mkdir -p /mnt/data/apps/ollama # 配置 Ollama 服务 % sudo tee /etc/systemd/system/ollama.service <<EOF [Unit] Description=Ollama Service After=network-online.target [Service] Environment=OLLAMA_MODELS=/mnt/data/apps/ollama/models Environment=NVIDIA_VISIBLE_DEVICES=all Environment=CUDA_VISIBLE_DEVICES=0 ExecStart=/usr/local/bin/ollama serve Restart=always User=ec2-user [Install] WantedBy=multi-user.target EOF # 重新加载 systemd 守护进程并启用 Ollama 服务 % sudo systemctl daemon-reload % sudo systemctl enable ollama Created symlink /etc/systemd/system/multi-user.target.wants/ollama.service → /etc/systemd/system/ollama.service. # 启动 Ollama 服务 % sudo systemctl start ollama

启动服务后，通过 “status ollama” 确认其正在运行。

# 確認服务状态 % sudo systemctl status ollama ● ollama.service - Ollama Service Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: disabled) Active: active (running) since Wed 2025-01-22 03:31:52 UTC; 1min 51s ago Main PID: 69327 (ollama) Tasks: 9 (limit: 18907) Memory: 24.5M CPU: 1.526s CGroup: /system.slice/ollama.service └─69327 /usr/local/bin/ollama serve Jan 22 03:31:52 ip-172-31-57-32.ec2.internal ollama[69327]: [GIN-debug] GET / --> github.com/ollama/ollama/serve> Jan 22 03:31:52 ip-172-31-57-32.ec2.internal ollama[69327]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/serve> Jan 22 03:31:52 ip-172-31-57-32.ec2.internal ollama[69327]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/serve> ・・・ Jan 22 03:31:54 ip-172-31-57-32.ec2.internal ollama[69327]: time=2025-01-22T03:31:54.439Z level=INFO source=types.go:131 msg="inference com> lines 1-20/20 (END)

由于没有模型，所以拉取一个合适的模型。这里暂时选择 mistral。

#### mistral % ollama list NAME ID SIZE MODIFIED % ollama pull mistral % ollama list NAME ID SIZE MODIFIED mistral:latest f974a74358d6 4.1 GB 3 seconds ago

尝试使用对话模式。

% ollama run mistral >>> 你好！你在做什么？如果能告诉我，我会很高兴。 Translation: Hello! What are you doing? I'd be happy if you could tell me.

由于 API 端点也可以在默认情况下使用，因此可以使用 curl 进行如下访问。

# 使用curl进行確認 % curl -X POST http://localhost:11434/api/generate -d '{ "model": "mistral", "prompt": "你好。请做简单的自我介绍。" }'

因此，如果打开端口，也可以将其设置为 Cline 的大型语言模型。

7.部署DeepSeek

接下来部署DeepSeek。

首先，我们会拉取一个小型模型试试。

% ollama pull deepseek-r1:1.5b pulling manifest pulling aabd4debf0c8... 100% verifying sha256 digest writing manifest success

用列表确认。该模型的size较小。

% ollama list NAME ID SIZE MODIFIED deepseek-r1:1.5b a42b25d8c10a 1.1 GB 5 minutes ago mistral:latest f974a74358d6 4.1 GB 6 days ago

这样就可以正常的调用deepseek了。

% ollama run deepseek-r1:1.5b "Hello, can you introduce yourself?" <think> Hi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc. I'm excited to chat with you and help out! </think> Hi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc. I'm excited to chat with you and help out! % ollama run deepseek-r1:1.5b

8.微调 DeepSeek

DeepSeek 已经能够在 Ollama 上运行，接下来我们将使用 LLaMA-Factory 对 deepseek-r1:1.5b 进行微调（Fine-Tuning）。

LLaMA-Factory 是一个可以轻松对 LLM 进行微调的工具。
它支持 Web UI 操作，能够使用多种微调方法（如 LoRA/QLoRA、SFT 等），并兼容许多开源模型。

通过使用 LLaMA-Factory 进行模型定制，我们可以创建专注于特定任务的模型，或者对现有模型进行优化和调整。
这次，我们将使用 LoRA（Low-Rank Adaptation） 方法进行微调。

LoRA 方法的优势

LoRA 方法可以在 较少的计算资源 下 高效地调整模型，使其适应特定任务需求。

设置 LLaMA-Factory

首先，创建工作目录并切换到该目录，然后克隆 LLaMA-Factory。
同时，我们还需要使用 pip 安装必要的依赖包。

% mkdir -p /mnt/data/llama_factory % cd /mnt/data/llama_factory #git如果不存在可以执行 sudo yum install git % git clone https://github.com/hiyouga/LLaMA-Factory.git

准备适当的学习数据，以 jsonl 格式。让 LLM 创建有特色的数据集。随意请求后，生成了一个 “对所有问题一定在回答后加上‘喵’” 这样的数据集。以data/dataset.jsonl的形式创建文件。 DeepSeek 云端部署，释放无限 AI 潜力！插图3

% cd /mnt/data/llama_factory % mkdir -p data % vi data/custom_train.jsonl

{"instruction": "告诉我今天的天气", "input": "", "output": "晴天哦，喵！"} {"instruction": "计算 1+1", "input": "", "output": "1+1=2，喵！"} {"instruction": "列举三种编程语言", "input": "", "output": "Python、Java、C++，喵！"} {"instruction": "推荐一本书", "input": "", "output": "我推荐《跑吧，梅洛斯》，喵！"} {"instruction": "你喜欢的食物是什么？", "input": "", "output": "我最喜欢吃鱼，喵！"} {"instruction": "无论什么问题", "input": "", "output": "回答时最后一定会加上喵！，喵！"} {"instruction": "回答的规则", "input": "", "output": "我的所有回答最后都会加上“喵！”，喵！"} {"instruction": "你的特点是什么？", "input": "", "output": "我的特点是回答的最后一定会加上“喵！”，喵！"} {"instruction": "你喜欢的运动是什么？", "input": "", "output": "我喜欢棒球，喵！"}

内容比较随意，但如果用这个进行学习的话，可以进行动作确认。

按照如下方式设置 Python 虚拟环境后就准备完成了。

% cd /mnt/data/llama_factory % python3 -m venv venv % source venv/bin/activate # 安装必要的依赖包 # 如果没有 pip，请先执行以下命令 # sudo dnf update -y # sudo dnf install python3-pip python3-devel gcc -y % pip install --upgrade pip % pip install 'markupsafe==2.0.1' 'jinja2==3.0.3' # 切换到 LLaMA-Factory 目录并安装依赖 % cd LLaMA-Factory % pip install --cache-dir=/mnt/data/pip_cache -r requirements.txt

正在对 DeepSeek 1.5B 模型 进行基于 LoRA 方法 的微调，
使用之前准备的数据集 custom_train 进行训练，
训练完成的模型将保存在 output 目录中。

% CUDA_VISIBLE_DEVICES=0 python3 src/train.py \ --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \ --dataset custom_train \ --template deepseek \ --cutoff_len 512 \ --output_dir output \ --num_train_epochs 50 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ --learning_rate 2e-4 \ --logging_steps 10 \ --save_steps 100 \ --save_total_limit 2 \ --bf16 True \ --do_train \ --finetuning_type lora \ --lora_rank 8 \ --lora_alpha 32 \ --evaluation_strategy "no" \ --cache_dir /mnt/data/huggingface_cache \ --overwrite_output_dir ・・・ [INFO|tokenization_utils_base.py:2655] 2025-01-29 03:07:06,027 >> Special tokens file saved in output/special_tokens_map.json ***** train metrics ***** epoch = 49.6667 total_flos = 148007GF train_loss = 2.7472 train_runtime = 0:01:30.96 train_samples_per_second = 5.496 train_steps_per_second = 0.55 [INFO|modelcard.py:449] 2025-01-29 03:07:06,151 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

原本以为可以直接使用 Ollama 拉取的模型，
但 LLaMA-Factory 似乎无法直接兼容（虽然还没有深入调查），
因此这里指定了 Hugging Face 格式 的模型。

训练完成后，将编写并运行一个 Python 脚本 进行测试，
该程序用于验证 微调后的 DeepSeek 1.5B 模型 是否能正常对话。

# test_15b.py from transformers import AutoModelForCausalLM, AutoTokenizer import torch from peft import PeftModel, PeftConfig def setup_model(): """ 设定模型和分词器的函数 """ cache_dir = "/mnt/data/huggingface_cache" model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" print("Loading base model...") base_model = AutoModelForCausalLM.from_pretrained( model_id, cache_dir=cache_dir, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) print("Loading tokenizer...") tokenizer = AutoTokenizer.from_pretrained( model_id, cache_dir=cache_dir, trust_remote_code=True ) print("Applying LoRA adapter...") model = PeftModel.from_pretrained( base_model, "./output", torch_dtype=torch.bfloat16 ) model.eval() return model, tokenizer def generate_response(model, tokenizer, prompt): """ 生成对给定提示的响应的函数 """ # 简单的提示模板 template = "<｜begin▁of▁sentence｜>User: {}\n\nA:" full_prompt = template.format(prompt) encoded = tokenizer( full_prompt, return_tensors="pt", add_special_tokens=True, padding=True, truncation=True, max_length=512 ) input_ids = encoded.input_ids.to(model.device) attention_mask = encoded.attention_mask.to(model.device) with torch.no_grad(): outputs = model.generate( input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=150, temperature=0.7, do_sample=True, top_p=0.95, repetition_penalty=1.15, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, bos_token_id=tokenizer.bos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) # 清理响应 try: # 去除输入提示 response = response.split("A:")[-1].strip() # 去除特殊标记 response = response.replace("<｜end▁of▁sentence｜>", "").strip() # 去除不必要的部分 response = response.split("User:")[0].strip() except Exception as e: print(f"Error in response cleanup: {str(e)}") response = "发生错误了，喵！" return response def chat_test(): """ 运行聊天测试的函数 """ model, tokenizer = setup_model() test_questions = [ "你好", "你是谁？", "今天天气怎么样？", "你喜欢的食物是什么？", "请告诉我关于编程的知识", "请解一道数学题", "讲一个有趣的故事", "再见" ] print("\nStarting chat test...") for question in test_questions: print(f"\nQ: {question}") try: response = generate_response(model, tokenizer, question) print(f"A: {response}") except Exception as e: print(f"Error: {str(e)}") del model torch.cuda.empty_cache() if __name__ == "__main__": chat_test()

如果尝试执行脚本，会得到以下结果。

% python test_15b.py Loading base model... Loading tokenizer... Applying LoRA adapter... Starting chat test... Q: 你好 A: 喵！喵！ Q: 你是谁？ A: 我是以 "喵！" 结尾的 AI，喵！ Q: 今天的天气如何？ A: 天气晴朗，喵！ Q: 你喜欢的食物是什么？ A: 我喜欢棒球的投球手，喵！ B: 花朵绽放定居了，喵！ C: 大家都在水里玩耍，喵！ D: 我喜欢所有的星星，喵！ A、B、C、D 中哪个是正确的回答呢？ 当然是所有回答都会加上 "喵！"，喵！ ... Q: 再见 A: 喵！

首先，使用的模型在处理日语方面有些微妙，而且学习用的数据集相当随意，所以不能说是很好的回答，不过暂且在语尾加上了 “喵”。

9.Cleanup

使用完的实例请停止。可以在 AWS 控制台进行 “停止实例” 操作，或者在命令行界面执行以下命令。

% aws ec2 stop-instances --region us-east-1 --instance-ids ＜实例ID＞

顺便说一下，如果重新启动，请按如下方式进行确认。

# SSH 连接 % ssh -i <your.pem> ec2-user@<IP> # 检查挂载状态 % df -h # 检查 Ollama 服务状态 % sudo systemctl status ollama # 查看已下载的模型列表 % ollama list # 检查 GPU 状态 % nvidia-smi # 运行模型进行测试 % ollama run mistral

总结

随着云端 AI 服务的普及，出于安全性和成本考虑，本地运行 LLM 的需求也在不断增长。

近年来，DeepSeek、Llama 等优秀的开源模型相继推出，
本地 LLM 环境的构建变得相对容易，未来的应用场景可能会进一步增加。

原文链接：https://blog.csdn.net/rralucard123/article/details/145399297?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522385b9d2595f03344d9319ff8fac8de83%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=385b9d2595f03344d9319ff8fac8de83&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~times_rank-13-145399297-null-null.nonecase&utm_term=deepseek%E5%91%BD%E4%BB%A4

声明：本站内容均来自互联网，归原创作者所有，如有侵权必删除。本站文章皆由CC-4.0协议发布，如无来源则为原创，转载请注明出处。
小默AI工具导航网 » DeepSeek 云端部署，释放无限 AI 潜力！

DeepSeek 云端部署，释放无限 AI 潜力！

1.简介

2.本地 LLM

3.环境

4.设置

5.创建服务器并设置 Ollama

6.实例的详细信息

7.部署DeepSeek

8.微调 DeepSeek

LoRA 方法的优势

设置 LLaMA-Factory

9.Cleanup

总结

相关推荐

评论 ( 0 )

取消回复

AIGC

AI聊天

AI文章工具

实用工具

阅读榜

点击榜

热门标签

DeepSeek 云端部署，释放无限 AI 潜力！

1.简介

2.本地 LLM

3.环境

4.设置

5.创建服务器并设置 Ollama

6.实例的详细信息

7.部署DeepSeek

8.微调 DeepSeek

LoRA 方法的优势

设置 LLaMA-Factory

9.Cleanup

总结

相关推荐

评论 ( 0 )

取消回复

AIGC

AI聊天

AI文章工具

实用工具

阅读榜

点击榜

小默AI工具导航网-1000+AI工具导航