This repository will guide you through the process of importing and using the distilled DeepSeek-R1-Distill-Llama-8B model based on Llama-3.1-8B as the base model from Hugging Face on Amazon Bedrock.
To learn more about DeepSeek-R1, please visit DeepSeek.
For detailed paper walkthrough on DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, check out this paper read on DeepSeek-R1 by Umar Jamil
- AWS Account with Bedrock access
- Python environment with the following packages:
huggingface_hub
boto3
-
Download Model Weights
- The model weights are downloaded from Hugging Face Hub
- Model used:
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-
Upload to S3
- Model weights are uploaded to an S3 bucket
- Target path:
s3://[your-bucket]/models/DeepSeek-R1-Distill-Llama-8B/
-
Import to Amazon Bedrock
- Navigate to AWS Console > Bedrock > Foundation Models > Imported Models
- Click "Import Model"
- Name the model (e.g.,
my-DeepSeek-R1-Distill-Llama-8B
) - Provide the S3 location of the model weights
- Wait for successful import
- Note down the Model ARN for API calls
Run the Jupyter notebook deepseek-bedrock.ipynb
for detailed implementation.
DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.