Tutorials

Run DeepSeek-R1 Locally: Complete Tutorial for 671B Parameter Model

AI Tools Team - Article Author

AI Tools Team

Featured image for Run DeepSeek-R1 Locally: Complete Tutorial for 671B Parameter Model

DeepSeek-R1 has taken the AI world by storm. With 671 billion parameters and reasoning capabilities that rival GPT-4o, it’s one of the most powerful open-weight models ever released. The best part? You can run it locally, completely free, with zero data leaving your machine.

This tutorial walks you through every step of deploying DeepSeek-R1 on your own hardware.

TL;DR

  • DeepSeek-R1 is a 671B parameter reasoning model available in quantized versions
  • Use Ollama for command-line deployment or LM Studio for a GUI experience
  • Minimum requirements: 16GB RAM for 7B distilled, 32GB+ for larger versions
  • Apple Silicon Macs offer the best consumer experience due to unified memory
  • The model excels at coding, math, and complex reasoning tasks

Why Run DeepSeek-R1 Locally?

Complete Privacy

When you use cloud AI services, your prompts are logged, analyzed, and potentially used for training. Running locally means:

  • No data transmission to external servers
  • No usage tracking or logging
  • Full control over your conversations

Zero Cost

DeepSeek-R1 is open-weight. Once downloaded, you own it forever. No subscriptions, no API fees, no usage limits.

Uncensored Responses

Cloud models are heavily filtered. Local models give you the raw output without corporate guardrails.

Hardware Requirements

DeepSeek-R1 comes in several sizes. Here’s what you need:

Model VersionParametersRAM RequiredBest For
DeepSeek-R1-Distill-Qwen-1.5B1.5B4GBMobile, testing
DeepSeek-R1-Distill-Qwen-7B7B8GBBasic tasks
DeepSeek-R1-Distill-Qwen-14B14B16GBGeneral use
DeepSeek-R1-Distill-Qwen-32B32B24GBAdvanced reasoning
DeepSeek-R1-Distill-Llama-70B70B48GBNear full capability
DeepSeek-R1 (Full)671B400GB+Enterprise/research

Platform Recommendations

Mac (Apple Silicon): The best consumer option. M2/M3/M4 chips with unified memory can run larger models than equivalent PC setups. A MacBook Pro with 32GB RAM can comfortably run the 32B version.

Windows/Linux (NVIDIA GPU): You need a dedicated GPU with sufficient VRAM. RTX 4090 (24GB) handles the 32B model well. For larger models, consider multi-GPU setups.

CPU Only: Possible but painfully slow. Expect 1-2 tokens per second on the 7B model.

Ollama is the simplest way to run local LLMs. It handles model downloads, quantization, and serving automatically.

Step 1: Install Ollama

Mac/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the installer from ollama.com/download.

Step 2: Pull the DeepSeek-R1 Model

Open your terminal and run:

# For the 7B distilled version (recommended starting point)
ollama pull deepseek-r1:7b

# For the 14B version (better quality)
ollama pull deepseek-r1:14b

# For the 32B version (best quality for consumer hardware)
ollama pull deepseek-r1:32b

The download will take several minutes depending on your connection. The 7B model is approximately 4.5GB.

Step 3: Run the Model

Start an interactive chat session:

ollama run deepseek-r1:7b

You’ll see a prompt where you can type your questions:

>>> What is the time complexity of quicksort?

The average time complexity of quicksort is O(n log n), while the worst-case 
complexity is O(n²). The worst case occurs when the pivot selection consistently 
results in highly unbalanced partitions...

Step 4: Configure for Better Performance

Create a custom Modelfile for optimized settings:

# Create a file named Modelfile
cat << 'EOF' > Modelfile
FROM deepseek-r1:14b

PARAMETER temperature 0.7
PARAMETER num_ctx 8192
PARAMETER num_gpu 99

SYSTEM "You are a helpful AI assistant specialized in coding and technical analysis."
EOF

# Create the custom model
ollama create deepseek-custom -f Modelfile

Step 5: Use with API

Ollama exposes a REST API for integration with other tools:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:14b",
  "prompt": "Explain the difference between TCP and UDP",
  "stream": false
}'

Method 2: Using LM Studio (GUI Option)

LM Studio provides a visual interface for those who prefer not to use the command line.

Step 1: Download LM Studio

Get it from lmstudio.ai. Available for Mac, Windows, and Linux.

Step 2: Search for DeepSeek-R1

  1. Open LM Studio
  2. Click the “Search” tab
  3. Type “DeepSeek-R1” in the search bar
  4. Browse available quantizations

Step 3: Choose the Right Quantization

You’ll see options like Q4_K_M, Q5_K_M, Q8_0. Here’s what they mean:

QuantizationQualitySizeSpeed
Q4_K_MGoodSmallestFastest
Q5_K_MBetterMediumMedium
Q8_0BestLargestSlowest

For most users, Q4_K_M offers the best balance.

Step 4: Download and Load

  1. Click the download button next to your chosen model
  2. Wait for the download to complete
  3. Go to the “Chat” tab
  4. Select your downloaded model from the dropdown
  5. Start chatting!

Step 5: Adjust Settings

In the right sidebar, you can configure:

  • Context Length: How much text the model remembers (default: 4096)
  • Temperature: Creativity level (0.0 = deterministic, 1.0 = creative)
  • GPU Layers: How much of the model to load into GPU memory

Method 3: Using vLLM (Advanced)

For production deployments or maximum performance, vLLM offers optimized inference.

Step 1: Install vLLM

pip install vllm

Step 2: Start the Server

python -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
    --tensor-parallel-size 1 \
    --max-model-len 8192

Step 3: Query via OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
    messages=[
        {"role": "user", "content": "Write a Python function to find prime numbers"}
    ]
)

print(response.choices[0].message.content)

Integrating with Your Workflow

VS Code Integration

Use the Continue extension to connect DeepSeek-R1 as your coding assistant:

  1. Install Continue from the VS Code marketplace
  2. Open Continue settings
  3. Add Ollama as a provider:
{
  "models": [
    {
      "title": "DeepSeek-R1 Local",
      "provider": "ollama",
      "model": "deepseek-r1:14b"
    }
  ]
}

Now you have a free, private Copilot alternative. For more on AI coding tools, check our Cursor vs Copilot comparison.

Terminal Integration

Add an alias to your shell configuration:

# Add to ~/.zshrc or ~/.bashrc
alias ask='ollama run deepseek-r1:14b'

Now you can quickly query the model:

ask "How do I reverse a linked list in Python?"

Performance Optimization Tips

1. Use GPU Offloading

Ensure Ollama is using your GPU:

# Check GPU usage
ollama ps

# Force GPU layers
OLLAMA_NUM_GPU=99 ollama run deepseek-r1:14b

2. Adjust Context Length

Longer context uses more memory. If you’re running out of RAM:

ollama run deepseek-r1:14b --num-ctx 2048

3. Monitor Resource Usage

On Mac:

# Watch memory usage
watch -n 1 'memory_pressure'

On Linux:

# Watch GPU usage
watch -n 1 nvidia-smi

4. Use Quantized Models

If performance is slow, try a more aggressive quantization:

ollama pull deepseek-r1:7b-q4_0

Troubleshooting Common Issues

”Out of Memory” Errors

Solution: Use a smaller model or reduce context length.

# Try the 7B version instead of 14B
ollama run deepseek-r1:7b --num-ctx 2048

Slow Generation Speed

Causes:

  • Model too large for available RAM
  • GPU not being utilized
  • Running on CPU only

Solutions:

  1. Check GPU is detected: ollama ps
  2. Use a smaller quantization
  3. Close other memory-intensive applications

Model Not Found

Solution: Pull the model first:

ollama pull deepseek-r1:7b

Garbled or Repetitive Output

Solution: Adjust temperature and repetition penalty:

ollama run deepseek-r1:14b --temperature 0.7 --repeat-penalty 1.1

DeepSeek-R1 vs Other Local Models

How does DeepSeek-R1 compare to alternatives?

ModelReasoningCodingSpeedSize
DeepSeek-R1 (14B)ExcellentExcellentMedium8GB
Llama 3.2 (8B)GoodGoodFast4GB
Mistral (7B)GoodMediumFast4GB
Qwen 2.5 (14B)Very GoodVery GoodMedium8GB

DeepSeek-R1 excels at complex reasoning and coding tasks. For general chat, Llama 3.2 might be faster. For a deeper comparison, see our DeepSeek vs ChatGPT coding benchmark.

What Can You Build?

With DeepSeek-R1 running locally, you can:

  1. Private Code Assistant: Review and generate code without sending proprietary code to the cloud
  2. Document Analyzer: Process sensitive documents locally
  3. Research Assistant: Analyze papers and data privately
  4. Learning Tool: Practice coding with instant feedback

For more ideas on local AI setups, check our guide to running local LLMs.

Conclusion

Running DeepSeek-R1 locally is easier than ever. With tools like Ollama and LM Studio, you can have a GPT-4 class model running on your laptop in minutes.

The key steps:

  1. Choose your method: Ollama for CLI, LM Studio for GUI
  2. Select the right model size: Start with 7B, upgrade as needed
  3. Optimize for your hardware: Adjust context length and quantization
  4. Integrate into your workflow: VS Code, terminal aliases, or API

Privacy, zero cost, and full control. That’s the promise of local AI, and DeepSeek-R1 delivers.


Ready to explore more AI tools? Check out our AI Agents Guide for 2026 to see how local models can power autonomous assistants.

#deepseek#local-llm#ollama#lm-studio#privacy#tutorial
AI Tools Team - Author Profile Photo

About AI Tools Team

The official editorial team of AI Tools.