DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

AstonJ · 29 January 2025 18:38

We just got the biggest open-source model running on Apple Silicon.

Without further ado, here are the results running DeepSeek v3 (671B) on a 8 x M4 Pro 64GB Mac Mini Cluster (512GB total memory):

Model Time-To-First-Token (TTFT) in seconds Tokens-Per-Second (TPS)

DeepSeek V3 671B (4-bit) 2.91 5.37

Llama 3.1 405B (4-bit) 29.71 0.88

Llama 3.3 70B (4-bit) 3.14 3.89

Wait, Deepseek has 671B parameters and runs faster than Llama 70B?

Yes!

Let me explain…

Model	Time-To-First-Token (TTFT) in seconds	Tokens-Per-Second (TPS)
DeepSeek V3 671B (4-bit)	2.91	5.37
Llama 3.1 405B (4-bit)	29.71	0.88
Llama 3.3 70B (4-bit)	3.14	3.89