We just got the biggest open-source model running on Apple Silicon.
Without further ado, here are the results running DeepSeek v3 (671B) on a 8 x M4 Pro 64GB Mac Mini Cluster (512GB total memory):
Model Time-To-First-Token (TTFT) in seconds Tokens-Per-Second (TPS) DeepSeek V3 671B (4-bit) 2.91 5.37 Llama 3.1 405B (4-bit) 29.71 0.88 Llama 3.3 70B (4-bit) 3.14 3.89 Wait, Deepseek has 671B parameters and runs faster than Llama 70B?
Yes!
Let me explain…