How to: Run DeepSeek on Mac, Windows, and Linux!

This is a very quick guide, you just need to:

  • Download LM Studio: https://lmstudio.ai/
  • Click on search
  • Type DeepSeek, then select the one you want* and click download
  • It should then show in your models

*When deciding which one to choose look at Params - the bigger the model the beefier the machine you’ll need. I downloaded the 7B model and it runs fine on my Mac.

  • You can then start a new chat and at the top select DeepSeek.

That’s it :023:

5 Likes

I was able to run the 32B model on a 4090. It was pretty responsive. I would love to get my hands on a bunch of Mac Mini M4 Pro to compare the 4090 with a cluster of these.

1 Like

Nice! I managed to run the 32B model too, albeit slower. Not really noticed much difference in quality of output tho, wbu?

Like you I’d be very interested in seeing how well the Mac mini cluster works!

I’ll also be curious how well the new Mac Studios perform when they come out this year - I expect Apple might be testing them already!!

I didn’t tried the smaller models, so I can’t really compare these with the 32B. But I tried the 70B model also and it was slightly better than the 32B, but it was too slow for me to be able to run it for anything serious. (I suppose a 64GB Mac Mini M4 Max should be able to run the 70B model.)

1 Like

Try the 7B model too if you can Paul, be curious how the three of them differ on your 4090 in terms of speed/results…

Since I’ve never run an open-source AI model on my local machine, which model will run smoothly on my M1 MacBook Air without any issues?

1 Like

Try the 7B model :023:

1 Like

Thank you! :slight_smile:

1 Like

@AstonJ

Prompt: how can I derive a grammar for expressions like 2+3, 4-5 and similar ?

Timings and memory usage for all models that can fit my 4090 GPU:

deepseek-r1:1.5b

total duration:       3.234323324s
load duration:        9.759262ms
prompt eval count:    24 token(s)
prompt eval duration: 11ms
prompt eval rate:     2181.82 tokens/s
eval count:           1050 token(s)
eval duration:        3.212s
eval rate:            326.90 tokens/s
memory usage:	1973MiB / 24564MiB

deepseek-r1:7b

total duration:       7.626323748s
load duration:        9.493629ms
prompt eval count:    24 token(s)
prompt eval duration: 17ms
prompt eval rate:     1411.76 tokens/s
eval count:           1113 token(s)
eval duration:        7.598s
eval rate:            146.49 tokens/s
memory usage:	5625MiB / 24564MiB

deepseek-r1:8b

total duration:       15.457397942s
load duration:        10.024962ms
prompt eval count:    24 token(s)
prompt eval duration: 92ms
prompt eval rate:     260.87 tokens/s
eval count:           2131 token(s)
eval duration:        15.354s
eval rate:            138.79 tokens/s
memory usage:	6507MiB / 24564MiB

deepseek-r1:14b

total duration:       14.432098376s
load duration:        9.027491ms
prompt eval count:    24 token(s)
prompt eval duration: 104ms
prompt eval rate:     230.77 tokens/s
eval count:           1135 token(s)
eval duration:        14.317s
eval rate:            79.28 tokens/s
memory usage:	10941MiB / 24564MiB

deepseek-r1:32b

total duration:       55.242498456s
load duration:        9.446347ms
prompt eval count:    25 token(s)
prompt eval duration: 124ms
prompt eval rate:     201.61 tokens/s
eval count:           2173 token(s)
eval duration:        55.108s
eval rate:            39.43 tokens/s
memory usage:	21853MiB / 24564MiB

I don’t have a benchmark to compare or even a way to measure the accuracy of the answers, this is why I’ve included only the timings and memory usage, at least you can get an idea of the resources used by each model.

As mentioned, the 70b model does not fit my GPU, so I didn’t measure it.

1 Like

This is just what I was looking for. Thank you!

1 Like

This is quite interesting, I will definitely try this on my Linux machine.

1 Like