Skip to main content

Running Ollama without a GPU

tl;dr You can run Ollama on an older device, but the response will be slow and/or low quality.

I have successfully run Ollama with a new Macbook M2 and a mid-range gaming PC, but I wanted to experiment using an older computer. My personal laptop is a 2017 Lenovo Yoga with Ubuntu and no graphics card. Here are some specs:

  • CPU: Intel i5-7200U CPU @ 2.50GHz
  • RAM: 4GB
  • Memory: 128GB SSD

Following the setup instructions for Linux, Ollama installed fine but printed the following:

WARNING: No NVIDIA GPU detected. Ollama will run in CPU-only mode.

This was foreshadowing for everything to follow.

I decided to run mistrel and sent the model a prompt by the terminal. The response was streaming in at about one character every four seconds, and my computer was obviously struggling with the task. I held on for around three minutes but then had to quit when the response was less than on sentence.

That seemed conclusive enough, but I decided to try with a smaller model. I ran tinyllama and was able to stream responses at a normal rate. However, the quality of the responses was terrible. I don’t have experience with this model on a more powerful machine, so I can’t blame the model. But the responses were so bad that I wouldn’t use it for anything on this device.

These results were disappointing, but the reality is that running LLMs is resource intensive. So if you want to try running LLMs locally with Ollama, make sure you have a modern CPU or a dedicated GPU.