Best Local AI Models for Low-End Hardware
Most guides to running local AI assume you have a powerful GPU and 32GB of RAM. But the majority of people don't. If you have an older laptop, a budget PC, or just 8GB of RAM, you can still run a capable local AI model — you just need to know which ones to pick. This guide covers the best options for low-end hardware, what performance to realistically expect, and how to get started for free.
What counts as low-end hardware?
For the purposes of this guide, low-end means:
- 8GB RAM or less
- No dedicated GPU, or a GPU with less than 4GB VRAM
- A CPU-only setup (integrated graphics only)
- An older laptop from 2018–2022
If this sounds like your machine, you can still run local AI — you just need smaller, more efficient models. The good news: smaller models have improved dramatically in the last two years and are genuinely useful for most everyday tasks.
Why RAM matters more than GPU for beginners
When you run a local AI model without a GPU, the model loads into your system RAM instead of GPU VRAM. This means:
- 8GB RAM: Can run models up to around 3B parameters (tight — close other apps first)
- 16GB RAM: Comfortable for 7B–8B models — the sweet spot for quality vs performance
- 32GB RAM: Can run 13B–14B models with good speed
CPU-only inference is slower than GPU inference — expect 3–8 tokens per second on a modern CPU, which means responses take a few seconds to complete. For most tasks this is perfectly usable. For real-time conversation it can feel slow but workable.
Recommended models by hardware tier
8GB RAM, no GPU (or integrated graphics only)
Recommended: llama3.2:3b
- Size: ~2GB download
- Speed: 4–8 tokens/sec on modern CPU
- Quality: Good for summarisation, Q&A, classification tasks
- Best for: Auto-Sort in AI Chat Importer, simple chat, quick questions
Alternative: phi3:mini (Microsoft Phi-3 Mini)
- Size: ~2.3GB download
- Particularly strong at reasoning tasks relative to its size
- Good fallback if llama3.2:3b feels too slow
To download, open a terminal and run:
ollama pull llama3.2:3b16GB RAM, no GPU or weak GPU (2–4GB VRAM)
Recommended: llama3.1:8b
- Size: ~4.7GB download
- Speed: 6–12 tokens/sec on CPU; faster with even a modest GPU
- Quality: Noticeably better than 3B models — handles complex instructions well
- Best for: Auto-Sort with better accuracy, longer documents, more nuanced tasks
Alternative: mistral:7b
- Similar size and speed to llama3.1:8b
- Strong general-purpose model, good instruction following
To download:
ollama pull llama3.1:8b16GB+ RAM with a GPU (4GB+ VRAM)
Recommended: llama3.1:8b (GPU-accelerated)
- With GPU acceleration, the same 8B model runs at 20–40 tokens/sec — much more responsive
- Ollama detects your GPU automatically and uses it if available
If you have 8GB+ VRAM: llama3.1:8b fits entirely on GPU for maximum speed. Consider llama3.2:11b for even better quality.
Tips for getting the best performance on low-end hardware
- Close other apps before running. Local AI models need all the RAM they can get. Close your browser, Slack, and anything else you don't need.
- Use smaller context windows. Longer conversations use more RAM. Keep sessions focused.
- Let it warm up. The first response after loading a model is always slower — subsequent responses are faster.
- Don't use quantization levels below Q4. Very aggressive compression (Q2, Q3) makes models faster but noticeably less accurate. Q4 is the right balance for most use cases.
- Try CPU offloading. If you have a small GPU (2–4GB VRAM), Ollama can split the model between GPU and RAM automatically — you get some GPU acceleration without needing the full model to fit.
Using local models with AI Chat Importer
AI Chat Importer's Auto-Sort feature uses a local Ollama model to automatically classify your conversations into folders. It works on low-end hardware — you just need to pick the right model.
Recommended for low-end machines: llama3.2:3b — fast enough to process batches of conversations in a reasonable time, and accurate enough for folder classification.
If you have 16GB RAM: Use llama3.1:8b for noticeably better classification accuracy — this is the recommended default.
Auto-Sort sends conversations to Ollama in batches, so even on a slow CPU the process runs in the background without blocking the app. A batch of 50 conversations takes around 5–10 minutes on a CPU-only machine with the 3B model — slow but fully functional.
See the Ollama setup guide for full installation and configuration instructions.