Local AI · Hardware Guide

Best Local AI Models for Low-End Hardware

Most guides to running local AI assume you have a powerful GPU and 32GB of RAM. But the majority of people don't. If you have an older laptop, a budget PC, or just 8GB of RAM, you can still run a capable local AI model — you just need to know which ones to pick. This guide covers the best options for low-end hardware, what performance to realistically expect, and how to get started for free.

What counts as low-end hardware?

For the purposes of this guide, low-end means:

8GB RAM or less
No dedicated GPU, or a GPU with less than 4GB VRAM
A CPU-only setup (integrated graphics only)
An older laptop from 2018–2022

If this sounds like your machine, you can still run local AI — you just need smaller, more efficient models. The good news: smaller models have improved dramatically in the last two years and are genuinely useful for most everyday tasks.

Why RAM matters more than GPU for beginners

When you run a local AI model without a GPU, the model loads into your system RAM instead of GPU VRAM. This means:

8GB RAM: Can run models up to around 3B parameters (tight — close other apps first)
16GB RAM: Comfortable for 7B–8B models — the sweet spot for quality vs performance
32GB RAM: Can run 13B–14B models with good speed

CPU-only inference is slower than GPU inference — expect 3–8 tokens per second on a modern CPU, which means responses take a few seconds to complete. For most tasks this is perfectly usable. For real-time conversation it can feel slow but workable.

Recommended models by hardware tier

Tier 1

8GB RAM, no GPU (or integrated graphics only)

Recommended: llama3.2:3b

Size: ~2GB download
Speed: 4–8 tokens/sec on modern CPU
Quality: Good for summarisation, Q&A, classification tasks
Best for: Auto-Sort in AI Chat Importer, simple chat, quick questions

Alternative: phi3:mini (Microsoft Phi-3 Mini)

Size: ~2.3GB download
Particularly strong at reasoning tasks relative to its size
Good fallback if llama3.2:3b feels too slow

To download, open a terminal and run:

ollama pull llama3.2:3b

Tier 2

16GB RAM, no GPU or weak GPU (2–4GB VRAM)

Recommended: llama3.1:8b

Size: ~4.7GB download
Speed: 6–12 tokens/sec on CPU; faster with even a modest GPU
Quality: Noticeably better than 3B models — handles complex instructions well
Best for: Auto-Sort with better accuracy, longer documents, more nuanced tasks

Alternative: mistral:7b

Similar size and speed to llama3.1:8b
Strong general-purpose model, good instruction following

To download:

ollama pull llama3.1:8b

Tier 3

16GB+ RAM with a GPU (4GB+ VRAM)

Recommended: llama3.1:8b (GPU-accelerated)

With GPU acceleration, the same 8B model runs at 20–40 tokens/sec — much more responsive
Ollama detects your GPU automatically and uses it if available

If you have 8GB+ VRAM: llama3.1:8b fits entirely on GPU for maximum speed. Consider llama3.2:11b for even better quality.

Tips for getting the best performance on low-end hardware

Close other apps before running.Local AI models need all the RAM they can get. Close your browser, Slack, and anything else you don't need.
Use smaller context windows. Longer conversations use more RAM. Keep sessions focused.
Let it warm up. The first response after loading a model is always slower — subsequent responses are faster.
Don't use quantization levels below Q4. Very aggressive compression (Q2, Q3) makes models faster but noticeably less accurate. Q4 is the right balance for most use cases.
Try CPU offloading. If you have a small GPU (2–4GB VRAM), Ollama can split the model between GPU and RAM automatically — you get some GPU acceleration without needing the full model to fit.

Using local models with AI Chat Importer

AI Chat Importer's Auto-Sort feature uses a local Ollama model to automatically classify your conversations into folders. It works on low-end hardware — you just need to pick the right model.

Recommended for low-end machines: llama3.2:3b — fast enough to process batches of conversations in a reasonable time, and accurate enough for folder classification.

If you have 16GB RAM: Use llama3.1:8b for noticeably better classification accuracy — this is the recommended default.

Auto-Sort sends conversations to Ollama in batches, so even on a slow CPU the process runs in the background without blocking the app. A batch of 50 conversations takes around 5–10 minutes on a CPU-only machine with the 3B model — slow but fully functional.

See the Ollama setup guide for full installation and configuration instructions.

Ready to try Auto-Sort on your machine?

View Desktop App Read the Ollama setup guide →