Skip to main content
AI Chat Importer
Local AI · Hardware Guide

Best Local AI Models for Low-End Hardware

Most guides to running local AI assume you have a powerful GPU and 32GB of RAM. But the majority of people don't. If you have an older laptop, a budget PC, or just 8GB of RAM, you can still run a capable local AI model — you just need to know which ones to pick. This guide covers the best options for low-end hardware, what performance to realistically expect, and how to get started for free.


What counts as low-end hardware?

For the purposes of this guide, low-end means:

  • 8GB RAM or less
  • No dedicated GPU, or a GPU with less than 4GB VRAM
  • A CPU-only setup (integrated graphics only)
  • An older laptop from 2018–2022

If this sounds like your machine, you can still run local AI — you just need smaller, more efficient models. The good news: smaller models have improved dramatically in the last two years and are genuinely useful for most everyday tasks.

Why RAM matters more than GPU for beginners

When you run a local AI model without a GPU, the model loads into your system RAM instead of GPU VRAM. This means:

  • 8GB RAM: Can run models up to around 3B parameters (tight — close other apps first)
  • 16GB RAM: Comfortable for 7B–8B models — the sweet spot for quality vs performance
  • 32GB RAM: Can run 13B–14B models with good speed

CPU-only inference is slower than GPU inference — expect 3–8 tokens per second on a modern CPU, which means responses take a few seconds to complete. For most tasks this is perfectly usable. For real-time conversation it can feel slow but workable.

Recommended models by hardware tier

Tier 1

8GB RAM, no GPU (or integrated graphics only)

Recommended: llama3.2:3b

  • Size: ~2GB download
  • Speed: 4–8 tokens/sec on modern CPU
  • Quality: Good for summarisation, Q&A, classification tasks
  • Best for: Auto-Sort in AI Chat Importer, simple chat, quick questions

Alternative: phi3:mini (Microsoft Phi-3 Mini)

  • Size: ~2.3GB download
  • Particularly strong at reasoning tasks relative to its size
  • Good fallback if llama3.2:3b feels too slow

To download, open a terminal and run:

ollama pull llama3.2:3b
Tier 2

16GB RAM, no GPU or weak GPU (2–4GB VRAM)

Recommended: llama3.1:8b

  • Size: ~4.7GB download
  • Speed: 6–12 tokens/sec on CPU; faster with even a modest GPU
  • Quality: Noticeably better than 3B models — handles complex instructions well
  • Best for: Auto-Sort with better accuracy, longer documents, more nuanced tasks

Alternative: mistral:7b

  • Similar size and speed to llama3.1:8b
  • Strong general-purpose model, good instruction following

To download:

ollama pull llama3.1:8b
Tier 3

16GB+ RAM with a GPU (4GB+ VRAM)

Recommended: llama3.1:8b (GPU-accelerated)

  • With GPU acceleration, the same 8B model runs at 20–40 tokens/sec — much more responsive
  • Ollama detects your GPU automatically and uses it if available

If you have 8GB+ VRAM: llama3.1:8b fits entirely on GPU for maximum speed. Consider llama3.2:11b for even better quality.

Tips for getting the best performance on low-end hardware

  • Close other apps before running. Local AI models need all the RAM they can get. Close your browser, Slack, and anything else you don't need.
  • Use smaller context windows. Longer conversations use more RAM. Keep sessions focused.
  • Let it warm up. The first response after loading a model is always slower — subsequent responses are faster.
  • Don't use quantization levels below Q4. Very aggressive compression (Q2, Q3) makes models faster but noticeably less accurate. Q4 is the right balance for most use cases.
  • Try CPU offloading. If you have a small GPU (2–4GB VRAM), Ollama can split the model between GPU and RAM automatically — you get some GPU acceleration without needing the full model to fit.

Using local models with AI Chat Importer

AI Chat Importer's Auto-Sort feature uses a local Ollama model to automatically classify your conversations into folders. It works on low-end hardware — you just need to pick the right model.

Recommended for low-end machines: llama3.2:3b — fast enough to process batches of conversations in a reasonable time, and accurate enough for folder classification.

If you have 16GB RAM: Use llama3.1:8b for noticeably better classification accuracy — this is the recommended default.

Auto-Sort sends conversations to Ollama in batches, so even on a slow CPU the process runs in the background without blocking the app. A batch of 50 conversations takes around 5–10 minutes on a CPU-only machine with the 3B model — slow but fully functional.

See the Ollama setup guide for full installation and configuration instructions.

Ready to try Auto-Sort on your machine?