Privacy & Security Guide

Local and Private AI Models for Developers

Sometimes code genuinely cannot leave your machine — NDAs, air-gapped environments, regulated data, or simply a preference for zero network dependency. This guide covers the local AI model tooling available today, which models are worth running, and when to use local versus cloud.

Last reviewed: May 26 2026


When Local Makes Sense

Local AI models make sense when one or more of these apply:

If none of these apply, a cloud model is almost always the better choice. Local models are meaningfully less capable than frontier models, and the gap has remained large even as local models have improved.


The Main Tools

Ollama — Developer-first CLI

Ollama is the easiest way to run local models. It handles model download, quantization, and serving through a simple CLI. It also exposes an OpenAI-compatible HTTP API, so tools built for the OpenAI API work with Ollama with a one-line URL change.

# Install (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull and run a model
ollama pull codellama:13b
ollama run codellama:13b "Explain this function: ..."

# Or run as a local API server (http://localhost:11434)
ollama serve

The OpenAI-compatible endpoint means you can point the official OpenAI SDK at Ollama and use local models from existing code:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // required by the library, not used by Ollama
});

const response = await client.chat.completions.create({
  model: 'qwen2.5-coder:7b',
  messages: [{ role: 'user', content: 'Review this code...' }],
});

LM Studio — GUI with model browser

LM Studio provides a desktop GUI for browsing, downloading, and running models. It's a good choice if you want to compare models quickly or if you prefer not working with the CLI. It also exposes a local HTTP server with an OpenAI-compatible API.

Jan — Privacy-first, offline-first

Jan is open-source and explicitly designed for fully offline use. It has a clean UI, a model hub, and works on Mac, Windows, and Linux. Good choice for teams that want a user-facing local AI tool with no cloud dependency at all.


Which Models to Run

Local model quality depends heavily on your hardware. A rule of thumb: you need roughly 6 GB of VRAM to run a 7B-parameter model comfortably, 12–16 GB for 13B, and 24+ GB for larger models. CPU inference works but is significantly slower.

For Code Tasks

For General Reasoning and Chat

Quality Gap Is Real

Even the best local models lag behind Claude Sonnet on complex multi-step reasoning, long-context tasks, and subtle code review. For tasks where quality matters most, local models are a tradeoff, not a replacement.


Connecting VS Code to a Local Model

Most AI coding extensions that support custom API endpoints can point at an Ollama server. The pattern is the same across tools: set the base URL to http://localhost:11434/v1 and the model to whatever you're running.

With Continue

Continue is an open-source VS Code extension built for local model support. Edit ~/.continue/config.json:

{
  "models": [
    {
      "title": "Qwen2.5 Coder (local)",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Coder",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b-base"
  }
}

With Cursor

Cursor supports custom OpenAI-compatible endpoints under Settings → Models → Add model. Set the base URL to your Ollama server and the model name to whatever you've pulled. Cursor's full context-aware features work best with its own models, but local models work for basic chat and inline edits.


Handling Slow Inference

Local model inference is slower than API calls, especially on CPU. A few approaches for managing this:


Private Cloud as a Middle Option

If your constraint is "no third-party SaaS" but you have cloud infrastructure, private deployment is another path:

Private cloud gives you frontier model quality with infrastructure-level control — at higher infrastructure cost than API calls, but often lower total cost than local hardware at scale.


Local Model Decision Guide

Related Guides

Sanitizing Code and Data Before Sending to AI

What to scrub from code, logs, and customer data before using cloud AI — and when to switch to local.

VS Code & Cursor with AI

Configure your editor for AI-first development — including custom model endpoints for local setups.

Back to Home