llama.cpp

by Open Source (Georgi Gerganov)

Overview

C/C++ LLM inference engine

Best for: Efficient local LLM inference; model quantization; embedded AI; research; foundation for other tools

Pricing

Free

Ideal for

ML engineersC/C++ developersembedded systems developersresearchersperformance optimizers

Works with

Outputs

How to use llama.cpp on phones and tablets.

Completely free and open-source (MIT)

Limits: None (hardware-limited)

When to upgrade: N/A (fully free)

Type: local

Offline: Yes

API: Yes

Languages: Multilingual (depends on model)

Integrations: Foundation for Ollama, LM Studio, GPT4All; Python bindings; server mode (OpenAI-compatible)

by Open Source

OpenAI API-compatible local inference

FreeAdvancedLocal & Open-Source AIWeb

Free: Completely free and open-source (MIT)