Overview
Ultra-fast LLM inference on custom hardware
Best for: Ultra-fast LLM inference (LPU chip); real-time chat; voice AI; low-latency applications; prototyping
At a glance
Pricing
- Difficulty
- Beginner-friendly
- Time to productivity
- 30 min
- Privacy
- High
- Learning curve
- Easy
Ideal for
Key capabilities
Works with
- Text
- code
Outputs
- OpenAI-compatible API responses
- streaming (ultra-fast)
Mobile access
How to use Groq on phones and tablets.
- Mobile web: Works in a mobile browser (responsive or dedicated mobile site).
Free Tier
Free: generous free tier with rate limits; multiple models available; no credit card required
Limits: Free: 30 req/min (varies by model); paid: higher limits; Developer: $0; Enterprise: custom
When to upgrade: Higher rate limits; enterprise SLA; dedicated capacity; more models; production guarantees
Technical Details
Alternatives
GPT models, embeddings, whisper, TTS via API
Free: $5 initial credits (new accounts); rate-limited; GPT-4o-mini free tier
Claude models via API
Free: $5 initial credits; rate-limited; all Claude models accessible
Fast inference for open-source models
Free: $5 free credits for new accounts; pay-per-use after
Fast, cheap inference for open models
Free: $1 credit for new accounts; generous rate limits on free models