You Already Use APIs
When you type a message in Claude and get a response, here's what actually happens:
That's an API in action. The chat interface you see is just the front end — the pretty part. The actual AI work happens on a remote server, and the API is the bridge between the two.
This same pattern applies to every AI tool you use. ChatGPT, Gemini, Midjourney, AI features in Notion or Canva — they all work this way. A user interface sends a request to an AI model running on someone else's servers, and the response comes back.
An API is a way for one piece of software to talk to another. Your browser talks to Anthropic's servers. A mobile app talks to OpenAI's servers. A website talks to a translation service. The conversation follows a specific format — "I'm sending you this, please send me back that" — and both sides agree on the format in advance.
Why This Matters
Understanding APIs matters for a practical reason: the AI doesn't live inside the app. It lives on someone else's server, and your app just talks to it.
This has real consequences:
- You need an internet connection. If you're offline, the API call can't reach the server. Most AI features stop working without internet.
- There's a cost. Every API call costs money. When you use Claude.ai on a free or paid plan, Anthropic covers the cost. When developers build apps that use the API directly, they pay per request.
- There's a delay. Your message travels to a server, gets processed, and the response travels back. That's why AI responses take a few seconds — it's not "thinking time," it's mostly network time plus processing.
- Your data leaves your device. When you send a message to an AI API, that text travels to an external server. This is why companies care about AI data policies — sensitive information is leaving the building.
- The AI can be updated without you knowing. The model on the server can be changed, improved, or replaced. You might notice responses getting better (or different) over time — that's a model update on the server side.
What an API Call Looks Like
When a developer builds an app that uses AI, the API call is a structured message. It's not a chat conversation — it's a precise request with a precise format.
Here's a simplified version of what happens when an app asks Claude to summarize something:
To: api.anthropic.com
Method: POST
{
"model": "claude-sonnet-4-20250514",
"messages": [
{
"role": "user",
"content": "Summarize this in 2 sentences: [article text here]"
}
]
}
The request specifies which model to use, what role the message has (user, in this case), and the actual content. It also includes an API key (not shown) that identifies who's making the request and handles billing.
{
"content": [
{
"type": "text",
"text": "The article discusses how remote work has changed team
communication patterns. Most teams now rely on asynchronous
tools rather than real-time meetings."
}
],
"usage": {
"input_tokens": 1247,
"output_tokens": 38
}
}
The response includes the AI's answer and a token count — which is how billing works. More text in (your prompt) and more text out (the response) means higher cost.
If you're not a developer, you don't need to write these requests yourself. But knowing this structure explains a lot about how AI tools work — and why they cost what they cost.
Tokens and Pricing
AI APIs charge by "tokens" — roughly, pieces of words. A token is about 4 characters or three-quarters of a word. The sentence "How does photosynthesis work?" is about 7 tokens.
Pricing works in two directions:
- Input tokens — What you send to the AI. Your prompt, your context, any code or documents you paste. The more context you provide, the more input tokens you use.
- Output tokens — What the AI sends back. A short answer costs less than a long one. This is why some tools limit response length.
This is why AI subscriptions exist. When you pay $20/month for Claude Pro, you're essentially pre-paying for a pool of API calls. Anthropic handles the per-token billing on their end so you don't have to think about it.
For developers building apps, the economics matter directly. An app that sends large documents to AI for summarization uses a lot of input tokens. An app that generates long responses uses a lot of output tokens. Designing prompts efficiently — getting the same quality result with fewer tokens — is a real cost optimization skill.
Different AI APIs for Different Tasks
Not every AI API does the same thing. Different providers offer different capabilities:
- Text generation — Claude (Anthropic), GPT (OpenAI), Gemini (Google). You send text, you get text back. Powers chatbots, summarizers, code generators, and writing tools.
- Image generation — DALL-E (OpenAI), Stable Diffusion (Stability AI). You send a text description, you get an image back.
- Speech to text — Whisper (OpenAI). You send audio, you get a text transcription back.
- Text to speech — ElevenLabs, OpenAI TTS. You send text, you get audio back.
- Embeddings — A more technical API that converts text into numbers. Used for search, recommendation systems, and finding similar content.
Many modern apps combine multiple APIs. A meeting notes app might use speech-to-text to transcribe the audio, then a text generation API to summarize the transcription, then an embedding API to make the notes searchable. Three API calls, three different capabilities, one user-facing feature.
What This Means for Things You Build
If you're building with AI — whether as a developer or a vibe coder — understanding APIs changes how you think about what's possible:
- You can add AI to anything. A website, a spreadsheet workflow, a Slack bot, an email automation. If the tool can make an HTTP request, it can talk to an AI API.
- You don't need your own AI model. The models are already built and running on someone else's servers. You just need to send the right request.
- Context matters for cost and quality. Sending your entire codebase as context produces better results but costs more tokens. Sending a focused snippet is cheaper and often just as effective. This trade-off shows up in every AI-powered tool.
- Reliability is not guaranteed. APIs can be slow, return errors, or go down temporarily. Any app that depends on an AI API needs to handle the case where the API doesn't respond. This is why AI features sometimes show "something went wrong" messages.
- You can switch providers. If you're using OpenAI's API and want to try Claude, the structure is similar. The exact format differs, but the concept is the same: send a message, get a response. This is why many AI tools let you choose which model to use.
Every AI feature you've ever used is an API call: your input goes to a remote server, a model processes it, and the result comes back. The chat interface, the mobile app, the browser extension — they're all just different front ends for the same pattern. Understanding this one concept demystifies the entire AI tool landscape.