Chapter 03API Designmedium17 min read

The API Surface

Good API design is taste expressed in URLs, payloads, and status codes. You already know what bad feels like from the consumer side.

Study window · Weeks 09–12

You've consumed thousands of APIs. You know the pain of one that returns different shapes for the same thing, or makes you call it five times to render one screen. This chapter is about being on the other side of that and not inflicting the same pain on the next developer.

REST, done right

REST isn't a protocol. It's a set of conventions for using HTTP consistently. The core ideas:

  • Resources are nouns, named by URLs: /users, /orders/42.
  • Verbs are HTTP methods: GET reads, POST creates, PUT replaces, PATCH partially updates, DELETE removes.
  • GET is safe (no side effects) and idempotent (calling it many times is the same as once). PUT and DELETE are idempotent but not safe. POST is neither.

A clean set of routes for one resource looks like this:

GET /userslist users
POST /userscreate a user
GET /users/42fetch one
PATCH /users/42partial update
DELETE /users/42remove
GET /users/42/ordersorders belonging to user 42
The same resource, different methods, predictable behaviour.

Keep your error responses consistent too. The application/problem+json standard gives you one shape for every error: a title, a status, a human-readable detail, and a type the client can branch on. One shape everywhere beats a different error body per endpoint.

On versioning, the most common option is putting the version in the URL (/v1/users). It's not clever, and that's the point. The real discipline isn't the strategy, it's the promise: once a version is public, you add fields but never remove them. Deprecate, announce a date, then retire.

REST vs GraphQL vs tRPC vs gRPC

REST

The default for public APIs. Wide tooling, cacheable at the HTTP layer, easy to poke at with curl. The downside is that clients often need several requests to assemble related data, or get more than they asked for.

GraphQL

Clients ask for exactly the fields they want from one endpoint. Solves over- and under-fetching when many clients have very different needs. The cost is server-side complexity: query-depth limits, the N+1 problem, per-field authorization, and harder caching.

tRPC

TypeScript only. Your server defines procedures with types, and your client calls them like local functions with full type safety and no code generation. Excellent for a monorepo where one team owns both ends. Not for public APIs, since clients must be TypeScript.

gRPC

A binary protocol over HTTP/2 using Protobuf schemas. Fast and schema-enforced, the standard for service-to-service traffic at scale. Browsers can't speak it directly without extra tooling.

How to pick

REST for public and most internal APIs. tRPC for a TypeScript monorepo with one owning team. GraphQL when many client teams have very different data needs and you can afford the operational overhead. gRPC for service-to-service in a high-performance backend.

Authentication, from the protocol level

Authentication answers "who are you." Two approaches dominate.

Sessions. The server stores a session (user ID, expiry) and hands the client an opaque ID, usually in a cookie. The client sends it back each request; the server looks it up. Easy to revoke (delete the row), but the server has to store and look up every session.

JWT. A self-contained token: header, payload, signature. The payload holds user info, the signature proves the server issued it. The server verifies it without storing anything. That's the strength and the weakness, because a JWT can't be revoked before it expires unless you build a blocklist, which removes the statelessness. The usual pattern is a short-lived access token (15 minutes) plus a long-lived refresh token you can rotate and revoke.

JWT footguns

Never store JWTs in localStorage, since any XSS can read them. Use HttpOnly cookies. Never allow the none algorithm, and never trust the algorithm named in the token's own header. Use a vetted library like jose.

For "log in with Google," you want OAuth 2.1 with OIDC on top for identity. The flow most apps use is Authorization Code with PKCE. Don't build it from scratch. Use a library or a provider (this very site uses Clerk).

Authorization, beyond if-statements

Authorization answers "what are you allowed to do." Three models, increasing in power:

  • RBAC (role-based). Users have roles (admin, editor, viewer), roles have permissions. Simple and fine for many apps. It struggles with "user X can edit document Y because they own it," which is per-resource.
  • ABAC (attribute-based). Decisions weigh attributes of the user, the resource, and the environment. Powerful, but the policy grows fast.
  • ReBAC (relationship-based). Decisions follow a graph: this user is in this team, that team owns this project, that project holds this document. Google's Zanzibar paper popularised it.

Where to put the policy

For most apps, start with consistent helpers in code, like can(user, 'edit', doc), never ad-hoc if-statements scattered around. Add a final safety net in the database with Postgres Row-Level Security so a missed check in code can't leak data. Reach for a policy engine (OPA, Oso) once the logic spreads across many files.

Pagination, properly

Offset / limit

GET /posts?offset=100&limit=20. Easy, and you can jump to any page. But OFFSET 10000 still makes the database compute and throw away 10,000 rows, and if rows are inserted while you page you'll see duplicates or skips.

Cursor / keyset

GET /posts?after=…&limit=20. The client passes a cursor pointing at "where I left off," and the server returns rows after it. Fast (no skipping) and stable (new rows don't shift your page). The trade-off: no jumping to an arbitrary page, and you must sort by a stable, unique key.

Return something like { data: [...], nextCursor: "...", hasMore: true }. Skip the total count unless the client truly needs it, because counting is often slower than paging.

Idempotency

An operation is idempotent if doing it twice has the same effect as doing it once. GET is naturally idempotent. POST is not, unless you make it so.

Why it matters: networks fail. A client posts an order, the server processes it, the response is lost, the client retries. Without idempotency, you just charged the customer twice.

The standard fix is an idempotency key: the client generates a unique ID per operation and sends it in a header. The server stores the result under that key and replays it on a retry instead of running the operation again. Here's the whole idea in a few lines.

Make a retry safe with an idempotency keyrun · edit · saved to you
Loading editor…

You'll almost never get exactly-once delivery over a network. What you get is at-least-once delivery plus idempotent handlers, which adds up to the same thing in practice.

Rate limiting

A single user with a script can hammer your API. A token bucket is the standard defence: each user gets a bucket that refills over time, a request costs a token, and an empty bucket means a 429. For multiple server instances the counter has to live somewhere central, which is where Redis comes in. Here's the bucket itself.

A token-bucket rate limiterrun · edit · saved to you
Loading editor…

Real-time: WebSockets, SSE, long polling

Three transports for pushing data to a client:

Server-Sent Events (SSE)

The server pushes events down one long-lived HTTP response. One-way (server to client), built into the browser via EventSource, and it reconnects on its own. A great fit for streaming updates, AI tokens, and dashboards.

WebSockets

A two-way persistent connection. Reach for it when the client needs to push frequently too: chat, collaborative editing, multiplayer.

Long polling (hold a request open until there's something to send) still works through every proxy, but it's high-overhead and mostly a fallback now.

Default to SSE

Use SSE unless you genuinely need the client pushing to the server constantly. It's simpler, plays well with CDNs and load balancers, and reconnects automatically.

The easy demo isn't the hard part. The hard parts are presence (who's online, usually a heartbeat with a TTL in Redis), reconnection (clients drop and need to catch up on what they missed), and fanout (one event reaching many subscribers, which needs a broker like Redis or NATS once you're on more than one process).

Background jobs and queues

Anything that takes more than a couple hundred milliseconds shouldn't happen in the request handler. Sending email, generating a PDF, calling a slow third party: all of it goes to a queue. The handler validates the input, writes a job, and returns. A worker picks it up later.

Request
Validate
Enqueue job
Return 202
Worker runs it
The request returns fast; the slow work happens out of band.

In Node, BullMQ (Redis-backed) is the common default. Inngest and Temporal offer durable workflows that can retry and even sleep for days.

Jobs fail, so always retry, but carefully: exponential backoff (wait 1s, 2s, 4s) so you don't hammer a struggling service, jitter (a little randomness) so a crowd of retries doesn't all hit at once, and a max attempts cap after which the job goes to a dead-letter queue for a human to look at. And because queues deliver at-least-once, every job handler must be idempotent, the same lesson as the API.

Now go deep

Three deep dives expand the trickiest parts of this chapter: authentication from the protocol up (sessions, JWT, OAuth), idempotency and exactly-once that actually works, and pagination that survives real data.

Test yourself

Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.

QShould creating a resource use PUT or POST?+

POST when the server assigns the ID (POST /users returns {id: 42}). PUT when the client supplies the ID (PUT /users/42) and the operation is idempotent, meaning PUTting the same body twice leaves the same state. PUT also replaces an existing resource wholesale.

QBuilding an internal API for a TypeScript-only team. Pick a flavour.+

tRPC. One language, one repo, end-to-end type safety with no code generation, and procedure calls that feel local. Being TypeScript-only is a non-issue here. REST would work but adds friction without paying for it; GraphQL adds complexity the use case doesn't justify.

QClient reports duplicate charges. Logs show the same request arrived twice. Where did the design fail?+

Missing idempotency. The endpoint should require an Idempotency-Key header, store the result under it for a day, and replay that result on retry. Without it, retries from clients, load balancers, or a double-click duplicate the operation.

QJWT or sessions for a banking app?+

Sessions, or short-lived access tokens backed by a session store. The non-negotiable property is immediate revocation: when a user logs out or you spot compromise, access must stop now. Pure JWTs can't do that without a blocklist that removes their stateless advantage. For banking, revocation wins over scaling.

QWhy does OFFSET 100000 kill performance?+

The database still computes and discards those 100,000 rows before returning your page; it can't skip them. Cursor pagination is roughly constant time wherever you are in the list; offset cost grows with the offset.

QOne event must fan out to 50,000 subscribed users in real time. Architecture?+

WebSockets or SSE at the edge, backed by a pub/sub broker. The originating service publishes the event to a topic. Multiple connection servers (behind a load balancer) subscribe for the users they hold, and push to their clients. Redis pub/sub works at small scale, NATS or Kafka at larger scale. Handle backpressure per client so a slow one doesn't drag the rest.

QDifference between authentication and authorization?+

Authentication is "who are you," verifying identity. Authorization is "what are you allowed to do," checking permission. You can be authenticated but not authorized (403). Confusing the two produces security bugs.

QWhy must every background job handler be idempotent?+

Because queues deliver at-least-once. Network failures, a worker crashing between doing the work and acknowledging it, retries on transient errors: all can run the same job twice. Idempotent handlers absorb that. Non-idempotent ones eventually double-charge or double-send.

QWhere should authorization checks go in a multi-tenant app?+

Layered. Authentication and basic role checks at the edge in middleware. Domain checks in business logic via a shared can() helper. A final net in the database with Postgres RLS so a missed code check can't leak data. Each layer should stand on its own; the layering is defence in depth.

QRate-limit an API to 100 requests per user per minute. How?+

Redis with an atomic increment and a TTL: key like rl:user:42:minute, increment per request, set a 60s expiry on the first one. Over 100 returns 429. For multiple instances the counter must be central, which is exactly what Redis gives you.

Before you leave — how confident are you with this?

Your honest rating shapes when you'll see this again. No grades, no shame.

Comments

to join the discussion.

Loading comments…