You shipped your app. It lives on one server: a computer in a data centre running your code, talking to a database. A few hundred people use it and everything is fast. Life is good.
Then more people show up. The site gets slow. Requests start timing out. Something has to change.
This is the moment most frontend developers feel out of their depth, because "scaling" sounds like a topic with its own textbooks. It mostly isn't. It's a handful of moves you make in order, each one to fix a specific pain the last one couldn't. Let's walk the whole path.
Start with one server
Here's where everyone begins.
A request comes in, your server runs some code, reads or writes the database, sends a response back. One machine does all of it. Don't be embarrassed by this setup. Plenty of real businesses run on it for years. You only move off it when it actually hurts.
The pain, when it comes, looks like this: the CPU sits at 100%, memory fills up, responses crawl. The server is doing more work than it can keep up with.
Move one: get a bigger server
The first fix is the laziest, and that's a compliment. If the machine is maxed out, rent a bigger machine. More CPU cores, more memory. Move your app onto it. Done.
This is vertical scaling, also called scaling up: same number of servers, each one beefier.
Try the boring fix first
Vertical scaling needs almost zero new code. No new concepts, no new failure modes. For a huge number of apps, renting a bigger box buys you another year of growth, and that year is cheap compared to the engineering time the fancier options cost. Reach for the simple lever before the clever one.
So why isn't this the end of the article? Two reasons.
- There's a ceiling. You can buy a bigger machine, then a bigger one, but eventually you hit the largest box your provider sells. You can't scale up forever.
- It's one machine. If that single server crashes, restarts, or its data centre loses power, your whole app is down. One machine means one thing that can take everything offline.
That second point is the real motivator. Even if a giant server could handle your traffic, betting your entire business on one computer never failing is a bad bet.
Move two: add a second server
So instead of one big server, you run two smaller ones, each with a full copy of your app. If one dies, the other keeps serving. If both are busy, they share the work.
This is horizontal scaling, or scaling out: more servers, each one ordinary.
DecisionScale out (more servers), not just up (a bigger server), once you care about staying online.
Scaling up is simpler but caps out and leaves you with a single point of failure. Scaling out is more work to set up, but it has no hard ceiling (add a tenth server, a hundredth) and it survives a server dying. Most systems do a bit of both: reasonably sized servers, several of them.
But adding a second server immediately creates a new question. The browser only knows one address. When someone visits your site, which of the two servers answers? You can't ask users to pick.
Move three: put a load balancer in front
You add one more box whose only job is to stand at the front door and hand each incoming request to one of your servers. That box is a load balancer.
Now the browser talks to the load balancer's address, and the load balancer spreads requests across your servers. The simplest rule is round robin: first request to A, next to B, next to C, then back to A. Smarter rules exist, like sending the next request to whichever server has the fewest open connections, but round robin already gets you most of the benefit.
The load balancer also quietly does a second job: health checks. It pings each server every few seconds. If server B stops answering, the balancer stops sending it traffic and routes everyone to A and C until B recovers. Your users never notice a server died. That's the payoff for all this work.
The catch nobody warns you about: state
Here's the trap, and it's the most important idea in this whole piece.
Say a user logs in. Their request happens to land on server A, and server A remembers "this person is logged in" by storing it in its own memory. Their next request gets sent by the load balancer to server B. But server B has never heard of this user. Suddenly they're logged out, for no reason they can see.
The problem: server A kept something in its own memory that the other servers don't know about. That remembered thing is called state, and the moment you have more than one server, state stored on a single server breaks.
Stateful server (breaks)
The server keeps login sessions, uploaded files, and counters in its own memory. Works with one server. With several, each one knows different things, so the same user gets different answers depending on which server they hit. Refreshing the page logs them out at random.
Stateless server (works)
The server keeps nothing important in its own memory between requests. Login sessions live in a shared store every server can read (a database, or a fast store like Redis). Uploaded files go to shared storage. Now every server gives the same answer, so it doesn't matter which one you hit.
The fix is a rule: keep your servers stateless. Anything that needs to be remembered between requests gets pushed out to a shared place all the servers can reach. The server itself becomes a forgetful worker. It does the job in front of it, remembers nothing, and any of them is as good as any other.
Two common ways to hold that shared session state: put it in a store every server reads (a database, or a fast in-memory store like Redis), or push it to the client itself as a signed token (a JWT) the browser sends on every request, so the server can verify it without looking anything up. Either works. The point is the same: the truth about who's logged in does not live in one server's head.
This rule is why horizontal scaling works at all. Interchangeable servers can only be interchangeable if none of them is secretly holding something the others need.
Sticky sessions: the tempting shortcut
There's an easier-looking way out, and it's worth knowing because people reach for it. You can tell the load balancer: "once a user lands on server A, always send that user back to A." Now server A's memory of them stays correct, because they never go anywhere else. This is a sticky session.
It works, but it quietly gives back most of what you came for.
Why sticky sessions bite you later
If a user is stuck to server A and A crashes, everything A was keeping for that user is gone — they're logged out, their half-finished work vanishes. Load also spreads unevenly: a server that collected the "heavy" users stays hot while others idle. And you can't freely take a server down for an update without disrupting whoever's stuck to it. Sticky sessions are a patch. Stateless servers with a shared store are the real fix.
Reach for stickiness only when going fully stateless is genuinely hard for some legacy reason, and treat it as temporary.
Where this leaves you
Notice the shape of what just happened. Every move was forced by a concrete problem:
One server gets overloaded
Rent a bigger one. Cheap, simple, and it buys real time. (Vertical scaling.)
A bigger server still has a ceiling and can still die
Run several ordinary servers instead, so work spreads and one failure isn't fatal. (Horizontal scaling.)
The browser doesn't know which server to talk to
Put a load balancer in front to spread requests and route around dead servers.
Users get logged out at random across servers
Keep servers stateless: push sessions, files, and counters to a shared store every server can read.
That's the core of scaling a web app's compute layer. You'll layer more on top later (caching, queues, splitting the database — each its own explainer), but they're all variations on the same instinct you just built: find the thing that breaks, add the smallest piece that fixes it, and keep your servers boring and replaceable.
The one idea to take away
A scalable system isn't a clever system. It's a system where any one machine can disappear without anyone noticing. You get there by keeping every server stateless and putting a load balancer in front, so the machines become a crowd of interchangeable workers instead of a few irreplaceable heroes.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWhat's the difference between vertical and horizontal scaling, and when would you pick each?+
Vertical scaling means using a bigger, more powerful server. Horizontal scaling means using more servers of ordinary size. Start with vertical because it needs almost no code changes and buys real time. Move to horizontal when you hit the ceiling of the biggest machine you can rent, or when you can't afford the downtime of having a single server that can fail. Most real systems do both: a few reasonably sized servers.
QWhy do you need a load balancer the moment you have more than one server?+
Because the browser only knows one address, and something has to decide which server answers each request. The load balancer sits at the front, spreads requests across the servers (round robin is the simplest rule), and runs health checks so it can stop sending traffic to a server that has died. Without it, you'd have no single entry point and no way to route around failures.
QWhat does it mean for a server to be 'stateless', and why does it matter?+
A stateless server keeps nothing important in its own memory between requests. Anything that must be remembered (login sessions, uploaded files, counters) is pushed to a shared store all servers can read. It matters because with multiple servers, anything one server keeps privately is invisible to the others, so the same user gets inconsistent answers depending on which server they hit. Statelessness is what makes servers interchangeable.
QA user reports they get logged out every few clicks after you added a second server. What happened?+
Their login session is being stored in one server's memory. The load balancer sends their requests to different servers, and any server that didn't handle the login doesn't know they're logged in. The fix is to make the session not depend on one server: store it in a shared place (a database or a fast store like Redis) every server can read, or carry it in a signed token (a JWT) the browser sends on each request.
QWhat are sticky sessions, and why are they considered a patch rather than a real fix?+
Sticky sessions tell the load balancer to always send a given user back to the same server, so that server's in-memory state stays correct. It's a patch because it reintroduces the problems you scaled out to avoid: if that server dies the user loses everything, load spreads unevenly, and you can't freely take servers down for maintenance. The real fix is stateless servers backed by a shared store.
Comments
Loading comments…