In the last lesson every query hit one table. But real questions cross tables: "who placed this order?", "what did Aarav buy?". The data is deliberately split (a customer's name lives once in customers, and an order just stores that customer's id), and JOIN is how you stitch it back together.
Why the data is split
Look at the shop. An order doesn't store the customer's name and city; it stores a customer_id that points at a row in customers. This is on purpose: if the name lived in every order, changing it would mean updating every order. So each fact lives once, and we link by id. The cost of that tidiness is that answering "who placed order 101?" needs two tables joined.
INNER JOIN: match rows across tables
A join pairs each row in one table with matching rows in another, based on a condition you give (the join condition), almost always "this table's foreign key equals that table's id."
Press Run to execute your query.
Read the JOIN ... ON ... line as "bring in customers, matching where the customer's id equals the order's customer_id." For every order, you now also have its customer's columns. That's an inner join: only rows that have a match on both sides appear.
Table aliases keep joins readable
Once you join several tables, writing the full name each time gets noisy. You can give each table a short alias right after its name (FROM orders o) and use it everywhere (o.customer_id). The next query leans on this.
Joining more than two tables
To answer "what did Aarav actually buy?" you have to walk a chain: a customer has orders, an order has items, and each item points at a product. That's four tables joined in sequence, each JOIN adding the next link.
Press Run to execute your query.
Each JOIN line connects one more table by its key. The query reads like the path through your data: customer → orders → order_items → products. This chaining is the everyday bread and butter of SQL, and it's how a single query can answer a question that touches your whole schema.
LEFT JOIN: keep rows that have no match
An inner join drops rows with no match. Sometimes that's wrong. "List every customer and their orders" should still include a customer who hasn't ordered yet, showing them with no order. That's a LEFT JOIN: keep every row from the left table, and fill in NULL where the right table has no match.
Press Run to execute your query.
Notice the customers with an empty (∅, meaning NULL) order id: they're in the result precisely because LEFT JOIN keeps them. Swap LEFT JOIN back to JOIN and run it again, and those customers disappear. That difference is the whole point.
Inner vs left, the one-line rule
Use INNER JOIN (just JOIN) when you only want rows that have a match on both sides. Use LEFT JOIN when the left table's rows must all appear even if the right side has nothing. "Customers who have ordered" is an inner join; "all customers, with their orders if any" is a left join.
The classic LEFT JOIN trap: filtering in WHERE
This one bites everyone once. You write a LEFT JOIN to keep all customers, then add WHERE orders.total > 500 to filter the orders. Suddenly the customers with no orders vanish, and your LEFT JOIN quietly behaves like an inner join. The reason: an unmatched customer has NULL for every order column, and NULL > 500 is unknown, so WHERE drops the row. If a condition on the right table should be part of the matching rule (not a filter that discards left rows), put it in the ON clause instead: LEFT JOIN orders ON orders.customer_id = customers.id AND orders.total > 500. Conditions in ON shape what counts as a match; conditions in WHERE throw away whole result rows.
A note on what a join really costs
A join isn't free: the database has to find the matching rows. When the column you're joining on is indexed (and primary keys and foreign keys usually are), it's fast, jumping straight to matches. Without a useful index, a join on large tables can get slow, which is exactly the kind of thing the query-plan deep dive teaches you to spot. For now, just know: join on keys, keep keys indexed, and joins stay quick.
Next: you can now combine tables. The next step is summarising them, counting orders, averaging prices, totalling revenue, with GROUP BY.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWhy is the data split across tables instead of stored together?+
So each fact lives in exactly one place. A customer's name sits once in customers; orders just reference that customer by id. If the name were copied into every order, changing it would mean updating every order. The trade-off is that combining the data needs a JOIN.
QWhat does the ON clause in a JOIN do?+
It's the join condition: it tells the database how to match rows across the two tables, almost always 'this table's foreign key equals that table's primary key' (e.g. ON customers.id = orders.customer_id). Each row is paired with the rows that satisfy that condition.
QWhat's the difference between INNER JOIN and LEFT JOIN?+
INNER JOIN (just JOIN) returns only rows that have a match on both sides. LEFT JOIN returns every row from the left table and fills in NULL where the right table has no match, so rows with nothing on the right are kept. 'All customers with their orders if any' needs a LEFT JOIN.
QHow do you answer a question that spans four tables, like 'what did this customer buy'?+
Chain joins: start at customers, JOIN orders on the customer id, JOIN order_items on the order id, JOIN products on the product id. Each JOIN adds the next link in the path through your schema, so one query walks customer → orders → items → products.
QWhy does a LEFT JOIN sometimes act like an inner join?+
Because you filtered a right-table column in the WHERE clause. Unmatched left rows have NULL for the right table's columns, and a condition like orders.total > 500 is unknown for NULL, so WHERE discards those rows, undoing the LEFT JOIN. Put such conditions in the ON clause instead, where they affect matching rather than dropping left rows.
QWhat makes a join fast or slow?+
Whether the joined column is indexed. Primary and foreign keys usually are, so the database jumps straight to matching rows and the join is fast. Joining large tables on an unindexed column forces it to do far more work, which is when joins get slow.
Comments
Loading comments…