Beyond Postgres, MySQL, SQLite, MongoDB, and Redis is a long tail of specialised databases. You won't reach for most of them on a typical app, and that's fine. The goal here is recognition: know what each is for, so when a real need appears you know the name to look up. Each one trades away generality to be excellent at a single job.
Built for extreme scale and writes
Cassandra (and its faster-compatible cousin ScyllaDB) is a wide-column store built to spread across hundreds of machines and absorb enormous write volumes with no single point of failure. You give up flexible querying and strict consistency to get that. Reach for it at genuine internet scale (think massive sensor or event streams) when a single relational node can't keep up. For almost everyone, that day never comes.
DynamoDB is Amazon's fully-managed key-value/document database. Its appeal is operational: it scales automatically and you never manage a server. Its catch is that you must design your data around your access patterns up front, because flexible queries aren't really an option. Great for AWS-native apps with well-known access patterns and spiky scale.
Built for search
Elasticsearch (and the open fork OpenSearch) is a search engine: throw text at it and it answers "which documents best match these words," with ranking, typo tolerance, and faceting, plus it's heavily used for exploring logs. You run it alongside your main database, feeding it a copy of the data you want searchable. Lighter modern alternatives like Meilisearch and Typesense cover many app-search needs with far less operational weight. Remember, though, that Postgres full-text search is often enough to skip this entirely.
Built for analytics
ClickHouse is a columnar database built for analytics: scanning and aggregating over billions of rows in a blink. It stores data by column rather than by row, which makes "sum this metric across a billion records" extraordinarily fast, at the cost of being poor for the single-row reads and writes a normal app does. It's for dashboards and analytics, not your app's primary store.
DuckDB is the delightful local counterpart: think "SQLite for analytics." A single-file, embeddable columnar database you can run on your laptop to crunch large data files (CSV, Parquet) with SQL, no server. Wonderful for data analysis and exploration.
Built for time-stamped data
Time-series databases like InfluxDB, TimescaleDB, and (increasingly) ClickHouse specialise in data that arrives stamped with a time and is mostly queried by time ranges: server metrics, IoT sensor readings, financial ticks. Because consecutive values usually change only a little, these systems compress aggressively (often 10x or more) and answer "average CPU over the last hour" far faster than a general database. TimescaleDB is the easy on-ramp here: it's a Postgres extension, so it's the same SQL and the same database you already run.
Built for relationships
Neo4j (and Amazon Neptune) is a graph database: data is nodes and the edges between them, and queries traverse those connections. When the relationships are the product (social graphs, recommendations, fraud rings, "shortest path between two people"), graph queries stay fast where relational joins at depth would crawl. You query with a graph language like Cypher rather than SQL.
// Neo4j Cypher: friends-of-friends of Alice, two hops out
MATCH (a:Person {name: 'Alice'})-[:FRIEND]->()-[:FRIEND]->(fof)
RETURN DISTINCT fof.name;Built for AI similarity
Vector databases store embeddings (lists of numbers representing meaning) and find the closest ones to a query, which powers AI retrieval (RAG). The dedicated names are Pinecone, Weaviate, Qdrant, and Milvus, but for most apps pgvector (a Postgres extension) is enough, so you add vectors to the database you already run instead of adopting a new one.
How to actually choose
The decision, compressed
Start with Postgres for your primary data. Add Redis when you need a cache. Reach for a specialised database only when its one job genuinely shows up: Cassandra/DynamoDB for extreme scale, Elasticsearch for serious search (after outgrowing Postgres full-text), ClickHouse/DuckDB for analytics over huge data, a time-series DB for metrics and sensor streams (and note TimescaleDB is just Postgres), Neo4j for deep graph traversal, a vector DB for AI (after outgrowing pgvector). The senior instinct isn't collecting databases; it's running as few as possible and adding one only when a real, present need forces it.
That's the landscape. You now know why databases exist, the families they come in, how to actually write SQL against a real one, and the popular names with the job each is built for. From here, the full-stack data layer chapter and the database deep dives take the relational story all the way down to how Postgres stores bytes on disk.
Test yourself
Questions· say the answer out loud before you open it. If you can't, the chapter isn't done.
QWhen would you reach for Cassandra or DynamoDB over Postgres?+
At genuine internet scale with very high write volumes that a single relational node can't absorb, and where you can design around known access patterns and tolerate weaker query flexibility and consistency. Cassandra for self-managed massive write throughput, DynamoDB for hands-off managed scaling in AWS. For most apps that scale never arrives.
QWhat is Elasticsearch for, and what's the lighter-weight alternative path?+
Full-text search and log exploration: ranked 'which documents match these words' with typo tolerance and faceting, run alongside your main database. Lighter options like Meilisearch or Typesense cover app search with less operational weight, and Postgres's built-in full-text search is often enough to skip a search engine entirely.
QWhy is ClickHouse fast for analytics but bad as an app's main database?+
It's columnar: it stores data by column, so aggregating one metric across billions of rows is extremely fast. But that layout is poor for the single-row reads and writes a normal app does constantly. It's for dashboards and analytics, used alongside (not instead of) your transactional database.
QDo you need a dedicated vector database for AI features?+
Usually not at first. pgvector adds vector similarity search to the Postgres you already run, which covers most apps' RAG needs. Move to a dedicated vector database (Pinecone, Weaviate, Qdrant, Milvus) only when scale or specialised features demand it, not by default.
Comments
Loading comments…