article thumbnail

How to choose a database for your next project

author's thumbnail

Michael Warner

Feb 13, 2025

Maybe you’re building a web app, logging sensor data, integrating one system to another, or creating multiplayer video game - whatever your next tech project is, you’re here because you need to store data. An Excel spreadsheet will not be able to communicate with your app, so where do you store the data so that you can use it?

There’s some good news and some bad news. The good news: your project isn’t the first project to store data. Some of the brightest minds in the software industry have spent their careers coming up with ways to store as much data as you need for your project and serve it to millions of people.

The bad news is: there are many, many choices. Layers of choices. Not only are there several different types of databases, there are competing options for each category. These aren’t just logos to pick from, either. Determining the type of database that you need is the first step, but which factors go into deciding on MySQL or Postgres? MongoDB or Firebase? InfluxDB or Timescale?

In this article, we’ll take a look at the various types of databases, how to decide on where to host them, and when you will need to consider scalability. Let’s dive in.

Databases aren’t only storing data

Just about every organization under the sun uses spreadsheets in one way or another. My first IT job involved overseeing a treasure trove of documents that other departments spent countless hours in, organizing financial projections, inventory, and performance metrics among many other things. Creating and maintaining these spreadsheets was a vital task for several roles in these departments. It was what they spent most of their work week doing.

This gets the job done in many cases. Everyone in the organization can be trained on how to access a shared drive on the network, open the documents that they need to open, and go about their work. Sometimes we just need to store some data in a spreadsheet. Keep it simple.

Eventually, though, a line is crossed where you need to do something more than only store data. Spreadsheets are great for small data sets, but they don’t scale to millions of rows, they lack robust security, and they can’t seamlessly integrate with modern apps or multiple systems. Databases were created to solve these problems: they’re centralized, secure, and can handle large amounts of data and simultaneous requests.

Types of databases

Decades ago, the idea of relational data was solidified as one of the first ways to programmatically store and access data into a centralized system. Relational data at its core is a programmatic spreadsheet. There’s a “database”, which is like a “file”, and “tables” within the database which are like “sheets”. There are also “schemas”, which are like blueprints for the types of data that can be stored in a table. For example, a text column in a table can only store text - it will not let you store numbers.

We call it “relational” because some columns from some schemas might relate to columns in other schemas. For example, this blog article belongs to an author, so this post (with a title, body, and publish date in its schema) is related to the author (with a name and a link to a profile picture in its schema).

Relational databases worked well as the standard for a long time, and they are still the most common type of database today, but there’s a problem; we found out many years ago that not all data can or should be stored in tables with columns and rows.

To skip to the point, let’s break our options down in a list with the main purpose for their creation:

  • Relational Databases: Highly structured and normalized data.
  • Document Databases: Large amounts of flexible data.
  • Key-Value Stores: Simplicity and speed for pre-calculated values.
  • Time-Series Databases: Highly optimized for timestamped data.
  • Graph Databases: Relationship-centric data with complex queries.
  • Columnar Databases: Analytical processing, data warehousing.
  • Vector Databases: Emerging for AI/ML, focused on similarity searches.
  • Search Databases: Full-text search.
  • Choosing the database type by use case

    Nice, we have a simple bullet list of categories. If you’re a reader with some knowledge in programming, chances are you’ve heard of a few of these terms before. Now how can we be sure about picking the best option from the list for your project?

    First, it’s important to understand that these are all tools. There isn’t really a “team relational” or “team time-series” - depending on your project, you may use several of these database types together. At IOTEA, we’re using a mix of relational, document, key-value, and columnar databases to power our entire platform. We’re also building channels that heavily utilize time-series data since these are a natural fit for sensor data that we want to capture on some of our IoT projects.

    Before we lose the plot (the puns will only get worse from here), understand that choosing the right database depends on the structure of your data, query patterns, and scalability needs. Let’s start translate your use case into our first decision: choosing a category.

    CategoryCommon use cases
    RelationalFinancial transactions, ERP systems, customer relationship management (CRM), inventory management.
    DocumentContent management systems, e-commerce product catalogs, user profiles, logs.
    Key-valueCaching, session management, real-time analytics, feature flags.
    Time-seriesIoT telemetry, stock market analysis, sensor data, observability metrics.
    GraphSocial networks, fraud detection, recommendation engines, knowledge graphs.
    VectorAI/ML applications, recommendation engines, semantic search, image/audio recognition.
    ColumnarBig data analytics, data warehousing, log analysis, ad targeting.
    Search enginesLog monitoring, security analytics, enterprise search, DevOps observability.
    This table lacks nuance and is intended to give a high-level overview to get started. Definitely read the rest of the article.

    One last technical detail: each database is going to have different ways of executing transactions. ACID (Atomicity, Consistency, Isolation, and Durability) compliance is often pointed to as a gold standard for ensuring reliability, but it sacrifices speed. If you need strict transactional guarantees, relational databases and certain NoSQL engines with ACID transactions might be important to you. If raw performance is more important, then consider alternatives.

    Managed and self-hosted databases

    I mentioned earlier that databases are tools, and you might use several tools to get the job done. These tools are not simple hammers and screwdrivers, though. Each type of database comes with specialized knowledge on how to use it - but keep reading because there is good news coming.

    There are several quality options for each database type, but you also need to understand the limitations and assumptions of your choice. Then, you have to know how to set it up and - most importantly - secure it. This process can be a headache that many cloud services, including our own platform, offer solutions to make easier. These are called managed databases.

    Traditionally, managed databases did not exist. Self-hosting was the only option, meaning you would have to learn how to set up the database on a server and perform maintenance like version upgrades. You would also need to know how to manage a server.

    Self-hosted deployments can be done on a bare-metal server, a virtual machine, or a container. If you understood the terms in that sentence without looking at the appendix, you can probably comfortably self-host your database and you likely skipped this section altogether. If you didn’t skip this section, you’re probably deciding on whether or not managed databases are right for your project.

    Are managed databases right for your project?

    Above are the logos for some of the popular managed database services in early 2025. From left to right, you’ll see Neon (relational), Pinecone (vector), TimescaleDB (relational/time-series), Redis Cloud (key-value), BigQuery (columnar), and MongoDB Atlas (document). These are only some of the choices, and they all exist so that you do not have to worry nearly as much about the maintenance and security of your database. Many managed database services run on a usage-based pricing model, so you simply set it up, start storing data, and you pay for what you use like a utility. As long as you pay the utility bill, the lights stay on.

    While there are many benefits to managed databases, they may not be the right choice for your project. If you are a part of a skilled team who has experience with servers and databases, you can likely handle doing this work internally for cheaper - especially if you have servers available to self-host it on.

    There may also be compliance requirements for government, HIPAA, GDPR, PCI, or SOC2. Advanced compliance offerings are often only given to enterprise customers in these managed database services, so you could be paying a lot of money when it might make sense financially to self-host.

    Below is a decision tree and a table to help answer the second step in deciding our choice of database: managed or self-hosted.

    ConsiderationFavor self-hostingFavor managed DBFavor BaaS
    Cost Predictability❌ High upfront✅ Pay-as-you-go✅ Low initial cost
    Operational Overhead❌ High✅ Lower✅ Lowest
    Latency & Residency✅ Full control✅ Regional options➖ May have restrictions
    Scalability✅ If well-designed✅ Automatic scaling✅ Best for startups
    Vendor Lock-in✅ No lock-in➖ Some lock-in➖ Can be high
    Security & Compliance✅ Full control✅ Compliant services❌ Least customization
    Time-to-market❌ Slowest✅ Faster✅ Fastest
    Disaster Recovery❌ Requires manual setup✅ Built-in options✅ Handled by provider

    What about backend-as-a-service (BaaS) platforms?

    It would be hypocritical for me to not mention these since IOTEA is a BaaS platform. With these platforms, you get a variety of features (including a managed database) built into one single application. Firebase was one of the first services of this kind, with authentication, a managed database, edge functions, and messaging wrapped neatly into one dashboard.

    Others have popped up in the last several years. Supabase, AWS Amplify, AppWrite, and Nhost (to name a few) are all very popular options to quickly build apps with. A managed database is at the core of each of these platforms, usually with an easy-to-use library for web and mobile developers to quickly access the data with. The idea is not to just easily host your data, but put it to work quickly.

    If you are building a web or mobile app that can utilize the additional features on top of the managed database, this is an option that can quickly accelerate your project so that you can build, launch, and ship faster. There’s a reason why BaaS and workflow automation tools like Zapier or Make are very successful ideas: they lower the barrier to entry and reduce the amount of time it takes to deliver solutions.

    Understanding options

    Whether you’ve decided on self-hosting, a managed database, or a backend-as-a-service (BaaS) platform, you’re ultimately choosing a specific underlying database technology. There are dozens of mainstream choices and many underground ones, and each have certain benefits and implications. I’ve made another diagram to quickly display all of the core database technologies grouped by the primary category of database they belong to.

    There will be some options missing from this image, especially as this article ages. I apologize if you worked on an incredible piece of technology and it’s not in this image. It’s a great starting point, though!

    Multi-model databases

    Before looking at options, it’s important to know that the lines between database types are becoming blurred as the years go on and each of the teams working on these database engines becomes smarter. For example, MySQL and Postgres added support for JSON and JSONB columns as a response to MongoDB rising in popularity in the early 2010’s. These columns add efficient document database capabilities to these relational databases, even if they are not fully optimized for all use cases that MongoDB excels at.

    Postgres also has extensions like timescaledb available so that you can could use a single database for relational, document, key-value, and time-series data. This is not only convenient, it’s powerful since you can build single queries that join relational data with time-series data instead of querying separate systems. Databases that support multiple types are referred to as multi-model databases.

    Choosing an option

    I could (and will eventually) go on and on about nuances and use cases, but this article is getting long. You want to pick an option run with it because you have a brilliant idea that needs to ship ASAP. Here’s a high-level breakdown of some of the pros and cons of each option, starting with relational databases.

    Keep in mind that these options are going to be less important if you’ve decided that managed databases are for your team and project. Some of the options will be fully managed options, but I did my best to keep the following lists to unique database technologies.
  • Oracle
    • Use this for large enterprise or Fortune 500 company projects
    • Use this if you need 24/7 support for your millions of users
    • Do not use this if you don’t have large amounts of money
  • MySQL
    • Use this for small projects to large corporate projects with many users
    • Do not use this if your team has more experience with Postgres
    • Open-source
  • Postgres
    • Use this for small projects to large corporate projects with many users
    • Do not use this if your team has more experience with MySQL
    • Open-source
  • Microsoft SQL Server
    • Use this if you want to invest in the Microsoft (Azure) ecosystem
    • Do not use this if you don’t have money
    • Do not use this if you aren’t storing data on-prem
    • Do not use this if you want to avoid vendor lock-in
  • MariaDB
    • Use this if you like MySQL
    • Use this if MySQL ever decides to go closed-source
    • Don’t use this if you already chose Postgres
    • Open-source, community-led fork of MySQL
  • SQLite
    • Use this if you need a relational database with hardware constraints (such as embedded devices or smartphones)
    • Use this if you just want a simple relational database, darnit
    • Open-source
  • CockroachDB
    • Use this for distributed databases that can scale globally
    • Open-source
  • Next, let’s take a look at document databases:

  • MongoDB
    • Use this as the default option for a document database
    • Use this if you plan on self-hosting
    • Open-source
  • Couchbase
    • Use this if you need more performance than MongoDB in certain use cases (more on use cases later)
  • AWS DynamoDB
    • Use this if you want to invest in the AWS ecosystem
    • Use this if you want a fully managed option
  • Azure Cosmos DB
    • Use this if you want to invest in the Microsoft (Azure) ecosystem
    • Use this if you want a fully managed option
  • …and key-value stores:

  • Redis
    • Use this as the default option for a key-value store
    • Use this if you need the fastest possible option
    • Use this if you want to easily send pub/sub messages
    • Open-source
  • AWS DynamoDB
    • Use this if you want to invest in the AWS ecosystem
    • Use this if you want a fully managed option
    • Use this if you want to easily send pub/sub messages
  • Azure Cosmos DB
    • Use this if you want to invest in the Microsoft (Azure) ecosystem
    • Use this if you want a fully managed option
  • A hashmap in your codebase
    • Use this if you know how to write code
    • Use this if you don’t need a centralized location to store data
  • …time-series:

  • InfluxDB
    • Use this as the default, battle-tested option for time-series data
  • TimescaleDB
    • Use this if you are already (or planning on) using Postgres
    • Use this if you need to join time-series data with relational data
  • KDB
    • Use this if your project is in finance or trading
  • Graphite
    • Use this if your team prefers it
  • QuestDB
    • Use this if you want raw performance
  • Prometheus
    • Use this for short-term storage, not long-term historical analysis
  • Graph!:

  • Neo4j:
    • Use this as the default, battle-tested option for graph data
  • ArangoDB:
    • Use this if you like the idea of using a single multi-model database
  • Azure Cosmos DB:
    • Use this if you want to invest in the Microsoft (Azure) ecosystem
    • Use this if you want a fully managed option
  • …columnar!.. (these are for nuanced use-cases and require additional research):

  • Apache Cassandra
  • Clickhouse
  • opentext (Vertica)
  • Databricks
  • Snowflake
  • Apache Druid
  • Finally, we have vector databases:

    These are very different from the other databases. Instead of querying directly for specific (groups of) data, they use Approximate Nearest Neighbor (ANN) searches. Experiment before committing.
  • Pinecone
  • Weaviate
  • Qdrant (vector search)
  • Elasticsearch (vector search)
    • Use this if you’re already using Elasticsearch as a search engine
  • Couchbase (vector search)
    • Use this if you’re already using Couchbase
  • Gotcha! I didn’t forget search engines:

    These are also very different from other databases. Typically these are used alongside another database, like a relational database, and simply off-load heavy full-text search operations at scale.
  • ElasticSearch
    • Use this as the default, battle-tested search engine
  • Splunk
    • Use this if you want plug-n-play log analytics
  • Algolia
    • Use this if you want a fully-managed solution and want to focus on code
  • OpenSearch
    • Use this if you want an open-source alternative to ElasticSearch
  • MeiliSearch
    • Use this if you want an open-source alternative to ElasticSearch
  • TypeSense
    • Use this if you want an open-source alternative to Algolia
  • Performance… and other factors

    Hot take alert. The smartest database engineers have spent countless hours working on performance over decades, and errant bloggers have always followed closely behind with countless write-ups, bar charts, and code repositories comparing the various options. This is often a trap.

    The problem with performance is that it’s often reductionistic. While it is factual to say that MongoDB is going to be more performant (and potentially more scalable) at handling JSONB data in more use cases than a multi-model database, that does not mean that it's the most effective option for your project. This is especially true if you don’t plan on storing more than tens millions of records in a single table or collection.

    To make an analogy, it’s like dating only based on looks. Of course you will date who you’re attracted to - but there are without a doubt many other factors in your approach and how you date, let alone the choice to stay in a relationship.

    For databases, performance metrics are similar to looks because they can be a very compelling first impression. If I look at a simple bar chart in a blog post that shows me a bigger bar, the first thought is: “that’s the superior technology”. Over the course of my career, I’ve seen all of these factors play a role in the selection process:

  • Features for your use case (query language, multi-model)
  • Team familiarity, experience, and preference
  • Budget
  • (Ease of) Scalability
    • Consider the self-host, managed, and BaaS options
  • Community support & popularity
  • Ecosystem (Microsoft, AWS, your codebase languages)
    • More mature databases will come with more tools, libraries, and hosting providers
  • Backup and replication strategies
  • How transactions are handled (ACID compliance)
  • Performance
  • tl;dr: Don’t get caught in the trap of performance metrics. Go with the option that fits your use case, team, and scale of data. Performance matters for database engineers, but your project isn’t launching if you’re spending time worrying about benchmarks.

    Choosing your database

    You’re now armed with fundamental knowledge on database selection. The landscape is easier to navigate with what you now know. I realize, though, that I haven’t just said “pick MySQL”. I have failed to give you a direct, opinionated answer to your original question: how am I going to store my data, gosh darnit?

    This is because there are nuances in specific use cases. Plus, there is a ton of information out there comparing self-hosted Postgres to AWS RDS, or InfluxDB to TimescaleDB, or Firebase to Supabase. This information can help guide you to a final answer. We’re also not sponsored by any of the amazing technologies that we’ve talked about in this article, so we can’t just point you to someone who offered us money.

    One problem remains… what if it doesn’t work out? How do you quickly pivot? That’s where our platform shines. Not only can you quickly spin up many of these databases, and not only are they fully managed to help you and your team save time, you can put your data to work immediately with our channels. Heck, you can even use them simultaneously for redundancy or to speed along the process.

    If you build a prototype with one database and it doesn’t work out, no problem: simply swap in a different database. Our platform abstracts so much of the specialized knowledge needed to run, manage, and build working software that stores and retrieves data.

    Regardless of whether or not you want to choose our managed databases to build with, you can connect any accessible database to our platform. Bring your own MongoDB instance, or connect your TimescaleDB Cloud instance; the point of our platform is to quickly connect, deploy, and scale without any hassle or specialized knowledge.

    If you’re interested, head over to https://iotea.com to learn more and sign up for our upcoming beta launch.

    Appendix

  • Centralization: Keeping data in one location, such as a server (cluster). Opposite of distributed.
  • Distributed: Storing data across multiple locations, such as edge devices (phone, web browser, IoT device) or services (API) in a network.
  • Scalability: How the system grows as the amount of data and traffic increase. Can be vertical (faster servers) or horizontal (more servers in a cluster).
  • Security: Ensuring data stays unreadable to those who do not have access to see it. User permissions and encryption are two primary ways to secure data.
  • Relational databases: Store data in tables with rows and columns, similar to a spreadsheet. They use SQL as a query language and utilize schemas to define relations and structure. Sometimes referred to as “RDBMS”.
  • Document databases: Store data in JSON-like documents of key-value pairs. They do not enforce strict schemas like relational databases, making them more flexible and adaptable.
  • Key-value stores: Store data in simple key-value pairs. Extremely fast because of the simple storage mechanism. Typically used as a central cache in a distributed service network.
  • Vector databases: A new type of database that is optimized for AI/ML, recommendation engines, and semantic search. Does not directly query data; instead, it uses an Approximate Nearest Neighbor (ANN) algorithm to come up with a response that seems to fit based on the data in the database.
  • Graph databases: Store data with entities and relationships. Ideal for complex, interconnected data such as a social media network with friends, posts, reactions, and comments.
  • Search engine: Store massive amounts of text data to parse, search, and analyze.
  • ACID Compliance: Guarantees that database transactions are reliable and follow Atomicity, Consistency, Isolation, and Durability. Used in relational databases.
  • Bare-metal server: A machine with an operating system. Software is installed directly onto the machine, like your personal computer.
  • Virtual machine: A bare-metal server running specialized software (a hypervisor) to virtually delegate hardware resources to environments that can be virtually created, turned off, and destroyed. Resources like CPU cores, memory, and disk space are statically allocated.
  • Container: Similar to virtual machines, but they use a container runtime instead of a hypervisor. Container runtimes can dynamically allocate resources instead of assigning them statically, allowing for easier scalability and management than a virtual machines.
  • Query Language: The syntax used to store and retrieve data from your database.
  • SQL: Simple query language, originally used in relational databases but also used in other databases to offer a familiar query language. Databases may have specific or unique functions or additions to the standard SQL syntax.
  • NoSQL: Not-only-SQL. MongoDB was one of the first NoSQL databases as it stored data in a way that did not enforce a strict schema or tabular relations. Some NoSQL databases, such as Clickhouse, utilize a variation of SQL syntax to maintain familiarity while storing data in an entirely different way.

  • Connect in minutes

    Create an account and start connecting for free until you're ready to commit.

    Connect anything

    Connect devices and services using any protocol with our platform.

    Pay for usage

    Stop spending more for team-based subscriptions. Pay only for what you use.