I was reading a post recently about Red Hat removing MongoDB support from Satellite (and yes, some folks say it is because of the license changes). It made me think how often over the last few years I’ve seen post after angry post about how terrible MongoDB is and how no one should ever use. However, in all this time, MongoDB has become a much more mature product. So what happened? Did all of the hate truly come from mistakes made in the early implementation/marketing of MongoDB? Or is the problem that people are blaming MongoDB for their own lack of efforts when evaluating if it was a good fit?
If you’re finding yourself at this point foaming at the mouth because it appears that I’m defending MongoDB, please jump to the end of this post and read my disclaimer.
I’ve been working with software for more years at this point than I like to admit, but even then I’ve only experienced a tiny fraction of the trends that have buffeted our industry. I’ve witnessed the rise of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain… the list never ends. Every year there are new trends that pop up in the world of software engineering. Some fizzle quickly, while others fundamentally change the way software development is performed.
With any new innovation that starts to gain traction, you’re going to see a general excitement appear around it as people start to pile on board, or see the buzz being generated by others and decide that they too want to get in on the action. This process was codified by Gartner with its hype cycle, which (while controversial) is a decent approximation of what happens with technologies that are eventually found to be valuable.
But every once in a while, a new innovation appears (or in this case reappears) that is driven by one particular implementation of that innovation. In the case of the NoSQL movement, it was driven heavily by the appearance, and meteoric rise, of MongoDB. MongoDB didn’t start the movement, it was the data challenges at large internet companies that really drove the return to non-relational databases. Projects like Google’s Bigtable and Facebook’s Cassandra kicked it off, but MongoDB was the most visible and accessible implementation of an open source NoSQL database that most developers had access to.
Aside: You might be thinking right now that I’m conflating document oriented databases with columnar databases, key/value stores, or any of the numerous other datastore types that fall under the generic NoSQL banner. And you are correct. But this was happening wildly at the time. Everyone was jumping into the NoSQL craze and they knew that they absolutely needed NoSQL, but didn’t really understand the different technologies involved. To many people, MongoDB was NoSQL.
And developers pounced on it. The idea of a schema-less database that used json-like documents, could run across multiple servers easily, and magically scaled to meet any challenge was quite alluring. Around 2014 or so, it seemed like everywhere you looked someone was implementing MongoDB in a place where just a year earlier a relational database like MySQL, Postgres, or SQL Server would have been used. When asked why Mongo was being used there were responses ranging from the banal “it’s web scale” to the more thoughtful “my data is very loosely structured and fits well into a schema-less database”.
It is important to remember that MongoDB, and document oriented databases in general, solve a number of problems people had with traditional relational databases:
Strict Schema – With a relational database, if your data model was dynamically shaped you were forced to either create a bunch of random “miscellaneous” data columns, shove data in as a blob of data, or use an EAV setup… all of which had significant downsides.
Difficult Scalability – With a relational database, if your data was so large that you couldn’t fit it easily into one server MongoDB had built in mechanisms like replica sets for allowing you to scale that data across multiple machines.
Difficult Schema Modifications – No migrations! With a relational database, changing the structure of the database can be a huge challenge (especially once your data gets really big). MongoDB promised to make this dramatically more simple. And it made it soooo easy to get started, you could just keep updating your schema and move really quickly.
Write Performance – MongoDB’s performance was good, especially when configured in certain ways. MongoDB’s out-of-the-box write configuration, which is one of the big things it was criticized for, allowed it to put up some impressive performance numbers.
The potential benefits MongoDB provided were huge, especially for people facing certain classes of problems. Reading the list above without context, or experience, would lead you to believe that it truly was a game-changer when it came to database systems. The only problem was that the benefits listed above came with a number of caveats, some of which I’ve listed below.
To be fair, no one at 10gen/MongoDB Inc. would claim the items below aren’t true, they are just tradeoffs.
Loss of transactions – Transactions are a core feature of many relational databases (no, not all, but most). Having a transaction means that you can perform multiple operations atomically and you can ensure that your data will stay consistent. Sure, with a NoSQL database you can have a transaction within a single document, or you can use tactics like two-phase commits to get transaction-like semantics. But the point is you have to do this work yourself… and it can be challenging and labor intensive to get right. Often you don’t realize how much you’re giving up here until you start seeing data in your database get into invalid states because you couldn’t guarantee the atomicity of operations. Note: As many people have let me know, MongoDB 4.0 introduced transactions last year, but they come with a number of limitations. So as this post is suggesting, please evaluate whether they will work for your needs.
Loss of relational integrity (foreign keys) – If your data has relationships, then you’re going to have relations. Almost all data has some kind of relations, and if your database doesn’t enforce them, then your application is going to have to. Having a database enforce these relationships can offload a lot of work from your application, and therefore from your engineers.
Lack of ability to enforce data structure – Strong schemas might be a pain in the ass sometimes, but they can also be a powerful mechanism for ensuring that your data is well structured. If you leverage them appropriately, it provides a powerful mechanism for ensuring your data is in the shape you expect. Document databases like MongoDB allow an incredible amount of flexibility around the schema, but that flexibility offloads the responsibility onto the maintainer to keep their data clean. If you don’t put in that effort, then you end up putting a lot of code into your application to account for data that might not be in the shape you expect. As we often like to say at Simple Thread… your app is going to be rewritten one day, your data will live forever. Note: MongoDB supports schema validation, which is useful, but doesn’t provide the same guarantees that you get in a relational database. Primarily, adding or modifying the schema validation doesn’t affect any existing data in the collection, it is up to you to make sure you’re updating your data to match the new schema. So whether or not this is sufficient for your needs is up to you to determine.
Custom query language/Loss of tooling ecosystem – SQL was an absolute revolution when it came out, and nothing has changed since then. SQL is an incredibly powerful language, but one that can also be challenging. Having to query a database using a custom query language composed of JSON snippets would be considered a big step backwards by folks experienced with SQL. There is a whole world of tools that interoperate with SQL databases. Everything from IDEs to reporting tools. Moving to a database that doesn’t support SQL means you can’t use most of these tools, or you have to find a way to get your data into a SQL database so that these tools can be used, and this can be harder than you think.
Many developers who reached for MongoDB didn’t deeply understand the tradeoffs they were making, and often they dove in head-first by using it as the primary datastore for their applications. This meant that is was often incredibly costly to go back on this decision.
What could have been done differently?
Not everyone jumped in head first and slammed into the bottom of the deep end. But enough did that there will be projects for years to come removing MongoDB from places where it just didn’t fit. If many of these organizations had taken a bit of time to think methodically about the technology choices they were making, it is likely that many of them wouldn’t have made the decisions they did.
So how do you decide what technology makes sense for your use case? There have been a few attempts at creating systematic frameworks for evaluating technologies such as “A Framework for Technology Introduction in Software Organizations” and “A Framework for Evaluating Software Technologies”, but I don’t think it needs to be that complicated.
Many technologies can be reasonably evaluated by asking just two main questions, but the challenge is finding individuals who can responsibly answer them, dedicate time to answering them, and answer them without bias.
If you’re not facing some kind of problem, you don’t need a new tool. Full stop.
Question 1: What problems am I trying to solve?
If you’re not facing some kind of problem, you don’t need a new tool. Full stop. Don’t look for solutions and then back into problems. If you’re not facing a problem that a new technology doesn’t solve significantly better than your existing technology, then your decision is over. If you’re considering using this technology because you’ve seen others using it, it might be useful to think about what problems they are facing, and ask yourself if you’re facing the same problems. It is often easy to reach for a technology because you see another company using it, the difficulty is in determining whether or not you’re facing the same challenges.
Question 2: What am I giving up?
This is definitely the harder of the two questions to answer, because you have to dig in and have a good understanding of both the old technology and the new technology. Sometimes you can’t really understand a new technology until you’ve built something with it, or have access to someone who has spent significant time with the technology.
If you don’t have either, then you should be considering what is the smallest investment you can make to determine if this tool is valuable. And if you make the investment, how hard would it be to undo the decision?
Humans Always Messing Things Up
One thing you’ll have to keep in mind is that you’re going to be fighting human nature when you’re trying to answer these questions as unbiased as possible. There are a number of cognitive biases that must be overcome in order to effectively evaluate a technology, but just to name a few:
Bandwagon effect – Everyone knows this, and yet it is still hard to fight against. Just make sure that you’re choosing a technology because it solves real needs for you, not because the cool kids are doing it.
Mere newness bias – Many software developers tend to undervalue technologies they have worked with for a long time, and overvalue the benefits of a new technology. This isn’t specific to software engineers, everyone has the tendency to do this.
Feature-positive effect – We tend to see what is present, and overlook what isn’t there. This can wreak havoc when working in concert with the “Mere newness bias”, since not only are you inherently putting more value on the new technology, but you’re also overlooking the gaps of the new tech.
Looking at things objectively is a challenge, but understanding the biases that may affect you will help you make more rational decisions.
When a new innovation appears (or reappears), we need to be very careful in answering two questions:
- Does this tool solve a real problem for us?
- Do we thoroughly understand the tradeoffs?
If you can’t confidently answer those two questions, take a few steps back and reevaluate.
So was MongoDB ever the right choice? Yes, of course it was; like most things in engineering, it depends. For teams that answered those two questions, many found value and continue to find value in MongoDB. For those who didn’t, hopefully they learned a valuable, not-too-painful lesson about navigating the hype cycle.
I want to clarify that I neither love nor hate MongoDB. I simply haven’t run into many problems that I thought it would be the best fit for. I know that 10gen/MongoDB Inc. didn’t do themselves any favors early-on by setting unsafe defaults and promoting MongoDB everywhere (especially at hackathons) as the be-all end-all solution for every data need. Yes these were probably bad decisions, but I think it backs up the point I’m making here because these were issues that could be uncovered very quickly with even a cursory evaluation of the technology.