Disclaimer: While I’ve experimented with MongoDB, I haven’t deployed it to production. These are my honest reservations that keep me from deploying a production app in Mongo. I welcome comments from those who have spent time in the trenches.
There are features I really enjoy about MongoDB. I love the hierarchical data structure. I love making a single, fast call to get a chunk of JSON that’s stored natively in the shape I need. And I enjoy rapidly prototyping an idea by slapping JSON into Mongo until the ideal schema becomes more apparent.
But there’s a long list of concerns that keep me from choosing it for production.
Schema = Protection
In a RDBMS, the schema dictates data types, nullable fields, maximum lengths, foreign key constraints, etc. This clarifies requirements. It simplifies application code. It also means when someone inevitably sends me a dataset to import, the DB won’t accept malformed data. In contrast, schemaless databases don’t natively protect from bad data. Sure, your application can (and should) enforce rules. When working with MongoDB, the schema can be enforced on the application side via Mongoose. However, there’s a gaping hole in this approach: Developers can insert and update data directly via the database. Honestly, have you ever seen a DB where all data was entered and manipulated solely through the application? Ad-hoc data manipulation will happen, so the database should protect itself.
Look guys, the data fits perfectly.
A strategically designed database won’t allow bad data in. Well-designed schemas work like a gated check-in. You’re forced to normalize and clean up your data before import, which is the most logical time to do so. Yes, I recognize MongoDB’s design assumes the lack of a schema is a selling point, but in my 15+ years of application development, I’ve yet to build an app where the structure of the data didn’t matter. A lot. I agree with Sarah Mei:
Schema flexibility sounds like a great idea, but the only time it’s actually useful is when the structure of your data has no value. If you have an implicit schema — meaning, if there are things you are expecting in that JSON — then MongoDB is the wrong choice.
Complex Ad-hoc Queries
Nearly every app I’ve built eventually requires ad-hoc queries to pull together data for reporting and complex UIs. Sure, Mongo supports querying, but your choice of document structure greatly impacts query performance and feasibility. Mongo is fastest when treated like a simple key/value store for retrieving hierarchical data. Thus, selecting the ideal document structure up-front is critical to assure you can efficiently access the data you need. And as you’ll see below, the right document structure for today often becomes the wrong structure for tomorrow.
Hierarchical Documents Assume Static Requirements
In a traditional RDBMS, your schema models relationships that are highly unlikely to change. The inherent nature of the data guides you toward a logical normalization path that will support ad-hoc queries and avoid repeating data. In a schemaless DB like MongoDB, selecting a logical document structure requires considering all the ways your data might be used up-front. Making the right call for the long-term isn’t easy.
Ironically, Mongo’s schemaless hierarchical document structure seems best suited for static data retrieval requirements. This means the document structure you initially choose can become a real millstone later. What options do I have when the document structure I selected no longer supports the queries I need? This is exactly what happened to Sarah Mei when building an app for Diaspora. Sure, relational databases require you to make a decision about your data structure up-front as well, but the explicit, normalized, and relational nature of that structure makes it more flexible and versatile. Flexibility is key when requirements change. And they always do.
You Need Two DBs Anyway
For reasons outlined above, you’ll likely find you need two DBs: One for transactions, and another for analytics. Maintaining and syncing two DBs in radically different technologies isn’t trivial. Yet Mongo reps acknowledge the common need of separate OLTP and OLAP DBs when working with MongoDB. But if I need two databases, why not simply lean on two relational DBs that are optimized for these two very different roles? Two optimized RDBMS are likely to offer comparable performance, superior data integrity, and simpler maintenance as the DBA doesn’t have to learn the intricacies of managing and syncing two radically different DB systems.
Inconsistent Data = Bugs
I’ve never written an app where an inconsistent data structure was considered a feature. Yes, inconsistent data structures are unavoidable in some cases, but when at all possible, I prefer setting explicit requirements and expectations for my data structures. Why? This mindset drives out holes in requirements. Oh, a user can have multiple addresses? A vehicle’s model year may be null or a decimal? Woah, these things have a big impact and ripple through a system all the way up to the UI design. Thus, schemaless DBs introduce hidden edge cases in your data. Edge cases should be addressed as early as possible, and an explicit schema enforces this critical step. The later you find out about edge cases in your data, the greater the impact to your application (and your timelines).
If this little guy is in my hotel bed, I’d like to know before booking.
Explicit > Implicit
Even if you embrace the wild west mentality that you don’t need a schema, you still have one. It’s just implicit instead of explicit. And that’s a problem because clean code is about writing code for humans. Explicit schemas convey expectations in a clear, standardized, and centralized manner that humans can easily understand.
The argument here parallels considerations in strong vs dynamic typing. Dynamic types can help you move faster in the short-term, but you lose the ability to lean on the compiler, and the decisions you make about types become implicit instead of explicit. Thus, in a dynamic language, your co-workers have to read either comments or tests to determine data types and interfaces. In a strongly typed language, the interface and expectations are explicit. Bottom-line, you have to convey your assumptions at some point, and a schema is a consistent and logical point to do so. Admittedly, you can optionally create a MongoDB schema in your application code via Mongoose, but this is optional and, as we discussed above, fails to adequately protect the data from direct manipulation.
These are my core reservations, but I’d love to be corrected in the comments. If data integrity truly doesn’t matter to you, then Mongo’s hierarchical data structure, schemaless nature, and high performance may make it a great fit for you. But until I build an app where the data structure truly doesn’t matter, I’m sticking with a traditional RDBMS.