Detecting Dangerous MongoDB Queries
Written by Neil Ongkingco
July 10, 2024
📈 Querying MongoDB at scale
Here at Apollo, we use MongoDB as our main database store and Mongoid on Ruby as our interface to the DB. The larger collections that are routinely used by our user-facing features can range from tens of millions to tens of billions of documents, so having good performance is paramount both for good response times and maintaining the stability of the database. While we have a comprehensive test suite and good code review practice, the occasional code change will still get through which would have unexpected performance effects on Mongo.
When working with MongoDB at scale, you’ve likely explored several features to fine-tune query performance. One such tool is using hint
to specify a specific index instead of letting Mongo choose an index when running a query. When done correctly, using hint
can lead to significant gains, particularly when you know the characteristics of your data in advance.
🚨How the dangers creep in
When done correctly, using hint
can lead to significant gains, particularly when you know the characteristics of your data in advance. Choosing the wrong hint
can lead to dramatic decreases in performance, making it a double-edged sword when optimizing queries. This becomes more likely to happen the more complex queries get, and the more often the code and database indexes change.
On more than one occasion a wrongly chosen hint
was the culprit for degraded query performance.
In most cases the mistake was a simple one:
company_by_url = Company.hint(url: 1).where(url: some_url).first
# some code...
company_by_uid = Company.hint(url: 1).where(uid: some_uid).first
In this example, there are 2 queries to the Mongo collection Company
,
- the first one finds companies where
url = some_url
, - the second one finds companies where
uid = some_uid
.
Both have hints, but the problem is that the second one uses the wrong index. While the first query is a lightning-fast index search, the second one is a full collection scan, because while we do have an index on uid
, the hint
forced Mongo to use the wrong index because calling .hint
overrides Mongo’s query planner if there is another matching index. This led to a full collection scan when it couldn’t find the uid
field in the index and caused enough load on the database that affected the site performance as a whole.
Because even such a simple mistake can affect the performance of the database we had to do something to handle it, preferably by detecting it in a pre-production environment.
👮🏼 Enter Static Analysis
One of the solutions we settled on was to perform static analysis on the Ruby code. We run it in the CI/CD workflow to detect such anti-patterns and catch them before they go into production. One of the anti-patterns that we searched for is calls to query methods (where, find_by
, find_or_create_by
) that do not have a field parameter that matches the first field of the index in the hint.
Company.hint(url: 1).where(url: some_url).first # OK
Company.hint(url: 1).where(name: some_name, url: some_url).first #OK
Company.hint(url: 1, created_at: 1).where(created_at: 1.day.ago) #Not OK, does not match first field of hint
Company.hint(url: 1, created_at: 1).where(uid: some_uid) #Not OK, does not match any field in the hint
We used the Whitequark Ruby parser to process our Ruby code. It has pretty good support and compatibility with Ruby language features for various versions (it’s the parser used by Rubocop, the most popular out-of-the-box linter for Ruby). Soon enough we had an analyzer that checks the abstract syntax tree generated by the parser for these patterns and flags them in an rspec
unit test.
# Sample code
Contact.hint(group_id: 1, random: 1).find_by(group_custom_fields: nil) # should be flagged, wrong hint
Contact.hint(group_id: 1, random: 1).find_by(_id: '424242') # should be flagged, does not use first field of hint
# Sample rspec output
Models(s) are called with index or hint violations:
Model: Contact count: 2
violation_reason: Does not use first field of any valid index, location: spec/mongo_query_static_analyzer_test_data.rb:21:32, file: spec/mongo_query_static_analyzer_test_data.rb, model: Contact, @method: find_by, args: [:group_custom_fields], hint_args: [:group_id, :random]
violation_reason: Does not use first field of hint location: spec/mongo_query_static_analyzer_test_data.rb:22:51, file: spec/mongo_query_static_analyzer_test_data.rb, model: Contact, @method: find_by, args: [:id], hint_args: [:group_id, :random]
The output shows that we not only check that the parameters of the query method calls match the first field of the hint
, but also check it against the actual index specifications of the Mongoid model objects to check if the fields match the first field of ANY index for the model. This catches many dangerous queries during the CI/CD run before anything gets into production.
💡In a nutshell
That said, this is just one of many layers we have to catch these errors. While having limitations, static analysis using tools like Whitequark is still pretty powerful.
Along with code review and additional checks in production to deal with long-running queries, static analysis gives us another tool to avoid problematic code changes that is low cost compared to infrastructure-level solutions.
Such deep technical challenges at scale are in plenty here at Apollo. With the team addressing these being small, chances are that you will find yourself sucked into a vortex of complex problems regularly.
And we are looking for smart engineers like you to join our "fully-remote, globally distributed" team. Click here to apply!