Solving ID type confusion in Mongoid

Written by Marcin Natanek

August 9, 2024

Share on social

Apollo’s Context

Apollo.io backend codebase is fully based on Ruby on Rails and contained within a single GitHub Monorepo. This allows us to centralize quality control, and database communication.

We also use multiple MongoDB clusters with collections containing billions of documents. It’s crucial for developer productivity to have a common interface for DB interactions from our Rails backend, so we use Mongoid ODM (Object Document Mapper) framework.

See the bottom of this post to learn more about Apollo!

The ID Type Problem

Mongoid models represent document IDs as BSON::ObjectId class, however many interfaces allow providing document ID as String. Example with our Contact model:

irb(main):001> c = Contact.new
=> #<Contact _id: 669a5a7b3a22434f738a2847>
irb(main):002> c.id
=> BSON::ObjectId('669a5a7b3a22434f738a2847')
irb(main):003> c.save! # persist to MongoDB to allow .find to locate the object.
=> true
irb(main):004> Contact.find(c.id) # Note: allows finding by BSON::ObjectId
=> #<Contact _id: 669a5a7b3a22434f738a2847>
irb(main):005> Contact.find(c.id.to_s) # Note: allows finding by String type!
=> #<Contact _id: 669a5a7b3a22434f738a2847>

While this type inconsistency is great for ease of use, especially in the Interactive Ruby console (IRB), it leads to common confusion in our backend code. Examples include:

# these values represent the same idea in different formats
irb(main):006> c.id == c.id.to_s 
=> false

# common mistake in our codebase: is an ID present in an array of IDs?
irb(main):007> ['669a5a7b3a22434f738a2847'].include? c.id
=> false
irb(main):007> ['669a5a7b3a22434f738a2847'].include? c.id.to_s
=> true

# another common issue: Hash indexed by Mongoid ID
irb(main):008> hash_indexed_by_id = {"669a5a7b3a22434f738a2847"=>"some value"}
=> {"669a5a7b3a22434f738a2847"=>"some value"}
irb(main):009> hash_indexed_by_id[c.id] # wrong access
=> nil
irb(main):009> hash_indexed_by_id[c.id.to_s] # correct access
=> "some value"

I often found myself questioning every method’s ID parameter. Which type should I use? Does the method contain a type conversion logic?

This usually required extra time to be spent on static code analysis, writing extra unit tests just for these cases. Sometimes there were so many type conversions happening, I needed a full session with the (awesome) Ruby Mine debugger to identify the root cause.

Solutions

Standardisation

Traditionally at Apollo we preferred to use String type for ID comparisons. Unfortunately, not all teams adopt this standard uniformly, and it’s very common to make these mistakes while using interfaces provided by other teams.

A side effect of this approach is 870 id.to_s repetitions and 312 id_array.map(&:to_s) repetitions in our codebase. Additionally, we cast String IDs back to ObjectId 290 times.

$ grep -r -o -w '\\bid.to_s\\b' app lib | wc -l 
870
$ grep -r -o -w 'map(&:to_s)' app lib | wc -l
312
$ grep -r -o -w 'BSON::ObjectId('  app lib | wc -l
290

Even with the adopted standards, the tricky edge cases still appeared every so often. Clearly, we need a better solution.

Static typing

Inspired by our React-based frontend codebase that makes heavy usage of TypeScript to address exactly this type of issues, we started working on adopting Sorbet for static type validation.

When a Ruby file is annotated with # typed: true sigil, we can rely on Sorbet to verify types provided to specific methods, assuming they provide an interface annotation:

sig { params(entity_id: BSON::ObjectId, fast: Boolean).void }
def update_entities(entity_id, fast: false)
   ...
end

With this interface, Sorbet can easily identify cases, where we provide String to entity_id parameter:

update_entities("66b4adfb3a2243709552afc9")
> Expected `BSON::ObjectId` but found `String("66b4adfb3a2243709552afc9")` for argument `entity_id`

While static typing is a great and strategic solution to the ID type confusion problem, it has a big caveat. It requires Sorbet type definitions to be enabled in all 3 places:

Method signature (provided above)
The caller method
The code that produces the ID value (as a return type). This can be mitigated with a tactical T.let(id, String)

All in all, Apollo started adopting Sorbet types only recently, and to this day we rarely have type checks enabled in all 3 cases. In order to fully rely on static typing to resolve this issue, we would have to introduce # typed: strict sigil everywhere, which would requires weeks, if not months of typing the existing codebase.

Additionally, full adoption requires all engineers to be onboard. Every person has their own opinion in the dynamic vs static typing debate, and we have to be mindful that Ruby developers typically prefer dynamic languages.

Stay tuned for our future publications, where we plan to describe our approach to gradually rolling out static typing in a large organization of opinionated developers!

Duck typing 🦆

With the static types adoption only just growing in the company, we decided to embrace the dynamic nature of Ruby to resolve this specific class of errors by overriding equality comparison and hash generation methods.

After all, if the value walks like an ID and quacks like an ID, then it must be a Mongo document ID and it’s safe to treat it as such!

This solution completely prevents this entire class of bugs from happening and does not require large coordination efforts from all engineers.

Here is how I implemented it:

# initializers/mongoid.rb

class ::BSON::ObjectId
  # Alias avoids overshadowing original implemenataion, so we can refer to it
  alias_method :original_eq, :== 
  
  # This equality 
  def ==(other)
    if other.is_a?(String)
      other == self.to_s
    else
      original_eq other
    end
  end

  alias_method :original_eql, :eql?

  def eql?(other)
    if other.is_a?(String)
      other == self.to_s
    else
      original_eql(other)
    end
  end

  # This allows accessing Hashes by either String or ObjectID
  def hash
    self.to_s.hash
  end
end

# See above
class ::String
  alias_method :original_eql, :eql?

	# the equality should work in both directions.
	# Note: eq and eql are not the same thing in Ruby. eql ignores type conversions.
  def eql?(other)
    if other.is_a?(BSON::ObjectId)
      other.eql? self
    else
      original_eql(other)
    end
  end
end

Since this initializer executes for each environment, including our test framework, we can immediately benefit from this solution everywhere. This includes our async queue processors utilizing the same Monorepo code.

Let’s retry running the problematic examples above with the initialization code above:

irb(main):006> c.id == c.id.to_s 
=> true

irb(main):007> ['669a5a7b3a22434f738a2847'].include? c.id
=> true
irb(main):007> ['669a5a7b3a22434f738a2847'].include? c.id.to_s
=> true

irb(main):008> hash_indexed_by_id = {"669a5a7b3a22434f738a2847"=>"some value"}
=> {"669a5a7b3a22434f738a2847"=>"some value"}
irb(main):009> hash_indexed_by_id[c.id]
=> "some value" 
irb(main):009> hash_indexed_by_id[c.id.to_s]
=> "some value"

Great! Now the value type is an irrelevant implementation details, and we can safely assume an ID is treated like an ID regardless of the type!

Compatibility with Sorbet

To ensure Sorbet understands that we can now use Strings and ObjectIds interchangeably, we defined a type alias that is now preferred for method annotations:

MongoId = T.type_alias { T.any(String, BSON::ObjectId) }

sig { params(entity_id: MongoId, fast: Boolean).void }
def update_entities(entity_id, fast: false)
   ...
end

With this addition, we can now provide any of the values to the method while passing all the static checks!

Confirmation

After raising a Pull Request on GitHub with the initializer change some Unit Tests started failing in our GitHub Actions pipelines. Upon closer inspection, I discovered that:

There were unit tests that started failing, because they asserted on this bug actually existing in our codebase!
There were multiple unit test assertions failing, since the unit test setup logic contained ID type mistakes as well. This meant we had false coverage that did not actually test anything!

Overall, I was very happy to see and fix these issues, as they further reassured me that the automated solution with Duck Typing was the right approach to solve this problem.

Conclusion

In large-scale software development it’s very important to proactively seek out minor improvements that affect a large group of developers. At first glance this issue is easily fixable and needs to be only encountered once by an engineer before they become aware of it.

However, we need to consider the fact that this cost is multiplied by several hundred engineers. In addition, Apollo is growing very fast and our hiring process focuses more on Software Engineering fundamentals than prior Ruby on Rails experience. Therefore, all new joiners have one more language quirk to be aware of.

Furthermore, these bugs might become very obscure and if they make it to production, they are very costly to reproduce and fix.

I am really glad to arrive at a solution that did not require any communication overhead or coordination with other engineers, as it allowed the entire organization to benefit from the fix immediately.

Author: Marcin Natanek

We are hiring!

undefined

Apollo.io is a fully remote company committed to grow in Poland. We are planning to double our headcount in 2024 and would love to see you in the picture above! See this LinkedIn post to learn more about our offsite in Warsaw.

Priority open roles:

Engineering Manager - Hot position!
Senior Backend Engineer - Solve scalability challenges, learn Ruby on the job.
Senior Frontend Engineer - React, Redux, TypeScript.