Database Mongo Performance

Excluding fields in the mongodb/mongodb library

I am using the mongodb/mongodb library for a project of mine.  The API seems fairly different from the old PECL library and it also came with some other, albeit unexpected, baggage.

My understanding of the library is that it is intended to be a replacement for the existing PECL library and that it intends to strike a balance between the core library functionality while also giving it a more agile, for lack of a better term, release cycle.  What that means is that there is some basic MongoDB functionality written in C with the forward-facing API being written in PHP.

In and of itself this is not an issue… except for the one thing I was trying to do.  One of the practices I’ve heard about Mongo is to get Mongo to do as much as it can, but not to worry too much about complicated joins and such as you would in SQL.  In other words, don’t shy away from bringing data into the application to do some processing.

That was the practice I followed, which worked fine up until my data size started to increase.  Then I ran into an issue where I was seeing 50% of my response time being eating up by BSONDocument and BSONArray unserialization calls.  This was due to a sub-document I had in the document that could actually be quite large.

“Hmm,” I thought to myself.  “I’m sure glad I didn’t see this a month from now.”  I did some research and waffled a little between using MapReduce or using the newish aggregation features.  I opted for the aggregation feature, which is so incredibly powerful.  I see myself misusing this quite often.

But after changing the code to make it work with aggregation I still had the problem of the unserialization.  The documents are returned as a whole.  In the core functionality you can exclude individual fields by doing something like this:

1
2
3
4
db.inventory.find(
   { type: 'food', _id: 3 },
   { "classification.category": 0}
)

The second parameter (classification.category) tells Mongo to omit that field in the result set.  But I couldn’t find a way to mimic this behavior in the new library.  So I opened an issue on GitHub (when you run into a problem, contact the library maintainers or post on SO.  Don’t just leave it) asking how I might run this.

Jeremy Mikola responded with the solution.  There is a second option that you can put in that is documented, but I don’t think it is directly clear from the documentation exactly what it does.  Or, at least, the description isn’t very SEO friendly (Google-Based-Development FTW!).

How does that look?

Like this:

$find = [
    'customer_id' => $this->getId(),
    'invoked_test' => $testName
];
$options = [
   'sort' => ['created_at' => -1],
   'limit' => 10,
   'projection' => ['events' => 0]];

$currentResults = $tests->find($find, $options);

BOOM! 7.5 seconds off the response wall clock time.

Leave a Reply

Your email address will not be published. Required fields are marked *