As we’ve added hundreds of interesting online attributes, we’ve been hitting some performance bottlenecks when processing larger, batch datasets. This hasn’t been an issue for customers thankfully, and it doesn’t affect our realtime APIs, but it’s still frustrating. I had a spare day, so it felt like time for a performance boost.
Here’s the executive summary :
- If you use threads, upgrade to ruby 1.9.X
- Index index index
- Don’t forget that rails sits on a database
To start I spun up a deliberately small cut down test server, set up a reasonably complex API, and used the great tools at http://blitz.io to rush the API with hundreds of concurrent requests
That spike at the start was a concern, even though it is on a small server.
All the CPU usage was in rack/passenger, so I dusted off the profiler. Thread contention was getting in the way. We need threads because we integrate with so many third party APIs. We were still on REE for it’s memory management, however that uses green threads, so it was time to bite the bullet and (1) update to ruby 1.9.X.
That helped a fair amount, but we were still getting the timeouts.
So we re-ran the profile and noticed a strange amount of time in activerecord associations and one particular activerecord and a different mongodb query. This led to a few things …
2. We didn’t dig in to why but mymodel.relatedmodel.create :param => X was causing some painful slowness in the association code. It wasn’t that important to keep the syntactic sugar; switching to Relatedmodel.create :mymodel_id => mymodel.id, :param => X saved a bunch.
3. We added a couple of activerecord indexes, which helped a bit. MongoDB indexes were working a charm, but there was one particular group of 3 independent indexes that were always used in conjunction, and the mongo profiler was revealing nscanned of >10000 for some queries. Creating a combined index helped a lot. Another couple of examples that remind us that, while ORMs are nice, you can never forget there’s a database sitting under all of this.
And no timeouts until about 150 concurrent hits.
The performance was already plenty in our production system (we automatically scale horizontally as needed), but this helped improved things about 2-3x.
That’s enough for today. We’ll share some more details on performance benchmarks in the coming weeks.
Any other thoughts from the community? Please email me (mhookey at demystdata dot com).