How We Made Our Maps 10x Faster
Using Datadog to optimize ActiveRecord performance
TL/DR
Our maps were really slow. We made them pretty fast by:
- Defining and measuring technical metrics that capture user pain
- Profiling before optimizing (Datadog is awesome)
- Installing alerts to protect against regressions (Datadog is awesome)
Context
In 2006, Redfin was the first company to put homes on a map⁰. This was even before Google Maps was a performant platform. They launched with Microsoft Virtual Earth before migrating to Google Maps in 2008¹. This was a legendary achievement in proptech, massively empowering homebuyers to shop on their own. Ever since, high quality map browsing has been table stakes for any real estate shopping tool.
Going into Q2, the maps on Opendoor were having performance issues. When a user viewed homes in Phoenix, it could take 15+ seconds before they see anything on the map.
Defining the problem: What’s slow?
Before we could start optimizing map speed, we needed to figure out why it felt slow. After some profiling with Chrome dev tools, it became apparent that there was a slow call to our REST API to fetch the property listing data to display on the map. Looking further, Datadog showed us that the bulk of this time was spent in ActiveModelSerializers and fetching from the DB. We then constructed a metric that captured the user pain we wanted to address: p90 time to fetch property data for the map. We use a 90th percentile to capture the predominant user experience while excluding outliers.
Metric: p90 time to fetch properties for the map
Baseline: 10.54s
Goal: <1s
Solution
The 10.54s p90 to <1s p90 optimization happened in 5 parts over about 1 month:
- Slim down serializers | 6s p90
- Cache most common queries | 3s p90 for common queries
- Minimize and optimize DB queries | 4s p90
- Split serial queries | 2.5s p90
- Skip mongo for < 120 properties | 750ms p90
1: Slim down serializers | 6s p90
We use Rails active_model_serializers to describe the fields that the REST API will return. A common problem in Rails apps is the same serializer being used for multiple endpoints and bloating to contain fields needed for multiple use-cases. This causes over-fetching of data both from our APIs and from the backing DB (we store listings in MongoDB). We also updated our Mongoid queries to use projections to return only the data necessary to fulfill the request from MongoDB. Creating specialized serializers just for the map component and reducing data fetches solved these issues and got us nearly a 2x speedup.
A common problem in Rails apps is the same serializer being used for multiple endpoints and bloating to contain fields needed for multiple use-cases.
2: Fix caching on naked map queries | 3s p90 for common queries
While fixing the DB problem, we noticed that we had smart logic in place to cache the serialized list of properties when the front-end requested an unfiltered list of properties for a region. This is the most common type of query because it’s the default state when a user lands on a map. Unfortunately a bug had been introduced that turned off this caching behavior in all cases. We corrected this behavior and added integration tests around the endpoint to ensure the cache is populated, then hit on subsequent requests. Alerts on latency spikes and cache hit rate will catch and flag us to investigate caching system failures in prod (more on that below).
3: Minimize and optimize DB queries | 4s p90
Datadog showed a lot of bouncing around between our serializers and mongo. It turned out that one of the fields had added an n+1 query by doing a fetch in one of the serializer fields as it was serializing a list of properties. We did this by using Mongoid’s includes() when querying for properties from the database. Optimizing queries so that all data was fetched in bulk before anything was serialized won us a few more seconds.
4: Parallelize queries | 2.5s p90
Both the properties on the map and the properties in the list were fetched by the same query. This was a design decision made to keep the logic simple and minimize the payload size if a property appears in both the list and the map. Profiling revealed that payload size wasn’t really an issue, but each list required its own CloudSearch and database query. Splitting these into two separate API calls done in parallel from the front-end was an easy win and didn’t require any encounters with multi-threading in Rails.
5: Skip mongo for > 120 properties | 750ms p90
Our final win came from avoiding the database entirely on some requests. Normally we fetch property listings from CloudSearch then hydrate them with a few extra fields from MongoDB. Instead, we switched the front-end to only expect latitude/longitude point locations and display clusters if there are more than 120 properties to show. This means that when point clusters are showing we don’t have to touch MongoDB, and in the cases we do, it’s a small number of properties and relatively quick.
Alerting
The final step was ensuring we have a great way to catch regressions. While automated tests are useful to catch bugs at the code level, alerting is important because subtle config changes can break distributed systems (like the cache) in ways automated testing in a staging environment may not catch. We now get notified by Datadog when there’s an anomalous latency change and we have useful dashboards with associated metrics.
Impact
Baseline: 10.54s
Goal: <1s
Result: 713ms 🥳
As a result, we’re seeing great performance from our home shopping tools and have seen even faster adoption of Opendoor-Backed Offers, an offering that backs our buyers’ home offers with all cash to give them a competitive edge in today’s hot market. Performance work on web and mobile is a top priority for me and my team. We will continue investing resources into our home shopping tools with the goal of delivering the best experience possible to our customers.
If this type of work sounds interesting, Opendoor is hiring! Head to our careers page to learn more.
from Hacker News https://ift.tt/3jmdwhL
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.