Analyzing Wikipedia's Search Performance

Nicholas Ray

Nicholas Ray

I took this photo while backpacking in the Wind River mountain range in Wyoming.
I took this photo while backpacking in the Wind River mountain range in Wyoming.

Wikipedia's desktop site is getting a major facelift for the first time in over a decade as part of the "Desktop Improvements Project", and that new experience will feature a search component that my team and I built. As part of a case study for adopting a modern Javascript framework in MediaWiki, the software that powers Wikipedia), we developed the new search component using the Vue.js framework. In addition to replacing the old "legacy search" component that was the status quo on the desktop site, one of our goals for this project was to improve the search experience by including more contextual information such as article descriptions and thumbnails in the search suggestion results.

The legacy search component. The suggestions only include the titles of articles. Helpful contextual info like  thumbnails and descriptions are not included.
The legacy search component. The suggestions only include the titles of articles. Helpful contextual info like thumbnails and descriptions are not included.

Since users across the world visit Wikipedia using a wide range of internet connection speeds and device specs, we put a premium on performance and generally try to avoid loading anything but the most essential scripts and styles on the critical rendering path. To avoid the critical rendering path, the legacy search component loads most of its resources (JS, CSS, i18n messages) "lazily" and defers downloading the resources until the user first focuses the input. Once loaded, the search component shows suggestions based on what the user has typed in the input. This kind of lazy loading strategy is commonly seen on mobile sites (including my site) that defer loading images until the viewport is nearby. For the new search component, we chose to follow suit and also load the new search component when the user first focuses the input. However, we knew regardless of the loading strategy used that comparing the performance of the legacy search component with the performance of the new Vue search component would be important.

The new Vue search component
The new Vue search component

Implementing synthetic performance tests

During the course of this project, one of the tickets I worked on was to implement instrumentation that would help us compare the performance of legacy search with Vue search but would also help us detect performance regressions in both legacy and Vue search implementations.

There are two types of performance metrics we collect at Wikipedia. One method involves collecting metrics from real users (commonly referred to as "real user monitoring"). Another method, and the method described in this ticket, is to use synthetic performance tests which are automated scripts that run in a controlled environment and simulate user behavior. For this ticket, the script would collect metrics related to loading the search experience and the list of suggestions that appear after typing characters into the input.

T251544: The ticket I worked on to implement synthetic performance tests for both Vue and legacy search.
T251544: The ticket I worked on to implement synthetic performance tests for both Vue and legacy search.

Metric definitions

Although there are multiple ways to measure the performance of the search experience, I was interested in measuring several specific aspects:

Metric 1: Search Load Start To Load End

The time it takes to lazy load and execute the search resources (e.g. JS, CSS, i18n message) when the input is focused. At Wikipedia, we package the related resources together in JS and refer to that package as a "module". Multiple modules are required for both the legacy and Vue search experience.

The steps involved in the 'Search load start to load end' metric

Metric 2: Search Load Start To First Render

The time it takes from the start of lazy loading the search modules to the point at which the first set of search suggestions are shown to the user.

The steps involved in the 'Search load start to load end' metric

Metric 3: Search Query To Render

The time it takes to fetch and show each set of search suggestions using three different queries (“a”, “ab”, “abc”). The graphic below depicts the set of steps for the query "a".

The steps involved in the 'Search load start to load end' metric

All the steps that happen in the synthetic test

A single synthetic test collects one "Search Load Start To Load End" metric, one "Search Load Start To First Render" metric, and three "Search Query To Render" metrics as depicted below.

All the metrics combined

Disclaimer: In practice, the synthetic tests were a little more complex than depicted above. After typing the character "a" in the input and waiting for the suggestions to appear, the test would type "bc", wait for the suggestions, and then backspace and wait for the suggestions to appear for the query "ab". However, the net effect was that the queries "a", "ab", "abc" were still tested and, for simplicity, that is what is depicted.

Synthetic Test Results

Towards the end of March 2021, I took a screenshot of the dashboard containing the synthetic test results that had been collected to that point. The results showed our Vue search implementation performing worse in each metric by a factor of ~1.7. 😱

A screenshot of the synthetic
test dashboard taken toward the end of March 2021. The results show Vue search
slower by ~1.7x in each metric.
A screenshot of the synthetic test dashboard taken toward the end of March 2021. The results show Vue search slower by ~1.7x in each metric.

Important factors to consider

Before getting too discouraged by these results, it's important to keep the following things in mind:

  • Vue search and legacy search use different APIs (see "Query to Render" metric investigation for more on this). Additionally, the API that Vue search uses needs to fetch thumbnails and descriptions, while the legacy search API does not have this requirement.
  • The synthetic tests are scripts and therefore exhibit artificial behavior. They are not real users. Testing real users may yield different results.
  • The tests run with an empty browser cache. Of course, most users will have cache enabled so the differences represented in these results represent the worst-case scenario.
  • The biggest differences we see in the results are several hundred milliseconds.
  • The tests run from AWS instances from North Virginia and they tested French Wikipedia using the Barack Obama page.
  • The internet connection speed is intentionally throttled to simulate poor internet connections.
  • Each synthetic test performs 5 runs every ~4 hours and the median for each metric is used.
  • Only three queries are tested (“a”, “ab”, “abc”). Other queries in the wild may deviate from these results.

At the surface, these results are disappointing but perhaps not surprising given the points above. However, I was unsatisfied with their ambiguity. While these graphs show Vue search performing worse in each metric, they don't convey which steps are worse in each metric. I wanted to know more. Therefore, I investigated deeper to find out where this time was spent.

Let's look at the timelines

Fortunately, our dashboard allows one to download and view the actual performance timelines that were recorded during each test. Given this ability, I imported each timeline into my Chrome browser and investigated the last 7 days of performance timelines that the synthetic tests had collected.

Search Load Start To Load End Investigation

Graph of 'search load start to load end' metric

Search Load Start To Load End: Network Request

As part of this metric, three steps could potentially be slow. Given that the "JS to load search executes" is relatively fast, let's first focus on the "Network Request" step.

Search load start to load end with network request focused
TestSearch Load Start To Load EndNetwork Request
legacy-2021-03-27 9:40:00274.04 ms112.89 ms
vue-2021-03-27 10:10:00565.24 ms217.86 ms
legacy-2021-03-26 00:10:00283.21 ms113.53 ms
vue-2021-03-26 00:30:00521.03 ms218.06 ms
legacy-2021-03-24 22:50:00285.71 ms113.76 ms
vue-2021-03-24 23:10:00526.49 ms217.87 ms
legacy-2021-03-24 01:30:00306.10 ms113.10 ms
vue-2021-03-24 01:50:00564.24 ms217.99 ms
legacy-2021-03-22 23:50:00294.11 ms113.25 ms
vue-2021-03-23 00:10:00535.96 ms217.67 ms
legacy-2021-03-21 22:30:00287.76 ms112.85 ms
vue-2021-03-21 23:00:00515.66 ms217.73 ms
legacy-2021-03-21 01:30:00273.39 ms112.92 ms
vue-2021-03-21 01:50:00500.94 ms217.92 ms

As can be seen from the table above, the network request to load Vue search modules is taking ~1.9x as long to complete. When I examined legacy and Vue's respective network requests, I found significant differences in the payload size of the response containing their relevant modules. The 69.6 kB payload size to lazy load Vue search dwarfs the 6.9 kB payload size to load legacy search. Legacy search depends on jQuery but since that library is loaded when the page loads on Wikipedia, it does not influence this step. However, the Vue.js library is loaded during the lazy load of Vue search. This helps explain the performance differences observed in the table.

The size of the payload containing the Vue search modules is ~10x the size of the payload that contains the legacy search modules.
The size of the payload containing the Vue search modules is ~10x the size of the payload that contains the legacy search modules.

Search Load Start To Load End: Modules initialize

Next, we'll look at the "Modules initialize" step where the JS that was downloaded as part of the "Network Request" step gets executed.

Search load start to load end with module initialization focused
TestSearch Load Start To Load EndModules Initialize
legacy-2021-03-27 9:40:00274.04 ms7.48 ms
vue-2021-03-27 10:10:00565.24 ms209.61 ms
legacy-2021-03-26 00:10:00283.21 ms7.48 ms
vue-2021-03-26 00:30:00521.03 ms204.38 ms
legacy-2021-03-24 22:50:00285.71 ms7.29 ms
vue-2021-03-24 23:10:00526.49 ms207.45 ms
legacy-2021-03-24 01:30:00306.10 ms10.95 ms
vue-2021-03-24 01:50:00564.24 ms187.19 ms
legacy-2021-03-22 23:50:00294.11 ms10.62 ms
vue-2021-03-23 00:10:00535.96 ms211.23 ms
legacy-2021-03-21 22:30:00287.76 ms13.19 ms
vue-2021-03-21 23:00:00515.66 ms211.72 ms
legacy-2021-03-21 01:30:00273.39 ms9.04 ms
vue-2021-03-21 01:50:00500.94 ms191.82 ms

Notice from the table above that the time it takes to initialize Vue search modules is taking roughly ~20x as long. The relatively large execution time for Vue search is somewhat intuitive given that it uses a reactive framework with a larger footprint than legacy search.

Disclaimer: The timeline bars I observed for this data were fairly ambiguous for Vue search. The data above reflects my best guess as to what was happening, but there is a possibility that it is not accurate.

Search Query To Render Investigation

Graph of 'search query to render' metric

Search Query To Render: Network Request

Search query to render with network request focused

Since there are three "Search Query To Render" metrics collected during a test (for the queries "a", "ab", "abc"), let's looks at the time it took for each network request.

TestNetwork Request #1Network Request #2Network Request #3
legacy-2021-03-27 9:40:0034.78 ms33.53 ms33.67 ms
vue-2021-03-27 10:10:0034.74 ms37.97 ms105.01 ms
legacy-2021-03-26 00:10:0034.68 ms33.70 ms33.55 ms
vue-2021-03-26 00:30:0034.77 ms104.25 ms108.77 ms
legacy-2021-03-24 22:50:0034.82 ms33.59 ms33.59 ms
vue-2021-03-24 23:10:0034.99 ms93.08 ms114.89 ms
legacy-2021-03-24 01:30:0035.31 ms33.61 ms33.48 ms
vue-2021-03-24 01:50:0034.46 ms100.22 ms72.55 ms
legacy-2021-03-22 23:50:0034.40 ms33.51 ms33.57 ms
vue-2021-03-23 00:10:0034.58 ms110.39 ms138.15 ms
legacy-2021-03-21 22:30:0034.90 ms34.05 ms33.46 ms
vue-2021-03-21 23:00:0034.83 ms105.89 ms91.70 ms
legacy-2021-03-21 01:30:0034.98 ms33.92 ms33.47 ms
vue-2021-03-21 01:50:0034.49 ms34.28 ms109.56 ms

The table shows that the time it takes the legacy search ajax request to complete is more consistent than Vue search. The Vue search network request is more variable and sometimes 3x-4x slower than legacy search.

These results were a little peculiar to me. Vue search uses a different API/endpoint (our REST API) than legacy search (which uses our Action API), and Vue search was the default search experience for anonymous users (the majority of Wikipedia's userbase) on French Wikipedia. Since the respective API for legacy search and Vue search takes advantage of our edge frontend caching infrastructure, I'd think our caching infrastructure for Vue search would have been more primed than legacy search and would have shown more consistency. Unfortunately, determining whether the response came from the edge or not was not available info in the timelines I downloaded. I hypothesize that the < 50 ms requests in this table are coming from the edge cache, but oddly enough there is a higher percentage of legacy requests in this category.

Another factor that might be playing a role here is that the Vue search API has to fetch more data than legacy search. Remember that Vue search needs to article titles, thumbnails, and descriptions whereas legacy search only needs to fetch article titles. The extra work that the Vue search API must do to produce a response might help explain why the Vue search request can sometimes take longer than the legacy search request. The instances where the request time is the same might be explained by the response coming from the edge cache.

Search Query To Render: JS mutates DOM with suggestions

Search query to render with js mutating DOM step focused
TestJS Execution #1JS Execution #2JS Execution #3
legacy-2021-03-27 9:40:0047.74 ms38.05 ms37.92 ms
vue-2021-03-27 10:10:0015.51 ms19.12 ms12.99 ms
legacy-2021-03-26 00:10:0046.19 ms36.29 ms37.38 ms
vue-2021-03-26 00:30:0019.79 ms14.03 ms13.80 ms
legacy-2021-03-24 22:50:0049.19 ms38.37 ms40.14 ms
vue-2021-03-24 23:10:0017.08 ms12.89 ms12.86 ms
legacy-2021-03-24 01:30:0049.17 ms37.93 ms36.21 ms
vue-2021-03-24 01:50:0015.87 ms14.29 ms14.30 ms
legacy-2021-03-22 23:50:0052.74 ms37.04 ms36.18 ms
vue-2021-03-23 00:10:0016.32 ms15.53 ms17.00 ms
legacy-2021-03-21 22:30:0043.56 ms36.89 ms34.47 ms
vue-2021-03-21 23:00:0019.49 ms13.16 ms16.15 ms
legacy-2021-03-21 01:30:0052.11 ms38.61 ms36.62 ms
vue-2021-03-21 01:50:0017.76 ms13.95 ms13.41 ms

The tables show legacy search JS execution time (after the network request) is often 2x-4x that of Vue search. Out of all the steps I analyzed, this was one of the most interesting to me. Why would Vue search which uses Vue.js and has the overhead of a virtual DOM be faster than legacy search which has no such overhead and uses imperative jQuery DOM manipulations?

To answer this question, I investigated a performance timeline from legacy search and found that legacy search had a series of purple "recalc style" and "layout" bars. One pair of these bars is referred to as a forced synchronous layout and can be a potential source of inefficiency. However, when those bars repeat in succession like depicted below, it is commonly referred to as "layout thrashing" and is an even bigger bottleneck. Taking this into account, it became much easier to see why Vue search made up some ground and performed better than legacy search for this step.

Legacy search timeline showing multiple forced synchronous layouts. This is also known as layout thrashing.
Legacy search timeline showing multiple forced synchronous layouts. This is also known as layout thrashing.

Search Load Start To First Render Investigation

Graph of 'search load start to first render' metric

Search Load Start To First Render: Is there more?

Load start to first render steps with a gap between the modules loading step and the initiation of the ajax request

Since the "Search load start to first render" metric is a superset of the "search load start to load end" metric and the first "search query to render" metric, we'd expect that all the factors that led to Vue performing worse in the previous metrics would play a role in this metric. This is true, but it turns out that there is an additional factor to consider between the "modules initialize" step and the "JS initiates request for 'a'" step.

Load start to first render gap

While both search implementations use a debouncer to prevent excessive network requests when the user rapidly types (or holds down a key), the legacy search implementation fires a network request during the module initialization step if it detects characters in the input while the Vue search implementation debounces/delays the request when this same scenario is recognized. While this should be easily correctable, it certainly plays a role in this metric as Vue search was using a 200ms debounce time when these results were recorded.

Conclusion

From my analysis, I believe the following to be true at the time of this writing:

  • Vue search’s JS modules are slower to load than legacy search’s modules. This is likely because of a 10x increase (6.9 kB vs 69 kB) in payload size.
  • Vue search’s JS modules are slower to initialize than legacy search’s modules.
  • Vue search's JS handling of each ajax response is faster than legacy search. This is likely due to legacy search’s inefficient layout thrashing.
  • All of the metrics collected are influenced by network requests including whether they have been cached by the edge (e.g. Varnish).
  • The legacy search API usually has a faster network response for the queries “ab” and “abc”, but both have similar speeds for the query “a”. This might be related to the fact that the Vue search API must do more work than legacy search API to produce a response by needing to include thumbnails and descriptions.
  • Vue search is influenced by debounce time during the "Search Load Start To First Render" metric. Legacy search fires the request immediately if the input while the search modules are loading. We should be able to easily improve Vue search in this regard.

During this analysis, I looked at many performance timelines and data. You can take a closer look at the same data I observed by importing the timelines I saved in a repo. Let me know if you find anything interesting!