Real users slow applications down

Anybody who’s worked with databases knows this: as they grow they can slow down. It’s the curse of the customer base. Successful applications gather users and other transactional data.

In this article I’m using MySQL database queries based on WordPress’s database design. Still, the ideas in this article apply to every application built on any relational database management system.

Unavoidable slowdowns

Part of this slowdown is unavoidable: even when looking up a row takes O(log(n)) resources, when n is ten million it costs more than when n is one thousand.

There’s another unavoidable cause for slowdowns: Summarizing a large users table, for example with

SELECT COUNT(*) user_count FROM wp_users WHERE deleted = 0

takes longer for millions of users than it does for dozens of users. It’s part of handling a growing application.

Predictable slowdowns

When designing new applications, we write our database queries to get the job done. In the early days of an application it’s vital to spend scarce time on creating features to delight our prospective users. But, as the application gains users and the database grows, we inevitably discover which queries are efficient and which are not. It’s really hard to guess which which queries are inefficient in early days when the tables are small. So successful applications’ databases need routine examination as they grow.

Many early-days queries are simple stuff like this query to find the most recent ten post IDs from a particular user.

SELECT wp_posts.ID 
  FROM wp_posts
  JOIN wp_users ON wp_posts.post_author = wp_users.ID
 WHERE wp_users.email = 'mickey@disney.com'
   AND wp_users.deleted = 0
 ORDER BY wp_posts.post_date
 LIMIT 10

With proper indexing this kind of query is fast. But, any real application gathers more complex queries. And, it’s hard to guess how people will actually use our application in practice. Some features will get heavy use, and others won’t.

Hot queries

We need to revisit database performance, looking for hot queries. We can do this when users start to complain about slowdowns, or regularly as part of routine maintenance.

Hot queries are those which take a lot of time each time they run, or queries that run so often they take a lot of time in aggregate. In MySQL you can find them in the slow query log. In WordPress you can find your hot queries using John Blackbourn’s Query Monitor plugin, or the monitor feature in our Index WP MySQL for Speed plugin. Other DBMSs have their own ways of finding hot queries. Hot queries often get hotter when an application is under heavy use. So, it’s helpful to use your hot-query finding tools at your application’s peak times.

Once you know which queries are hot, you can run them with your DBMS’s EXPLAIN or EXPLAIN PLAN feature. This gets your DBMS to tell you which indexes, if any, it uses to satisfy the hot query in question. Once you know how the database satisfies a hot query, you can try to add an index or two to make it faster to satisfy. In extreme cases it may be necessary to refactor the query itself. (Explaining how to design new indexes and refactor queries is not the subject of this post.)

In summary, hot queries are entirely predictable. What’s unpredictable? Which queries in your application will turn out to be hot.

Unpredictable slowdowns

People routinely add feature to long-lived applications. (WordPress is a great example.) In WordPress’s case, the designers put tables in the database to allow the creation of plugins and themes. That design is successful: the ecosystem of third-party plugins and themes is vast.

Metadata

The core data feature for extensibility in WordPress is metadata. Posts, users, and terms can all be adorned with arbitrary metadata. Do you want to keep track of users’ birthdays? Simply put a record for each user in the wp_usermeta table for each user. Give it the meta_key of “birthday” and the meta_value of the actual birthday. Then, the application can retrieve the birthday using a query like this:

SELECT wp_users.display_name, wp_users.email,
      COALESCE (birthday.meta_value, 'unknown') AS birthday
  FROM wp_users
  LEFT JOIN wp_usermeta birthday
            ON birthday.user_id = wp_users.ID
           AND birthday.meta_key = 'birthday'
 WHERE wp_users.email = 'mickey@disney.com';

The alternative to using this metadata is to add a column to the users table to hold the birthday. In a (non-WordPress) medical application, that would be the right thing to do: dates of birth are a part of personal identity in medical records. But the WordPress designers did not try to build the core tables to hold everything. Instead they relied on metadata.

The same approach works for posts and terms. As a WordPress site grows and changes, it gathers lots of metadata.

Complex metadata

For a more elaborate example, a site operator can assign capabilities to users. These capabilities are things like “administrator”, “author”, or “subscriber”. And, themes and plugins (notably BuddyPress) add new capabilities. WordPress stores each user’s capabilities in wp_usermeta records with meta_key values of “wp_capabilities”. And, here’s where the real world gets in the way and things get ugly: A single meta_value can hold more than one capability. In fact, the meta_value for capabilities contains a serialized php array. Here is a serialized example.

a:2:{s:13:"administrator";b:1;s:14:"backup_admin";b:1;}

When WordPress loads the metadata it uses php’s unserialize() function to convert this to

array (
    'administrator' => true,
    'backup_admin'  => true
)

That’s fine when some part of the application wants to know “is this user a “backup_admin?”. But it’s absurdly hard and slow to search for all backup admins. You need something like this query that filters using the notoriously slow unanchored LIKE '%value%' search term.

SELECT wp_users.display_name, wp_users.email
  FROM wp_users
  JOIN wp_usermeta wp_capabilities
            ON wp_capabilities.user_id = wp_users.ID
           AND wp_capabilities.meta_key = 'wp_capabilities'
 WHERE wp_capabilities.meta_value = LIKE '%"backup_admin"%'

It gets worse. Look at this query. It counts all users with various capabilities.

SELECT COUNT(NULLIF(`meta_value` LIKE '%"administrator"%', false)), 
              COUNT(NULLIF(`meta_value` LIKE '%"editor"%', false)), 
              COUNT(NULLIF(`meta_value` LIKE '%"author"%', false)), 
              COUNT(NULLIF(`meta_value` LIKE '%"contributor"%', false)), 
              COUNT(NULLIF(`meta_value` LIKE '%"subscriber"%', false)), 
              COUNT(NULLIF(`meta_value` = 'a:0:{}', false)), 
              COUNT(*) 
FROM wp_usermeta 
INNER JOIN wp_users ON user_id = ID
WHERE meta_key = 'wp_capabilities'

It’s not a problem when a site has about a hundred users. But, when a site grows to a million users a query like this takes half a minute to run. There’s no index that can make the LIKE '%value%' search term faster. (In postgreSQL a trigram index will help, but MySQL doesn’t have those.)

Nevertheless, this sort of feature extension is good. It makes the application able to do things not imagined when it was originally developed.

Summary

The point is, tables grow and need new indexes. People add useful features to applications in ways that makes them inefficient. This happens in all successful applications. Wise application designers and maintainers are not surprised by this. They learn from their users, add indexes as needed, and sometimes alter their applications to work around unavoidable slowness.

Leave a Comment