How I improved a query for our plugin by several orders of magnitude

This is going to be super developer heavy. 99% of the article is going to be about SQL, so don’t really expect PHP or javascript things.

Background

We’ve been getting support tickets about the admin area being rather slow at times. We’ve recently released a new, 2.0 version of Subscriptions, and one of the reasons we did the rewrite is to make it more scalable than the previous version. To hear that large stores were struggling with it was rather discomforting, and to a certain extent, annoying to me, as a developer. Annoying, because we haven’t achieved what we wanted.

So we went digging.

Research

We have a server that we use only for staging purposes. It’s pretty amazing, fairly easy to deal with. WP-CLI is on there, plus there’s a GUI for fast staging site setup. We have New Relic on it too. The whole thing is Jason’s doing. It’s awesome!

Jason spun up a large site with 24,977 subscriptions and 25,718 orders. That’s roughly one order (parent order) for every subscription, and a few renewal orders.

I tried recreating the problem the shop owners experienced: sorting the subscriptions by last renewal date.

This is what New Relic gave us:

long query new relic

If you can’t see the picture: that one ran for 90 seconds. That’s a minute and a half. Considering some hosts have some sort of guard on their shared hosting plans so no one busy customer can take down an entire server, this does actually make it very hard to use. WP Engine is one host where I know they terminate any script that runs for longer than 60 seconds1.

Optimize!

SELECT SQL_CALC_FOUND_ROWS  
  wp_posts.*,
  wp_posts.ID AS original_id,
  wp_posts.post_parent AS original_parent,
  CASE (SELECT COUNT(*) FROM wp_postmeta WHERE meta_key = '_subscription_renewal' AND meta_value = original_id)
    WHEN 0 THEN
      CASE (SELECT COUNT(*) FROM wp_posts WHERE ID = original_parent)
        WHEN 0 THEN 0
        ELSE (SELECT post_date_gmt FROM wp_posts WHERE ID = original_parent)
      END
    ELSE (SELECT p.post_date_gmt
      FROM wp_postmeta pm
      LEFT JOIN wp_posts p ON p.ID = pm.post_id
      WHERE pm.meta_key = '_subscription_renewal'
      AND meta_value = original_id
      ORDER BY p.post_date_gmt DESC
      LIMIT 1)
  END AS last_payment
FROM wp_posts  
WHERE 1=1  
AND wp_posts.post_type = 'shop_subscription'  
AND ((  
  wp_posts.post_status = 'wc-active'
  OR wp_posts.post_status = 'wc-pending'
  OR wp_posts.post_status = 'wc-cancelled'
  OR wp_posts.post_status = 'wc-on-hold'
  OR wp_posts.post_status = 'wc-pending-cancel'
  OR wp_posts.post_status = 'wc-expired'
  OR wp_posts.post_status = 'wc-switched'))
ORDER BY CAST(last_payment AS DATETIME) ASC;  

I tried to break it up for readability. This is the result of a filter that’s hooked into posts_clauses when we’re sorting by that column in the admin list table view of the custom post type. The main gist of this query is this:

  • we want to add another column of data: last_payment
  • that data should come from the following:
    • if there are renewal orders for that subscription (has the _subscription_renewal meta), then grab the highest post_date_gmt value of that
    • if there aren’t, check if it has a parent order’s date (the post that’s the parent of this one)
    • if there’s none, be 0
  • finally, order by this value

This query takes between 40.2 and 41.1 seconds on my local copy of the data, ran as a raw query against the database using Sequel Pro. None of the WordPress hooks / filters are touched to slow it down.

We can surely do better than this. So here’s the end solution:

SELECT SQL_CALC_FOUND_ROWS  
  p.*,
  COALESCE(lp.last_payment, o.post_date_gmt, 0) as last_payment
FROM wp_posts p  
LEFT JOIN  
  (SELECT
    MAX( p.post_date_gmt ) as last_payment,
    pm.meta_value
  FROM
    wp_postmeta pm
  LEFT JOIN
    wp_posts p
  ON
    p.ID = pm.post_id
  WHERE
    pm.meta_key = '_subscription_renewal'
  GROUP BY pm.meta_value ) lp
ON p.ID = lp.meta_value  
LEFT JOIN wp_posts o on p.post_parent = o.ID  
WHERE 1=1  
AND p.post_type = 'shop_subscription'  
AND ((  
  p.post_status = 'wc-active' OR
  p.post_status = 'wc-pending' OR
  p.post_status = 'wc-cancelled' OR
  p.post_status = 'wc-on-hold' OR
  p.post_status = 'wc-pending-cancel' OR
  p.post_status = 'wc-expired' OR
  p.post_status = 'wc-switched'))
ORDER BY CAST(last_payment AS DATETIME) ASC;  

This one runs in 2.72 seconds on the same dataset, returning the exact same data. That’s 15x the speed, 1/15th the time, or 1,500% improvement. I’ll show you how I checked whether it returns the same data later.

Changes

Ah, the meaty part of the article. The whys of the entire decisions.

Move away from the CASE in SQL

Using CASE i WHEN a THEN b in SQL is like using if (x) { a } else { b } in PHP. Can be useful, but when there’s a LOT of data to sift through, the engine will just waste cycles.

Because essentially we’re just using fallbacks for the data, we can get all of them anyways. So for every returned row of subscriptions, we can also return the date of its parent (or null, if parent ID is 0, or is not found), and also the date of its latest renewal order (or null, if there aren’t any).

There’s a wonderful function in MySQL called COALESCE, which basically says

Returns the first non-NULL value in the list, or NULL if there are no non-NULL values.

Bam. Feed it the latest renewal date, the date of the parent, 0, and we’re done. That’s what you see above:

COALESCE(lp.last_payment, o.post_date_gmt, 0) as testing  

But that means we have to actually get the data...

Joins are awesome

Especially when they are straightforward. At its core, we need to construct two more tables that we can then tack onto the default wp_posts one with the correct connections.

1. The latest from the renewal orders

Because we need a table, we need to grab data for all the renewal orders for all subscriptions in a way that the subscriptions would have one row. GROUP BY will achieve that, and because we can use aggregate functions there, we can select the latest of the post_date_gmts that belong to orders where the _subscription_renewal meta is the same. Or, in SQL:

SELECT  
  MAX( p.post_date_gmt ) as last_payment,
  pm.meta_value
FROM  
  wp_postmeta pm
LEFT JOIN  
  wp_posts p
ON  
  p.ID = pm.post_id
WHERE  
  pm.meta_key = '_subscription_renewal'
GROUP BY pm.meta_value  

That returns data that looks like this:

Since the meta values are IDs of subscriptions, we can use that for the joins. We’re using left joins here, because if data is missing in this table, I want the dates to be NULL instead of removing rows from the original query by using RIGHT JOIN2.

The parent order’s date

This time we’re joining the wp_posts table to the wp_posts table. It sounds stupid, but it’s better than doing 24,977 subqueries. In SQL, it’s this bit:

LEFT JOIN wp_posts o on p.post_parent = o.ID  

I’ve previously declared the original, first wp_posts table to be called p, hence the p.post_parent.

Making sure it’s the same data

Basically I’ll tack on my new data to the old data and then see whether the new data and the old data are different for each row.

This is the resulting (massive) SQL:

SELECT SQL_CALC_FOUND_ROWS  /* wp_posts.*, */  
  wp_posts.post_date_gmt,
  wp_posts.ID AS original_id,
  wp_posts.post_parent AS original_parent,
  lp.last_payment as max_renewal_date,
  o.post_date_gmt as parent_date,
  CASE (SELECT COUNT(*) FROM wp_postmeta WHERE meta_key = '_subscription_renewal' AND meta_value = original_id)
    WHEN 0 THEN
      CASE (SELECT COUNT(*) FROM wp_posts WHERE ID = original_parent)
        WHEN 0 THEN 0
        ELSE (SELECT post_date_gmt FROM wp_posts WHERE ID = original_parent)
      END
    ELSE (SELECT p.post_date_gmt
      FROM wp_postmeta pm
      LEFT JOIN wp_posts p ON p.ID = pm.post_id
      WHERE pm.meta_key = '_subscription_renewal'
      AND meta_value = original_id
      ORDER BY p.post_date_gmt DESC
      LIMIT 1)
  END AS last_payment
  coalesce(lp.last_payment, o.post_date_gmt, 0) as testing
FROM wp_posts  
LEFT JOIN wp_posts o on wp_posts.post_parent = o.ID  
LEFT JOIN  
  (SELECT
    MAX( p.post_date_gmt ) as last_payment,
    pm.meta_value
  FROM
    wp_postmeta pm
  LEFT JOIN
    wp_posts p
  ON
    p.ID = pm.post_id
  WHERE
    pm.meta_key = '_subscription_renewal'
  GROUP BY pm.meta_value ) lp ON wp_posts.ID = lp.meta_value
WHERE 1=1  
AND wp_posts.post_type = 'shop_subscription'  
AND ((  
  wp_posts.post_status = 'wc-active' OR
  wp_posts.post_status = 'wc-pending' OR
  wp_posts.post_status = 'wc-cancelled' OR
  wp_posts.post_status = 'wc-on-hold' OR
  wp_posts.post_status = 'wc-pending-cancel' OR
  wp_posts.post_status = 'wc-expired' OR
  wp_posts.post_status = 'wc-switched'))
HAVING testing != last_payment  
ORDER BY CAST(last_payment AS DATETIME) ASC;  

This massive query ran in 41 seconds (so negligible time added by the joins and extra bits).

It returned exactly 0 rows. If I change the last HAVING clause from HAVING testing != last_payment to HAVING testing = last_payment, it runs for 38.7 seconds, and returns all 24,977 rows.

To massage it into WordPress

Because the new query has a bunch of joins and new selects, it seems to be tricky. It was actually super easy. The posts_clauses provides a super handy way to glue things on it. The $pieces array (its first argument) looks something like this:

[where] => [...]
[groupby] => 
[join] => 
[orderby] => wp_posts.post_date ASC
[distinct] => 
[fields] => wp_posts.*
[limits] => LIMIT 0, 20

All I’d need to do is put the necessary things into this. Internally WordPress uses the result of the filter in this way to construct the query:

$this->request = $old_request = "SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits";

Oh well, I guess it’s just a copy-paste from Sequel Pro then. :) The entire filter ended up being this (with some of the error checking et al removed:

function subs_posts_clauses( $pieces, $query ) {  
    global $wpdb;

    if ( ! is_admin() || ! isset( $query->query['post_type'] ) || 'shop_subscription' !== $query->query['post_type'] ) {
        return $pieces;
    }

    $pieces['fields'] .= ', COALESCE(lp.last_payment, o.post_date_gmt, 0) as last_payment';

    $pieces['join'] .= "LEFT JOIN
            (SELECT
                MAX( p.post_date_gmt ) as last_payment,
                pm.meta_value
            FROM {$wpdb->postmeta} pm
            LEFT JOIN {$wpdb->posts} p ON p.ID = pm.post_id
            WHERE pm.meta_key = '_subscription_renewal'
            GROUP BY pm.meta_value) lp
        ON wp_posts.ID = lp.meta_value
        LEFT JOIN {$wpdb->posts} o on {$wpdb->posts}.post_parent = o.ID";

    $order = strtoupper( $query->query['order'] );

    $pieces['orderby'] = "CAST(last_payment AS DATETIME) {$order}";

    return $pieces;
}

And we’re done with this one. Except not really!

MORE Renewal orders!

I tried this solution on a later snapshot of the same database as above. The original query, the one that ran in about 41 seconds, didn’t finish in 25 minutes, so I had to abort it. I still don't know how fast it would have been.

I ran the second, “optimised” query on the same database: 7 minutes. If the 15x ratio holds true, the first one would have finished in just under 2 hours. Not ideal...

The problem was that we were joining tables to the result of a select:

LEFT JOIN  
  (SELECT
    MAX( p.post_date_gmt ) as last_payment,
    pm.meta_value
  FROM
    wp_postmeta pm
  LEFT JOIN
    wp_posts p
  ON
    p.ID = pm.post_id
  WHERE
    pm.meta_key = '_subscription_renewal'
  GROUP BY pm.meta_value ) lp
ON p.ID = lp.meta_value  

Now joins are fast! Provided they have indexes. Note that the above does not have indexes. The reason my first iteration was so much faster was because the above only returned ~750 rows. For every row (all 24,977), checking every 750 rows takes time, but not that long... hence the around 4 seconds.

If however there are 14k results in the above, we get to a point where the database needs to scan all 14k rows in that result set for every row in the outer query. That's 24,977 * 14,000. About the population of the United States (350 million). That’s going to take a while.

What’s the solution here?

On one hand, we can go back to the drawing board and rethink where the data is stored. Most other schedules (end date, start date, end of trial, etc...) are stored on the post meta table, which is indexed and keyed, so we’re good, but because this one depends on a lot of things, it’s not easy. We can do it, because when we create a new renewal order we can just update a meta on a subscription, but that won’t help existing sites that have a large dataset with the information missing. Either we need to populate the post meta once by providing yet another upgrade script, which we’d like to avoid, or we need to rethink the query.

Query, rethought

So the problem is that we have a fairly straightforward result set consisting of a bunch of INTs and DATETIMEs, but it’s not indexed. What if we could just CREATE TEMPORARY TABLE, index that, and run the query on the now indexed result set?

A bit of hackery was needed, but the new, shiny query for the same data on the same dataset looks like this:

DROP TEMPORARY TABLE IF EXISTS wp_tmp_lastpayment;

CREATE TEMPORARY TABLE wp_tmp_lastpayment (id INT, INDEX USING BTREE (id), last_payment DATETIME) AS SELECT pm.meta_value as id, MAX( p.post_date_gmt ) as last_payment FROM wp_postmeta pm LEFT JOIN wp_posts p ON p.ID = pm.post_id WHERE pm.meta_key = '_subscription_renewal' GROUP BY pm.meta_value;

SELECT SQL_CALC_FOUND_ROWS  
  wp_posts.*,
  COALESCE(lp.last_payment, o.post_date_gmt, 0) as last_payment
FROM wp_posts  
LEFT JOIN wp_tmp_lastpayment lp  
ON wp_posts.ID = lp.id  
LEFT JOIN wp_posts o on wp_posts.post_parent = o.ID  
WHERE 1=1  
AND wp_posts.post_type = 'shop_subscription'  
AND ((  
  wp_posts.post_status = 'wc-active' OR
  wp_posts.post_status = 'wc-pending' OR
  wp_posts.post_status = 'wc-cancelled' OR
  wp_posts.post_status = 'wc-on-hold' OR
  wp_posts.post_status = 'wc-pending-cancel' OR
  wp_posts.post_status = 'wc-expired' OR
  wp_posts.post_status = 'wc-switched'))
ORDER BY CAST(last_payment AS DATETIME) ASC;  

It’s a set of 3 queries. Guess how long all of them run? You don’t have to: somewhere between 700ms and 950ms. That’s from 7 minutes. That’s from 2 hours.

That’s a 420x improvement on the optimised query, which is a 15x improvement on the original one. That comes to a total of 6,300x, or 630,000% improvement. Not too bad.

The resulting function to filter the posts_clauses array becomes:

public function posts_clauses( $pieces, $query ) {  
    global $wpdb;

    if ( ! is_admin() || ! isset( $query->query['post_type'] ) || 'shop_subscription' !== $query->query['post_type'] ) {
        return $pieces;
    }

    // Let's create a temporary table, drop the previous one, because otherwise this query is hella slow
    $wpdb->query( "DROP TEMPORARY TABLE IF EXISTS {$wpdb->prefix}tmp_lastpayment" );

    $wpdb->query( "CREATE TEMPORARY TABLE {$wpdb->prefix}tmp_lastpayment (id INT, INDEX USING BTREE (id), last_payment DATETIME) AS SELECT pm.meta_value as id, MAX( p.post_date_gmt ) as last_payment FROM {$wpdb->postmeta} pm LEFT JOIN {$wpdb->posts} p ON p.ID = pm.post_id WHERE pm.meta_key = '_subscription_renewal' GROUP BY pm.meta_value" );
    // Magic ends here

    $pieces['fields'] .= ', COALESCE(lp.last_payment, o.post_date_gmt, 0) as last_payment';

    $pieces['join'] .= "LEFT JOIN {$wpdb->prefix}tmp_lastpayment lp
        ON {$wpdb->posts}.ID = lp.id
        LEFT JOIN {$wpdb->posts} o on {$wpdb->posts}.post_parent = o.ID";

    $order = strtoupper( $query->query['order'] );

    $pieces['orderby'] = "CAST(last_payment AS DATETIME) {$order}";

    return $pieces;
}

Caveats

Of course the story would be too awesome if it didn’t have “but...”s. The downside is that the database user needs to have the following privileges, otherwise MySQL will just return an error:

  • CREATE TEMPORARY TABLE
  • DROP TABLE
  • INDEX

This can have security implications however, and we’re very much at the mercy of the hosting companies with this one3. To solve that we can either disable this way of sorting, or disable sorting altogether on large sites.

And now we’re really done :).

Comments, questions are welcome.

  1. Technically if a process runs on two consecutive process checks that are 60 seconds apart. Theoretically if the process starts 1 second after a check, and stops 59 seconds after the next check, it can run for 118 seconds without being terminated.

  2. See this article about the differences of JOIN, LEFT JOIN, RIGHT JOIN, OUTER JOIN: http://www.sitepoint.com/understanding-sql-joins-mysql-database/

  3. See http://codex.wordpress.org/HardeningWordPress#DatabaseSecurity