What happens on wp_insert_post()?

This question came up in the Advanced WordPress Facebook group in the form of whether we should be using direct database inserts when converting non-WP data to WP data, or use the APIs.

On one hand, using the WP API ensures that data will be sanitised and checked throughout the process. On the other it might mean that the process is going to be long running because there are a lot of database queries involved, and if the initial import is a failry large chunk of stuff, the servers might just choke.

I was curious to know what exactly happens on wp_insert_post().

Here's the entire code for reference: WP 3.9.2, wp-includes/post.php, line 2909, and here's the WordPress Codex page about wp_insert_post.

Assumptions

We are creating a new post. The data we have available:

array(
    'post_title' => 'Some Title',
    'post_content' => 'Some Content'
);

With that, let's hurl that data at wp_insert_post. I'll count the database queries we'll encounter.

The process

Let's get the current user by calling get_current_user_id. That calls wp_get_current_user. That calls get_currentuserinfo() (which populates the $current_user global, and makes sure it's an instance of WP_User.). It might call wp_set_current_user, which calls a new WP_User, which might fire off a number of database calls:

  • (1) $wpdb->get_blog_prefix in WP_User::init to get the capability keys
  • (2) then $wpdb->get_user_meta to get the capabilities by the key in WP_User::_init_caps, which then calls WP_User::get_role_caps, which calls new WP_Roles
  • (3) $wpdb->get_blog_prefix in WP_Roles::_init to get the role key
  • (4) get_option with the role key, which calls a $wpdb->get_row (unless it's in the cache) after calling wp_load_alloptions
  • (5) wp_load_alloptions fires off a request at the database that gets all the options that are to be autoloaded, unless they're in the cache (which is an instance of WP_Object_Cache, a global array with stuff in them).

At this point we have our current user. wp_insert_post is on line 3.

It populates the defaults, parses our data, sanitizes the data, and extracts stuff, all PHP. We're now on line 22 locally, or line 2932 in the file.

We're creating a new post, so $ID is empty (because our initial array did not have a key called 'ID', thus PHP extract did not convert that to $ID), therefore we're jumping to line 2947.

Line 2969: if we didn't give a title, content and excerpt, the script will stop here with an error that we're trying to insert an empty post. We don't, so carry on.

It's mostly setting defaults the next few lines, until line 2989.

  • (6) get_option('default_category') on line 2989. It may or may not get things from the database depending on whether the thing we're looking for is already in the cache. The first time it does it it won't be.

3011 will call get_post_field, but since we don't have an existing post yet, it's going to jump a few functions deep and return false. This is all PHP. Does a lot of sanitizing, accent removing, etc.

The next bunch of lines are all to do with datetimes, all PHP.

  • (7) 3062 has another get_option, which may or may not be cached.
  • (8) 3065 has another get_option, which may or may not be cached.

3100 returns the $post_name, which is already sanitized at this point (because we're creating a new post, did not specify the new post_status, therefore the status is draft, therefore it doesn't matter at this point).

3117 checks whether we're updating (we're not), so let's skip to 3134.

And finally, 3143 inserts the post. We're 234 lines deep into the function.

  • (9) $wpdb->insert finally happens with all our sanitized, default data.
  • (10) after we've inserted the post, we're going to do an update on the record just inserted, and make sure that post_name is set too. Our original data array did not have post_name, therefore it has to be created from post_title.
  • (11) if our post type has category as a taxonomy, line 3161 calls wp_set_post_categories. That calls get_post_type, which calls get_post, which will either return the global $post, or fire off WP_Post:get_instance, which is a database read.
  • (12) wp_set_post_categories will also call get_post_status, which also calls get_post.
  • (13) there's also a get_option call, which may or may not be returned from cache.
  • (14) wp_set_post_terms is called, which calls wp_set_object_terms, which fires a SELECT to ensure data parity.
  • (15) there's a $wpdb->insert to the term relationships table between the default category and the post
  • (16) there are a number of database queries relating to term counting. _update_post_term_count is one, and for each term it will fire off two SELECT queries. This is to ensure that when you are looking at the taxonomy pages, and it says Uncategorized has 14 posts, that 14 is accurate.
  • (17) there's an INSERT INTO near the end of wp_set_object_terms that deals with term orders, which may or may not fire.
  • 12-17 are repeated for tags. Since there weren't any, this is skipped. Ends up calling wp_set_post_terms in the end, so process is identical.

Then we get the guid, which is empty, we're not updating, therefore:

  • (18) $wpdb->update on line 3177 to update the guid (which is the permalink usually)
  • (19) after deleting cache, we're doing a get_post again. It may or may not use the cache. Probably not as we've just cleared it, therefore a full SELECT.
  • (20) the function _transition_post_status is hooked into the transition_post_status hook, which is fired by calling the wp_transition_post_status function on line 3199. It's a simple $wpdb->update on the row that we've just inserted.

At this point it fires off a bunch of other hooks, but nothing interesting happens.

If there was no cache, a simple wp_insert_post would have 20 database queries. With cache, we can probably speculate it's actually down to about 5 depending on what had happened before we called the function.

Updating instead of inserting a new one, and supplying more data might change the number of queries as well in both directions.

The downside is that it can be quite taxing on the server. The upside is that data will make sense, and WordPress will not let you add bad, inconsistent data.

Note: get_post will return stuff from the cache, if it's available.