Improving WordPress core search

WordPress’s core search capability is very simple and very primitive. That can be frustrating. We’re spoiled by Google, Bing, and the other search engines which have had decades to get really good. What makes them good is a lot of information beyond the content of any given site, and thousands of programmer labor years making them good.

The core WP_Query search gets used (on a site without any search plugin stuff) when you say https://example.com/?s=searchterm . The search box on a typical site invokes that. WP_Query turns that into SQL stuff like

WHERE wp_posts.post_content LIKE '%searchterm%'
   OR wp_posts.post_title   LIKE '%searchterm%'

That is, no other way to put it, as stupid as it is slow as a way to search for meaningful text in a CMS. But, it’s cheap, easy to test, and doesn’t require stuff like big controlled vocabularies of synonyms, spelling corrections, and other search stuff that requires lots of contextual knowledge beyond the data in the CMS. It’s self-contained. That is good for most of us WordPress site owners.

Core search does have one really good feature, especially if you use languages other than plain old ASCII English. That otherwise slow LIKE '%searchterm%' stuff in MariaDB / MySQL applies the default database collation. Recent DBMS versions have case- and accent- insensitive collations, so you get that for free in core search if your site uses such a collation.

One could use MariaDB / MySQL FULLTEXT search. But it has its own problems, like difficult-to-control stopword lists and so forth. It will fail to find stuff that core search does find.

External search systems like ElasticSearch or Algolia get us search systems with some of that synonym and spelling correction feature set. But they have added server ops complexity and cost.

Mikko Saari’s Relevanssi plugin builds tables of words from your posts and pages. He builds both a word table and a reverse-word table, so it can find, for example, the word Relevanssi if you search either for Relev or nssi. It’s pretty good. It’s more or less perfect for a WooCommerce store I run for selling traditional folk music. And it doesn’t depend on external data.

I put together a plugin for helping to search for WooCommerce orders. WooCommerce’s built in search uses LIKE '%searchterm%' and so is astonishingly slow on sites with many orders. (WooCommerce order search doesn’t use the core WP_Query code paths.) My order search shamelessly copies a feature called Trigram indexing built in to PostgreSQL. LIKE-based searches and trigram searches both have the advantage that they find oddball search terms like, I dunno, postcodes or fragments of PayPal transaction ids, so they’re totally appropriate for searching for orders.

I’m fooling around with a plugin, still unfinished, using Trigram search to replace WordPress’s core search. I still have no idea whether it will be better than Relevanssi. And, releasing a plugin to the repo is a big enough hassle that I won’t do it unless it is at least as good as Mikko Saari’s work.

Leave a Comment