Enhancing Search in MediaWiki Using the CirrusSearch Extension

Why CirrusSearch matters for a MediaWiki site

If you’ve ever typed a phrase into Special:Search and gotten a wall of unrelated hits, you know the frustration. The classic MediaWiki search is fast enough for a handful of pages, but it falls flat once your wiki grows beyond a few thousand articles. That’s where CirrusSearch steps in – it swaps the built‑in Lucene–like engine for a full‑blown Elasticsearch (or OpenSearch) backend, turning “search‑i‑guess‑it‑might‑be‑here” into “boom, here’s the exact page you need.”

Getting the basics right: install and enable

First things first, you need the extension files. Grab the latest release from the MediaWiki extension page, drop it into your extensions/ folder, then add the usual boilerplate to LocalSettings.php. Nothing fancy, but watch out for those tiny typos – they’ll bite you later.

// Load the extension
wfLoadExtension( 'CirrusSearch' );

// Tell MediaWiki to use Elasticsearch/OpenSearch
$wgCirrusSearchServers = [ 'http://localhost:9200' ]; // adjust host/port as needed
$wgSearchType = 'CirrusSearch';

Don’t forget to run the update script afterwards:
php maintenance/update.php

After that, MediaWiki will automatically create an index named mediawiki (or whatever you set in $wgCirrusSearchIndexBaseName) and start feeding it every page edit. The first full re‑index can take a while – think coffee‑break length for a modest wiki, half‑day for a giant one.

Peeking under the hood: what Cirrus actually does

  • Full‑text search – every word tokenized, lower‑cased, stemmed (so “running” matches “run”).
  • Prefix and wildcard support – type “cat*” and get “cat”, “caterpillar”, “catastrophe”.
  • Namespace filters – limit results to Talk:, Template:, or your custom namespace without extra code.
  • Boosting and weighting – titles, headings, and infobox fields can be given extra oomph.
  • Geo‑search – if you store lat/long in page properties, you can search “within 10 km of Berlin”.
  • Suggestions & “Did you mean?” – fuzzy matching that feels almost magical.

All that power comes from Elasticsearch’s inverted index. In plain English, each word points back to the pages that contain it, so a lookup is basically a dictionary hit – lightning fast.

Configuring the index: a few knobs worth turning

Out of the box, CirrusSearch works, but you’ll probably want to tweak a few settings for your own traffic pattern. Below are the most common options, each with a short rationale.

// Raise the number of shards if you expect a ton of pages (e.g., > 200k)
$wgCirrusSearchShardCount = 5;

// Reduce memory pressure on the ES node – useful on low‑end VPS
$wgCirrusSearchMaxConcurrentSearches = 30;

// Enable “search-as-you-type” suggestions (the tiny dropdown)
$wgCirrusSearchSuggestMaxResults = 7;

// Turn on “more like this” – helpful for related‑article widgets
$wgCirrusSearchFeatureMoreLikeThis = true;

These flags can be sprinkled anywhere in LocalSettings.php. Just remember to restart the web server and, if you changed shard count, reindex from scratch – you can’t change shard numbers on the fly.

Tuning relevance: it’s not “set‑and‑forget”

Relevance ranking is the heart‑beat of any search UI. CirrusSearch offers a boost syntax you can embed in searchProfiles to promote certain content. For example, you might want pages with the [[Category:Important]] tag to float to the top.

$wgCirrusSearchSearchProfile = [
    'default' => [
        'boost' => [
            // Give a 2× boost to articles in the “Important” category
            'category:Important' => 2,
            // Titles are already high‑ranked, but we can nudge them a bit more
            'title' => 1.5,
        ],
    ],
];

After saving the change, flush the cache (php maintenance/run.php flushCache) and watch the order shift. It’s a bit of trial‑and‑error; I usually start with a small boost (1.2‑1.5) and adjust after looking at real user queries.

Another subtle lever is cirrussearch-boost-templates – you can give a gentle push to pages that include a particular template. That’s handy for product documentation where a “{{FeatureBox}}” template signals a key article.

Handling accents and language quirks

MediaWiki used to treat “café” and “cafe” as different words. CirrusSearch, by default, normalizes them, which is good news for multilingual sites. If you need to keep them separate for a reason (maybe you’re cataloguing coffees), you can flip the $wgCirrusSearchUseNormalizedForms flag.

$wgCirrusSearchUseNormalizedForms = false; // keep accents distinct

Most of the time you’ll leave it on – the search feels more natural, especially when users type on mobile keyboards that drop accents automatically.

Common pitfalls and how to dodge them

Even a seasoned sysadmin can stumble over a few gotchas, so here’s a quick “watch‑out” list.

  • Memory pressure – Elasticsearch can be a memory hog. The default JVM heap is 1 GB, but for a busy wiki you’ll want to bump that to 2‑4 GB (set ES_JAVA_OPTS="-Xms2g -Xmx2g" in elasticsearch.yml). Too low and you’ll see “temporary problem” errors like the one on the Elastic forum.
  • Cluster health – If you’re running a single‑node cluster, make sure it’s healthy. Run curl localhost:9200/_cluster/health?pretty and look for "status":"green". Anything else means something’s off.
  • Re‑indexing after schema changes – Adding a new field to $wgCirrusSearchExtraFields won’t magically appear in the index. You have to run php extensions/CirrusSearch/maintenance/ForceSearchIndex.php again.
  • Search after a massive purge – Deleting a lot of pages can leave “ghost” hits for a few minutes. Run the DeleteDocument job or simply let the background jobs catch up.
  • Version mismatch – CirrusSearch 7.x expects Elasticsearch 7.x (or the OpenSearch fork). Mixing major versions leads to cryptic errors about “unsupported mapping type”.

My own experience taught me to keep an eye on the ES logs – they’ll whisper hints before the UI throws a “We could not complete your search” banner.

Advanced tricks – a quick sampler

Below are a couple of snippets that seasoned wikis love, just to give you a taste of what’s possible.

1. “Search within a date range”

Assuming you store a date property on pages (via [[Property:Created::2024-09-01]]), you can surface only recent updates:

$wgCirrusSearchSearchProfile = [
    'recent' => [
        'query' => [
            'bool' => [
                'filter' => [
                    [ 'range' => [ 'page_created' => [ 'gte' => 'now-30d/d' ] ] ],
                ],
            ],
        ],
    ],
];

Then call Special:Search?profile=recent&search=foo and you’ll get “foo” articles from the last month only.

2. “Boost based on page view stats”

If you have a pageview table (for example via the PageViewInfo extension), you can feed those numbers into a custom scoring script. It’s a bit more involved, but the gist looks like this:

{
  "script_score": {
    "script": {
      "source": "doc['pageviews'].value * params.boost_factor",
      "params": { "boost_factor": 0.001 }
    }
  }
}

You’d plug that into the cirrussearch-custom-score hook. The result: hot articles get a subtle nudge to the top, making the search feel “in‑the‑now”.

Wrapping up

CirrusSearch isn’t a silver bullet, but it’s the closest thing to a “Google‑for‑your‑wiki” that the MediaWiki ecosystem currently offers. By swapping the native search for Elasticsearch, you gain speed, relevance tuning, and fancy features like geo‑search that would otherwise require a whole separate stack.

To recap the essentials:

  1. Install the extension, point $wgCirrusSearchServers at a healthy ES/OpenSearch node.
  2. Run the initial re‑index, then monitor _cluster/health and heap usage.
  3. Adjust searchProfiles and boost rules to match your site’s content strategy.
  4. Keep an eye on memory and version compatibility – they’re the usual suspects behind “temporary problem” messages.
  5. Experiment with advanced queries (date ranges, custom scripts) once the basics feel solid.

In practice, you’ll find that a well‑tuned CirrusSearch feels almost invisible: users type a phrase and are handed the exact page they were hunting for, sometimes even before they finish the word. That’s the sort of smooth experience that keeps contributors coming back and makes a wiki feel less like a static archive and more like a living knowledge base.

So, whether you’re shepherding a small hobbyist wiki or a massive corporate documentation portal, give CirrusSearch the space it deserves in your architecture. The payoff – faster, smarter search – is usually worth the few extra minutes of setup and the occasional tweak down the line.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe