Advanced Search with CirrusSearch Extension in MediaWiki

What is CirrusSearch?

If you’ve ever tried to hunt for a specific phrase on a wiki with a gazillion pages, you know the built‑in search can feel like looking for a needle in a haystack. CirrusSearch changes that story. It hooks MediaWiki up to Elasticsearch (soon OpenSearch) and turns the search engine into something that actually understands relevance, proximity, and fuzzy matching.

Why bother with “Advanced Search”?

There’s a AdvancedSearch extension that sits on top of Special:Search. It adds a form where you can pick namespaces, set a date range, or toggle case‑sensitivity. Alone it’s handy, but when you pair it with CirrusSearch you unlock a whole suite of hidden parameters that ordinary users never see. Think of it as the difference between a basic screwdriver and a multi‑bit power driver – both turn screws, but the latter does it faster, cleaner, and with fewer mistakes.

Getting CirrusSearch up and running

First things first: you need a working MediaWiki installation. Then:


// In Composer
composer require mediawiki/cirrussearch

// In LocalSettings.php
wfLoadExtension( 'CirrusSearch' );
$wgSearchType = 'CirrusSearch';

// Minimal Elasticsearch config
$wgCirrusSearchServers = [ [ 'host' => 'localhost', 'port' => 9200 ] ];

That’s it. In practice you’ll want to tweak a few more settings – class‑name prefixes, index names, perhaps a connection timeout – but the snippet above gets the engine talking.

Installing the AdvancedSearch front‑end

Grab the extension, slap it into extensions/AdvancedSearch, and add a single line to LocalSettings.php:

wfLoadExtension( 'AdvancedSearch' );

Now Special:Search shows a collapsible “Advanced options” panel. That panel merely passes query arguments to the back‑end; CirrusSearch reads them, interprets them, and does the heavy lifting.

Key parameters you’ll see

  • ns: one or more namespace IDs, comma‑separated.
  • profile: a search profile such as default, strict, or autocomplete.
  • prefix: restrict matches to terms that start with the given string.
  • regex: a regular expression filter (dangerous, use with care).

Beyond the UI – raw CirrusSearch query syntax

Whenever you submit a search, CirrusSearch translates the URL into an Elasticsearch query DSL. If you’re comfortable with JSON, you can craft your own queries and feed them through the cirrussearch-query API endpoint. For example, to find pages that contain the exact phrase “climate change” but not the word “denial”, you could POST the following:

{
  "query": {
    "bool": {
      "must": [
        { "match_phrase": { "text": "climate change" } }
      ],
      "must_not": [
        { "match": { "text": "denial" } }
      ]
    }
  },
  "highlight": {
    "fields": { "text": {} }
  }
}

It’s a mouthful, but the power is undeniable. You can filter by page creation date, boost certain domains, or even limit results to a specific language.

Practical examples you can copy‑paste

1. Find all pages in the “Help” namespace that were edited after 2022‑01‑01

$params = [
    'search' => '',
    'ns' => 12, // Help namespace
    'cirrusSearchBoostTemplates' => false,
    'profile' => 'strict',
    'date' => '20220101..' // open‑ended range
];
$api = new \MediaWiki\Api\ApiMain( new \FauxRequest( $params ) );

2. Use fuzzy matching for a misspelled name

Append ~2 after the term to allow two edits (insert, delete, substitute). The UI doesn’t expose this directly, but you can add it to the search field yourself:

Jon~2

Will pull up “John”, “Jonas”, “Joon” – whatever is within two Levenshtein steps.

3. Exclude all talk pages from results

Set the ns parameter to everything except talk (namespace 1). In URL form it looks like:

https://wiki.example.com/w/index.php?search=foo&ns=0%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15%2C100%2C101

Performance considerations

CirrusSearch is fast, but it’s only as good as the underlying Elasticsearch cluster. A few points to remember:

  • Shard sizing: Too many shards for a small index cause unnecessary overhead. Start with a single shard per node and monitor.
  • Refresh interval: The default 1‑second refresh can be aggressive for a heavily edited wiki. Raising it to 5s can reduce index write load.
  • Memory: Elasticsearch likes RAM. Allocate at least half the machine’s memory to the JVM heap, but don’t exceed 30 GB (the “compressed oops” limit).

And a friendly reminder: after massive imports or a bulk edit spree, run php maintenance/rebuildCirrusSearchIndex.php to catch up.

Common pitfalls and how to avoid them

1. “My searches are returning nothing!” – Often this means the index is out of sync. Check the cirrussearch-index-status page or run the rebuild script.

2. “Fuzzy search is too permissive.” – The ~ operator defaults to a fuzziness of 2, but you can tighten it by appending ~1 or using the fuzzy_max_expansions parameter in the JSON DSL.

3. “The advanced form shows namespaces I don’t want.” – Tweak $wgAdvancedSearchNamespaces in LocalSettings.php to restrict the list.

Future directions – OpenSearch migration

MediaWiki’s developers have announced a shift from Elasticsearch to OpenSearch. From a user’s perspective, nothing dramatic changes – the API stays the same, the UI stays the same. Under the hood you’ll get a more community‑driven backend, regular security patches, and better compatibility with AWS‑hosted services. Keep an eye on the extension page for migration guides.

Wrapping up

Advanced search in MediaWiki isn’t a luxury; it’s a necessity when you’ve got a knowledge base that rivals an encyclopedia. By pairing the AdvancedSearch UI with the raw power of CirrusSearch, you give editors and readers alike a tool that feels both familiar and surprisingly precise. Install the extensions, tweak a few settings, and you’ll notice the difference before the first search even finishes loading.

Remember: the real magic lies not in the fancy UI but in the underlying query DSL. If you’re comfortable with JSON, go ahead and experiment – the surface is slick, but the engine is a beast you can tame. And when it finally runs smoothly, you’ll wonder how you ever lived without it.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe