How to Configure MediaWiki to Use Elasticsearch for Faster Search

How to Configure MediaWiki to Use Elasticsearch for Faster Search

MediaWiki ships with a simple, database‑based search engine that works well for small wikis. As a wiki grows – in page count, traffic, or content complexity – the native search can become a bottleneck. Elasticsearch, a Lucene‑based distributed search engine, offers near‑real‑time full‑text indexing, powerful relevance scoring, and horizontal scalability. By wiring MediaWiki to Elasticsearch you can dramatically speed up search queries and unlock advanced features such as phrase suggestions, fuzzy matching, and faceted search.

Why Elasticsearch?

  • Speed: Queries run against an inverted index stored in RAM/SSD, bypassing heavy MySQL joins.
  • Scalability: Add nodes to the cluster without changing MediaWiki code.
  • Rich query language: Supports boolean logic, regex, proximity, and custom analyzers.
  • Extensibility: Works with MediaWiki extensions like CirrusSearch and Semantic MediaWiki to provide structured search.

Prerequisites

  1. MediaWiki version: The CirrusSearch extension supports MediaWiki 1.39+ with Elasticsearch 7.10.2 (or 6.8.23+ using a compatibility layer). MediaWiki 1.44+ also works with OpenSearch 1.3, but Elasticsearch remains the most widely documented.
  2. PHP: PHP 7.4+ with the cURL extension compiled. The Elastica library (used by CirrusSearch) requires PHP 5.4+ but recent MediaWiki releases expect PHP 7.4+.
  3. Java: Elasticsearch runs on the JVM; install OpenJDK 11 (or later) on the host.
  4. System resources: Allocate at least 2 GB of heap (Xms and Xmx equal) for a modest wiki; larger installations may need 8 GB+ per node.
  5. Network: Elasticsearch must be reachable from the web server. If it runs on a separate host, use a firewall rule that permits only the MediaWiki server to connect on port 9200 (HTTP) and 9300 (transport).

Step 1 – Install Elasticsearch

Download the official Elasticsearch package that matches the required version (7.10.2 for MediaWiki 1.39+). The simplest method on Ubuntu/Debian is:

# Install Java (OpenJDK 11)
sudo apt-get update
sudo apt-get install -y openjdk-11-jdk
# Download and install the .deb package
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb
sudo dpkg -i elasticsearch-7.10.2-amd64.deb
# Enable and start the service
sudo systemctl enable --now elasticsearch.service
# Verify the node is responding
curl -s http://127.0.0.1:9200 | jq .

If you prefer containers, the official Docker image works as well:

docker run -d --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  elasticsearch:7.10.2

Make sure the node is reachable from the MediaWiki host. For production clusters, configure discovery.type=single-node only for testing; a multi‑node cluster requires a proper elasticsearch.yml with unicast hosts and security settings.

Step 2 – Install MediaWiki Extensions

Two extensions are required:

  1. Elastica – a PHP client library that abstracts the HTTP API of Elasticsearch.
  2. CirrusSearch – the MediaWiki search backend that translates MediaWiki search syntax into Elasticsearch queries.

Both extensions are hosted in MediaWiki’s Gerrit repository. The recommended installation method is via git so you can keep the code in sync with future updates.

cd /path/to/your/mediawiki/extensions
# Clone Elastica
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica
# Clone CirrusSearch
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch
# Install PHP dependencies via Composer (requires composer installed)
cd Elastica && composer install --no-dev && cd ..
cd CirrusSearch && composer install --no-dev && cd ..

After the files are in place, enable the extensions in LocalSettings.php:

wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
// Optional: disable live search updates while we bootstrap the index
$wgDisableSearchUpdate = true;

Step 3 – Configure the Connection

Tell MediaWiki where the Elasticsearch cluster lives. By default CirrusSearch assumes a local node on localhost:9200. If your node runs elsewhere, set $wgCirrusSearchServers:

$wgCirrusSearchServers = [
    [ 'host' => 'es1.example.com', 'port' => 9200, 'scheme' => 'http' ],
    // Add more hosts for a cluster
    [ 'host' => 'es2.example.com', 'port' => 9200, 'scheme' => 'http' ],
];
// Optional: give the index a stable base name (useful if your MySQL DB name contains capitals)
$wgCirrusSearchIndexBaseName = 'mywiki';
// Choose the search type for MediaWiki core
$wgSearchType = 'CirrusSearch';

For clusters that require authentication (e.g., managed OpenSearch services), also provide credentials:

$wgCirrusSearchServers = [
    [
        'host' => 'search.example.com',
        'port' => 443,
        'scheme' => 'https',
        'user' => 'elastic_user',
        'pass' => 'very_secret_password',
    ],
];

Step 4 – Bootstrap the Search Index

With the extensions installed and the connection configured, you must build the Elasticsearch index from the existing wiki content. This is a two‑phase operation: first create the index mapping, then populate it with page text and link data.

  1. Generate the index configuration:
php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver

This script creates the index (named {dbname}_general by default) and stores the mapping that defines fields such as title, text, heading, and file_text.

  1. Populate the index: The ForceSearchIndex.php script parses each wiki page and sends the resulting document to Elasticsearch. For a fresh wiki you can run it in two passes – one that indexes page content without link counts, and a second that adds link information.
# First pass – content only (skip link counting)
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php \
    --skipLinks --indexOnSkip
# Second pass – link data (skip parsing, use already‑rendered text)
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php \
    --skipParse

Both commands may take a while on large wikis. You can speed up the process by adding --queue and --maxJobs flags; this splits the work into background jobs handled by MediaWiki’s job queue. A typical production configuration uses Redis as the job queue backend to avoid “unserialize()” errors that can appear with the default DB‑based queue.

Step 5 – Enable Live Updates

After the initial index is built, re‑enable automatic updates so that future page edits are reflected in Elasticsearch:

// Remove the temporary disabling flag
unset( $wgDisableSearchUpdate );

Or simply delete the line from LocalSettings.php. MediaWiki will now push changes to Elasticsearch via the job queue. Verify that new edits appear in search results within a few seconds.

Step 6 – Optional Tuning & Advanced Features

The default configuration works well for most wikis, but you may want to fine‑tune performance or enable extra search capabilities.

Memory & JVM Settings

Edit /etc/elasticsearch/jvm.options and set identical heap values:

-Xms4g
-Xmx4g

Restart Elasticsearch afterwards:

sudo systemctl restart elasticsearch

Regex and Deepcat Queries

CirrusSearch can run regular‑expression searches and deep category queries if the search‑extra plugin is installed on the Elasticsearch node:

/usr/share/elasticsearch/bin/elasticsearch-plugin install \
    org.wikimedia.search:extra:7.10.2-wmf12

Then enable it in MediaWiki:

$wgCirrusSearchWikimediaExtraPlugin['regex'] = [
    'build', 'use', 'max_inspect' => 10000,
];

Weighting and Boosting

The $wgCirrusSearchWeights array lets you prioritize certain fields. For example, to give page titles a higher relevance:

$wgCirrusSearchWeights = [
    'title' => 20,
    'heading' => 5,
    'text' => 5,
    // If you have the PdfHandler extension, boost the extracted file text
    'file_text' => 25,
];

Namespace Weighting

If you want the search to favor content namespaces (e.g., articles) over talk pages, use $wgCirrusSearchNamespaceWeights:

$wgCirrusSearchNamespaceWeights = [
    NS_MAIN => 1.0,
    NS_TALK => 0.1,
    NS_CATEGORY => 0.3,
];

Pool Counter for Concurrency Control

On high‑traffic wikis, limit the number of simultaneous Elasticsearch queries to protect the cluster:

wfLoadExtension( 'PoolCounter' );
$wgPoolCounterConf['CirrusSearch-Search'] = [
    'class' => 'MediaWiki\PoolCounter\PoolCounterClient',
    'workers' => 25,
    'maxqueue' => 50,
];
$wgPoolCounterConf['CirrusSearch-ExpensiveFullText'] = [
    'class' => 'MediaWiki\PoolCounter\PoolCounterClient',
    'workers' => 10,
    'maxqueue' => 10,
];

Monitoring and Health Checks

Elasticsearch exposes a /_cluster/health endpoint. A simple cron job can restart the service if it becomes unhealthy:

#!/bin/bash
if ! curl -s http://127.0.0.1:9200/_cluster/health | grep -q '"status":"green"'; then
    systemctl restart elasticsearch
    echo "$(date) – Elasticsearch restarted" >> /var/log/elasticwatch.log
fi

Step 7 – Verifying the Setup

After everything is configured, perform a few sanity checks:

  1. Search a known phrase via Special:Search – results should appear instantly.
  2. Inspect the Elasticsearch index directly:
curl -s http://127.0.0.1:9200/_cat/indices?v

You should see an index named {dbname}_general with a non‑zero document count.

  1. Check the MediaWiki debug log (if $wgDebugLogGroups['CirrusSearch'] is set) for any connection errors.

Step 8 – Upgrading and Re‑indexing

When you upgrade MediaWiki or CirrusSearch, the index mapping may change. The extension provides a clear upgrade path:

  • Configuration‑only changes: Run UpdateSearchIndexConfig.php without --startOver – it updates the mapping in place.
  • Mapping changes that require a full rebuild: Use --startOver to drop the old index and recreate it, then repopulate with ForceSearchIndex.php.

For very large wikis you can rebuild the index on a separate node and then switch the alias to the new index, minimizing downtime. The $wgCirrusSearchIndexBaseName and $wgCirrusSearchIndexIdentifier settings control the alias name.

Conclusion

Integrating Elasticsearch via the Elastica and CirrusSearch extensions transforms MediaWiki’s search from a slow, database‑bound operation into a lightning‑fast, feature‑rich experience. The steps outlined above—installing Elasticsearch, adding the two extensions, configuring the connection, building the index, and enabling live updates—are sufficient to get most wikis up and running. From there, you can fine‑tune heap sizes, enable regex queries, adjust field weighting, and add concurrency controls to match your traffic profile. With a properly sized Elasticsearch cluster, search latency drops from seconds to milliseconds, and users enjoy more relevant results, auto‑completion, and “did you mean” suggestions that keep large wikis discoverable.

Happy indexing!

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe