How to Configure MediaWiki to Use Elasticsearch for Faster Search
How to Configure MediaWiki to Use Elasticsearch for Faster Search
MediaWiki ships with a simple, database‑based search engine that works well for small wikis. As a wiki grows – in page count, traffic, or content complexity – the native search can become a bottleneck. Elasticsearch, a Lucene‑based distributed search engine, offers near‑real‑time full‑text indexing, powerful relevance scoring, and horizontal scalability. By wiring MediaWiki to Elasticsearch you can dramatically speed up search queries and unlock advanced features such as phrase suggestions, fuzzy matching, and faceted search.
Why Elasticsearch?
- Speed: Queries run against an inverted index stored in RAM/SSD, bypassing heavy MySQL joins.
- Scalability: Add nodes to the cluster without changing MediaWiki code.
- Rich query language: Supports boolean logic, regex, proximity, and custom analyzers.
- Extensibility: Works with MediaWiki extensions like
CirrusSearchandSemantic MediaWikito provide structured search.
Prerequisites
- MediaWiki version: The
CirrusSearchextension supports MediaWiki 1.39+ with Elasticsearch 7.10.2 (or 6.8.23+ using a compatibility layer). MediaWiki 1.44+ also works with OpenSearch 1.3, but Elasticsearch remains the most widely documented. - PHP: PHP 7.4+ with the
cURLextension compiled. TheElasticalibrary (used byCirrusSearch) requires PHP 5.4+ but recent MediaWiki releases expect PHP 7.4+. - Java: Elasticsearch runs on the JVM; install OpenJDK 11 (or later) on the host.
- System resources: Allocate at least 2 GB of heap (
XmsandXmxequal) for a modest wiki; larger installations may need 8 GB+ per node. - Network: Elasticsearch must be reachable from the web server. If it runs on a separate host, use a firewall rule that permits only the MediaWiki server to connect on port 9200 (HTTP) and 9300 (transport).
Step 1 – Install Elasticsearch
Download the official Elasticsearch package that matches the required version (7.10.2 for MediaWiki 1.39+). The simplest method on Ubuntu/Debian is:
# Install Java (OpenJDK 11)
sudo apt-get update
sudo apt-get install -y openjdk-11-jdk
# Download and install the .deb package
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb
sudo dpkg -i elasticsearch-7.10.2-amd64.deb
# Enable and start the service
sudo systemctl enable --now elasticsearch.service
# Verify the node is responding
curl -s http://127.0.0.1:9200 | jq .If you prefer containers, the official Docker image works as well:
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
elasticsearch:7.10.2Make sure the node is reachable from the MediaWiki host. For production clusters, configure discovery.type=single-node only for testing; a multi‑node cluster requires a proper elasticsearch.yml with unicast hosts and security settings.
Step 2 – Install MediaWiki Extensions
Two extensions are required:
Elastica– a PHP client library that abstracts the HTTP API of Elasticsearch.CirrusSearch– the MediaWiki search backend that translates MediaWiki search syntax into Elasticsearch queries.
Both extensions are hosted in MediaWiki’s Gerrit repository. The recommended installation method is via git so you can keep the code in sync with future updates.
cd /path/to/your/mediawiki/extensions
# Clone Elastica
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica
# Clone CirrusSearch
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch
# Install PHP dependencies via Composer (requires composer installed)
cd Elastica && composer install --no-dev && cd ..
cd CirrusSearch && composer install --no-dev && cd ..After the files are in place, enable the extensions in LocalSettings.php:
wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
// Optional: disable live search updates while we bootstrap the index
$wgDisableSearchUpdate = true;Step 3 – Configure the Connection
Tell MediaWiki where the Elasticsearch cluster lives. By default CirrusSearch assumes a local node on localhost:9200. If your node runs elsewhere, set $wgCirrusSearchServers:
$wgCirrusSearchServers = [
[ 'host' => 'es1.example.com', 'port' => 9200, 'scheme' => 'http' ],
// Add more hosts for a cluster
[ 'host' => 'es2.example.com', 'port' => 9200, 'scheme' => 'http' ],
];
// Optional: give the index a stable base name (useful if your MySQL DB name contains capitals)
$wgCirrusSearchIndexBaseName = 'mywiki';
// Choose the search type for MediaWiki core
$wgSearchType = 'CirrusSearch';
For clusters that require authentication (e.g., managed OpenSearch services), also provide credentials:
$wgCirrusSearchServers = [
[
'host' => 'search.example.com',
'port' => 443,
'scheme' => 'https',
'user' => 'elastic_user',
'pass' => 'very_secret_password',
],
];
Step 4 – Bootstrap the Search Index
With the extensions installed and the connection configured, you must build the Elasticsearch index from the existing wiki content. This is a two‑phase operation: first create the index mapping, then populate it with page text and link data.
- Generate the index configuration:
php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOverThis script creates the index (named {dbname}_general by default) and stores the mapping that defines fields such as title, text, heading, and file_text.
- Populate the index: The
ForceSearchIndex.phpscript parses each wiki page and sends the resulting document to Elasticsearch. For a fresh wiki you can run it in two passes – one that indexes page content without link counts, and a second that adds link information.
# First pass – content only (skip link counting)
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php \
--skipLinks --indexOnSkip
# Second pass – link data (skip parsing, use already‑rendered text)
php extensions/CirrusSearch/maintenance/ForceSearchIndex.php \
--skipParse
Both commands may take a while on large wikis. You can speed up the process by adding --queue and --maxJobs flags; this splits the work into background jobs handled by MediaWiki’s job queue. A typical production configuration uses Redis as the job queue backend to avoid “unserialize()” errors that can appear with the default DB‑based queue.
Step 5 – Enable Live Updates
After the initial index is built, re‑enable automatic updates so that future page edits are reflected in Elasticsearch:
// Remove the temporary disabling flag
unset( $wgDisableSearchUpdate );
Or simply delete the line from LocalSettings.php. MediaWiki will now push changes to Elasticsearch via the job queue. Verify that new edits appear in search results within a few seconds.
Step 6 – Optional Tuning & Advanced Features
The default configuration works well for most wikis, but you may want to fine‑tune performance or enable extra search capabilities.
Memory & JVM Settings
Edit /etc/elasticsearch/jvm.options and set identical heap values:
-Xms4g
-Xmx4g
Restart Elasticsearch afterwards:
sudo systemctl restart elasticsearchRegex and Deepcat Queries
CirrusSearch can run regular‑expression searches and deep category queries if the search‑extra plugin is installed on the Elasticsearch node:
/usr/share/elasticsearch/bin/elasticsearch-plugin install \
org.wikimedia.search:extra:7.10.2-wmf12Then enable it in MediaWiki:
$wgCirrusSearchWikimediaExtraPlugin['regex'] = [
'build', 'use', 'max_inspect' => 10000,
];
Weighting and Boosting
The $wgCirrusSearchWeights array lets you prioritize certain fields. For example, to give page titles a higher relevance:
$wgCirrusSearchWeights = [
'title' => 20,
'heading' => 5,
'text' => 5,
// If you have the PdfHandler extension, boost the extracted file text
'file_text' => 25,
];
Namespace Weighting
If you want the search to favor content namespaces (e.g., articles) over talk pages, use $wgCirrusSearchNamespaceWeights:
$wgCirrusSearchNamespaceWeights = [
NS_MAIN => 1.0,
NS_TALK => 0.1,
NS_CATEGORY => 0.3,
];
Pool Counter for Concurrency Control
On high‑traffic wikis, limit the number of simultaneous Elasticsearch queries to protect the cluster:
wfLoadExtension( 'PoolCounter' );
$wgPoolCounterConf['CirrusSearch-Search'] = [
'class' => 'MediaWiki\PoolCounter\PoolCounterClient',
'workers' => 25,
'maxqueue' => 50,
];
$wgPoolCounterConf['CirrusSearch-ExpensiveFullText'] = [
'class' => 'MediaWiki\PoolCounter\PoolCounterClient',
'workers' => 10,
'maxqueue' => 10,
];
Monitoring and Health Checks
Elasticsearch exposes a /_cluster/health endpoint. A simple cron job can restart the service if it becomes unhealthy:
#!/bin/bash
if ! curl -s http://127.0.0.1:9200/_cluster/health | grep -q '"status":"green"'; then
systemctl restart elasticsearch
echo "$(date) – Elasticsearch restarted" >> /var/log/elasticwatch.log
fi
Step 7 – Verifying the Setup
After everything is configured, perform a few sanity checks:
- Search a known phrase via Special:Search – results should appear instantly.
- Inspect the Elasticsearch index directly:
curl -s http://127.0.0.1:9200/_cat/indices?vYou should see an index named {dbname}_general with a non‑zero document count.
- Check the MediaWiki debug log (if
$wgDebugLogGroups['CirrusSearch']is set) for any connection errors.
Step 8 – Upgrading and Re‑indexing
When you upgrade MediaWiki or CirrusSearch, the index mapping may change. The extension provides a clear upgrade path:
- Configuration‑only changes: Run
UpdateSearchIndexConfig.phpwithout--startOver– it updates the mapping in place. - Mapping changes that require a full rebuild: Use
--startOverto drop the old index and recreate it, then repopulate withForceSearchIndex.php.
For very large wikis you can rebuild the index on a separate node and then switch the alias to the new index, minimizing downtime. The $wgCirrusSearchIndexBaseName and $wgCirrusSearchIndexIdentifier settings control the alias name.
Conclusion
Integrating Elasticsearch via the Elastica and CirrusSearch extensions transforms MediaWiki’s search from a slow, database‑bound operation into a lightning‑fast, feature‑rich experience. The steps outlined above—installing Elasticsearch, adding the two extensions, configuring the connection, building the index, and enabling live updates—are sufficient to get most wikis up and running. From there, you can fine‑tune heap sizes, enable regex queries, adjust field weighting, and add concurrency controls to match your traffic profile. With a properly sized Elasticsearch cluster, search latency drops from seconds to milliseconds, and users enjoy more relevant results, auto‑completion, and “did you mean” suggestions that keep large wikis discoverable.
Happy indexing!