Optimizing MediaWiki Performance through Custom Extension Development
Why a Custom Extension Can Be the Secret Sauce
When you first spin up MediaWiki on a modest server, everything feels snappy. Add a handful of pages, sprinkle a few templates, and—boom—your latency starts creeping up like a lazy cat. Sure, you could crank up MySQL buffers or slap an opcode cache on the side, but there’s a more surgical option: a well‑crafted extension that does the heavy lifting where the core simply can’t.
Getting a Feel for the Bottleneck
Before you write a single line of PHP, take a quick look at what’s actually slowing you down. MediaWiki’s performance manual mentions three usual suspects:
- Database round‑trips that could be batched.
- Parser work that repeats the same work for every request.
- PHP execution that isn’t cached.
Grab composer require --dev phpunit/phpunit and run php vendor/bin/phpunit --filter ParserPerformanceTest if you’re feeling fancy. In practice, the Profiler extension (built‑in) is enough to dump a JSON file you can open in Chrome’s “Performance” tab. Look for “ParserCache::get” spikes—those are the low‑hanging fruit.
Designing the Extension: Keep It Small, Keep It Fast
It’s tempting to bundle every tweak into one monolithic plugin, but that’s a recipe for “it works… until it doesn’t”. The best‑practice page says: one hook, one responsibility. Below is a skeleton that shows the pattern.
get( $key );
if ( $cached !== false ) {
return true; // skip heavy work
}
// … do the expensive thing here …
ObjectCache::getLocalCluster()->set( $key, $result, 3600 );
return true;
}
}
$GLOBALS['wgHooks']['ParserAfterParse'][] = 'MyExtension::onParserAfterParse';
Notice the use of ObjectCache::getLocalCluster()—that’s the built‑in Memcached wrapper. If your wiki already runs memcached, you get a free boost. If not, the fallback is a file‑based cache, which is still better than recomputing the regex each time.
Profiling the Hook Itself
Once the hook is in place, you don’t just assume it helped. Fire up the built‑in Profiler again, this time with profile=1 in the URL. You’ll see something like:
ParserAfterParse: 0.0045s (cached 0.0002s)
If the “cached” branch dominates, you’re golden. If not, maybe the cache key is too granular—try hashing only the part that actually changes.
When to Reach for a Dedicated Table
Sometimes the data you want to cache isn’t a string but a structured set—think “related pages” computed from link graphs. Storing those in the default object cache can bloat memory. A custom table, with an index on the primary key, lets you pull a handful of rows in a single query.
CREATE TABLE /*_*/myextension_cache (
ce_key VARBINARY(255) NOT NULL,
ce_data BLOB NOT NULL,
ce_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (ce_key)
) ENGINE=InnoDB;
Then, inside the hook:
selectRow(
'myextension_cache',
[ 'ce_data' ],
[ 'ce_key' => $key ],
__METHOD__
);
if ( $row ) {
// unserialize, use it, and bail out
}
It looks like a lot of boilerplate, but once you have the table you can reuse it for other extensions—pay‑off across the board.
Thread‑Safety and Concurrency
Here’s a subtle trap: two concurrent requests could both miss the cache and start the expensive work. The built‑in ObjectCache::add method can act like a tiny lock. Example:
add( $lockKey, 1, 30 ) ) {
// we own the lock – do the work
$result = heavyComputation( $text );
$cache->set( $key, $result, 3600 );
$cache->delete( $lockKey );
} else {
// another process is working – wait a tick
usleep( 50000 );
// then try fetch again
$result = $cache->get( $key );
}
That tiny bit of extra code prevents a thundering‑herd scenario that could otherwise swamp a small VPS during a traffic spike.
Balancing Memory vs. CPU
One of the most common misconceptions is “cache everything, memory will handle it”. Not true. A rule of thumb I use (maybe too loosely) is: if the cached payload is larger than 64 KB, weigh the memory cost against the CPU cost of recomputing it. In a high‑traffic wiki, a 100 KB cache entry that lives for an hour can be a memory hog. In those cases, consider storing a compressed version:
set( $key, $compressed, 3600 );
$decompressed = gzuncompress( $cache->get( $key ) );
Yes, you add a few CPU cycles, but the net win is often a smaller RAM footprint, which translates to lower swap usage—something that can kill a wiki during a flash‑crowd.
Testing in the Real World
Automated tests are nice, but they don’t capture the jitter of a live site. I usually spin up a cheap DigitalOcean droplet, point it at a copy of the production dump, and run ab -n 500 -c 20 http://example.com/wiki/Main_Page. Compare the “Time per request” before and after the extension is enabled. If you see a dip of 10‑15 ms on average, you’ve done something right.
Potential Pitfalls
- Hard‑coded paths. Using
__DIR__works locally, but on a shared hosting environment you might need$IPinstead. - Over‑reliance on global variables. The
$wgarray is handy, yet passing config through the extension’sExtension.jsonmakes upgrades smoother. - Neglecting error handling. If the cache server goes down, your hook should gracefully fall back to the original algorithm instead of throwing a fatal.
Putting It All Together: A Mini‑Roadmap
- Identify a repeatable, expensive operation (regex, DB join, API call).
- Wrap it in a hook that first checks an
ObjectCachekey. - If the key is missing, acquire a short‑lived lock, compute, store, release.
- Measure with
Profilerandabto verify improvement. - Iterate: tighten the cache key, compress payload, or move to a custom table if needed.
That’s it. No need for a full‑blown CDN or a cluster of Varnish nodes if you’re just looking to shave a few milliseconds off each page view. A little bit of custom PHP, a pinch of caching, and a dash of profiling can move the needle enough to keep your community happy.
Final Thoughts (or not…)
Honestly, I still sometimes forget to clear the cache after a schema change—my own “oops” moment that taught me to add a maintenance/refreshCache.php call in the deployment script. It’s those tiny, human slips that remind us why we need to write extensions that fail gracefully. If you’re already using composer for MediaWiki extensions, just add your new plugin to composer.json, run composer update, and you’re set.
In the end, performance is a marathon, not a sprint. One well‑placed hook can buy you hours of smooth operation before you have to think about scaling out to a load balancer. And that, dear reader, is why custom extension development remains a cornerstone of MediaWiki optimization.