Automating MediaWiki Maintenance Tasks with Cron Jobs and Scripts
Why automate MediaWiki upkeep?
Running a wiki is like keeping a garden alive – you water the plants, pull the weeds, and occasionally trim the hedges. If you wait for a weekend binge‑watch session to pop up on the calendar and then scramble to run updateSpecialPages.php or a rebuildData.php script, you’ll end up with stale caches, missed notifications, and a frustrated community. Cron jobs are the quiet gardeners that tend to the soil while you’re sipping coffee, and custom scripts let you tailor the chores to the quirks of your particular installation.
Getting the basics right
MediaWiki ships with a maintenance folder full of PHP one‑liners that do everything from clearing job queues to rebuilding the search index. The official manual calls them “maintenance scripts” and advises running them via the command line, e.g.:
php /path/to/mediawiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300On a fresh install you’ll probably run this command every ten minutes, otherwise the job queue (the background worker that processes link back‑links, email notifications, page throttling, etc.) will fill up faster than a viral meme.
Pick a place to store your custom code
- Base it under
maintenance/so you can callphp maintenance/your‑script.phpwithout extracdgymnastics. - Make sure the file is world‑readable but not writable by the web server – a simple
chmod 644does the trick. - If you need secret credentials (API keys, DB passwords) keep them in
LocalSettings.phpand have your scriptrequire_once "$IP/LocalSettings.php";– that way you don’t duplicate secrets.
Typical tasks that deserve a schedule
Below is a grab‑bag of the most common housekeeping chores. Not all wikis need every single one; think of it as a menu you can mix‑and‑match.
1. Run the job queue
The backbone of MediaWiki. A ten‑minute cadence keeps things snappy.
# */10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300 --memory-limit=128M2. Update special pages & indexes
If you’ve enabled $wgMiserMode, MediaWiki won’t automatically refresh special pages. A daily run at low‑traffic hours is enough.
# 45 7 * * * php /var/www/wiki/maintenance/updateSpecialPages.php --quiet3. Semantic MediaWiki (SMW) rebuilds
SMW ships with a handful of scripts. The most resource‑hungry is rebuildData.php – you’ll want a nightly window for that.
# 15 5 * * * php /var/www/wiki/extensions/SemanticMediaWiki/maintenance/rebuildData.php --quiet --shallow-updateOther SMW scripts you might schedule:
disposeOutdatedEntities.php– prunes stale annotations.rebuildPropertyStatistics.php– refreshes view‑counts for property usage.rebuildConceptCache.php– keeps concept queries fast.
4. Database dumps and backups
Even if you have a nightly snapshot at the infrastructure level, a plain‑text XML dump is handy for migrations.
# 30 2 * * * php /var/www/wiki/maintenance/dumpBackup.php --full --output=/backups/wiki-$(date +\%F).xml5. Purge stale caches
If you run memcached or Redis behind the scene, a weekly flush can prevent memory bloat caused by lingering keys.
# 0 4 * * 0 php /var/www/wiki/maintenance/purgeCache.php --quietCrafting a robust cron entry
At first glance a crontab line looks like a cryptic puzzle. A quick cheat‑sheet can save you endless head‑scratching:
| Field | Allowed values | Common shorthand |
|---|---|---|
| Minute | 0‑59 | * (every minute), */10 (every 10 minutes) |
| Hour | 0‑23 | *, 2‑4 (2 am to 4 am) |
| Day of month | 1‑31 | * |
| Month | 1‑12 | * |
| Day of week | 0‑6 (Sun‑Sat) | 0 (Sunday), 1‑5 (weekdays) |
When you edit the crontab (crontab -e), prepend a comment line to remind yourself what the job does. Comments are ignored by cron but priceless for future you.
# Refresh MediaWiki job queue every ten minutes
*/10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300Running scripts on modern Kubernetes‑centric deployments
If you’ve moved your wiki to a container orchestrator (the mw‑cron system that ships with MediaWiki 1.40+), you don’t edit /etc/crontab on the host. Instead you launch a short‑lived pod with mwscript (or the newer mwscript‑k8s) and let the platform schedule it.
# Example using kubectl to run a maintenance script once
kubectl run --rm -i --tty maintenance-job \
--image=mediawiki:latest \
--restart=Never \
-- php maintenance/runJobs.php --quietThe advantage? The pod inherits the same configuration, file system layout, and resource limits as the live wiki. You also get automatic log retention – Kubernetes keeps the pod logs for a configurable period (seven days is the default).
To make it truly automated, define a CronJob resource:
apiVersion: batch/v1
kind: CronJob
metadata:
name: mediawiki-runjobs
spec:
schedule: "*/10 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: runjobs
image: mediawiki:latest
command: ["php", "/var/www/html/maintenance/runJobs.php", "--quiet", "--maxjobs=200", "--maxtime=300"]
restartPolicy: NeverThis YAML snippet will spin up a fresh pod every ten minutes, run the job queue, and then vanish. No lingering processes, no stale PID files.
Handling output and errors
By default, cron pipes both STDOUT and STDERR to the user’s mail account – useful on a personal server but noisy on a production box. Redirect to a log file, and rotate it with logrotate:
# Append output to /var/log/mediawiki/cron.log and rotate weekly
*/10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300 >> /var/log/mediawiki/cron.log 2>&1When a script exits with a non‑zero status, cron sends a mail. If you prefer to capture the failure in a monitoring system, you can add a tiny wrapper:
# /usr/local/bin/runjobs-wrapper.sh
#!/bin/bash
php /var/www/wiki/maintenance/runJobs.php "$@"
RET=$?
if [ $RET -ne 0 ]; then
logger -t mediawiki "runJobs.php failed with code $RET"
fi
exit $RETNow the crontab line simply calls the wrapper, and your syslog picks up any errors.
Tips for a smoother experience
- Test locally first. Run the script from the command line with the same user that cron will use (usually
www-dataormediawiki). - Mind the PHP memory limit. Some maintenance jobs, especially
rebuildData.php, can chew a lot of RAM. Adding--memory-limit=256Mprevents the process from being killed. - Locking. If you have more than one server that might trigger the same job, add a simple lock file check:
# /usr/local/bin/lockrun.sh
#!/bin/bash
LOCKFILE=/var/run/maintenance.lock
if [ -e $LOCKFILE ]; then
echo "Another instance is running."
exit 1
fi
touch $LOCKFILE
"$@"
rm -f $LOCKFILEWrap any heavy script with lockrun.sh php /path/to/script.php … and you’ll avoid duplicate runs.
When things go sideways
Even the best‑planned cron job can flare up. A quick checklist can save you from chasing ghosts:
- Check the log. Look at the file you redirected output to, or the syslog entry if you used
logger. - Confirm PHP path. Some distros ship multiple PHP binaries; the cron environment might be using
/usr/bin/php7.4while you tested with/usr/local/bin/php. - Validate permissions. The cron user needs read access to the MediaWiki codebase and write access to any output directories.
- Watch for DB locks. Scripts that alter tables (e.g.
rebuildData.php) can dead‑lock if another background job is writing at the same moment.
If you’re on Kubernetes, kubectl logs job/mediawiki-runjobs-xxxx shows the pod’s console output. Combine that with kubectl get events to see resource‑quota issues.
Wrapping up
Automation isn’t a silver bullet, but it’s a reliable ally that keeps a MediaWiki instance humming while you focus on content, community, and maybe a bit of sleep. By picking the right maintenance scripts, scheduling them with cron or CronJob, and handling output sensibly, you turn a potentially chaotic “run‑it‑manually‑when‑you‑remember” routine into a predictable rhythm.