Automating MediaWiki Maintenance Tasks with Cron Jobs and Scripts

Why automate MediaWiki upkeep?

Running a wiki is like keeping a garden alive – you water the plants, pull the weeds, and occasionally trim the hedges. If you wait for a weekend binge‑watch session to pop up on the calendar and then scramble to run updateSpecialPages.php or a rebuildData.php script, you’ll end up with stale caches, missed notifications, and a frustrated community. Cron jobs are the quiet gardeners that tend to the soil while you’re sipping coffee, and custom scripts let you tailor the chores to the quirks of your particular installation.

Getting the basics right

MediaWiki ships with a maintenance folder full of PHP one‑liners that do everything from clearing job queues to rebuilding the search index. The official manual calls them “maintenance scripts” and advises running them via the command line, e.g.:

php /path/to/mediawiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300

On a fresh install you’ll probably run this command every ten minutes, otherwise the job queue (the background worker that processes link back‑links, email notifications, page throttling, etc.) will fill up faster than a viral meme.

Pick a place to store your custom code

  • Base it under maintenance/ so you can call php maintenance/your‑script.php without extra cd gymnastics.
  • Make sure the file is world‑readable but not writable by the web server – a simple chmod 644 does the trick.
  • If you need secret credentials (API keys, DB passwords) keep them in LocalSettings.php and have your script require_once "$IP/LocalSettings.php"; – that way you don’t duplicate secrets.

Typical tasks that deserve a schedule

Below is a grab‑bag of the most common housekeeping chores. Not all wikis need every single one; think of it as a menu you can mix‑and‑match.

1. Run the job queue

The backbone of MediaWiki. A ten‑minute cadence keeps things snappy.

# */10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300 --memory-limit=128M

2. Update special pages & indexes

If you’ve enabled $wgMiserMode, MediaWiki won’t automatically refresh special pages. A daily run at low‑traffic hours is enough.

# 45 7 * * * php /var/www/wiki/maintenance/updateSpecialPages.php --quiet

3. Semantic MediaWiki (SMW) rebuilds

SMW ships with a handful of scripts. The most resource‑hungry is rebuildData.php – you’ll want a nightly window for that.

# 15 5 * * * php /var/www/wiki/extensions/SemanticMediaWiki/maintenance/rebuildData.php --quiet --shallow-update

Other SMW scripts you might schedule:

  • disposeOutdatedEntities.php – prunes stale annotations.
  • rebuildPropertyStatistics.php – refreshes view‑counts for property usage.
  • rebuildConceptCache.php – keeps concept queries fast.

4. Database dumps and backups

Even if you have a nightly snapshot at the infrastructure level, a plain‑text XML dump is handy for migrations.

# 30 2 * * * php /var/www/wiki/maintenance/dumpBackup.php --full --output=/backups/wiki-$(date +\%F).xml

5. Purge stale caches

If you run memcached or Redis behind the scene, a weekly flush can prevent memory bloat caused by lingering keys.

# 0 4 * * 0 php /var/www/wiki/maintenance/purgeCache.php --quiet

Crafting a robust cron entry

At first glance a crontab line looks like a cryptic puzzle. A quick cheat‑sheet can save you endless head‑scratching:

FieldAllowed valuesCommon shorthand
Minute0‑59* (every minute), */10 (every 10 minutes)
Hour0‑23*, 2‑4 (2 am to 4 am)
Day of month1‑31*
Month1‑12*
Day of week0‑6 (Sun‑Sat)0 (Sunday), 1‑5 (weekdays)

When you edit the crontab (crontab -e), prepend a comment line to remind yourself what the job does. Comments are ignored by cron but priceless for future you.

# Refresh MediaWiki job queue every ten minutes
*/10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300

Running scripts on modern Kubernetes‑centric deployments

If you’ve moved your wiki to a container orchestrator (the mw‑cron system that ships with MediaWiki 1.40+), you don’t edit /etc/crontab on the host. Instead you launch a short‑lived pod with mwscript (or the newer mwscript‑k8s) and let the platform schedule it.

# Example using kubectl to run a maintenance script once
kubectl run --rm -i --tty maintenance-job \
  --image=mediawiki:latest \
  --restart=Never \
  -- php maintenance/runJobs.php --quiet

The advantage? The pod inherits the same configuration, file system layout, and resource limits as the live wiki. You also get automatic log retention – Kubernetes keeps the pod logs for a configurable period (seven days is the default).

To make it truly automated, define a CronJob resource:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: mediawiki-runjobs
spec:
  schedule: "*/10 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: runjobs
            image: mediawiki:latest
            command: ["php", "/var/www/html/maintenance/runJobs.php", "--quiet", "--maxjobs=200", "--maxtime=300"]
          restartPolicy: Never

This YAML snippet will spin up a fresh pod every ten minutes, run the job queue, and then vanish. No lingering processes, no stale PID files.

Handling output and errors

By default, cron pipes both STDOUT and STDERR to the user’s mail account – useful on a personal server but noisy on a production box. Redirect to a log file, and rotate it with logrotate:

# Append output to /var/log/mediawiki/cron.log and rotate weekly
*/10 * * * * php /var/www/wiki/maintenance/runJobs.php --quiet --maxjobs=200 --maxtime=300 >> /var/log/mediawiki/cron.log 2>&1

When a script exits with a non‑zero status, cron sends a mail. If you prefer to capture the failure in a monitoring system, you can add a tiny wrapper:

# /usr/local/bin/runjobs-wrapper.sh
#!/bin/bash
php /var/www/wiki/maintenance/runJobs.php "$@"
RET=$?
if [ $RET -ne 0 ]; then
  logger -t mediawiki "runJobs.php failed with code $RET"
fi
exit $RET

Now the crontab line simply calls the wrapper, and your syslog picks up any errors.

Tips for a smoother experience

  • Test locally first. Run the script from the command line with the same user that cron will use (usually www-data or mediawiki).
  • Mind the PHP memory limit. Some maintenance jobs, especially rebuildData.php, can chew a lot of RAM. Adding --memory-limit=256M prevents the process from being killed.
  • Locking. If you have more than one server that might trigger the same job, add a simple lock file check:
# /usr/local/bin/lockrun.sh
#!/bin/bash
LOCKFILE=/var/run/maintenance.lock
if [ -e $LOCKFILE ]; then
  echo "Another instance is running."
  exit 1
fi
touch $LOCKFILE
"$@"
rm -f $LOCKFILE

Wrap any heavy script with lockrun.sh php /path/to/script.php … and you’ll avoid duplicate runs.

When things go sideways

Even the best‑planned cron job can flare up. A quick checklist can save you from chasing ghosts:

  1. Check the log. Look at the file you redirected output to, or the syslog entry if you used logger.
  2. Confirm PHP path. Some distros ship multiple PHP binaries; the cron environment might be using /usr/bin/php7.4 while you tested with /usr/local/bin/php.
  3. Validate permissions. The cron user needs read access to the MediaWiki codebase and write access to any output directories.
  4. Watch for DB locks. Scripts that alter tables (e.g. rebuildData.php) can dead‑lock if another background job is writing at the same moment.

If you’re on Kubernetes, kubectl logs job/mediawiki-runjobs-xxxx shows the pod’s console output. Combine that with kubectl get events to see resource‑quota issues.

Wrapping up

Automation isn’t a silver bullet, but it’s a reliable ally that keeps a MediaWiki instance humming while you focus on content, community, and maybe a bit of sleep. By picking the right maintenance scripts, scheduling them with cron or CronJob, and handling output sensibly, you turn a potentially chaotic “run‑it‑manually‑when‑you‑remember” routine into a predictable rhythm.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe