By MW in mediawiki — 05 Mar 2024

Leveraging MediaWiki API for Automated Content Management

Why the MediaWiki Action API Matters for Automation

If you’ve ever tried to keep a knowledge base tidy by hand, you know the feeling: pages multiply like weeds, categories drift out of sync, and a single typo can ripple through dozens of references. The MediaWiki Action API is the quiet backstage hand that lets you pull the plug on that manual grind. By treating the wiki as a REST‑ish service, you can script page creation, bulk edits, user‑role adjustments, and even category clean‑ups without ever opening the web UI.

It’s not a brand‑new fad – Wikipedia’s bots have been using it for years – but for private wikis the API is often an after‑thought. That’s a mistake. When you stitch the API into your deployment pipeline, you get reproducible content, audit trails, and the sort of flexibility that makes a wiki feel less like a static repository and more like a living, programmable knowledge engine.

Getting Your Hands on the Endpoint

The heart of the service lives at https://yourwiki.example.com/api.php. All requests are GET or POST, and they expect a set of query parameters that tell MediaWiki what you want to do. A typical “list pages in a category” call looks like this:

curl "https://yourwiki.example.com/api.php?action=query&list=categorymembers&cmtitle=Category:Guidelines&format=json"

Notice the format=json flag – you can also ask for XML, but JSON tends to play nicer with modern scripting languages. If you’re tinkering in Python, the requests library makes it a one‑liner:

import requests
    "https://yourwiki.example.com/api.php",
    params={"action":"query","list":"categorymembers","cmtitle":"Category:Guidelines","format":"json"}
)
data = resp.json()

That snippet is enough to fetch an array of page IDs, titles, and timestamps – perfect fodder for a nightly audit script.

Authentication: Token Dance, Not a Nightmare

Most API calls that modify state require a login token, then an edit token. It sounds like a bureaucratic tango, but it’s actually a couple of HTTP rounds. Here’s the minimal flow using curl:

# 1. Get a login token
TOKEN=$(curl -s "https://yourwiki.example.com/api.php?action=query&meta=tokens&type=login&format=json" |
       jq -r '.query.tokens.logintoken')

# 2. Log in (replace USER and PASS)
curl -s -X POST -d "action=login&lgname=USER&lgpassword=PASS&lgtoken=$TOKEN&format=json" \
     "https://yourwiki.example.com/api.php"

# 3. Grab an edit token
EDITTOKEN=$(curl -s "https://yourwiki.example.com/api.php?action=query&meta=tokens&format=json" |
            jq -r '.query.tokens.csrftoken')

# 4. Perform an edit
curl -X POST -d "action=edit&title=Help:Automation&text=Automated+content+added+$(date)&token=$EDITTOKEN&format=json" \
     "https://yourwiki.example.com/api.php"

In practice you’ll wrap that in a function, cache the edit token for a few minutes, and handle errors gracefully. The point is: once you’ve nailed the token dance, the rest is just data shuffling.

Batch Editing – The Real Power Move

Suppose you need to prepend a disclaimer to every page in Category:Drafts. Doing that by hand would be a slog, but with the API you can loop over the members and fire off edits in rapid succession. Below is a Python example that respects rate limits (MediaWiki tends to block more than 5 writes per from the same IP).

import time, requests, json

API = "https://yourwiki.example.com/api.php"
SESSION = requests.Session()

def get_token(action):
    r = SESSION.get(API, params={"action":"query","meta":"tokens","type":action,"format":"json"})
    return r.json()["query"]["tokens"][f"{action}token"]

login_token = get_token("login")
# Assume credentials are set in env or similar
SESSION.post(API, data={"action":"login","lgname":"bot","lgpassword":"SECRET","lgtoken":login_token,"format":"json"})

edit_token = get_token("csrf")

def members_of(cat):
    cont = {}
    while True:
        params = {"action":"query","list":"categorymembers","cmtitle":cat,"cmlimit":"500","format":"json"}
        params.update(cont)
        r = SESSION.get(API, params=params).json()
        yield from r["query"]["categorymembers"]
        if "continue" not in r: break
        cont = r["continue"]

for page in members_of(":Drafts"):
    # fetch current content
    r = SESSION.get(API, params={"action":"query","prop":"revisions","rvprop":"content","titles":page["title"],"format":"json"}).json()
    content = next(iter(r["query"]["pages"].values()))["revisions"][0]["*"]
    new_content = "{{Disclaimer}}\n" + content
    SESSION.post(API, data={"action":"edit","title":page["title"],"text":new_content,"token":edit_token,"format":"json"})
    time.sleep(0.3)  # be gentle with the server

That script does three things that often trip newcomers up:

Continuation handling: the API splits large result sets; the continue token keeps you moving.
Revision fetching: you need the wikit to prepend, otherwise you’ll overwrite accidentally.
Polite pacing: a short sleep avoids temporary bans.

Automation Use‑Cases Worth Considering

Content Staging Pipelines

Many organizations treat their wiki like a software repo – code lives in git, documentation lives in MediaWiki. By pairing the API with a CI tool (GitHub Actions, GitLab CI), you can push markdown from a repo, convert it to wikitext on the fly (using pandoc), and then fire an edit request. The result? Every commit triggers a live update, and you retain a history both in git and in the wiki’s revision log.

Cross‑System Sync

Imagine an internal ticketing system that stores resolution steps. When a ticket closes, a webhook fires, hits a tiny Flask endpoint, and that endpoint calls the MediaWiki API to either create or update a “Known Issues” page. No one has to copy‑paste a solution; the wiki stays current automatically.

Permission Audits

MediaWiki’s list=users endpoint can dump every user’s groups. Combine that with a nightly script that checks against your corporate LDAP, and you’ll have a report that flags stray admin accounts before they become a security hole.

Handling Edge Cases – A Bit of Grit

When you start playing with bots, MediaWiki will politely remind you that “the page you tried to edit was changed in the meantime.” That’s the dreaded edit conflict. The API returns a code=editconflict error, and the response includes the latest revision ID. A resilient script should fetch the fresh content, re‑apply its transformation, and retry the edit.

Another quirk: some extensions, like ParserFunctions or Semantic MediaWiki, inject hidden markup. If you blindly append text, you might break a template. The trick is to test on a sandbox page first, or use the summary parameter to note “automated prepend – check templates.”

Performance Tips – Keep the Server Happy

Batch requests where possible. The action=edit endpoint only handles one page at a time, but for reads you can ask for multiple titles via titles=Page1|Page2|Page3.
Cache tokens. A token is valid for about 5 minutes; fetching it on every loop is wasteful.
Use maxlag. Adding maxlag=5 tells MediaWiki to pause the request if the database is under heavy load – a polite “wait a sec”.
Compress responses. Send Accept-Encoding: gzip (most HTTP libraries do this automatically) to reduce bandwidth.

Putting It All Together – A Mini‑Project Sketch

Below is a compact “one‑file” script that could sit in a cron job. It checks a CSV file of upcoming events, creates a page for each if it doesn’t exist, and tags them with a “Upcoming” category.

import csv, requests, time

API = "https://yourwiki.example.com/api.php"
S = requests.Session()

def get_token(t):
    r = S.get(API, params={"action":"query","meta":"tokens","type":t,"format":"json"})
    return r.json()["query"]["tokens"][f"{t}token"]

def login():
    lt = get_token("login")
    S.post(API, data={"action":"login","lgname":"bot","lgpassword":"SECRET","lgtoken":lt,"format":"json"})

def page_exists(title):
    r = S.get(API, params={"action":"query","titles":title,"format":"json"}).json()
    pages = r["query"]["pages"]
    return next(iter(pages.values()))["missing"] is False

def create_page(title, wikitext, token):
    S.post(API, data={"action":"edit","title":title,"text":wikitext,"token":token,"format":"json"})

login()
csrf = get_token("csrf")

with open("events.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        title = f"Event:{row['Name']}"
        if not page_exists(title):
            content = f"= {row['Name']} =\n{{{{EventInfo|date={row['Date']}|location={row['Location']}}}}}\n[[Category:Upcoming]]"
            create_page(title, content, csrf)
            time.sleep(0.2)

This toy example illustrates the typical flow: login → token → read → conditionally write. In a real deployment you’d add error handling, logging, and perhaps a back‑off strategy if the wiki signals “maxlag”.

Final Thoughts

Leveraging MediaWiki’s API isn’t about turning the wiki into a code‑only zone; it’s about giving you a lever to keep content tidy, consistent, and in step with the rest of your tech stack. The learning curve is modest – a few curl calls, some token juggling, and you’re already scripting at a level that would make any human editor sigh in relief.

So whether you’re grooming a product knowledge base, syncing tickets, or just tired of hunting down stray categories, the Action API is the Swiss‑army knife that lets you automate without sacrificing the collaborative spirit that makes wikis valuable. And remember: a little friction (tokens, rate limits) is just the system’s way of saying “I’ve got your back, but don’t go wild”. Embrace it, script responsibly, and let the wiki work for you.