Mastering MediaWiki API for Automated Content Management

Picture this: you’ve just finished a massive data dump from a legacy system, and you need to push thousands of rows into a wiki

Why automate with MediaWiki’s API?

Picture this: you’ve just finished a massive data dump from a legacy system, and you need to push thousands of rows into a wiki. Manually opening “Edit” for each page? That’s the kind of nightmare that keeps developers up at night, sipping cold coffee while the cursor blinks. The MediaWiki API, however, is like a backstage pass – it lets you skip the front‑row audience and get straight to the action.

Since the 1.35 LTS release, the API has become more consistent, and the newer RESTBase endpoint adds a modern touch. So whether you’re a hobbyist bot‑author or a full‑scale content‑management team, mastering the API is the ticket to turning repetitive edits into a smooth, automated workflow.

Getting your hands dirty: the first request

All right, roll up your sleeves. The most basic thing you can do is a GET to the action=query module. That’ll fetch a page’s raw wikitext, its last revision ID, or even a list of pages that match a certain prefix.


import requests

URL = "https://www.example.org/w/api.php"
params = {
    "action": "query",
    "prop": "revisions",
    "titles": "Sandbox",
    "rvprop": "content",
    "format": "json"
}
r = requests.get(URL, params=params)
print(r.json()["query"]["pages"])

Never underestimate the power of that tiny snippet – you just pulled the entire content of “Sandbox” into a Python dict. From there the sky’s the limit.

Tokens, security, and the dreaded CSRF

Before you start blasting action=edit calls, you need a token. Think of the token as a “digital handshake” that proves you’re not a rogue script trying to hijack the wiki. The flow looks like this:

  1. Log in (or use a bot password).
  2. Request a csrf token via action=query&meta=tokens.
  3. Include that token in every edit request.

Here’s a quick PHP example that logs in with a bot password and grabs the token:


$api = "https://www.example.org/w/api.php";
$login = [
    "action" => "login",
    "lgname" => "MyBot",
    "lgpassword" => "BotPassword123",
    "format" => "json"
];
$ch = curl_init($api);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($login));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$loginResponse = json_decode(curl_exec($ch), true);
$token = $loginResponse['login']['result'] === 'Success'
    ? $loginResponse['login']['token']
    : null;

Okay, that snippet is a bit rough – you’ll need to handle cookies and maybe a second login step for newer MediaWiki versions. Yet it showcases the idea: get the token, store it, use it.

Batch actions: the power of generator and continue

Want to edit a whole category of pages? Use the generator parameter to pull a list of pages and then feed them into a single POST. Combine it with continue so you don’t hit the 500‑page limit.


def batch_edit(category, new_text):
    session = requests.Session()
    # Step 1: get a token
    token = session.get(
        URL,
        params={"action":"query","meta":"tokens","type":"csrf","format":"json"}
    ).json()["query"]["tokens"]["csrftoken"]
    
    # Step 2: iterate over pages in the category
    cont = {}
    while True:
        resp = session.get(URL, params={
            "action":"query",
            "list":"categorymembers",
            "cmtitle":f"Category:{category}",
            "cmlimit":"max",
            **cont,
            "format":"json"
        }).json()
        for page in resp["query"]["categorymembers"]:
            edit_resp = session.post(URL, data={
                "action":"edit",
                "title":page["title"],
                "text":new_text,
                "token":token,
                "format":"json"
            })
            print(f"Edited {page['title']}: {edit_resp.json()}")
        if "continue" not in resp:
            break
        cont = resp["continue"]

That function will walk through all members of a category, replace the whole page with new_text, and keep going until the API says “that’s it”. The continue dance is essential – otherwise you’ll get the dreaded “continue‑parameter missing” error.

Rate limits, polite bots, and the “User‑Agent” etiquette

MediaWiki installations often enforce a request‑per‑second ceiling. If you’re hammering a wiki at 1000 req/s you’ll hit a 429 Too Many Requests. The fix? Throttle your script, and set a recognisable User-Agent header. Something like:


headers = {
    "User-Agent": "MyWikiBot/2.0 (https://mydomain.org/bot-info; contact@mydomain.org)"
}
session.get(URL, params=params, headers=headers)

Most wikis respect Wikimedia’s rate‑limit policy. Adding a contact email isn’t just polite – it can save you from being blocked when something goes sideways.

A quick Python script for “Create‑or‑Update”

One of the most common patterns is “if the page exists, edit it; otherwise, create it”. The API makes this painless because the same edit endpoint works for both, you just have to watch the basetimestamp and starttimestamp flags. Here’s a compact script that does exactly that:


def upsert_page(title, content):
    sess = requests.Session()
    token = sess.get(
        URL,
        params={"action":"query","meta":"tokens","type":"csrf","format":"json"}
    ).json()["query"]["tokens"]["csrftoken"]
    
    # Grab the current revision ID if it exists
    rev_resp = sess.get(URL, params={
        "action":"query",
        "prop":"info",
        "titles":title,
        "format":"json"
    }).json()
    pages = rev_resp["query"]["pages"]
    pageid = next(iter(pages))
    cur_rev = pages[pageid].get("lastrevid")
    
    edit_data = {
        "action":"edit",
        "title":title,
        "text":content,
        "token":token,
        "format":"json"
    }
    if cur_rev:
        edit_data["basetimestamp"] = pages[pageid]["touched"]
    resp = sess.post(URL, data=edit_data)
    print(resp.json())

Notice the tiny “if cur_rev” guard – it adds a basetimestamp only when you’re truly updating. That little nuance prevents edit conflicts when multiple bots are working side‑by‑side.

Beyond the classic API: RESTBase and the new /v1 endpoints

Since MediaWiki 1.39, the RESTBase service ships with an HTTP‑friendly JSON API. Instead of the old action=query style, you can now use endpoints like /v1/page/{title} for reads and /v1/page/{title}/content for writes. The biggest win? No need for a token query – you just include a Authorization: Bearer … header if you’ve set up OAuth 2.0.

Example with curl to fetch a page’s HTML:


curl -H "Accept: application/json" \
    "https://www.example.org/api/rest_v1/page/html/Help:Contents"

And to patch the content (requires a bot password with the writeapi right):


curl -X PATCH -H "Authorization: Basic $(echo -n 'mybot:BotPassword' | base64)" \
    -H "Content-Type: text/plain" \
    --data-binary "New wikitext here" \
    "https://www.example.org/api/rest_v1/page/Title/content"

Switching to RESTBase can simplify client code, especially if you already speak JSON‑API in other parts of your stack.

Real‑world tips from the field

  • Don’t ignore edit conflicts. Even if you use basetimestamp, it’s wise to catch the editconflict error and retry with the latest revision.
  • Log every request. A simple CSV with timestamp, endpoint, response code, and any error message becomes invaluable when you need to audit bot activity.
  • Cache tokens. Tokens are valid for a while (usually a few hours). Requesting a new token on every iteration just adds latency.
  • Test on a sandbox. The official https://test.wikidata.org instance is perfect for trying out bulk edits before you point at production.
  • Watch the maxlag parameter. Adding maxlag=5 tells the master database “don’t let my query lag the replica by more than five seconds”. If it’s exceeded, the API returns a maxlag error – you can then back off a bit.

Putting it all together – a mini‑pipeline

Imagine you have a CSV of product IDs and descriptions that need to land on a wiki. The pipeline would look like this:

  1. Read the CSV with pandas (or even plain csv module).
  2. For each row, construct the page title (e.g., Product:12345).
  3. Use the upsert_page function above to create or update the page.
  4. Log success or error to a separate file.
  5. After the batch, send a summary email to the content team.

All of that can be wrapped in a while True loop that checks the CSV for new rows every hour – a tiny “cron‑style” daemon that keeps your wiki in sync with the source database without any human fingers touching the edit box.

Final thoughts

Automating MediaWiki content isn’t just about learning a set of HTTP parameters; it’s about embracing the mindset of “treat the wiki as a data store”. When you think of pages as records, revisions as versioned rows, and the API as a CRUD interface, the whole process becomes as familiar as working with any other RESTful service.

Sure, there are quirks – token gymnastics, continuation loops, occasional 503s when the cluster is under load. But those are just the growing pains of a platform that powers everything from Wikipedia to corporate knowledge bases. With a dash of patience, a sprinkle of logging, and a good dose of respectful bot behaviour, you’ll find that the MediaWiki API can turn a mountain of manual edits into a quiet, humming workflow.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe