By MW in mediawiki — 27 May 2025

Mastering MediaWiki API for Automated Content Management

Picture this: you’ve just finished a massive data dump from a legacy system, and you need to push thousands of rows into a wiki

Why automate with MediaWiki’s API?

Picture this: you’ve just finished a massive data dump from a legacy system, and you need to push thousands of rows into a wiki. Manually opening “Edit” for each page? That’s the kind of nightmare that keeps developers up at night, sipping cold coffee while the cursor blinks. The MediaWiki API, however, is like a backstage pass – it lets you skip the front‑row audience and get straight to the action.

Since the 1.35 LTS release, the API has become more consistent, and the newer RESTBase endpoint adds a modern touch. So whether you’re a hobbyist bot‑author or a full‑scale content‑management team, mastering the API is the ticket to turning repetitive edits into a smooth, automated workflow.

Getting your hands dirty: the first request

All right, roll up your sleeves. The most basic thing you can do is a GET to the action=query module. That’ll fetch a page’s raw wikitext, its last revision ID, or even a list of pages that match a certain prefix.


import requests

URL = "https://www.example.org/w/api.php"
params = {
    "action": "query",
    "prop": "revisions",
    "titles": "Sandbox",
    "rvprop": "content",
    "format": "json"
}
r = requests.get(URL, params=params)
print(r.json()["query"]["pages"])

Never underestimate the power of that tiny snippet – you just pulled the entire content of “Sandbox” into a Python dict. From there the sky’s the limit.

Tokens, security, and the dreaded CSRF

Before you start blasting action=edit calls, you need a token. Think of the token as a “digital handshake” that proves you’re not a rogue script trying to hijack the wiki. The flow looks like this:

Log in (or use a bot password).
Request a csrf token via action=query&meta=tokens.
Include that token in every edit request.

Here’s a quick PHP example that logs in with a bot password and grabs the token:


$api = "https://www.example.org/w/api.php";
$login = [
    "action" => "login",
    "lgname" => "MyBot",
    "lgpassword" => "BotPassword123",
    "format" => "json"
];
$ch = curl_init($api);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($login));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$loginResponse = json_decode(curl_exec($ch), true);
$token = $loginResponse['login']['result'] === 'Success'
    ? $loginResponse['login']['token']
    : null;

Okay, that snippet is a bit rough – you’ll need to handle cookies and maybe a second login step for newer MediaWiki versions. Yet it showcases the idea: get the token, store it, use it.

Batch actions: the power of `generator` and `continue`

Want to edit a whole category of pages? Use the generator parameter to pull a list of pages and then feed them into a single POST. Combine it with continue so you don’t hit the 500‑page limit.


def batch_edit(category, new_text):
    session = requests.Session()
    # Step 1: get a token
    token = session.get(
        URL,
        params={"action":"query","meta":"tokens","type":"csrf","format":"json"}
    ).json()["query"]["tokens"]["csrftoken"]
    
    # Step 2: iterate over pages in the category
    cont = {}
    while True:
        resp = session.get(URL, params={
            "action":"query",
            "list":"categorymembers",
            "cmtitle":f"Category:{category}",
            "cmlimit":"max",
            **cont,
            "format":"json"
        }).json()
        for page in resp["query"]["categorymembers"]:
            edit_resp = session.post(URL, data={
                "action":"edit",
                "title":page["title"],
                "text":new_text,
                "token":token,
                "format":"json"
            })
            print(f"Edited {page['title']}: {edit_resp.json()}")
        if "continue" not in resp:
            break
        cont = resp["continue"]

That function will walk through all members of a category, replace the whole page with new_text, and keep going until the API says “that’s it”. The continue dance is essential – otherwise you’ll get the dreaded “continue‑parameter missing” error.

Rate limits, polite bots, and the “User‑Agent” etiquette

MediaWiki installations often enforce a request‑per‑second ceiling. If you’re hammering a wiki at 1000 req/s you’ll hit a 429 Too Many Requests. The fix? Throttle your script, and set a recognisable User-Agent header. Something like:


headers = {
    "User-Agent": "MyWikiBot/2.0 (https://mydomain.org/bot-info; contact@mydomain.org)"
}
session.get(URL, params=params, headers=headers)

Most wikis respect Wikimedia’s rate‑limit policy. Adding a contact email isn’t just polite – it can save you from being blocked when something goes sideways.

A quick Python script for “Create‑or‑Update”

One of the most common patterns is “if the page exists, edit it; otherwise, create it”. The API makes this painless because the same edit endpoint works for both, you just have to watch the basetimestamp and starttimestamp flags. Here’s a compact script that does exactly that:


def upsert_page(title, content):
    sess = requests.Session()
    token = sess.get(
        URL,
        params={"action":"query","meta":"tokens","type":"csrf","format":"json"}
    ).json()["query"]["tokens"]["csrftoken"]
    
    # Grab the current revision ID if it exists
    rev_resp = sess.get(URL, params={
        "action":"query",
        "prop":"info",
        "titles":title,
        "format":"json"
    }).json()
    pages = rev_resp["query"]["pages"]
    pageid = next(iter(pages))
    cur_rev = pages[pageid].get("lastrevid")
    
    edit_data = {
        "action":"edit",
        "title":title,
        "text":content,
        "token":token,
        "format":"json"
    }
    if cur_rev:
        edit_data["basetimestamp"] = pages[pageid]["touched"]
    resp = sess.post(URL, data=edit_data)
    print(resp.json())

Notice the tiny “if cur_rev” guard – it adds a basetimestamp only when you’re truly updating. That little nuance prevents edit conflicts when multiple bots are working side‑by‑side.

Beyond the classic API: RESTBase and the new `/v1` endpoints

Since MediaWiki 1.39, the RESTBase service ships with an HTTP‑friendly JSON API. Instead of the old action=query style, you can now use endpoints like /v1/page/{title} for reads and /v1/page/{title}/content for writes. The biggest win? No need for a token query – you just include a Authorization: Bearer … header if you’ve set up OAuth 2.0.

Example with curl to fetch a page’s HTML:


curl -H "Accept: application/json" \
    "https://www.example.org/api/rest_v1/page/html/Help:Contents"

And to patch the content (requires a bot password with the writeapi right):


curl -X PATCH -H "Authorization: Basic $(echo -n 'mybot:BotPassword' | base64)" \
    -H "Content-Type: text/plain" \
    --data-binary "New wikitext here" \
    "https://www.example.org/api/rest_v1/page/Title/content"

Switching to RESTBase can simplify client code, especially if you already speak JSON‑API in other parts of your stack.

Real‑world tips from the field

Don’t ignore edit conflicts. Even if you use basetimestamp, it’s wise to catch the editconflict error and retry with the latest revision.
Log every request. A simple CSV with timestamp, endpoint, response code, and any error message becomes invaluable when you need to audit bot activity.
Cache tokens. Tokens are valid for a while (usually a few hours). Requesting a new token on every iteration just adds latency.
Test on a sandbox. The official https://test.wikidata.org instance is perfect for trying out bulk edits before you point at production.
Watch the maxlag parameter. Adding maxlag=5 tells the master database “don’t let my query lag the replica by more than five seconds”. If it’s exceeded, the API returns a maxlag error – you can then back off a bit.

Putting it all together – a mini‑pipeline

Imagine you have a CSV of product IDs and descriptions that need to land on a wiki. The pipeline would look like this:

Read the CSV with pandas (or even plain csv module).
For each row, construct the page title (e.g., Product:12345).
Use the upsert_page function above to create or update the page.
Log success or error to a separate file.
After the batch, send a summary email to the content team.

All of that can be wrapped in a while True loop that checks the CSV for new rows every hour – a tiny “cron‑style” daemon that keeps your wiki in sync with the source database without any human fingers touching the edit box.

Final thoughts

Automating MediaWiki content isn’t just about learning a set of HTTP parameters; it’s about embracing the mindset of “treat the wiki as a data store”. When you think of pages as records, revisions as versioned rows, and the API as a CRUD interface, the whole process becomes as familiar as working with any other RESTful service.

Sure, there are quirks – token gymnastics, continuation loops, occasional 503s when the cluster is under load. But those are just the growing pains of a platform that powers everything from Wikipedia to corporate knowledge bases. With a dash of patience, a sprinkle of logging, and a good dose of respectful bot behaviour, you’ll find that the MediaWiki API can turn a mountain of manual edits into a quiet, humming workflow.