By MW in mediawiki — 13 Apr 2024

Mastering MediaWiki's API: Building Custom Applications

Why the MediaWiki Action API matters

Picture this: you’re strolling through a knowledge base, looking for a specific paragraph, and you wish you could pull that snippet straight into your own dashboard. MediaWiki’s Action API hands you that power, all without scraping HTML. It’s the glue between the wiki engine and any external system that needs to read, write, or even automate routine chores.

Core principles you should keep in mind

First off, the API lives at https://yourwiki.org/w/api.php. Every request is HTTP‑based, and you can ask for JSON, XML, or even PHP serialized output by setting &format=json (or xml, php). The action parameter decides what you’re after – most folks start with query, but edit and login are just as common.

GET vs POST – Retrieval (search, fetch page) is safe to do via GET; write‑like operations demand POST.
Tokens – Any write call needs a CSRF token, fetched with action=query&meta=tokens&type=csrf.
Namespaces – MediaWiki organizes content; remember 0 is the main namespace, 1 is Talk, 6 is File.

Getting content – a quick example

Suppose you want the wikitext of “Main Page”. A minimal request looks like this:

curl "https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main_Page&rvprop=content&format=json"

The JSON payload returns a pages object keyed by page ID. Grab the "*" field under revisions – that’s the raw wikitext.

Parsing the response in Python

import requests, json

url = "https://en.wikipedia.org/w/api.php"
params = {
    "action": "query",
    "prop": "revisions",
    "titles": "Main_Page",
    "rvprop": "content",
    "format": "json"
}
resp = requests.get(url, params=params)
data = resp.json()
page = next(iter(data["query"]["pages"].values()))
wikitext = page["revisions"][0]["*"]
print(wikitext[:200])  # show first 200 chars

That snippet is about as simple as it gets – you’re already pulling live wiki data into a script.

Editing pages programmatically

Now, let’s get messy. Editing a page via the API is a two‑step dance: fetch a token, then POST the changes. Here’s the token fetch in PHP:

$api = "https://yourwiki.org/w/api.php";
$tokenResp = file_get_contents($api . "?action=query&meta=tokens&type=csrf&format=json");
$token = json_decode($tokenResp, true)['query']['tokens']['csrftoken'];

Once you have $token, you can send a POST request. Notice the title, section (use 0 for the whole page), and the text you’re committing.

$postData = http_build_query([
    'action'   => 'edit',
    'title'    => 'User:DemoBot/TestPage',
    'section'  => '0',
    'text'     => "This is a test edit made at " . date('c'),
    'token'    => $token,
    'format'   => 'json'
]);

$options = [
    'http' => [
        'method'  => 'POST',
        'header'  => "Content-Type: application/x-www-form-urlencoded\r\n",
        'content' => $postData,
    ]
];
$context = stream_context_create($options);
$result = file_get_contents($api, false, $context);
echo $result;

Watch for the "result":"Success" flag – if you see "badtoken" something went wrong, probably the token expired (they’re good for a few minutes).

MediaWiki offers three approaches: cookie‑based login, central authentication (CAS, LDAP), and OAuth2. For quick scripts, cookie login works fine:

curl -c cookies.txt \
     -d "action=login&lgname=MyBot&lgpassword=SecretPass&format=json" \
     "https://yourwiki.org/w/api.php"

Now every subsequent request can reuse cookies.txt. For production‑grade bots, you’ll want OAuth – the API will hand you a request token you exchange for an access token, then sign each request. The docs on API:OAuth walk through the handshake.

Handling pagination and limits

When you query large lists – say, “all pages in the Main namespace” – the API caps results (default 10, max 500 for regular users, 5000 for bots). Use continue to loop:

params = {
    "action": "query",
    "list": "allpages",
    "apnamespace": "0",
    "aplimit": "max",
    "format": "json"
}
while True:
    resp = requests.get(url, params=params).json()
    for page in resp["query"]["allpages"]:
        print(page["title"])
    if "continue" not in resp:
        break
    params.update(resp["continue"])

This pattern works for categorymembers, search, and others. Just remember to merge the continue fields back into your request parameters.

Integrating with external data sources

Say you maintain a separate product catalog in MySQL and you want each product page on the wiki to reflect the latest price. A typical workflow:

Pull the product list from MySQL (via PDO or an ORM).
For each product, query the wiki to see if a page exists (use action=query&titles=Product:ID).
If missing, create it with action=edit using the CSRF token.
If present, compare the price field embedded in a <templatedata> or a {{Infobox}} and update only when it differs.

All of that can live inside a cron job, running nightly. The nice thing is that you can reuse the same token for dozens of edits, staying within rate limits (usually ~200 writes per minute for bots).

Dealing with errors and rate‑limit quirks

When something goes sideways, MediaWiki returns a "error" object. Typical culprits:

maxlag – the database is under heavy load; you’ll see "maxlag": "5". The advice is to back off for a few seconds.
badtoken – token expired; fetch a fresh one and retry.
blocked – the IP or user account is blocked; you’ll need to talk to the wiki admins.

A robust script catches these, sleeps a random short interval (to avoid hammering the server), and then tries again. Here’s a tiny Python wrapper:

import time, random, requests

def api_call(params, data=None):
    while True:
        r = requests.post(url, params=params, data=data) if data else requests.get(url, params=params)
        j = r.json()
        if 'error' not in j:
            return j
        err = j['error']['code']
        if err == 'maxlag':
            wait = int(j['error']['lag']) + random.uniform(0.5, 1.5)
            time.sleep(wait)
        elif err == 'badtoken':
            # Refresh token logic would go here
            raise RuntimeError('Token invalid – need new token')
        else:
            raise RuntimeError(f"API error: {err}")

The loop feels a bit clunky, but it mirrors how humans troubleshoot – you keep trying, pause, then retry.

Building a tiny client library

If you’re planning multiple projects, wrap the basics into a class. Below is a stripped‑down PHP example that handles token caching and simple GET/POST calls:

class WikiClient {
    private $api;
    private $token = null;
    private $cookieJar = [];

    public function __construct($apiUrl) {
        $this->api = $apiUrl;
    }

    private function request($params, $post = false) {
        $opts = [
            'http' => [
                'method'  => $post ? 'POST' : 'GET',
                'header'  => "Cookie: " . implode('; ', $this->cookieJar) . "\r\n",
                'content' => $post ? http_build_query($params) : '',
            ]
        ];
        $ctx = stream_context_create($opts);
        $url = $this->api . ($post ? '' : '?' . http_build_query($params));
        $response = file_get_contents($url, false, $ctx);
        // Parse Set-Cookie headers for future calls
        // ... (omitted for brevity)
        return json_decode($response, true);
    }

    public function getToken() {
        if ($this->token) return $this->token;
        $res = $this->request(['action'=>'query','meta'=>'tokens','type'=>'csrf','format'=>'json']);
        $this->token = $res['query']['tokens']['csrftoken'];
        return $this->token;
    }

    public function editPage($title, $text) {
        $token = $this->getToken();
        $params = [
            'action' => 'edit',
            'title'  => $title,
            'text'   => $text,
            'token'  => $token,
            'format' => 'json'
        ];
        return $this->request($params, true);
    }
}

Now any script can just do $client->editPage('User:DemoBot/Test', $newText); and be done with it.

Real‑world use cases you might encounter

Search widgets – Front‑end components that query action=opensearch for autocomplete suggestions.
Content sync – Companies mirror internal policy docs onto a wiki, using the API to keep both sides in lockstep.
Analytics dashboards – Pull edit statistics (list=recentchanges) and feed them into Grafana or PowerBI.
Chatbots – Slack bots that fetch article extracts on demand, using prop=extracts with explaintext.

Tips for staying sane while hacking the API

– Keep an eye on the official tutorial. It’s surprisingly thorough for newcomers.
– Test with format=json and a pretty‑print tool (e.g., jq) – raw JSON is easier to sniff than a wall of XML.
– When you hit a wall, try the “interactive API sandbox” on MediaWiki.org; it builds the request for you.
– Document your token lifecycle – a forgotten expire can cause mysterious “badtoken” errors that feel like the API is haunted.

Wrapping things up

Mastering MediaWiki’s API isn’t about memorizing every module; it’s about internalizing the pattern: ask for what you need, respect the token dance, handle pagination, and be polite to the server. Once you’ve got those basics, building custom applications – from simple search widgets to full‑blown edit bots – becomes a matter of stitching together the right calls.

So, whether you’re pulling data for a research portal or syncing product info from a legacy database, the Action API is the bridge. Treat it like any other web service: test, cache, retry, and you’ll find the wiki bending to your will rather than the other way around.