Mastering MediaWiki API for Custom Applications

Why the MediaWiki API Matters for Your Project

Ever stared at a wiki page and thought, “There’s got to be a better way to pull this data into my own app?” You’re not alone. The MediaWiki Action API is the hidden doorway that lets you treat a wiki like any other data source – whether you need to fetch article summaries, push new content, or sync user information. Mastering it means you can build tools that speak directly to Wikipedia, Wikidata, or your private‑instance without hacking the UI.

Getting a Grip on the Basics

First off, the API lives at https://yourwiki.org/w/api.php. It understands both GET and POST, but most read‑only queries are fine with a simple URL. The response format can be json, xml or even php, though JSON is the de‑facto choice for modern apps.

Typical query skeleton:

https://yourwiki.org/w/api.php?action=query&list=search&srsearch=MediaWiki&format=json

That one line will return a JSON object with a list of pages matching “MediaWiki”. Not magic, just a well‑documented endpoint.

Key Parameters at a Glance

  • action – what you want the API to do (query, parse, edit, ...).
  • format – json, xml, php, yaml (yaml is rare but handy for debugging).
  • list / prop / meta – the “module” you’re tapping into.
  • continue – pagination token for large result sets.
  • token – required for write operations; see the token dance later.

Authentication – The Token Tango

If you’re only reading public pages, you can skip auth. But any edit, delete, or user‑centric action demands a login session and a CSRF token. The sequence is a little clunky, which is why many developers write a tiny wrapper to hide the steps.

Step 1: Get a Login Token

import requests, json

S = requests.Session()
login_token_r = S.get(
    'https://yourwiki.org/w/api.php',
    params={'action':'query','meta':'tokens','type':'login','format':'json'}
)
login_token = login_token_r.json()['query']['tokens']['logintoken']

Step 2: Log In

login_r = S.post(
    'https://yourwiki.org/w/api.php',
    data={
        'action':'login',
        'lgname':'YourBotUser',
        'lgpassword':'SecretPass',
        'lgtoken':login_token,
        'format':'json'
    }
)
print(login_r.json())  # should say "Success"

Step 3: Grab a CSRF Token

Once you’ve got a cookie for the session, fetch the edit token:

csrf_r = S.get(
    'https://yourwiki.org/w/api.php',
    params={'action':'query','meta':'tokens','format':'json'}
)
csrf_token = csrf_r.json()['query']['tokens']['csrftoken']

Now you’re ready to edit, move pages, or even roll back revisions – just remember to include token=csrf_token in the POST body.

Common Read Operations – From Search to Full Text

Let’s walk through a few “real‑world” scenarios you might encounter.

https://yourwiki.org/w/api.php?action=query&list=search&srsearch=Open%20source&utf8=&format=json

The search list returns title, snippet, and pageid. Handy for autocomplete widgets.

2. Page Content (wikitext) Extraction

If you need the raw markup:

https://yourwiki.org/w/api.php?action=parse&page=Main_Page&prop=wikitext&format=json

3. Structured Data via prop=parsetree

Parsing the wikitext into an AST gives you programmatic access to headings, links, and templates. The output is a JSON representation of the parse tree – not the prettiest, but useful when you need to rewrite templates on the fly.

Write Operations – Editing Without the UI

Editing is the part that scares most newbies. The API requires three things: a CSRF token, the page identifier (title or pageid), and the new content. Here’s a concise example that adds a line to an existing article.

<?php
$endpoint = 'https://yourwiki.org/w/api.php';
$client = new \GuzzleHttp\Client(['cookies' => true]);

// 1) login token
$res = $client->get($endpoint, ['query'=>['action'=>'query','meta'=>'tokens','type'=>'login','format'=>'json']]);
$loginToken = json_decode($res->getBody(), true)['query']['tokens']['logintoken'];

// 2) login
$client->post($endpoint, ['form_params'=>[
    'action'=>'login','lgname'=>'BotUser','lgpassword'=>'Secret','lgtoken'=>$loginToken,'format'=>'json'
]]);

// 3) CSRF token
$res = $client->get($endpoint, ['query'=>['action'=>'query','meta'=>'tokens','format'=>'json']]);
$csrf = json_decode($res->getBody(), true)['query']['tokens']['csrftoken'];

// 4) fetch current content
$res = $client->get($endpoint, ['query'=>['action'=>'query','prop'=>'revisions','rvprop=content','titles'=>'Demo_Page','format'=>'json']]);
$page = json_decode($res->getBody(), true)['query']['pages'];
$pageId = key($page);
$current = $page[$pageId]['revisions'][0]['*'];

// 5) edit
$new = $current."\n== New Section ==\nAdded via API.";
$client->post($endpoint, ['form_params'=>[
    'action'=>'edit','title'=>'Demo_Page','text'=>$new,'token'=>$csrf,'format'=>'json'
]]);
?>

Notice the extra step to pull the existing content first – you don’t want to clobber what’s already there unless you really mean to. The API will reject a submission that looks like a bot‑spam edit without a proper edit summary, so always include a summary= parameter.

Handling Pagination and Continuation

Large wikis can return thousands of results; the API won’t dump everything in one go. Instead, it uses a continue token. The pattern looks like:

continue = {}
while True:
    params = {
        'action':'query',
        'list':'categorymembers',
        'cmtitle':'Category:Physics',
        'cmlimit':'500',
        'format':'json',
        **continue
    }
    resp = S.get('https://yourwiki.org/w/api.php', params=params).json()
    for page in resp['query']['categorymembers']:
        print(page['title'])
    if 'continue' not in resp:
        break
    continue = resp['continue']

This loop pulls every member of a category, one chunk at a time, until the server stops sending a continue field.

Best Practices – Keep It Clean

  • Rate‑limit yourself. MediaWiki enforces a modest request cap. Add a sleep(0.2) or similar pause in batch jobs.
  • Cache tokens. Login and CSRF tokens are cheap to fetch, but you’ll hit the server less if you stash them for the duration of a script.
  • Respect user consent. If you’re pulling private user data, make sure you have explicit permission – the API respects the same ACLs as the web UI.
  • Validate input. When you receive user‑generated search terms, encode them properly; the API is forgiving but you’ll avoid 400 errors with urllib.parse.quote_plus or similar.
  • Use maxlag flag on write calls.

Putting It All Together – A Mini‑App Sketch

Imagine you want a tiny Flask service that returns the first paragraph of any Wikipedia article asked for via a /summary?title=... endpoint. Below is a sketch that shows how the API can be the backbone of a custom app.

from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

WIKI_API = 'https://en.wikipedia.org/w/api.php'

@app.route('/summary')
def summary():
    title = request.args.get('title')
    if not title:
        return jsonify(error='title missing'), 400
    params = {
        'action':'query',
        'prop':'extracts',
        'exintro':True,
        'explaintext':True,
        'titles':title,
        'format':'json'
    }
    resp = requests.get(WIKI_API, params=params).json()
    page = next(iter(resp['query']['pages'].values()))
    if 'extract' in page:
        return jsonify(title=page['title'], summary=page['extract'])
    return jsonify(error='not found'), 404

if __name__=='__main__':
    app.run(debug=True)

This snippet is only ~30 lines, yet it demonstrates a typical workflow: accept input, query the API, massage the JSON, and return a clean response.

Common Pitfalls and How to Dodge Them

Even seasoned developers stumble on a few quirks.

  • Missing utf8=1 parameter. Some older MediaWiki versions get grumpy about non‑ASCII characters unless you explicitly tell the API you’re speaking UTF‑8.
  • Token “already used”. CSRF tokens are single‑use for edit actions; if a request fails, fetch a fresh token before retrying.
  • Unexpected redirects. When editing a page that has a redirect, the API will return a warning. Decide whether to follow the redirect or edit the target directly.
  • Namespace numbers. The API works with numeric namespace IDs (0 for articles, 2 for user pages, etc.). Forgetting this can lead to “page not found” errors even though the title looks correct.

Where to Dig Deeper

The official docs are surprisingly thorough. Good places to continue your journey include:

  • API:Tutorial – step‑by‑step introductions.
  • API:Action_API – full module reference.
  • GitHub’s mediawiki‑api‑demos – ready‑made code in several languages.
  • Developer Portal tutorials for Python, JavaScript, and PHP – they show practical use‑cases like generating article drafts.

Final Thoughts

If you’ve ever muttered “there’s got to be a better way” while copying tables from a wiki into a spreadsheet, you now have a roadmap. The MediaWiki Action API isn’t just a curiosity; it’s a fully‑featured service that lets you read, write, and automate just about anything the UI can do – often faster and at scale. By mastering the token dance, handling pagination with poise, and respecting the API’s rate limits, you’ll be able to weave wikis into any custom application, whether it’s a data‑driven dashboard, a bot that curates content, or a backend that keeps documentation in sync with code. The learning curve is real, but the payoff? A truly programmable knowledge base at your fingertips.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe