Mastering MediaWiki API for Custom Applications
Why the MediaWiki API Matters for Your Project
Ever stared at a wiki page and thought, “There’s got to be a better way to pull this data into my own app?” You’re not alone. The MediaWiki Action API is the hidden doorway that lets you treat a wiki like any other data source – whether you need to fetch article summaries, push new content, or sync user information. Mastering it means you can build tools that speak directly to Wikipedia, Wikidata, or your private‑instance without hacking the UI.
Getting a Grip on the Basics
First off, the API lives at https://yourwiki.org/w/api.php. It understands both GET and POST, but most read‑only queries are fine with a simple URL. The response format can be json, xml or even php, though JSON is the de‑facto choice for modern apps.
Typical query skeleton:
https://yourwiki.org/w/api.php?action=query&list=search&srsearch=MediaWiki&format=jsonThat one line will return a JSON object with a list of pages matching “MediaWiki”. Not magic, just a well‑documented endpoint.
Key Parameters at a Glance
- action – what you want the API to do (query, parse, edit, ...).
- format – json, xml, php, yaml (yaml is rare but handy for debugging).
- list / prop / meta – the “module” you’re tapping into.
- continue – pagination token for large result sets.
- token – required for write operations; see the token dance later.
Authentication – The Token Tango
If you’re only reading public pages, you can skip auth. But any edit, delete, or user‑centric action demands a login session and a CSRF token. The sequence is a little clunky, which is why many developers write a tiny wrapper to hide the steps.
Step 1: Get a Login Token
import requests, json
S = requests.Session()
login_token_r = S.get(
'https://yourwiki.org/w/api.php',
params={'action':'query','meta':'tokens','type':'login','format':'json'}
)
login_token = login_token_r.json()['query']['tokens']['logintoken']Step 2: Log In
login_r = S.post(
'https://yourwiki.org/w/api.php',
data={
'action':'login',
'lgname':'YourBotUser',
'lgpassword':'SecretPass',
'lgtoken':login_token,
'format':'json'
}
)
print(login_r.json()) # should say "Success"Step 3: Grab a CSRF Token
Once you’ve got a cookie for the session, fetch the edit token:
csrf_r = S.get(
'https://yourwiki.org/w/api.php',
params={'action':'query','meta':'tokens','format':'json'}
)
csrf_token = csrf_r.json()['query']['tokens']['csrftoken']Now you’re ready to edit, move pages, or even roll back revisions – just remember to include token=csrf_token in the POST body.
Common Read Operations – From Search to Full Text
Let’s walk through a few “real‑world” scenarios you might encounter.
1. Full‑Text Search
https://yourwiki.org/w/api.php?action=query&list=search&srsearch=Open%20source&utf8=&format=jsonThe search list returns title, snippet, and pageid. Handy for autocomplete widgets.
2. Page Content (wikitext) Extraction
If you need the raw markup:
https://yourwiki.org/w/api.php?action=parse&page=Main_Page&prop=wikitext&format=json3. Structured Data via prop=parsetree
Parsing the wikitext into an AST gives you programmatic access to headings, links, and templates. The output is a JSON representation of the parse tree – not the prettiest, but useful when you need to rewrite templates on the fly.
Write Operations – Editing Without the UI
Editing is the part that scares most newbies. The API requires three things: a CSRF token, the page identifier (title or pageid), and the new content. Here’s a concise example that adds a line to an existing article.
<?php
$endpoint = 'https://yourwiki.org/w/api.php';
$client = new \GuzzleHttp\Client(['cookies' => true]);
// 1) login token
$res = $client->get($endpoint, ['query'=>['action'=>'query','meta'=>'tokens','type'=>'login','format'=>'json']]);
$loginToken = json_decode($res->getBody(), true)['query']['tokens']['logintoken'];
// 2) login
$client->post($endpoint, ['form_params'=>[
'action'=>'login','lgname'=>'BotUser','lgpassword'=>'Secret','lgtoken'=>$loginToken,'format'=>'json'
]]);
// 3) CSRF token
$res = $client->get($endpoint, ['query'=>['action'=>'query','meta'=>'tokens','format'=>'json']]);
$csrf = json_decode($res->getBody(), true)['query']['tokens']['csrftoken'];
// 4) fetch current content
$res = $client->get($endpoint, ['query'=>['action'=>'query','prop'=>'revisions','rvprop=content','titles'=>'Demo_Page','format'=>'json']]);
$page = json_decode($res->getBody(), true)['query']['pages'];
$pageId = key($page);
$current = $page[$pageId]['revisions'][0]['*'];
// 5) edit
$new = $current."\n== New Section ==\nAdded via API.";
$client->post($endpoint, ['form_params'=>[
'action'=>'edit','title'=>'Demo_Page','text'=>$new,'token'=>$csrf,'format'=>'json'
]]);
?>Notice the extra step to pull the existing content first – you don’t want to clobber what’s already there unless you really mean to. The API will reject a submission that looks like a bot‑spam edit without a proper edit summary, so always include a summary= parameter.
Handling Pagination and Continuation
Large wikis can return thousands of results; the API won’t dump everything in one go. Instead, it uses a continue token. The pattern looks like:
continue = {}
while True:
params = {
'action':'query',
'list':'categorymembers',
'cmtitle':'Category:Physics',
'cmlimit':'500',
'format':'json',
**continue
}
resp = S.get('https://yourwiki.org/w/api.php', params=params).json()
for page in resp['query']['categorymembers']:
print(page['title'])
if 'continue' not in resp:
break
continue = resp['continue']This loop pulls every member of a category, one chunk at a time, until the server stops sending a continue field.
Best Practices – Keep It Clean
- Rate‑limit yourself. MediaWiki enforces a modest request cap. Add a
sleep(0.2)or similar pause in batch jobs. - Cache tokens. Login and CSRF tokens are cheap to fetch, but you’ll hit the server less if you stash them for the duration of a script.
- Respect user consent. If you’re pulling private user data, make sure you have explicit permission – the API respects the same ACLs as the web UI.
- Validate input. When you receive user‑generated search terms, encode them properly; the API is forgiving but you’ll avoid 400 errors with
urllib.parse.quote_plusor similar. - Use
maxlagflag on write calls.
Putting It All Together – A Mini‑App Sketch
Imagine you want a tiny Flask service that returns the first paragraph of any Wikipedia article asked for via a /summary?title=... endpoint. Below is a sketch that shows how the API can be the backbone of a custom app.
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)
WIKI_API = 'https://en.wikipedia.org/w/api.php'
@app.route('/summary')
def summary():
title = request.args.get('title')
if not title:
return jsonify(error='title missing'), 400
params = {
'action':'query',
'prop':'extracts',
'exintro':True,
'explaintext':True,
'titles':title,
'format':'json'
}
resp = requests.get(WIKI_API, params=params).json()
page = next(iter(resp['query']['pages'].values()))
if 'extract' in page:
return jsonify(title=page['title'], summary=page['extract'])
return jsonify(error='not found'), 404
if __name__=='__main__':
app.run(debug=True)This snippet is only ~30 lines, yet it demonstrates a typical workflow: accept input, query the API, massage the JSON, and return a clean response.
Common Pitfalls and How to Dodge Them
Even seasoned developers stumble on a few quirks.
- Missing
utf8=1parameter. Some older MediaWiki versions get grumpy about non‑ASCII characters unless you explicitly tell the API you’re speaking UTF‑8. - Token “already used”. CSRF tokens are single‑use for edit actions; if a request fails, fetch a fresh token before retrying.
- Unexpected redirects. When editing a page that has a redirect, the API will return a warning. Decide whether to follow the redirect or edit the target directly.
- Namespace numbers. The API works with numeric namespace IDs (0 for articles, 2 for user pages, etc.). Forgetting this can lead to “page not found” errors even though the title looks correct.
Where to Dig Deeper
The official docs are surprisingly thorough. Good places to continue your journey include:
- API:Tutorial – step‑by‑step introductions.
- API:Action_API – full module reference.
- GitHub’s mediawiki‑api‑demos – ready‑made code in several languages.
- Developer Portal tutorials for Python, JavaScript, and PHP – they show practical use‑cases like generating article drafts.
Final Thoughts
If you’ve ever muttered “there’s got to be a better way” while copying tables from a wiki into a spreadsheet, you now have a roadmap. The MediaWiki Action API isn’t just a curiosity; it’s a fully‑featured service that lets you read, write, and automate just about anything the UI can do – often faster and at scale. By mastering the token dance, handling pagination with poise, and respecting the API’s rate limits, you’ll be able to weave wikis into any custom application, whether it’s a data‑driven dashboard, a bot that curates content, or a backend that keeps documentation in sync with code. The learning curve is real, but the payoff? A truly programmable knowledge base at your fingertips.