Automating Bulk Edits in MediaWiki Using the REST API
Automating Bulk Edits in MediaWiki Using the REST API
When you need to apply the same change to dozens or thousands of pages – for example adding a template, fixing a recurring typo, or inserting a tracking tag – doing it manually is impractical. MediaWiki provides a modern REST API that can be scripted to perform edits programmatically. This guide walks through the essential steps: authenticating, retrieving the latest revision, building the edit payload, handling conflicts, and respecting rate‑limits. All examples use only the core REST API, so they work on any MediaWiki installation that runs version 1.35 or later.
Why the REST API?
- Stateless HTTP – each request contains everything the server needs (method, JSON body, authentication header).
- Consistent JSON schema – responses are easy to parse in any language.
- Versioned endpoints – /v1/ guarantees backward compatibility for the lifetime of your script.
- Extension‑friendly – extensions can expose additional endpoints without breaking existing code.
Although the classic Action API (api.php) can also edit pages, the REST API’s PUT /core/v1/{project}/{language}/page/{title} endpoint is the most straightforward for bulk operations.
Prerequisites
- A MediaWiki installation with REST API enabled (MediaWiki 1.35+).
- An OAuth consumer or a Bot password that grants the
editright. The token is sent as aBearertoken in theAuthorizationheader. - A programming environment that can send HTTP requests – Python, PHP, JavaScript, or even
curlfrom a shell script. - A list of pages to edit. The list can be generated via the
list=allpagesaction of the Action API, stored in a CSV file, or produced by a custom query.
Authentication and the User‑Agent Header
The REST API requires two headers for every request:
User-Agent: MyBulkBot/1.0 (https://example.org/bot; user@example.org)
Authorization: Bearer <access‑token>MediaWiki enforces a User-Agent policy to help administrators identify automated traffic. Include a contact URL or e‑mail address so you can be reached if your script generates too many requests.
Fetching the Latest Revision
To edit a page you must provide the latest.id field – the revision identifier of the version you are editing. The easiest way is to call the GET /core/v1/{project}/{language}/page/{title}/source endpoint. The response contains a latest.id value you can reuse.
curl -s -H "User-Agent: MyBulkBot/1.0" \
-H "Authorization: Bearer $TOKEN" \
https://api.wikimedia.org/core/v1/wikipedia/en/page/Example_Page/source | \
jq '.latest.id'In a script you would store that ID for each page before constructing the edit request.
Constructing the Edit Payload
The JSON body of a PUT request contains three required fields:
source– the new page content (wikitext by default).comment– an edit summary for the revision history.latest.id– the revision you are basing the edit on.
Optionally you can set content_model if you are editing CSS, JavaScript, JSON, or plain text.
Python example
import requests, json
BASE = "https://api.wikimedia.org/core/v1/wikipedia/en"
TOKEN = "YOUR_ACCESS_TOKEN"
HEADERS = {
"User-Agent": "MyBulkBot/1.0 (https://example.org/bot; user@example.org)",
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json",
}
def get_latest_rev(title):
url = f"{BASE}/page/{title}/source"
r = requests.get(url, headers=HEADERS)
r.raise_for_status()
return r.json()["latest"]["id"]
def edit_page(title, new_wikitext, summary):
rev_id = get_latest_rev(title)
payload = {
"source": new_wikitext,
"comment": summary,
"latest": {"id": rev_id},
}
url = f"{BASE}/page/{title}"
resp = requests.put(url, headers=HEADERS, data=json.dumps(payload))
if resp.status_code == 200:
print(f"✅ {title} updated")
elif resp.status_code == 409:
print(f"⚠️ {title} edit conflict – skipping")
else:
print(f"❌ {title} error {resp.status_code}: {resp.text}")
# Example bulk loop
pages = ["Template:Example", "Page:OldName", "Category:Legacy"]
for p in pages:
edit_page(p, "{{NewTemplate}}
{{{1}}}", "Add NewTemplate via bulk script")
Shell‑script with curl
#!/usr/bin/env bash
TOKEN="YOUR_ACCESS_TOKEN"
UA="MyBulkBot/1.0 (https://example.org/bot; user@example.org)"
BASE="https://api.wikimedia.org/core/v1/wikipedia/en"
edit_page() {
local title="$1"
local newcontent="$2"
local summary="$3"
# Get latest revision ID
rev=$(curl -s -H "User-Agent: $UA" \
-H "Authorization: Bearer $TOKEN" \
"$BASE/page/$title/source" | jq -r '.latest.id')
# Build JSON payload
payload=$(jq -n --arg src "$newcontent" \
--arg com "$summary" \
--argjson rev "$rev" \
'{source:$src, comment:$com, latest:{id:$rev}}')
# Send PUT request
resp=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "$BASE/page/$title" \
-H "User-Agent: $UA" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
--data "$payload")
if [[ $resp == 200 ]]; then
echo "✅ $title updated"
elif [[ $resp == 409 ]]; then
echo "⚠️ $title conflict – skipped"
else
echo "❌ $title error $resp"
fi
}
# Example usage – read titles from a file
while IFS=$'\t' read -r title newtext summary; do
edit_page "$title" "$newtext" "$summary"
# Respect polite rate limit (1 request per second)
sleep 1
done < pages_to_edit.tsv
Handling Edit Conflicts
The REST API automatically merges simple conflicts. If the server cannot resolve the conflict it returns 409 Conflict with a diff description. In a bulk script you usually want to:
- Log the conflict for later manual review.
- Optionally fetch the latest source again and retry with a smarter merge (e.g., prepend the new snippet instead of overwriting the whole page).
Because the latest.id is tied to a specific revision, the most reliable strategy is to keep the edit payload small – for instance, only prepend a template or a tag – so the chance of a conflict is minimal.
Rate‑Limiting and Courtesy
Wikimedia projects enforce a request‑rate policy to protect the infrastructure. The documentation recommends:
- One request per second for anonymous bots.
- Two to three requests per second for authenticated bots that have been whitelisted.
- Respect the
Retry‑Afterheader if the server returns429 Too Many Requests.
Implement a simple sleep between iterations, or use a token bucket algorithm for higher throughput. Always include a descriptive User-Agent so administrators can identify your script.
Putting It All Together – A Minimal Bulk Bot
The following Python script demonstrates a complete bulk‑edit workflow:
#!/usr/bin/env python3
"""Bulk edit bot using MediaWiki REST API.
Reads a TSV file: title\tnew_wikitext\tedit_summary
"""
import csv, json, sys, time, requests
BASE = "https://api.wikimedia.org/core/v1/wikipedia/en"
TOKEN = "YOUR_ACCESS_TOKEN"
HEADERS = {
"User-Agent": "BulkEditBot/1.2 (https://example.org/bot; bot@example.org)",
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json",
}
def get_latest(title):
r = requests.get(f"{BASE}/page/{title}/source", headers=HEADERS)
r.raise_for_status()
return r.json()["latest"]["id"]
def edit(title, source, comment):
rev = get_latest(title)
payload = {"source": source, "comment": comment, "latest": {"id": rev}}
r = requests.put(f"{BASE}/page/{title}", headers=HEADERS, data=json.dumps(payload))
if r.status_code == 200:
print(f"✅ {title}")
elif r.status_code == 409:
print(f"⚠️ Conflict on {title}")
else:
print(f"❌ {title} – {r.status_code}: {r.text}")
if len(sys.argv) != 2:
print("Usage: bulk_edit.py pages.tsv")
sys.exit(1)
with open(sys.argv[1], newline='', encoding='utf-8') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
if len(row) != 3:
continue
title, new_wikitext, summary = row
edit(title, new_wikitext, summary)
time.sleep(1) # polite rate limit
Save your list of pages as pages.tsv, run the script, and watch the console output for success, conflict, or error messages.
Beyond Simple Edits – Advanced Use Cases
- Section‑only edits – fetch the page source, replace a specific
== Section ==block, and submit the whole page. The REST API does not have a dedicated section parameter, so you must manipulate the wikitext yourself. - Conditional edits – use the
latest.idcheck to ensure the page has not changed since you fetched it. If the409response is received, you can fetch the new source, apply your transformation again, and retry. - Batch creation of stub pages – omit the
latest.idfield entirely. The API will create the page if it does not exist, returning a201 Createdresponse. - Multi‑wiki bots – the
{project}path segment lets the same script targetwikipedia,wiktionary,commons, etc., by looping over project names.
Testing in a Sandbox
Before running on a production wiki, test against a sandbox page or a private test wiki. The sandbox page (e.g., Wikipedia:Sandbox) is ideal because it is intended for experimental edits. Verify that:
- The
latest.idyou retrieve matches the revision you edit. - Your edit summary appears correctly in the revision history.
- Conflicts are handled as expected.
Once the script works on a sandbox, you can safely scale up to the full list of pages.
Common Pitfalls
- Missing
latest.id– the API will reject the request with400 Bad Request. Always fetch the revision first. - Wrong content model – if you edit a CSS page but send
content_model: "wikitext", the server returns400 Bad content model. Match the model to the page type. - Improper User‑Agent – generic agents like
curl/7.68.0may be blocked. Use a descriptive string per the User‑Agent policy. - Rate‑limit errors – a
429response means you are sending requests too quickly. Back off and respect theRetry‑Afterheader.
Conclusion
Bulk editing with the MediaWiki REST API is a reliable, language‑agnostic way to automate repetitive changes across a wiki. The key steps are:
- Obtain an OAuth or Bot‑password token.
- Fetch the latest revision ID for each target page.
- Build a JSON payload containing the new wikitext, a concise edit summary, and the
latest.id. - Send a
PUTrequest to/core/v1/{project}/{language}/page/{title}. - Handle
409conflicts, respect rate limits, and log results.
With a few dozen lines of code you can safely update thousands of pages, keep the edit history clean, and free yourself from tedious manual copy‑and‑paste work. The same pattern works for page creation, CSS/JavaScript updates, and even multi‑project bots, making the REST API a versatile tool for any MediaWiki automation effort.