Automating Bulk Edits in MediaWiki Using the REST API

Automating Bulk Edits in MediaWiki Using the REST API

When you need to apply the same change to dozens or thousands of pages – for example adding a template, fixing a recurring typo, or inserting a tracking tag – doing it manually is impractical. MediaWiki provides a modern REST API that can be scripted to perform edits programmatically. This guide walks through the essential steps: authenticating, retrieving the latest revision, building the edit payload, handling conflicts, and respecting rate‑limits. All examples use only the core REST API, so they work on any MediaWiki installation that runs version 1.35 or later.

Why the REST API?

  • Stateless HTTP – each request contains everything the server needs (method, JSON body, authentication header).
  • Consistent JSON schema – responses are easy to parse in any language.
  • Versioned endpoints – /v1/ guarantees backward compatibility for the lifetime of your script.
  • Extension‑friendly – extensions can expose additional endpoints without breaking existing code.

Although the classic Action API (api.php) can also edit pages, the REST API’s PUT /core/v1/{project}/{language}/page/{title} endpoint is the most straightforward for bulk operations.

Prerequisites

  1. A MediaWiki installation with REST API enabled (MediaWiki 1.35+).
  2. An OAuth consumer or a Bot password that grants the edit right. The token is sent as a Bearer token in the Authorization header.
  3. A programming environment that can send HTTP requests – Python, PHP, JavaScript, or even curl from a shell script.
  4. A list of pages to edit. The list can be generated via the list=allpages action of the Action API, stored in a CSV file, or produced by a custom query.

Authentication and the User‑Agent Header

The REST API requires two headers for every request:

User-Agent: MyBulkBot/1.0 (https://example.org/bot; user@example.org)
Authorization: Bearer <access‑token>

MediaWiki enforces a User-Agent policy to help administrators identify automated traffic. Include a contact URL or e‑mail address so you can be reached if your script generates too many requests.

Fetching the Latest Revision

To edit a page you must provide the latest.id field – the revision identifier of the version you are editing. The easiest way is to call the GET /core/v1/{project}/{language}/page/{title}/source endpoint. The response contains a latest.id value you can reuse.

curl -s -H "User-Agent: MyBulkBot/1.0" \
     -H "Authorization: Bearer $TOKEN" \
     https://api.wikimedia.org/core/v1/wikipedia/en/page/Example_Page/source | \
     jq '.latest.id'

In a script you would store that ID for each page before constructing the edit request.

Constructing the Edit Payload

The JSON body of a PUT request contains three required fields:

  • source – the new page content (wikitext by default).
  • comment – an edit summary for the revision history.
  • latest.id – the revision you are basing the edit on.

Optionally you can set content_model if you are editing CSS, JavaScript, JSON, or plain text.

Python example

import requests, json

BASE = "https://api.wikimedia.org/core/v1/wikipedia/en"
TOKEN = "YOUR_ACCESS_TOKEN"
HEADERS = {
    "User-Agent": "MyBulkBot/1.0 (https://example.org/bot; user@example.org)",
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

def get_latest_rev(title):
    url = f"{BASE}/page/{title}/source"
    r = requests.get(url, headers=HEADERS)
    r.raise_for_status()
    return r.json()["latest"]["id"]

def edit_page(title, new_wikitext, summary):
    rev_id = get_latest_rev(title)
    payload = {
        "source": new_wikitext,
        "comment": summary,
        "latest": {"id": rev_id},
    }
    url = f"{BASE}/page/{title}"
    resp = requests.put(url, headers=HEADERS, data=json.dumps(payload))
    if resp.status_code == 200:
        print(f"✅ {title} updated")
    elif resp.status_code == 409:
        print(f"⚠️ {title} edit conflict – skipping")
    else:
        print(f"❌ {title} error {resp.status_code}: {resp.text}")

# Example bulk loop
pages = ["Template:Example", "Page:OldName", "Category:Legacy"]
for p in pages:
    edit_page(p, "{{NewTemplate}}
{{{1}}}", "Add NewTemplate via bulk script")

Shell‑script with curl

#!/usr/bin/env bash
TOKEN="YOUR_ACCESS_TOKEN"
UA="MyBulkBot/1.0 (https://example.org/bot; user@example.org)"
BASE="https://api.wikimedia.org/core/v1/wikipedia/en"

edit_page() {
    local title="$1"
    local newcontent="$2"
    local summary="$3"
    # Get latest revision ID
    rev=$(curl -s -H "User-Agent: $UA" \
               -H "Authorization: Bearer $TOKEN" \
               "$BASE/page/$title/source" | jq -r '.latest.id')
    # Build JSON payload
    payload=$(jq -n --arg src "$newcontent" \
                    --arg com "$summary" \
                    --argjson rev "$rev" \
                    '{source:$src, comment:$com, latest:{id:$rev}}')
    # Send PUT request
    resp=$(curl -s -o /dev/null -w "%{http_code}" -X PUT "$BASE/page/$title" \
        -H "User-Agent: $UA" \
        -H "Authorization: Bearer $TOKEN" \
        -H "Content-Type: application/json" \
        --data "$payload")
    if [[ $resp == 200 ]]; then
        echo "✅ $title updated"
    elif [[ $resp == 409 ]]; then
        echo "⚠️ $title conflict – skipped"
    else
        echo "❌ $title error $resp"
    fi
}

# Example usage – read titles from a file
while IFS=$'\t' read -r title newtext summary; do
    edit_page "$title" "$newtext" "$summary"
    # Respect polite rate limit (1 request per second)
    sleep 1
done < pages_to_edit.tsv

Handling Edit Conflicts

The REST API automatically merges simple conflicts. If the server cannot resolve the conflict it returns 409 Conflict with a diff description. In a bulk script you usually want to:

  • Log the conflict for later manual review.
  • Optionally fetch the latest source again and retry with a smarter merge (e.g., prepend the new snippet instead of overwriting the whole page).

Because the latest.id is tied to a specific revision, the most reliable strategy is to keep the edit payload small – for instance, only prepend a template or a tag – so the chance of a conflict is minimal.

Rate‑Limiting and Courtesy

Wikimedia projects enforce a request‑rate policy to protect the infrastructure. The documentation recommends:

  • One request per second for anonymous bots.
  • Two to three requests per second for authenticated bots that have been whitelisted.
  • Respect the Retry‑After header if the server returns 429 Too Many Requests.

Implement a simple sleep between iterations, or use a token bucket algorithm for higher throughput. Always include a descriptive User-Agent so administrators can identify your script.

Putting It All Together – A Minimal Bulk Bot

The following Python script demonstrates a complete bulk‑edit workflow:

#!/usr/bin/env python3
"""Bulk edit bot using MediaWiki REST API.
   Reads a TSV file: title\tnew_wikitext\tedit_summary
"""
import csv, json, sys, time, requests

BASE = "https://api.wikimedia.org/core/v1/wikipedia/en"
TOKEN = "YOUR_ACCESS_TOKEN"
HEADERS = {
    "User-Agent": "BulkEditBot/1.2 (https://example.org/bot; bot@example.org)",
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

def get_latest(title):
    r = requests.get(f"{BASE}/page/{title}/source", headers=HEADERS)
    r.raise_for_status()
    return r.json()["latest"]["id"]

def edit(title, source, comment):
    rev = get_latest(title)
    payload = {"source": source, "comment": comment, "latest": {"id": rev}}
    r = requests.put(f"{BASE}/page/{title}", headers=HEADERS, data=json.dumps(payload))
    if r.status_code == 200:
        print(f"✅ {title}")
    elif r.status_code == 409:
        print(f"⚠️ Conflict on {title}")
    else:
        print(f"❌ {title} – {r.status_code}: {r.text}")

if len(sys.argv) != 2:
    print("Usage: bulk_edit.py pages.tsv")
    sys.exit(1)

with open(sys.argv[1], newline='', encoding='utf-8') as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        if len(row) != 3:
            continue
        title, new_wikitext, summary = row
        edit(title, new_wikitext, summary)
        time.sleep(1)  # polite rate limit

Save your list of pages as pages.tsv, run the script, and watch the console output for success, conflict, or error messages.

Beyond Simple Edits – Advanced Use Cases

  • Section‑only edits – fetch the page source, replace a specific == Section == block, and submit the whole page. The REST API does not have a dedicated section parameter, so you must manipulate the wikitext yourself.
  • Conditional edits – use the latest.id check to ensure the page has not changed since you fetched it. If the 409 response is received, you can fetch the new source, apply your transformation again, and retry.
  • Batch creation of stub pages – omit the latest.id field entirely. The API will create the page if it does not exist, returning a 201 Created response.
  • Multi‑wiki bots – the {project} path segment lets the same script target wikipedia, wiktionary, commons, etc., by looping over project names.

Testing in a Sandbox

Before running on a production wiki, test against a sandbox page or a private test wiki. The sandbox page (e.g., Wikipedia:Sandbox) is ideal because it is intended for experimental edits. Verify that:

  1. The latest.id you retrieve matches the revision you edit.
  2. Your edit summary appears correctly in the revision history.
  3. Conflicts are handled as expected.

Once the script works on a sandbox, you can safely scale up to the full list of pages.

Common Pitfalls

  • Missing latest.id – the API will reject the request with 400 Bad Request. Always fetch the revision first.
  • Wrong content model – if you edit a CSS page but send content_model: "wikitext", the server returns 400 Bad content model. Match the model to the page type.
  • Improper User‑Agent – generic agents like curl/7.68.0 may be blocked. Use a descriptive string per the User‑Agent policy.
  • Rate‑limit errors – a 429 response means you are sending requests too quickly. Back off and respect the Retry‑After header.

Conclusion

Bulk editing with the MediaWiki REST API is a reliable, language‑agnostic way to automate repetitive changes across a wiki. The key steps are:

  1. Obtain an OAuth or Bot‑password token.
  2. Fetch the latest revision ID for each target page.
  3. Build a JSON payload containing the new wikitext, a concise edit summary, and the latest.id.
  4. Send a PUT request to /core/v1/{project}/{language}/page/{title}.
  5. Handle 409 conflicts, respect rate limits, and log results.

With a few dozen lines of code you can safely update thousands of pages, keep the edit history clean, and free yourself from tedious manual copy‑and‑paste work. The same pattern works for page creation, CSS/JavaScript updates, and even multi‑project bots, making the REST API a versatile tool for any MediaWiki automation effort.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe