Using the MediaWiki API to Automate Bulk Edits

Why automate bulk edits?

Large wikis often need repetitive changes – adding a template, fixing a typo across hundreds of pages, or updating category membership after a policy shift. Doing this manually is error‑prone and time‑consuming. The MediaWiki Action API gives programmatic access to read and write operations, allowing a script or bot to apply the same change to many pages in a controlled, repeatable way.

Core concepts

  • Authentication – Bots must log in with a bot password (see API:Login) and keep the session cookies.
  • CSRF token – Every write request needs a csrf token obtained from API:Tokens.
  • Page list – Use query modules (list=allpages, generator=categorymembers, etc.) to build the set of pages you will edit.
  • Rate limits & edit conflicts – Respect maxlag, use basetimestamp/ starttimestamp, and handle editconflict errors gracefully.

Step‑by‑step workflow

Iterate and edit. Typical edit: prepend a notice.

for title in pages:
    # 1. Get current revision and timestamp (helps avoid conflicts)
    rev = S.get('https://example.org/w/api.php', params={
        'action':'query','prop':'revisions','titles':title,
        'rvprop':'content|timestamp','formatversion':'2','format':'json'}).json()
    page = rev['query']['pages'][0]
    old_text = page['revisions'][0]['content']
    base_ts = page['revisions'][0]['timestamp']
    # 2. Build new content (prepend a template)
    new_text = "{{Outdated notice}}
" + old_text
    # 3. Send the edit request
    edit_params = {
        'action':'edit','title':title,'text':new_text,'summary':'Add outdated notice',
        'basetimestamp':base_ts,'starttimestamp':rev['curtimestamp'],
        'token':csrf,'format':'json'}
    edit_resp = S.post('https://example.org/w/api.php', data=edit_params).json()
    if edit_resp.get('edit',{}).get('result') != 'Success':
        print('Failed on', title, edit_resp)
    else:
        print('Edited', title)
    # optional: short sleep to stay under rate limits
    time.sleep(0.2)

Generate the target list. Example: all pages in the Category:Outdated.

pages = []
apcontinue = None
while True:
    params = {
        'action':'query','list':'categorymembers','cmtitle':'Category:Outdated',
        'cmlimit':'500','format':'json'
    }
    if apcontinue:
        params['cmcontinue'] = apcontinue
    resp = S.get('https://example.org/w/api.php', params=params).json()
    pages.extend([p['title'] for p in resp['query']['categorymembers']])
    if 'continue' not in resp:
        break
    apcontinue = resp['continue']['cmcontinue']

Fetch a CSRF token for the edit session.

csrf = S.get('https://example.org/w/api.php', params={
    'action':'query','meta':'tokens','format':'json'}).json()['query']['tokens']['csrftoken']

Log in and store cookies.

import requests
S = requests.Session()
login_token = S.get('https://example.org/w/api.php', params={
    'action':'query','meta':'tokens','type':'login','format':'json'}).json()['query']['tokens']['logintoken']
S.post('https://example.org/w/api.php', data={
    'action':'login','lgname':'BotUser','lgpassword':'BotPass','lgtoken':login_token,'format':'json'})

Using the edit module efficiently

The API:Edit module accepts several parameters that make bulk operations smoother:

  • appendtext / prependtext – avoid sending the whole page when you only need to add content.
  • bot – mark the edit as a bot edit (requires the bot right).
  • minor – optionally flag the edit as minor.
  • maxlag – tell the server you are willing to wait if replication lag is high.

Client‑side regex bulk edit – the MassEditRegex extension

If you have control over the wiki, the MassEditRegex extension provides a graphical “Special:MassEditRegex” page. It runs the regular‑expression replacement on the client side, so the PHP execution timeout is avoided. The workflow is:

  1. Grant the masseditregex right to a user group (often sysop).
  2. Navigate to Special:MassEditRegex.
  3. Enter a page source selector (category, prefix, etc.) and the search/replace regex.
  4. Run – the extension will apply the change to every matching page, recording each edit as a bot edit.

While convenient, MassEditRegex is limited to simple regexes and cannot handle complex conditional logic that a full script can.

Handling edit conflicts

When many pages are edited in a short window, two processes may try to edit the same page. The API returns editconflict. A robust script should:

  • Fetch the latest revision timestamp before each edit (as shown in the example).
  • Retry the edit a few times with the new baseretimestamp and starttimestamp.
  • Log failures for manual review.

CAPTCHA and protected pages

Pages protected from edits or wikis that enable ConfirmEdit will block automated edits. Options:

  • Give the bot account the required rights (e.g., editprotected).
  • If a CAPTCHA is required, the API returns captchaid and captchaword fields; the script must present the challenge to a human or use an OCR service (not recommended for production).

Rate limiting and polite automation

MediaWiki enforces maxlag and per‑IP edit limits. Good practice:

  • Include maxlag=5 in every request.
  • Throttle requests (e.g., 5 edits per second) with time.sleep().
  • Monitor the Retry-After header when you receive a 429 response.

Alternative approaches

  • REST API – Wikimedia’s Core REST API (Core REST API – edit page) lets you send JSON payloads. It is useful for cross‑project bots and supports the same token flow.
  • Pywikibot – a mature Python library that wraps the Action API, handles login, token management, and pagination out of the box. It also ships with a replace script that works like MassEditRegex but runs locally.
  • MediaWiki JS (mw.Api) – for client‑side bots that run inside a logged‑in user’s browser. Use mw.Api().edit() with a csrf token.

Putting it all together – a minimal bulk‑edit script

#!/usr/bin/env python3
import time, requests

API = 'https://example.org/w/api.php'
USERNAME = 'BotUser'
PASSWORD = 'BotPass'

session = requests.Session()
# 1. login
login_token = session.get(API, params={'action':'query','meta':'tokens','type':'login','format':'json'}).json()['query']['tokens']['logintoken']
session.post(API, data={'action':'login','lgname':USERNAME,'lgpassword':PASSWORD,'lgtoken':login_token,'format':'json'})
# 2. csrf token
csrf = session.get(API, params={'action':'query','meta':'tokens','format':'json'}).json()['query']['tokens']['csrftoken']
# 3. pages to edit – all pages with prefix "Template:Old"
pages = []
cmcontinue = None
while True:
    params = {'action':'query','list':'allpages','apnamespace':10,'apprefix':'Old','aplimit':'500','format':'json'}
    if cmcontinue:
        params['apcontinue'] = cmcontinue
    resp = session.get(API, params=params).json()
    pages.extend([p['title'] for p in resp['query']['allpages']])
    if 'continue' not in resp:
        break
    cmcontinue = resp['continue']['apcontinue']
# 4. bulk edit – rename "Old" to "New"
for title in pages:
    # fetch current content
    rev = session.get(API, params={'action':'query','prop':'revisions','titles':title,'rvprop':'content|timestamp','formatversion':'2','format':'json'}).json()
    page = rev['query']['pages'][0]
    old = page['revisions'][0]['content']
    ts = page['revisions'][0]['timestamp']
    new = old.replace('OldTemplate', 'NewTemplate')
    edit = session.post(API, data={
        'action':'edit','title':title,'text':new,'summary':'Rename OldTemplate → NewTemplate',
        'basetimestamp':ts,'token':csrf,'bot':True,'format':'json'}).json()
    if edit.get('edit',{}).get('result') == 'Success':
        print('✔', title)
    else:
        print('✘', title, edit)
    time.sleep(0.3)

This script demonstrates the full cycle – login, token handling, page enumeration, conflict‑aware edit, and throttling. Adapt the selector, transformation, and summary to your own bulk‑edit task.

Best‑practice checklist

  • Use a dedicated bot account with bot and any required editprotected rights.
  • Store credentials securely (e.g., Special:BotPasswords).
  • Always fetch a fresh CSRF token before the edit batch.
  • Include basetimestamp and starttimestamp to minimise edit conflicts.
  • Respect maxlag and add a short delay between requests.
  • Log every response – successes, conflicts, and API errors – for auditability.
  • Test on a sandbox wiki before running on production.

Conclusion

Automating bulk edits with the MediaWiki API is straightforward once you understand the authentication flow, token handling, and pagination. Whether you write a tiny Python script, leverage Pywikibot, or use the server‑side MassEditRegex extension, the API gives you the same low‑level control. By following the checklist above you can safely apply large‑scale changes while staying within the wiki’s rate limits and preserving edit history integrity.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe