Using the MediaWiki API to Automate Bulk Edits
Why automate bulk edits?
Large wikis often need repetitive changes – adding a template, fixing a typo across hundreds of pages, or updating category membership after a policy shift. Doing this manually is error‑prone and time‑consuming. The MediaWiki Action API gives programmatic access to read and write operations, allowing a script or bot to apply the same change to many pages in a controlled, repeatable way.
Core concepts
- Authentication – Bots must log in with a bot password (see API:Login) and keep the session cookies.
- CSRF token – Every write request needs a
csrftoken obtained from API:Tokens. - Page list – Use query modules (
list=allpages,generator=categorymembers, etc.) to build the set of pages you will edit. - Rate limits & edit conflicts – Respect
maxlag, usebasetimestamp/starttimestamp, and handleeditconflicterrors gracefully.
Step‑by‑step workflow
Iterate and edit. Typical edit: prepend a notice.
for title in pages:
# 1. Get current revision and timestamp (helps avoid conflicts)
rev = S.get('https://example.org/w/api.php', params={
'action':'query','prop':'revisions','titles':title,
'rvprop':'content|timestamp','formatversion':'2','format':'json'}).json()
page = rev['query']['pages'][0]
old_text = page['revisions'][0]['content']
base_ts = page['revisions'][0]['timestamp']
# 2. Build new content (prepend a template)
new_text = "{{Outdated notice}}
" + old_text
# 3. Send the edit request
edit_params = {
'action':'edit','title':title,'text':new_text,'summary':'Add outdated notice',
'basetimestamp':base_ts,'starttimestamp':rev['curtimestamp'],
'token':csrf,'format':'json'}
edit_resp = S.post('https://example.org/w/api.php', data=edit_params).json()
if edit_resp.get('edit',{}).get('result') != 'Success':
print('Failed on', title, edit_resp)
else:
print('Edited', title)
# optional: short sleep to stay under rate limits
time.sleep(0.2)
Generate the target list. Example: all pages in the Category:Outdated.
pages = []
apcontinue = None
while True:
params = {
'action':'query','list':'categorymembers','cmtitle':'Category:Outdated',
'cmlimit':'500','format':'json'
}
if apcontinue:
params['cmcontinue'] = apcontinue
resp = S.get('https://example.org/w/api.php', params=params).json()
pages.extend([p['title'] for p in resp['query']['categorymembers']])
if 'continue' not in resp:
break
apcontinue = resp['continue']['cmcontinue']
Fetch a CSRF token for the edit session.
csrf = S.get('https://example.org/w/api.php', params={
'action':'query','meta':'tokens','format':'json'}).json()['query']['tokens']['csrftoken']
Log in and store cookies.
import requests
S = requests.Session()
login_token = S.get('https://example.org/w/api.php', params={
'action':'query','meta':'tokens','type':'login','format':'json'}).json()['query']['tokens']['logintoken']
S.post('https://example.org/w/api.php', data={
'action':'login','lgname':'BotUser','lgpassword':'BotPass','lgtoken':login_token,'format':'json'})
Using the edit module efficiently
The API:Edit module accepts several parameters that make bulk operations smoother:
appendtext/prependtext– avoid sending the whole page when you only need to add content.bot– mark the edit as a bot edit (requires thebotright).minor– optionally flag the edit as minor.maxlag– tell the server you are willing to wait if replication lag is high.
Client‑side regex bulk edit – the MassEditRegex extension
If you have control over the wiki, the MassEditRegex extension provides a graphical “Special:MassEditRegex” page. It runs the regular‑expression replacement on the client side, so the PHP execution timeout is avoided. The workflow is:
- Grant the
masseditregexright to a user group (oftensysop). - Navigate to
Special:MassEditRegex. - Enter a page source selector (category, prefix, etc.) and the search/replace regex.
- Run – the extension will apply the change to every matching page, recording each edit as a bot edit.
While convenient, MassEditRegex is limited to simple regexes and cannot handle complex conditional logic that a full script can.
Handling edit conflicts
When many pages are edited in a short window, two processes may try to edit the same page. The API returns editconflict. A robust script should:
- Fetch the latest revision timestamp before each edit (as shown in the example).
- Retry the edit a few times with the new
baseretimestampandstarttimestamp. - Log failures for manual review.
CAPTCHA and protected pages
Pages protected from edits or wikis that enable ConfirmEdit will block automated edits. Options:
- Give the bot account the required rights (e.g.,
editprotected). - If a CAPTCHA is required, the API returns
captchaidandcaptchawordfields; the script must present the challenge to a human or use an OCR service (not recommended for production).
Rate limiting and polite automation
MediaWiki enforces maxlag and per‑IP edit limits. Good practice:
- Include
maxlag=5in every request. - Throttle requests (e.g., 5 edits per second) with
time.sleep(). - Monitor the
Retry-Afterheader when you receive a429response.
Alternative approaches
- REST API – Wikimedia’s Core REST API (Core REST API – edit page) lets you send JSON payloads. It is useful for cross‑project bots and supports the same token flow.
- Pywikibot – a mature Python library that wraps the Action API, handles login, token management, and pagination out of the box. It also ships with a
replacescript that works like MassEditRegex but runs locally. - MediaWiki JS (mw.Api) – for client‑side bots that run inside a logged‑in user’s browser. Use
mw.Api().edit()with acsrftoken.
Putting it all together – a minimal bulk‑edit script
#!/usr/bin/env python3
import time, requests
API = 'https://example.org/w/api.php'
USERNAME = 'BotUser'
PASSWORD = 'BotPass'
session = requests.Session()
# 1. login
login_token = session.get(API, params={'action':'query','meta':'tokens','type':'login','format':'json'}).json()['query']['tokens']['logintoken']
session.post(API, data={'action':'login','lgname':USERNAME,'lgpassword':PASSWORD,'lgtoken':login_token,'format':'json'})
# 2. csrf token
csrf = session.get(API, params={'action':'query','meta':'tokens','format':'json'}).json()['query']['tokens']['csrftoken']
# 3. pages to edit – all pages with prefix "Template:Old"
pages = []
cmcontinue = None
while True:
params = {'action':'query','list':'allpages','apnamespace':10,'apprefix':'Old','aplimit':'500','format':'json'}
if cmcontinue:
params['apcontinue'] = cmcontinue
resp = session.get(API, params=params).json()
pages.extend([p['title'] for p in resp['query']['allpages']])
if 'continue' not in resp:
break
cmcontinue = resp['continue']['apcontinue']
# 4. bulk edit – rename "Old" to "New"
for title in pages:
# fetch current content
rev = session.get(API, params={'action':'query','prop':'revisions','titles':title,'rvprop':'content|timestamp','formatversion':'2','format':'json'}).json()
page = rev['query']['pages'][0]
old = page['revisions'][0]['content']
ts = page['revisions'][0]['timestamp']
new = old.replace('OldTemplate', 'NewTemplate')
edit = session.post(API, data={
'action':'edit','title':title,'text':new,'summary':'Rename OldTemplate → NewTemplate',
'basetimestamp':ts,'token':csrf,'bot':True,'format':'json'}).json()
if edit.get('edit',{}).get('result') == 'Success':
print('✔', title)
else:
print('✘', title, edit)
time.sleep(0.3)
This script demonstrates the full cycle – login, token handling, page enumeration, conflict‑aware edit, and throttling. Adapt the selector, transformation, and summary to your own bulk‑edit task.
Best‑practice checklist
- Use a dedicated bot account with
botand any requirededitprotectedrights. - Store credentials securely (e.g.,
Special:BotPasswords). - Always fetch a fresh CSRF token before the edit batch.
- Include
basetimestampandstarttimestampto minimise edit conflicts. - Respect
maxlagand add a short delay between requests. - Log every response – successes, conflicts, and API errors – for auditability.
- Test on a sandbox wiki before running on production.
Conclusion
Automating bulk edits with the MediaWiki API is straightforward once you understand the authentication flow, token handling, and pagination. Whether you write a tiny Python script, leverage Pywikibot, or use the server‑side MassEditRegex extension, the API gives you the same low‑level control. By following the checklist above you can safely apply large‑scale changes while staying within the wiki’s rate limits and preserving edit history integrity.