Manual:Pywikibot/Cookbook/Working with your watchlist
Pywikibot |
---|
|
We have a watchlist.py among scripts which deals with the watchlist of the bot. This does not sound too exciting. But if you have several thousand pages on your watchlist, handling it by bot may sound well. To do this you have to run the bot with your own username rather than that of the bot. Either you overwrite it in user-config.py or save the commands in a script and run
python pwb.py -user:MyUserAccount myscript
Loading your watchlist
[edit]The first tool we need is site.watched_pages()
. This is a page generator, so you may process pages with a for loop or change it to a list. When you edit your watchlist on Wikipedia, talk pages will not appear. This is not the case for site.watched_pages()
! It will double the number by listing content pages as well as talk pages. You may want to cut it.
Print the number of watched pages:
print(len(list(site.watched_pages())))
This may take a while as it goes over all the pages. For me it is 18235. Hmm, it's odd, in both meaning. :-) How can a doubled number be odd? This reveals a technical error: decades ago I watched [[WP:AZ]]
which was a redirect to a project page, but technically a page in article namespace back then, having the talk page [[Vita:WP:AZ]]
. Meanwhile WP:
was turned into an alias for Wikipedia namespace, getting a new talk page, and the old one remained there stuck and abandoned, causing this oddity.
If you don't have such a problem, then
watchlist = [page for page in site.watched_pages() if not page.isTalkPage()]
print(len(watchlist))
will do both tasks: turn the generator into a real list and throw away talk pages as dealing with them separately is senseless. Note that site.watched_pages()
takes a total=n
argument if you want only the first n
, but we want to process the whole watchlist now. You should get half of the previous number if everything goes well. For me it raises an error because of the above stuck talk page. (See phab:T331005.) I show it because it is rare and interesting:
pywikibot.exceptions.InvalidTitleError: The (non-)talk page of 'Vita:WP:AZ' is a valid title in another namespace.
So I have to write a loop for the same purpose:
watchlist = []
for page in site.watched_pages():
try:
if not page.isTalkPage():
watchlist.append(page)
except pywikibot.exceptions.InvalidTitleError:
pass
print(len(watchlist))
Well, the previous one looked nicer. For the number I get 9109 which is not exactly (n-1)/2
to the previous one, but I won't bother it for now.
UPDATE: Beginning from version 8.1.0 site.watched_pages()
gets a new with_talkpage
parameter which defaults to True for backward compatibility. So
watchlist = site.watched_pages(with_talkpage=False)
will do the same job as the above written list comprehension.
A basic statistics
[edit]In one way or other, we have at last a list with our watched pages. The first task is to create a statistics. I wonder how many pages are on my list by namespace. I wish I had the data in sqlite, but I don't. So a possible solution:
# Create a sorted list of unique namespace numbers in your watchlist
ns_numbers = sorted(list(set([page.namespace().id for page in watchlist])))
# Create a dictionary of them with a default 0 value
stat = dict.fromkeys(ns_numbers, 0)
for page in watchlist:
stat[page.namespace().id] += 1
print(stat)
{0: 1871, 2: 4298, 4: 1803, 6: 98, 8: 96, 10: 391, 12: 3, 14: 519, 90: 15, 100: 10, 828: 5}
There is an other way if we steal the technics from BasePage.contributors()
(discussed in Useful methods section). We just generate the namespace numbers, and create of them a collections.Counter() object:
from collections import Counter
stat = Counter(p.namespace().id for p in watchlist)
print(stat)
Counter({2: 4298, 0: 1871, 4: 1803, 14: 519, 10: 391, 6: 98, 8: 96, 90: 15, 100: 10, 828: 5, 12: 3})
This is a subclass of dictionaries so may be used as a dict. The difference compared to the previous is that a Counter
sorts items by the decreasing number automatically.
Selecting anon pages and unwatch according to a pattern
[edit]The above statistics shows that almost the half of my watchlist consists of user pages because I patrol recent changes, welcome people and warn if neccessary. And it is neccessary often. Now I focus on anons:
from pywikibot.tools import is_ip_address
anons = [page for page in watchlist
if page.namespace().id == 2 and is_ip_address(page.title(with_ns=False))]
I could use the own method of a User instance to determine if they are anons without importing, but for that I would have to convert pages to Users:
anons = [page for page in watchlist
if page.namespace().id == 2
and pywikibot.User(page).isAnonymous()]
Anyway, len(anons)
shows that they are over 2000.
IPv4 addresses starting with 195.199
belong to schools in Hungary. Earlier most of them was static, but nowadays they are dynamic, and may belong to other school every time, so there is no point in keeping them. For unwatching I will use Page.watch()
:
for page in anons:
if page.title(with_ns=False).startswith('195.199'):
print(page.watch(unwatch=True))
With print()
in last line I will also see a True
for each successful unwatching. Without it only unwatches. This loop will be slower than the previous. For repeated run it will write these True
s again because watchlist is cached. To avoid this and refresh watchlist use site.watched_pages(force=True)
which will always reload it.
Watching and unwatching a list of pages
[edit]By this time we delt with pages one by one with page.watch()
which is a method of the Page
object. But if we look into the code, we may discover that this method uses a method of APISite
:
Even more exciting, this method can handle complete lists at once, and even better the list items may be strings – this means you don't have to create Page objects of them, just provide titles. Furthermore it supports other sequence types like a generator function, so page generators may be used directly.
To watch a lot of pages if you have the titles, just do this:
titles = [] # Write titles here or create a list by any method
site.watch(titles)
To unwatch a lot of pages if you already have Page objects:
pages = [] # Somehow create a list of pages
site.watch(pages, unwatch=True)
For use of page generators see the second example under #Red pages.
Further ideas for existing pages
[edit]With Pywikibot you may watch or unwatch any quantity of pages easily if you can create a list or generator for them. Let your brain storm! Some patterns:
- Pages in a category
- Subpages of a page
- Pages following a title pattern
- Pages got from logs
- Pages created by a user
- Pages from a list page
- Pages often visited by a returning vandal whose known socks are in a category
- Pages based on Wikidata queries
API:Watch shows that MediaWiki API may have further parameters such as expiry and builtin page generators. At the time of writing this article Pywikibot does not support them yet. Please hold on.
Red pages
[edit]Non-existing pages differ from existing in we have to know the exact titles in advance to watch them.
Watch the yearly death articles in English Wikipedia for next decade so that you see when they are created:
for i in range(2023, 2033):
pywikibot.Page(site, f'Deaths in {i}').watch()
hu:Wikipédia:Érdekességek has "Did you know?" subpages by the two hundreds. It has other subpages, and you want to watch all these tables until 10000, half of what is blue and half red. So follow the name pattern:
prefix = 'Wikipédia:Érdekességek/'
for i in range(1, 10000, 200):
pywikibot.Page(site, f'{prefix}{i}–{i + 199}').watch()
While English Wikipedia tends to list existing articles, in other Wikipedias list articles are to show all the relevant titles either blue or red. So the example is from Hungarian Wikipedia. Let's suppose you are interested in history of Umayyad rulers. hu:Omajjád uralkodók listája lists them but the articles of Córdoba branch are not yet written. You want to watch all of them and know when a new article is created. You notice that portals are linked from the page, but you want to watch only the articles, so you use a wrapper generator to filter the links.
from pywikibot.pagegenerators import \
LinkedPageGenerator, NamespaceFilterPageGenerator
basepage = pywikibot.Page(site, 'Omajjád uralkodók listája')
site.watch(NamespaceFilterPageGenerator(LinkedPageGenerator(basepage), 0))
List of ancient Greek rulers differs from the previous: many year numbers are linked which are not to be watched. You exclude them by title pattern.
basepage = pywikibot.Page(site, 'Ókori görög uralkodók listája')
pages = [page for page in
NamespaceFilterPageGenerator(LinkedPageGenerator(basepage), 0)
if not page.title().startswith('Kr. e')]
site.watch(pages)
Or just to watch the red pages in the list:
basepage = pywikibot.Page(site, 'Ókori görög uralkodók listája')
pages = [page for page in LinkedPageGenerator(basepage) if not page.exists()]
site.watch(pages)
In the first two examples we used standalone pages in a loop, then a page generator, then lists. They all work.
Follow your bot
[edit]You create or edit several pages by bot, and you want to watch them. Especially when you create new pages, these won't be watched by anybody but your bot, thus they are exposed to vandalism without anyone notices it, so watching is important. On the other hand you may want to follow the useful edits by humans so that you can improve your bot.
Create the bot script with a page generator and a conditional running as described in Beginning and ending section. Both the original bot and the watcher script will use the same geenerator. In this example you create the year articles from 2024 on for ten years.
Sample procedural mybot.py
:
# Any auxiliary functions here that are necessary for a more complicated generator
def gen():
for i in range(2024, 2034):
yield pywikibot.Page(site, str(i))
def process():
for page in gen():
page.text = 'Your text comes here'
page.save('Bot: creating the page')
if __name__ == '__main__':
process()
Of course, you don't want to overwrite existing pages, but now we didn't bother this part in order to show the main thing.
Now create a watcher.py
:
import mybot
site.watch(mybot.gen()) # Note the parentheses
import mybot
becomes better than from mybot import gen
if you use auxiliary functions or variables for gen()
. Now run this script with
python pwb.py -user:MyUserAccount watcher
In an object-oriented example mybot.py
would have a CreatorBot
class, and you could invoke it like
import mybot
bot = mybot.CreatorBot()
site.watch(bot.gen()) # Note the parentheses
Summary
[edit]- For walking your watchlist use
site.watched_pages()
generator function. Don't forget to use the-user
global parameter if theuser-config.py
contains your bot's name. - For watching and unwatching a single page use
page.watch()
andpage.watch(unwatch=True)
. - For watching and unwatching several pages at once, given them as list of titles, list of page objects or a page generator use
site.watch(<iterable>)
.