User:BDavis (WMF)/Notes/Finding Files To Purge
Trying to help with bugzilla:54647.
Some unspecified network error caused htcp purge messages to be lost for an extended period of time. Once the network issue was resolved we needed to purge a lot of potentially stale pages from the varnish cache layer.
Figuring out what to do
[edit]- Reedy knew that the purgeList.php maintenance script could be used to send the purges
- Nik knew where to find a query that would grab all pages changed in a date range
- I volunteered to try and bring the two together.
Finding the pages
[edit]Nik's sql query came in the form of php code from CirrusSearch. I took that and put it into phpsh to get the generated sql.
$minUpdate = new MWTimestamp( '2013-09-22T00:00:00Z' ); $maxUpdate = new MWTimestamp( '2013-09-26T00:00:00Z' ); $dbr = wfGetDB( DB_SLAVE ); $minId = $dbr->addQuotes( '-1' ); $minUpdate = $dbr->addQuotes( $dbr->timestamp( $minUpdate ) ); $maxUpdate = $dbr->addQuotes( $dbr->timestamp( $maxUpdate ) ); $res = $dbr->selectSQLText( array( 'page', 'revision' ), array_merge( array( 'page_touched', 'page_counter', 'page_restrictions' ), Revision::selectPageFields(), Revision::selectFields() ), 'page_id = rev_page' . ' AND rev_id = page_latest' . " AND ( ( $minUpdate = rev_timestamp AND $minId < page_id ) OR $minUpdate < rev_timestamp )" . " AND rev_timestamp <= $maxUpdate", // Note that redirects are allowed here so we can pick up redirects made during search downtime 'INTERACTIVE', array( 'ORDER BY' => 'rev_timestamp, rev_page') );
The resulting sql was:
SELECT page_touched ,page_counter ,page_restrictions ,page_namespace ,page_title ,page_id ,page_latest ,page_is_redirect ,page_len ,rev_id ,rev_page ,rev_text_id ,rev_timestamp ,rev_comment ,rev_user_text ,rev_user ,rev_minor_edit ,rev_deleted ,rev_len ,rev_parent_id ,rev_sha1 ,rev_content_format ,rev_content_model FROM `page` ,`revision` WHERE page_id = rev_page AND rev_id = page_latest AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ORDER BY rev_timestamp, rev_page
This was TMI but led me to formulate a simpler version that was "good enough for now". I ran that on tools-lab against the enwiki and dewiki mirrors:
ssh tools-login.wmflabs.org become bd808-test
mysql --defaults-file="${HOME}/replica.my.cnf" -h enwiki.labsdb enwiki_p -e "SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ;" > enwiki-misses.txt
mysql --defaults-file="${HOME}/replica.my.cnf" -h dewiki.labsdb dewiki_p -e "SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ;" > dewiki-misses.txt
Massaging the data
[edit]Now that I had dumps of page titles I needed to turn them into full urls for Reedy to use to do the purges:
First, URLEncode the titles:
perl -pi.raw -e 'chomp();s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;print "\n"' enwiki-misses.txt dewiki-misses.txt
Then turn them into full URLs and make sure there aren't any dupes:
perl -pi.enc -e 's@^@http://de.wikipedia.org/@' dewiki-misses.txt mv dewiki-misses.txt dewiki-misses.txt.url sort -u dewiki-misses.txt.url > dewiki-misses.txt
perl -pi.enc -e 's@^@http://en.wikipedia.org/@' enwiki-misses.txt mv enwiki-misses.txt enwiki-misses.txt.url sort -u enwiki-misses.txt.url > enwiki-misses.txt
Now What?
[edit]After a lot of futzing around I scp'd the files to a web accessible location where Reedy picked them up and applied them.
Here's the start of a script that may help with this the next time it happens.
#!/usr/bin/env bash # Find all wiki pages changed between a given start and end date. # TODO: # - Add command line flags # - Add namespace support LANG=${1:-en} WIKI=${LANG}wiki START_TS=20130922000000 END_TS=20130926000000 # NOTE: only gets changes to namespace 0 (Main) mysql --defaults-file="${HOME}/replica.my.cnf" -h ${WIKI}.labsdb \ ${WIKI}_p --skip-column-names -e " SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '${START_TS}' = rev_timestamp AND '-1' < page_id ) OR '${START_TS}' < rev_timestamp ) AND rev_timestamp <= '${END_TS}' ;" | perl -e 'while (<STDIN>) { next if (/^\s+$/); chomp; s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg; print $_, "\n"; }'| perl -p -e "s@^@http://${LANG}.wikipedia.org/wiki/@" | sort -u > ${WIKI}-changed.txt # vim:sw=2 ts=2 sts=2 et:
Maintenance Scripts
[edit]purgeChangedPages
[edit]Tim suggested that I go ahead and make a real maintenance script for this. See gerrit:86883 for progress on the new purgeChangedPages
script.
Once the maintenance script is approved, somebody (maybe even me) will use it to send purges esams for all wikis.
Testing
[edit]Tested in beta to see if it really does what it is supposed to do.
ssh deployment-bastion.pmtpa.wmflabs
- Check that script runs at all:
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v --dry-run
- manually examined list of changed files generated
- Check that PURGE messages are recieved by Varnish:
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v
- watched PURGE requests come in on deployment-cache-text1.pmtpa.wmflabs with
varnishncsa -n frontend|grep PURGE
- Check that using
--htcp-dest
flag does something useful:- started htcpsnoop to listen for HTCP packets at 127.0.0.1:31337 on deployment-bastion.pmtpa.wmflabs
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v --htcp-dest=127.0.0.1:31337
- watched HTCP arrive on python daemon
- verified that no PURGE requests come in on deployment-cache-text1.pmtpa.wmflabs at same time
Production purge
[edit]- Done Find out when script will land on terbium: Thursday 2013-10-10
- Done terbium:/usr/local/apache/common/php-1.22wmf21/maintenance/purgeChangedPages.php
- Done terbium:/usr/local/apache/common/php-1.22wmf20/maintenance/purgeChangedPages.php
- Done Get IP address of hooft to target HTCP packets
- 91.198.174.113:4827
- Ori said that he verified that UDP packets can traverse to there from terbium
- Done Schedule deployment window with Greg
- Done Announce intentions to ops-l
- Done Get updated patch with rate limit approved
- Done Get updated patch with rate limit pushed to terbium
- Done Wait... for... deployment... window...
- Done
ssh terbium
- Done Purge all the things!
while read wiki; do echo "== ${wiki} ==" LOG="purge-${wiki}-$(date +%Y%m%dT%H%M).log" date >${LOG} mwscript purgeChangedPages.php $wiki \ --starttime 20130922000000 \ --endtime 20130926000000 \ --htcp-dest 91.198.174.113:4827 \ --sleep-per-batch 500 \ --verbose | tee -a ${LOG} done </usr/local/apache/common/all.dblist
Purge post-mortem
[edit]- Started at 2013-10-16T20:01Z
- Brandon provided Ops support
- Brandon discovered that traffic was not arriving at esams as expected
- Changed HTCP traffic target to 91.198.174.106 (nescio)
- Still couldn't get packets into esams
- Mark pointed out that terbium has no route to external addresses at about the same time Brandon figured that out
- Mark suggested directing traffic to equiad UDP relay host dobson @ 208.80.152.173
- Discussion was had over rate of sending packets and overflow possibility
- Decided to reduce batch size from 100 to 10 and adjust delay to 50ms
- Restarted at 2013-10-16T21:08Z
- Brandon became concerned by IOWAIT on cp3xxx
- Stopped run in the middle of commonswiki
- Brandon investigated historical IOWAIT and decided it was a daily trend
- Decided that was should add an additional delay just in case
- I changed the runner script to make it possible to skip the wikis that we had already purged
- Restarted at 2013-10-16T21:27Z with 100ms delay and skipping wikis before commonswiki
- Process continued to run with no further interruption until completion at 2013-10-17T02:43Z
- Approximately 1.6M purges were sent
Final driver script used:
#!/usr/bin/env bash
set -o nounset
set -o errexit
LOGDIR="./run-$(date +%Y%m%dT%H%M)"
mkdir ${LOGDIR}
#hooft: RELAY=91.198.174.113
# dobson
RELAY=208.80.152.173
START_AT=commonswiki
while read wiki; do
if [[ ${wiki} < ${START_AT} ]]; then
echo "> SKIPPING ${wiki}"
continue
fi
echo "== ${wiki} =="
LOG="${LOGDIR}/${wiki}.log"
date >${LOG}
mwscript purgeChangedPages.php $wiki \
--starttime 20130922000000 \
--endtime 20130926000000 \
--htcp-dest ${RELAY}:4827 \
--batch-size 10 \
--sleep-per-batch 100 \
--verbose |
tee -a ${LOG}
date >>${LOG}
done </usr/local/apache/common/all.dblist
purgeChangedFiles
[edit]Faidon and Aaron volunteered me to create a similar script for Files. After discussion with Aaron I think the plan is to rename the existing purgeDeletedFiles
maintenance script to purgeChangedFiles
and add some new command line switches:
- --type <created|modified|deleted|all>
- Types of file changes to send purges for
- --htcp-dest <IP:post>
- HTCP announcement destination (IP:port)
The type flag will control which log messages are examined:
- created
- upload/upload, import/upload, import/interwiki
- modified
- upload/overwrite, upload/revert, move/move, move/move_redir
- deleted
- delete/delete, delete/revision, suppress/delete, suppress/revision
The move/*
logs will need special handling to purge not only the original page but also the move destination.