Analytics/Wikimetrics/FAQ
This is the frequently asked questions page for Wikimetrics.
What is the project code?
[edit]The WMF has a lot of different projects, see the site matrix for a complete overview. To construct the code:
- For Wikipedia it's just the language code (for example en)
- For Commons it's commons
- For chapters it's ....
- For mediawiki it's ....
- For wikibooks it's ....
- For wikidata it's ....
- For wikimania it's ....
- For wikinews it's ....
- For wikiquote it's ....
- For wikisource it's ....
- For wikispecies it's ....
- For wikiversity it's ....
- For wikivoyage it's ....
- For wiktionary it's ....
Where is the source code?
[edit]https://phabricator.wikimedia.org/diffusion/ANWM/ and https://github.com/wikimedia/analytics-wikimetrics
Where is the data coming from?
[edit]Wikimetrics uses the copy of the WMF databases at Wikimedia Cloud Services.
How do I add a new feature?
[edit]Analytics/Wikimetrics/Adding_New_Features
Developers: Troubleshooting
[edit]On vagrant: Tests work fine when I run all of them but fail when I just run just one. What is going on?
[edit]nosetest is not executing properly tests that are two levels deep from main tests directory. Make sure your test is located only one level deep, for example the following would be executed properly:
/vagrant/wikimetrics/tests/some-directory/your-test.py
But the following would not:
/vagrant/wikimetrics/tests/some-directory/some-deeper-directory/your-test.py
This is a bug with nosetest, it is similar, although not identical, to this one: https://code.google.com/p/python-nose/issues/detail?id=342
Looks like it was fixed in some python 2.7.* release, we are running 2.7.3 in vagrant. Things seem to work in 2.7.5
Tests just hang or fail due to queue issues, what do I do?
[edit]It is likely that if test hangs there is some issue with the queue. Logging in celery needs work on our side but the easy remedy on your dev environment is that instead of making celery log to stdout you make celery log to /tmp/.
Uncomment the following line on tests/_init_.py
celery_out = open("/tmp/logCelery.txt", "w")
Tail logs and you might be able to see any errors that the queue might be throwing.
You can log to the queue log while tests are ongoing doing:
f = open('/tmp/logCelery.txt','a') f.write(str(some variable)) f.close()
When I run tests, errors show up on the console, but at the end it says "OK"
[edit]This is fine. Logging and nose get along like pirates and the English, so sometimes they fight and talk funny at each other. But all's well that ends well.
Pro tip: Don't run tests as root
[edit]Just sayin'...
How do I generate test data?
[edit]Go to:
http://localhost:5000/demo/create/fake-wiki-users/100
This would create 100 fake users in database 'wiki'. If you did this on a new database, those users' ids should be 1, 2, 3, ... 100. If not, you can find the ids you just created by issuing 'select * from user' in your 'wiki' database. Now from those users you need to create a cohort:
http://localhost:5000//cohorts/upload
Use the textarea and type in a user id per line, like:
1 2 3
Pick a project and upload
Restore from backup
[edit]Wikimetrics is backed up once an hour and once a day.
Code
[edit]The relevant changeset pertaining how backups are organized is this one: [[1]]
The relevant part of root's crontab looks like:
root@wikimetrics-staging1:~# crontab -l <snip> 0 * * * * /data/project/wikimetrics/backup/hourly_script -o /data/project/wikimetrics/backup/hourly -f /var/lib/wikimetrics/public -d wikimetrics -r /a/redis/wikimetrics1-6379.rdb # Puppet Name: daily wikimetrics backup 30 22 * * * /data/project/wikimetrics/backup/daily_script -i /data/project/wikimetrics/backup/hourly -o /data/project/wikimetrics/backup/daily -k 10
Procedure
[edit]We back up three things:
- Wikimetrics database
- Redis
- Public reports stored on '/var/lib/wikimetrics/public' directory
Note that the dashboards that (in the future) will pull data from wikimetrics will do so from data on the '/var/lib/wikimetrics/public' directory. This directory stores data files generated daily and, while in theory all that data can be regenerated, it might take a long time. We want to be sure that 'at maximum' we lose just one day of data.
In order to restore the backup:
- Get snapshot file from '/data/project/wikimetrics/backup/daily'
- Untar to a location that you own, like ~/backup
tar -xvzf <snapshot file>
You shall see the database dump file, redis rdb file, and the public directory
- Stop redis, wikimetrics and queue on the machine where the restore is going to happen
sudo stop wikimetrics-queue sudo stop wikimetrics-scheduler /etc/init.d/apache2 stop /etc/init.d/redis-server stop
- Restore database:
mysql wikimetrics < database.sql
- Restore redis:
cp ~/backup/a/redis/wikimetrics1-6379.rdb /a/redis/wikimetrics1-6379.rdb
- Restore public reports:
Move aside current public dir:
mv /var/lib/wikimetrics/public /var/lib/wikimetrics_old_public/ cp -r ~/backup/var/lib/wikimetrics/public /var/lib/wikimetrics/public
- Restart queue, scheduler, apache and redis
- Make sure backup looks good, if so remove: /var/lib/wikimetrics_old_public/