Phlogiston/Running
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
The Servers in the Cloud
[edit]Server | ssh location | URL |
---|---|---|
Development | phlogiston-2.eqiad.wmflabs | http://phlogiston-dev.wmflabs.org |
Production | phlogiston-3.eqiad.wmflabs | http://phlogiston.wmflabs.org |
In normal operation, Phlogiston should run automatically every day on both the production and development servers. A cron job (of the phlogiston
user) runs right after the Phabricator dump is usually made available. The normal sequence of operation is:
- Download new dump
- Load the dump into the database
- For each specified scope:
- Reconstruct data
- Normally, this runs incrementally compared to the last date processed, so it only runs on one day of data.
- Regenerate the report completely.
- Reconstruct data
Access
[edit]- You must already have a wikitech VPS shell account (formerly known as labs, not Tool Labs). See Getting Started.
- Your account must be set up to access Phlogiston.
- Admin note: user must be in group project-phlogiston; not sure if this is set at server or at labs level.
- Your phlogiston shell account on phlogiston-2 must be in the
project-phlogiston
group. - The OpenStack page for the project is: https://tools.wmflabs.org/openstack-browser/project/phlogiston
- A typical ssh string following this convention is
ssh phlogiston-2.eqiad.wmflabs
Adding a new report
[edit]- Create the configuration files.
- Configuration files are stored in the github folder
https://github.com/wikimedia/phlogiston
. - Configuration files for a given your_scope_prefix include
your_scope_prefix_recategorization.csv
your_scope_prefix_scope.py
- Configuration files are stored in the github folder
- Test the new configuration on the dev site.
- In the shared
mission_control
virtual screen console, runbash ~/phlogiston/batch_phlog.bash -m rerecon -s your_scope_prefix
- In the shared
- If the new report looks good, add it to the cron job on test and production
- Add links to the new report in the appropriate files in
https://github.com/wikimedia/phlogiston/html
and copy those files to the live html folders on the servers.
Manual Control
[edit]The current practice is for only the Phlogiston developer to access on production, and for the Phlogiston developer to be the primary user of the development server. One use case for shared development is supported: users of reports may reconfigure their reports and then re-run the reports on the development server to see results immediately instead of the next day.
Phlogiston has no run control or locking; multiple Phlogiston reports run at the same time will have bad results. We therefore have a manual convention on the development server that all Phlogiston runs should happen in a shared tmux console session, by convention called mission_control
, to prevent two conflicting runs from happening at once. This convention is not followed on production, which should not have multiple users running phlogiston.
To re-run a report on the development server
[edit]- Change the configuration files to make the desired changes, and commit to github.
- Log in to phlogiston-2.
- Change to be the phlogiston user:
sudo su - phlogiston
.- Your phlogiston shell account on phlogiston-2 must be in the
project-phlogiston
group.
- Your phlogiston shell account on phlogiston-2 must be in the
- Join the shared console:
tmux a -t mission_control
.- If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with
tmux new -s mission_control
.
- If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with
- Re-run Phlogiston.
cd ̃/phlogiston
bash batch_phlog.bash -m reports -l false -s your_scope_prefix
- replace your_scope_prefix with the code for your scope, for example, and for Android, or ve for VisualEditor. This is determined when the files for this scope report are originally created.
- This will automatically update files from git, and then rerun the report. It will not reprocess any of the data.
To create a new scope
[edit]- Create new configuration files and add them to github.
- Log in to phlogiston-2.
- Change to be the phlogiston user:
sudo su - phlogiston
. - Join the shared console:
tmux a -t mission_control
.- If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with
tmux new -s mission_control
.
- If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with
- In the
~/phlogiston
directory, get the new files from github:git pull
- Build the new scope data reconstruction and report.
./batch_phlog.bash -m rerecon -l false -s your_scope_prefix
- Rerecon will generate or regenerate the scope reconstruction completely, and generate the report, but will not download and load fresh dump data.
- After the report is complete, verify it through a browser. If it looks good, add the scope to the phlogiston crontab command on both develompent and production. It may also be helpful to add a link to the report to the file
html/index.html
and to deploy that file to development and production.
Debugging
[edit]To test whether the dump is fresh:
$ openssl s_client -connect dumps.wikimedia.org:443
[...]
HEAD /other/misc/phabricator_public.dump HTTP/1.1
host: dumps.wikimedia.org
followed by two carriage returns.
Automation
[edit]Phlogiston is run on both servers automatically every day with the crontab entry
# m h dom mon dow command
15 4 * * * bash ~/phlogiston/batch_phlog.bash -m incremental -l true -s ana -s and -s col -s cot -s discir -s dismap -s dis -s diswik -s fr -s ja -s ios -s lan -s phl -s red -s rel -s tpg -s ve >>~/phlog.log 2>&1
Data integrity and idempotency
[edit]The data dump includes all historical data from Phabricator, so only the most current dump is required for operation. Each data dump load will provide Phlogiston with complete information, and the data dump does not need to be reloaded until a new dump is available. Loading the dump is independent of any specific scopes.
Reconstruction and reporting are partitioned by scope. Changes to one scope will not affect any other.
An incremental reconstruction will operate from the most recent date available in the already-processed data, so if it is run a second time on the same day, it will not corrupt data. A complete reconstruction will begin by wiping all data for that scope.
A report will wipe the existing report on the website prior to generating a new report, so it is possible to end up with a broken report if the new report fails.