Phlogiston/Installation

Install Prerequisites

Operating System

These instructions assume installation of Phlogiston on a Debian GNU/Linux Stretch (9.5) system.

In Labs, to enable the bigger hard drive, go to https://horizon.wikimedia.org/project/puppet/ and activate the puppet "profile::labs::lvm::srv*

Move Postgres's working directories to the new folder.

System-wide Installation

Nginx

sudo apt install nginx

Postgresql

sudo apt install postgresql postgresql-contrib

Python Modules

sudo apt install python3-venv

R

Install R repository to get the newest version (from DigitalOcean instructions)

sudo apt install software-properties-common

sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'

sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'

sudo apt update

sudo apt install r-base

sudo apt install build-essential

Set up Accounts

A shell account for Phlogiston

This account is used to run Phlogiston, store data, and publish for the webserver. By convention it's called phlogiston. Create it and apply whatever login rules, ssh, configuration, and security as is appropriate.

Set up Python

As user phlogiston:

python3 -m venv phlog_env

source phlog_env/bin/activate

pip install python-dateutil psycopg2-binary pytz jinja2

Set up R

As phlogiston, type R to enter R command line. In R,

install.packages('RColorBrewer', dep=TRUE)

install.packages('ggplot2', dep=TRUE)

install.packages('ggthemes', dep=TRUE)

install.packages('argparse', dep=TRUE)

install.packages('reshape', dep=TRUE)

install.packages('fivethirtyeight', dep=TRUE)

If prompted where to install, Install locally.

A Postgres account for Phlogiston

The local account must have access to a PostGreSQL database for data storage and reporting. As root:

sudo su - postgres

createuser -s phlogiston

createdb -O phlogiston phlogiston

Superuser access is required to because load_tables.sql installs the intarray postgresql extension. This also allows the script to create or reset its own data tables. Probably don't do this on a shared server.

Access to Phlogiston directories for postgresql

The Phlogiston scripts run some commands on the postgresql server, which runs under the postgres user, which needs to have access to phlogiston directories via the phlogiston group.

sudo usermod -a -G phlogiston postgres

sudo service postgresql restart

Install Phlogiston

Get the phlogiston code by cloning it from github. As the phlogiston user:

sudo su - phlogiston

git clone https://github.com/wikimedia/phlogiston.git

exit

Set up web publishing of results

Configure Nginx

Configure Nginx to publish from the phlogiston html output directory. Create the following file as /etc/nginx/sites-available/phlogiston:

server {
        server_name localhost;
        listen 80 default_server;
        listen [::]:80 default_server ipv6only=on;
        root /home/phlogiston/html;
        index index.html index.htm;
        ssi on;
        location / {
                 autoindex on;
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                try_files $uri $uri/ =404;
                # Uncomment to enable naxsi on this location
                # include /etc/nginx/naxsi.rules
        }        
}

And run these commands to configure Nginx to use it

sudo rm /etc/nginx/sites-enabled/default

sudo ln /etc/nginx/sites-available/phlogiston /etc/nginx/sites-enabled

sudo service nginx restart

Set up the reports home page

mkdir /home/phlogiston/html

cp /home/phlogiston/phlogiston/html/index.html /home/phlogiston/html/

cp /home/phlogiston/phlogiston/html/style.css /home/phlogiston/html/

And edit index.html to reflect the scopes being reported.

First run

sudo su - phlogiston

cd phlogiston

createdb phlogiston

python3 phlogiston.py --initialize

bash batch_phlog.bash -m reconstruct -l true -s xxx

where xxx is a correctly configured scope. Validate the results.

Automate daily reports

Run a complete reconstruction for all scopes:
1. bash ~/phlogiston/batch_phlog.bash -m reconstruct -l true -s xxx -s yyy
  1. where xxx and yyy are reporting scopes. Append as many -s zzz as needed.
Create a cron job for the phlogiston user, of the form
1. 15 4 * * * bash ~/phlogiston/batch_phlog.bash -m incremental -l true -s xxx -s yyy >>~/phlog.log 2>&1
2. This runs daily at 4:15 am UTC. Set this to be right after the dump is generated from Phabricator, and to run as often as the dump is updated. (Phlogiston can take hours to run, so anything more than daily may not be practical without optimization.)
3. The file ~/phlog.log can be inspected for status.
  1. In particular, grep Done ~/phlog.log will show one line per scope per reconstruction and/or report.

How to use on other Phabricator instances besides Wikimedia Foundation

Untested:

1) set up a dump script on Phabricator, like this one, to generate dumps like this one.

2) Customize batch_phlog.bash to point to the new dump file.