Content translation/analytics/Daily stats
Appearance
This is knowledge sharing about how to collect daily Content Translation statistics.
Unless otherwise noted, "<TIMESTAMP>" is eight digits: yyyymmdd. For example 20160425 is April 25 2016.
The scripts can be found in the bash/ directory in the limn-language-data Gerrit repo.
crontab configuration
[edit]The scripts for collecting deletion and suggestions stats run every day using cron. To configure it in your account, run crontab -e
and write 00 01 * * * ~/cxscriptsg/scheduled_daily_stats.sh
.
Yandex
[edit]- Go to https://translate.yandex.com/developers/stat
- Copy the number from the top chart to the Characters column in the Yandex stats spreadsheet.
- Copy the number from the bottom chart to the Request column in the Yandex stats spreadsheet.
Daily suggestions accepted
[edit]- Run:
ssh terbium
cat events_<TIMESTAMP>.txt
- In the query results table, copy just the numbers from the "count(event_targetLanguage)" column. (In OSX terminal this is easy by selecting with the mouse while holding the Alt key.)
- Paste them to a text editor.
- Select them in the text editor and copy. (The intermediate step is needed because direct copying from the terminal to a spreadsheet is broken.)
- Open the "Suggestions Accepted Daily" tab in the "Suggestions Enablement" spreadsheet.
- Find today's date (NB: one day after the <TIMESTAMP>).
- Click row 3 and paste.
- Scroll down and check whether the list is not too long. The last number must align with the last language. If not:
- Delete the pasted numbers (the Undo action usually accomplishes that).
- Find the missing language by examining the query result.)
- Add a row for that language.
- Go back to step "Click row 3 and paste".
- The number in the "Per day" row is the number of accepted suggestions per day.
NB: The events_<TIMESTAMP>.txt file is generated by the script events_scheduled.sh. It automatically runs for yesterday. To get the numbers until a particular day, run the following:
./rprod.sh
- Run the following query—it takes about three minutes:
SELECT
event_targetLanguage,
count(event_targetLanguage)
FROM
log.ContentTranslationCTA_11616099 cta
WHERE
cta.event_cta like 'suggestions%' AND
cta.timestamp < 20160425000000 -- write the appropriate timestamp here
AND
cta.event_action = 'accept'
GROUP BY
event_targetLanguage;
Save, restore, publish errors (s/r/p)
[edit]- Make the terminal window wide. The numbers row can be long and you don't want it to clip.
- Run:
ssh terbium
cd cxscriptsg
- Just for yesterday:
./srp.py
- NB 1: It is usually fast in the beginning, and slowish in the end. The total time is up to two minutes.
- NB 2: The reason "a week ago" is shown for "What was published" is that this number can sometimes change (E.g., it took a few days until the translator actually published.)
- For a particular day:
./sorted_save_events.py <TIMESTAMP>
./sorted_restore_events.py <TIMESTAMP>
./sorted_publish_events.py <TIMESTAMP>
./what_was_published.py <TIMESTAMP>
- NB: The last step is can take up to two minutes.
- For a range of dates:
for day in $(seq 20160401 20160430); do ./sorted_restore_events.py $day; done
- (etc.)
- Carefully copy the row of numbers, and paste it to a text editor.
- Select them in the text editor and copy. (The intermediate step is needed because direct copying from the terminal to a spreadsheet is broken.)
- Open the s/r/p spreadsheet and paste the numbers from the text editor in the appropriate column at the appropriate date.
Deletion stats
[edit]- Run
ssh terbium
- For a particular day in the recent past:
cat deletion_<TIMESTAMP>.txt
- For older reports see the schedule-archive directory, but it's better to just run the script, because the data may have changed.
- For a particular day:
cd cxscriptsg
./count_deletion.sh <TIMESTAMP>
- Default is yesterday.
- For a range of dates:
cd cxscriptsg
./count_deletion_range.sh <TIMESTAMP> <TIMESTAMP>
- NB 1: Note that this is very silly and only works for dates within the same month.
- NB 2: The default range is from the beginning of the current month until yesterday.
- Copy the numbers to the deletion stats spreadsheet.
- NB: The data changes practically every day, so the numbers are an approximation. They are less likely to change after two weeks. It's suggested to check not only yesterday, but also a week ago.
Adding a language
[edit]- Add a new row, alphabetically, and put the code there.
- Make sure to fill the bottom totals rows by copying from adjacent languages' columns.
Adding a new month tab
[edit]- Create a new spreadsheet tab. Make it the first one, because that's the one you'll need most frequently from now on.
- Copy the data from the previous month.
- Select all the deletion numbers cells (but not the totals cells) and delete.
- Select the whole tab and remove the background color.
- Select the Sundays' rows and make their background light gray.
- In the weekly total column, write the SUM formula for the week (for the first week this will usually include a few days from the previous month).
Monthly report stats
[edit]- Run
ssh terbium
cd cxscriptsg
./monthly_stats.sh
- That's pretty much it! Just be sure to get the deletion number from the deletion spreadsheet.