Analytics/EventLogging
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
EventLogging
Sensibly capture Mediawiki usage data to help improve products.
|
Background
[edit]EventLogging is an extension to Mediawiki. There is a useful guide here: Extension:EventLogging/Guide.
Backlog (Draft)
[edit]Draft features/stories as of 2014-05-21. This is an attempt to start articulating the work that needs to be done from a product development perspective.
Title | Description |
---|---|
Monitoring system fires alert when event volume is high | 5 points; Tasked on etherpad
https://bugzilla.wikimedia.org/show_bug.cgi?id=65482
|
Product manager specifies sampling rate for his EL schema | https://bugzilla.wikimedia.org/show_bug.cgi?id=65500 |
Product manager specifies schema ownership | We need to know who owns a schema so we can fire alerts to them if the volume exceeds what db can handle.
|
Automated process handles old data | This is a large task that needs to be better defined and then broken down. Some features related to this are:
|
User has old data for ServerSideAccountCreation | scrub or aggregate it so it is available beyond 90 days. It is used by others |
User has old data for NavigationTiming | scrub or aggregate it so it is available beyond 90 days. It is used by others |
Product manager extends persistence of events | suppose we're two months into a data collection job. The researcher realizes he needs the data for 180 days. Provide a mechanism to extend the persistence of a set of events. At the very least have a mechanism to aggregate or anonymize the data so the researcher can have a longer time period for his data. |
User suppresses EventLogging for his actions | Define a mechanism for user to opt-out of the EventLogging process. |
Transition Plan
[edit]EventLogging is a widely used library in the Foundation. The Analytics team and Ori have discussed the details of the Analytics team taking over responsibility for this Extension. This document is that proposal.
Administration
[edit]Formalize agreement with Ori, OpsTalk to RobLa/PlatformFigure out what ask to make of Ori in terms of regular commitmentDiscuss this document
- Send out support email
- Target handover start
4/24/16 (Needs agreement from Analytics, Ops, Platform teams)
Schema Support
[edit]Probably the most common EventLogging support task is schema review. We'd like to make this a revolving responsibility among the users
- Create EventLogging review group in Gerrit
- Ask people for consent before adding them
- Announce / request social convention of adding people to the review group once they've successfully instrumented something
Data Validation/Support
[edit]We'd also like users to take responsibility for their own data generated by EventLogging. The Analytics team isn't staffed to follow up on invalid data from a single schema but we will invest in automated tools and notifications.
- Announce the generating invalid data is a software bug and you are expected to fix it in a prompt fashion.
- Invite people to subscribe to eventlogging-alert
- Provide information about notification and debugging tools
Development Support
[edit]- Bugs reported in Bugzilla should be acknowledged and resolved.
- Automatically purging or anonymizing data to be in line with Privacy Policy needs to be implemented
Development/Operations Tasks
[edit]- Create graphite script that shows valid and invalid events for each schema, thereby satisfying the requirement that eventlogging be in principle self-serving
- Add alert for number of events
- A daily report should go out reporting the number of valid and invalid events logged, broken down by schema.
Operations
[edit]- Operational support by analytics team: Event_logging/OperationalSupport
- Data recovery plan - Ori thinks we shouldn't hack the EventCapsule or validation model again
- Dario thinks we will have some high-priority data recovery needs related to DB outages
- Create and respond to alerts
- Once a month, the backup process (vanadium -> stat1001 -> tridge) should get a quick lookover to ensure that it is functioning.
- Once every six months, a drill should be conducted to test system failover and recovery procedures.
- Sean Pringle is supporting db replication
- Failover for Vanadium