Jump to content

Analytics/Archive/Meetings/2012-12-15

From mediawiki.org

Analytics Quarterly Review

- Hangout: https://plus.google.com/hangouts/_/322a23cbc2dc33491a7826c439c2aa12335141f3?authuser=0&hl=en - Deck: https://docs.google.com/a/wikimedia.org/presentation/d/1EutD_z6Koyv71JY8qM1AovTxJDGSWGkE9bikXHji0k4/edit#slide=id.p - You Are Here: http://etherpad.wmflabs.org/pad/p/analytics-quarterly-review

Folks present: Erik M, Patrick, Dario, David, Diederik, Howie, Robla, Tomasz, Jessie, Gayle, Ori, CT, Terry, Asher, Erik Z, Andrew Otto, Dan


  1. Links

- Tons of more information available at https://www.mediawiki.org/wiki/Analytics

Kraken Dataflow Diagram: http://upload.wikimedia.org/wikipedia/mediawiki/3/38/Kraken_flow_diagram.png Project: https://www.mediawiki.org/wiki/Analytics/Kraken

CDH4 — the world's leading Apache Hadoop Distribution. http://www.cloudera.com/content/cloudera/en/products/cdh.html

Hue Hue is a general purpose web interface built for the Hadoop ecosystem. Use Hue if you want to easily run and schedule Pig and Hive jobs. Navigate to to http://hue.analytics.wikimedia.org/. You'll need a Hue login account. Otto should have created one for you and given you a password if you also asked him for a shell account earlier. (This will soon be hooked into LDAP, and you will be able to use your usual WMF password). http://www.mediawiki.org/wiki/Analytics/Kraken/Access#Hue

The Hadoop Distributed File System ( HDFS ) is a distributed file system designed to run on commodity hardware. http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html


  1. Questions

Patrick: What is the status of Kraken as a prototype? Coming to it in the Kraken section (slide 12) ✔

Wikistats: traffic scripts (aka squid scripts) are improved now by contractor, dumps scripts are stable. All scripts are in git.

On Storm: http://www.linkedin.com/pub/nathan-marz/3/820/6a https://github.com/nathanmarz Nathan Marz provided useful feedback during the research phase, which encouraged us to examine Storm as a solution for the ETL/Stream Processing phase. https://github.com/nathanmarz/storm http://storm-project.net/