Wikimedia Discovery/Meetings/Search retrospective 2016-07-06
Appearance
Format
[edit]This retrospective was conducted using the "Five finger retrospective" format.
Action Items from last month
[edit]- Chris: post the link to his "what technical collaboration team does" presentation
- Done
- Chris: Chris and Erik should talk about implications of interwiki search indices
- Done
- Trey & Deb: Chris needs to be aware of the ? at the end of queries
- Done, and material has been posted on several village pumps
- Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
- Done, decided that volume is low enough that watching projects and getting emails from github is sufficient
- Got Discovery members admin rights on github projects
What happened since the last retro (June 1)
[edit]- new elasticsearch servers
- Change in product ownership for Q1: Dan -> Deb
- Wikimania and chatting with community and others about what the Discovery team is doing with search
- Job offer extended and accepted for new analyst, starting around the end of July \o/^_^+1
- Dan's vacation / Deb did a great job filling in
Thumb: Thumbs up--something that went well
[edit]- Chris, Deb, and Trey chatting about question-mark handling +1+1
- Fixes & improvements & relaunch of the TextCat A/B Test
- Elasticsearch servers are SOO easy to install (if you don't count the required cluster restarts)+1
- Deb did a great job filling in while Dan was out +1
- phan is proving a useful addition to CI testing+1+1
- phan is a static php code analyzer - https://github.com/etsy/phan
- This is a Discovery initiative; would like to spread to other groups over time
Index finger: The ONE thing you want people to know (about how this team has functioned over the last month)
[edit]- somehow we partially own the production logging infrastructure (by being elasticsearch "experts") +1 (Guillaume get a quite a few questions on logstash, where I have no idea...)
- was this "somehow ownership" transferred from Bryan Davis's "somehow ownership" of it previously? :-)
- Questions for the future: Who will be responsible for new hardware? Should we become the official owners?
Middle finger: Something that did not go well
[edit]- Issue that affects the elasticsearch cluster (being discussed here: https://github.com/elastic/elasticsearch/issues/19187 )
- Generates a ton of logs; fills the disk
- Might be fallout from upgrading the clusters
- Maintaining the swift repo plugin is hard because we don't use it (https://github.com/wikimedia/search-repository-swift )
- David has spent days trying to fix it when broken
- We should look for a new maintainer—maybe add a disclaimer in the README
- Initial run of the TextCat A/B Test +1 (alas)
- After a strong analysis, the data we were collecting were unreliable ("visit pages" were completely wrong)
- Contributing factor: No automated tests for the logging code
- Contributing factor: No front-end engineer, so not expert in browser-specific issues
- Contributing factor: More than 20 ways to perform a search; complex code
- Related factor: We already knew there was a mismatch in counts—this forced us to diagnose and fix it
- Maps has a tendency to absorb a lot of my (Guillaume's) time. Prioritization needs to happen between different sub teams. Not sure how to make that happen.
- If you need more of Guillaume, let him know, and he can try to reallocate his time.
- Could shift more coding to developers, and leave Guillaume to review/finalize
- If you see something is stuck, let him know. If he doesn't hear anything, he'll assume things are ok.
- cindy (automated tests) has started acting up again, after the last round of fixes had it working well for a month or so+1
- Mysterious errors; very common on local vagrant instance
- Is integration testing worthwhile, from a cost/benefit basis?
- "I can't live without Cindy now" +1
- Some cindy errors are now being caught by phan
- Other team runs tests as part of jenkins; we don't, partly because of the elastic dependency
Ring finger: Something about relationships--within the team, between teams, other
[edit]- working with mobile team to implement geo features
- seems to be going well so far
- Thanks to Erik for answering all of Deb's 'newbie' search and overall team work questions
- Weekly video chats with David, Erik, and Trey have been both productive and good "almost" face time
- Thanks to Guillaume for answering "newbie" questions about web requests and caching (great help during the whole thing with Legal) - It was fun, I learned a lot as well!
- Marcus Kroetsch (with Technical University of Dresden) is about to run research for WDQS usage
- http://korrekt.org/
- Stas will get help from Mikhail to anonymize query data before we hand it over
- https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
- Had very interesting talk with Fabian Suchanek from YAGO (Yet Another Great Ontology) http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
- Potential future collaborations
- Guillaume working hard on getting integrated in the Ops team. Thanks for your support in that.
- Seems to be going well
- I am trying to spend more time on "real" Ops stuff (clinic duty, looking a mediawiki servers, ...)
Pinky: A little thing that would be easy to overlook
[edit]- The "wrong keyboard" analysis turned up a *lot* of "Latin Russian" in ruwiki. There's a lot there (maybe 1% of queries) that could be improved.
- Amir Aharoni (from Language Team in Editing) mentioned a gadget on the Hebrew Wikipedia that attempts to automatically correct for "Latin Hebrew" -> https://he.wikipedia.org/wiki/%D7%9E%D7%93%D7%99%D7%94_%D7%95%D7%99%D7%A7%D7%99:Gadget-Dwim.js
- To test it out, go to https://he.wikimedia.org and type "trnhev" (without the quotes) into the search box; that's "Latin Hebrew" for America. You'll see it corrects what you've written into America in Hebrew!
Action items
[edit]- David: Look into getting out of maintaining the swift plugin
- Deb look at prioritising/defining the "Latin Russian/Latin Hebrew" problem? - https://phabricator.wikimedia.org/T138958
- Resolved: put in the "This Quarter" column on the Discovery Search backlog
- Kevin: Send reminder one day before next retro... except not to Guillaume? ;-) [He prefers to respond "in the moment"]