Wikimedia Discovery/Meetings/Search retrospective 2016-08-19
Appearance
Format
[edit]This retrospective was conducted using the "Five finger retrospective" format: https://www.mediawiki.org/wiki/Team_Practices_Group/Five_finger_retrospective
Action Items from last month
[edit]- Chris: post the link to his "what technical collaboration team does" presentation
- Done
- Chris: Chris and Erik should talk about implications of interwiki search indices
- Done
- Trey & Deb: Chris needs to be aware of the ? at the end of queries
- Done, and material has been posted on several village pumps
- https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search_queries & talk page
- Done, and material has been posted on several village pumps
- Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
- Done, decided that volume is low enough that watching projects and getting emails from github is sufficient
- Got Discovery members admin rights on github projects
What happened since the last retro (July 6)
[edit]- Deployed "? stripping" in queries
- Setup relforge cluster
- Deployed textcat
- Elasticsearch upgrade to 2.3.4 in progress
- Discovery got a new analyst which will help a lot going forward, especially if we start building that probabilistic bot classifier thingy :P
- Deployed logstash/kibana upgrade
- Completed refactoring of the search fields
- WDQS servers for codfw approved and on their way
- Searcher class in cirrus is now < 1000 lines +1+1:D
- Did research on the top 100-ish unsuccessful queries and decided not to go further with it due to lack of interesting data
- Analysis of ascii folding and stemming
Thumb: Thumbs up--something that went well
[edit]- We seem to have proven to many people's satisfaction that zero-results queries are not a good place to mine articles and redirects.
- Trey's "?" blog post!
- Blog post on textcat deployment+1
- Addressed some technical debt and made code look saner+1
- Trey's analysis of providing a list of search queries and the communication of the results
- RelForge cluster has come into existence! And David has been able to index lots of data (enwiki & frwiki in two different ways!)
Index finger: The ONE thing you want people to know (about how this team has functioned over the last month)
[edit]- (Things have generally been running smoothly. It doesn't feel like any ONE thing stands out.)+1+1
Middle finger: Something that did not go well
[edit]- elasticsearch not stable
- 2 major issues in the last month and a half. One is a mystery; one we understand and have some ideas for fixes
- logstash upgrade delayed multiple times due to lack of preparation / thoroughness
- seemingly not enough time in the day +1+1
- Just a lot going on; everything takes time. Last couple weeks have been atypically busy.
- KH: Try not to put in extra hours, generally. Time-sensitive occasional things are understandable.
- internet connections have been a bit weird / dropping at inconvinient times ( I spellz gud ) +1 (hangouts have been dodgy)
- cindy (automated browser testing) was acting up again, and we're still not sure why or have final fix
- Recurring item. Do we want to think about shifting testing to unit test level? Or to php level?
- We really do need this to work; feel much more comfortable merging when automated tests are working
- Devs should work w/PO to make sure some time gets allocated to work on this
Ring finger: Something about relationships--within the team, between teams, other
[edit]- Good communication about "search across projects and across languages"
- Trey says: working with Deb & Chris to get blog posts out about developments has been great. Thanks to Deb for driving the process! +1(yay, thanks!)
- Doing some good work with Graphs team to make visualizations easier (e.g. integrating w/WDQS)
- Guillaume still split between multiple sub teams, no one is complaining...(I feel ya!)+1 - he's doing great!
- Seems to be improving since the last retrospective
- We have multiple sub-teams that are fairly independent
Pinky: A little thing that would be easy to overlook (or was overlooked)
[edit]- Elasticsearch garden is not cultivated as much as it should (T109089) - for example: the multiple alerts when cluster is failing was there for a fairly long time, but we had that spam again today
- Similar issue with maps: Small issues that are not critical; only get attention when they break. Could do better with that.
- There are a lot of little issues, so it makes sense to prioritize them
- Do you (GL) have knowledge/support to be able to prioritize your work?
- GL: Would be interested in participating in a planning session
- Some work ongoing with a new recommendation system that may need some help from cirrus developers (https://phabricator.wikimedia.org/T143197 )
- Offline article recommendation system (similar to "more like")
- Some help needed to catch obvious problems with bm25
- DC: Have created a place to test enwiki on BM25. (http://en-wp-bm25-relforge.wmflabs.org/wiki/Special:Search )
- Stas working on upcoming lectures / demos
- Internal talk about SPARQL and WDQS, mostly technical, partly aimed at analysis folks
- Wiki conference San Diego: Less technical audience
Action items
[edit]- Kevin: Invite GL to search planning meeting(s)
- Done
- David: Will send email to private list requesting BM25 testing; later to public list (and Discovery weekly status)
- Done
- Erik: work w/PO to make sure some time gets allocated to work on cindy problems