Architecture meetings/RFC review 2014-10-01
Appearance
Oct 02 07:02:55 <TimStarling> #startmeeting Oct 02 07:02:56 <wm-labs-meetbot> TimStarling: Error: A meeting name is required, e.g., '#startmeeting Marketing Committee' Oct 02 07:03:02 * rfarrand has quit (Quit: Computer has gone to sleep.) Oct 02 07:03:12 <TimStarling> #topic API roadmap | https://meta.wikimedia.org/wiki/IRC_office_hours | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE). | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ Oct 02 07:03:21 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap Oct 02 07:03:47 * lbenedix has quit (Quit: Leaving.) Oct 02 07:04:07 * DarTar has quit (Quit: DarTar) Oct 02 07:04:31 <TimStarling> do we have anomie and yurikR? Oct 02 07:04:33 * rdaiccherlb has quit (Quit: Computer has gone to sleep.) Oct 02 07:04:38 <anomie> TimStarling: I'm here Oct 02 07:04:49 <yurikR> yep Oct 02 07:05:02 * ryasmeen is now known as ryasmeen|Away Oct 02 07:05:38 * J-Mo (~jtmorgan@199.231.242.26) has joined #wikimedia-office Oct 02 07:05:42 * rfarrand (~rfarrand@198.73.209.5) has joined #wikimedia-office Oct 02 07:06:09 * DangSunM|cloud has quit (Ping timeout: 260 seconds) Oct 02 07:06:09 * Revi has quit (Ping timeout: 260 seconds) Oct 02 07:06:57 <TimStarling> can you tell us what has been done on this API work since the architecture summit? Oct 02 07:07:03 * alantz (~Anna@tan4.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:07:56 <anomie> I've started working on the stuff in the document. I added Gerrit links to each item as patches got submitted, and moved a few things to a "completed" section. Oct 02 07:08:44 <anomie> Since we're taking things slow as far as deprecation, some have their patches merged but need an analysis of whether people have actually changed their code. Oct 02 07:09:00 * TrevorParscal is now known as TrevorP|Away Oct 02 07:09:39 * Guest24138 (sid13042@gateway/web/irccloud.com/x-sygihooeqvhmvnnv) has joined #wikimedia-office Oct 02 07:10:11 <TimStarling> you mean like token handling? Oct 02 07:10:23 * rdaiccherlb (~rdaiccher@wikimedia/rdicerb-wmf) has joined #wikimedia-office Oct 02 07:10:36 <anomie> Yes Oct 02 07:10:43 * James_F|Away is now known as James_F Oct 02 07:10:58 * Revi (sid12940@wikimedia/Hym411) has joined #wikimedia-office Oct 02 07:11:11 * ryasmeen|Away is now known as ryasmeen Oct 02 07:11:26 * alantz has quit (Ping timeout: 258 seconds) Oct 02 07:12:11 * alantz (~Anna@tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:12:53 <TimStarling> it looks like you need code review on some changes Oct 02 07:12:59 <anomie> Yes, I do Oct 02 07:15:02 <TimStarling> is there anything else you need? Oct 02 07:15:12 * alantz has quit (Client Quit) Oct 02 07:16:39 <anomie> Not really. Oct 02 07:16:47 <anomie> I'm still not too fond of the decision to go with format=json2 for https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap#Changes_to_JSON_output_format, but I still agree with the points you made at Wikimania that clean breaking is better than random mystery breaking. Oct 02 07:16:47 <TimStarling> I have a set of API feature requests from Tomasz that he sent in June Oct 02 07:17:09 <anomie> I would like to see those Oct 02 07:17:16 <TimStarling> I'll forward Oct 02 07:17:39 * alantz (~Anna@tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:17:51 <TimStarling> the main one that is relevant is a request for "chain queries" Oct 02 07:18:13 <TimStarling> "The fever queries we have to send the better it gets for our users batteries." Oct 02 07:19:00 <TimStarling> so I suppose we are talking about doing multiple actions in a single POST request Oct 02 07:19:19 * InezK_away is now known as InezK Oct 02 07:19:38 <anomie> We already have generators for a common instance in action=query. Details on what other "chains" he's thinking of would be useful. Oct 02 07:19:58 <TimStarling> yeah, he didn't give details, but I assume he knows about generators already Oct 02 07:20:20 * DarTar (~DarTar@wikimedia/DarTar) has joined #wikimedia-office Oct 02 07:21:23 * wisdom is now known as alpha Oct 02 07:22:24 <gwicke> don't forget that SPDY / HTTP2 is around the corner Oct 02 07:22:37 * alantz has quit (Quit: Computer has gone to sleep.) Oct 02 07:23:34 <TimStarling> well, if we are just talking about doing several unconnected API queries in a row, that could be done with pipelining, if the client supported that Oct 02 07:23:42 <gwicke> which eliminates some of the issues that generators are designed to address Oct 02 07:23:59 <TimStarling> but what if you are taking some data from one query and using it in the next query? Oct 02 07:24:00 <anomie> gwicke: No it doesn't. Oct 02 07:24:03 * DanielK_WMDE (~daniel@wikipedia/duesentrieb) has joined #wikimedia-office Oct 02 07:24:06 * alantz (~Anna@tan3.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:24:17 <TimStarling> then it could be arbitrarily complicated Oct 02 07:24:22 <gwicke> TimStarling: right, that is the bit that isn't addressed Oct 02 07:25:07 <gwicke> security is another relevant aspect to consider Oct 02 07:25:15 <gwicke> DOS in particular Oct 02 07:25:51 * alantz has quit (Client Quit) Oct 02 07:25:52 * Jyothis has quit (Remote host closed the connection) Oct 02 07:26:23 * Jyothis (~Jyothis@wikipedia/Jyothis) has joined #wikimedia-office Oct 02 07:26:25 <TimStarling> you mean DOS by means of an expensive query batch? Oct 02 07:26:26 <gwicke> we shouldn't provide entry points that allow somebody to take down the API cluster by visiting some static web page with their cell phone Oct 02 07:27:03 <gwicke> there is a security bug with an example page Oct 02 07:28:10 * TrevorP|Away is now known as TrevorParscal Oct 02 07:28:18 <gwicke> #62615 Oct 02 07:28:49 <yurikR> i would actually prefer to keep queries separate too Oct 02 07:28:53 * parent5446 (parent5446@mediawiki/parent5446) has joined #wikimedia-office Oct 02 07:29:08 <yurikR> if you want to chain requests, lets rely on http-level protocol Oct 02 07:29:11 * alantz (~Anna@tan4.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:29:13 * DarTar has quit (Quit: DarTar) Oct 02 07:29:36 <TimStarling> but splitting it up implies duplicated overhead Oct 02 07:29:48 <yurikR> if some data is needed for consequent request, we either create specific api that understands that (e.g. - generators for query and other) Oct 02 07:29:57 <cscott> or rely on gzip compression to take care of it Oct 02 07:30:16 * alantz has quit (Client Quit) Oct 02 07:30:30 <yurikR> well, the overhead will be negligent if they reuse the same connection, plus caching might make it much more efficient Oct 02 07:30:47 <yurikR> with combining done on the api level, caching is totally busted Oct 02 07:30:52 * Jyothis has quit (Ping timeout: 240 seconds) Oct 02 07:30:56 * aharoni has quit (Remote host closed the connection) Oct 02 07:30:59 <TimStarling> I mean in varnish, apache and HHVM Oct 02 07:31:03 <gwicke> I don't think anybody is proposing to get rid of generators or chaining in general altogether -- it's just that we should be careful about what we use them for, and keep in mind how HTTP/2 affects the trade-offs Oct 02 07:31:15 <TimStarling> there is per-request overhead at each level Oct 02 07:31:22 <TimStarling> especially in HHVM/MW Oct 02 07:31:43 <JetLaggedPanda> re: Tomasz's requests, I think the problem there is action=mobileformat, which apps use (this was the reason for asking about pipelining, IIRC) Oct 02 07:31:44 <TimStarling> also in MySQL Oct 02 07:31:48 <JetLaggedPanda> and that doesn't support generators or anything Oct 02 07:31:58 <JetLaggedPanda> so over time slowly things have been tacked on to it Oct 02 07:32:16 * anomie sees no action=mobileformat on enwiki Oct 02 07:32:18 <TimStarling> there's a big difference in MySQL CPU usage between doing a single query that gets information about 100 pages, and doing 100 queries, one for each page Oct 02 07:32:28 <JetLaggedPanda> anomie: gah, action=mobileview Oct 02 07:32:29 * flyingclimber has quit (Remote host closed the connection) Oct 02 07:32:48 <gwicke> TimStarling: the same is not necessarily true if each of those pages is stored on a different node Oct 02 07:33:03 * mhurd has quit (Quit: mhurd) Oct 02 07:33:05 <yurikR> JetLaggedPanda, there was a big change a while ago that allowed any module to use generators Oct 02 07:33:15 <anomie> JetLaggedPanda: I'd have to look at what exactly action=mobileview is doing, but offhand it sounds like it needs any unique bits rolled into core. Much like a lot of MobileFrontend. Oct 02 07:33:22 <yurikR> so now the mobileview simply needs to be updated to use generators Oct 02 07:33:25 <JetLaggedPanda> anomie: i agree, yeah Oct 02 07:33:29 <TimStarling> we're not going to split storage across hundreds of nodes Oct 02 07:33:54 <JetLaggedPanda> yurikR: yeah, that would be good too, although perhaps it needs general query prop= as well Oct 02 07:34:01 <gwicke> perhaps not hundreds, but we already use dozens Oct 02 07:34:25 <JetLaggedPanda> yurikR: *also*, perhaps this could be solved by simply making mobileview html a prop= for action=query, but I guess that'll have caching implications Oct 02 07:34:28 <TimStarling> I don't think so Oct 02 07:34:46 <yurikR> JetLaggedPanda, yes, i think it should have been done that way :) Oct 02 07:35:13 <gwicke> TimStarling: I agree with your general point, it's just that it might not be an eternal truth to the same degree it's right now Oct 02 07:35:14 * tfinc (~tfinc@wikipedia/Tfinc) has joined #wikimedia-office Oct 02 07:36:42 * alantz (~Anna@tan3.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:37:58 <TimStarling> #info implementation by anomie is proceeding, some changes just need code review and merge Oct 02 07:37:59 <gwicke> there are for example wins in making more API requests static by storing or caching them; combined with different cost structures in HTTP/2 some applications might actually perform better when they do a few parallel requests vs. hitting a custom, uncached entry point Oct 02 07:38:41 <TimStarling> #info Tomasz requested a "chain query" feature, but we need specific requirements Oct 02 07:39:06 <gwicke> I see it more as a gradual shift Oct 02 07:39:49 <TimStarling> you can't cache API responses Oct 02 07:40:33 <gwicke> it's not technically impossible Oct 02 07:40:40 <TimStarling> maybe you could if it were REST, but it is too difficult to invalidate the multiple URL variants enabled by the action API Oct 02 07:40:44 <anomie> Some API response can be cached, mostly action=query. We already emit cache-control headers indicating what MediaWiki thinks about cacheability. Oct 02 07:41:04 <anomie> True, people might have stale caches then. Oct 02 07:41:12 <TimStarling> the client requests cache-control headers Oct 02 07:41:37 <TimStarling> the client is explicitly requesting a stale cache since there is no way to update those caches once they are generated Oct 02 07:41:40 * James_F is now known as James_F|Away Oct 02 07:42:25 * gwicke nods Oct 02 07:42:29 * bearND has quit (Remote host closed the connection) Oct 02 07:42:44 <TimStarling> maybe we could normalize requests in varnish... Oct 02 07:42:46 <DanielK_WMDE> there's a lot of stuff on that rfc page. perhaps it would be good to split it to ease discussion. Oct 02 07:42:48 <anomie> The major opportunity for caching is revision content, for which gwicke is already working on a REST API specifically intended for heavy caching. Oct 02 07:43:12 <DanielK_WMDE> the way things a structured now, i'm afraid some high profile discussions may drown out talk about some finer points Oct 02 07:43:28 <gwicke> I think it might be worth looking for other resources that could potentially be cacheable with the right URL structure Oct 02 07:43:50 <gwicke> and have the right granularity / access pattern for this to make sense Oct 02 07:43:59 <TimStarling> even with normalization, you still have things like rvprop Oct 02 07:44:25 * bearND (~bearnd@guest-tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:44:27 <TimStarling> with REST, you just send all the data, but with api.php, each application will request a different rvprop Oct 02 07:44:41 * TrevorParscal is now known as TrevorP|Away Oct 02 07:45:29 <TimStarling> so even in that simple case, you multiply the cache space requirement by several Oct 02 07:46:13 * alantz has quit (Quit: Computer has gone to sleep.) Oct 02 07:46:20 <gwicke> yeah, it only makes sense if the number of variants is more limited Oct 02 07:46:38 <gwicke> which is something we could try to move towards for newer modules Oct 02 07:46:55 <gwicke> where the trade-offs make sense Oct 02 07:47:02 <TimStarling> for purging, imagine if you had to send an HTCP purge request for each rvprop combination Oct 02 07:47:13 * alantz (~Anna@tan4.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:48:00 <anomie> DanielK_WMDE: There's basically no discussion happening there at the moment, so I doubt anything is being drowned out. Although at some point (not now) I'd still like to hear your thoughts on what makes things like ApiResult::setIndexedTagName hard for you to use (without getting into redesigning the whole thing around a forest of objects, that was discussed enough at Wikimania IMO). Oct 02 07:48:06 <gwicke> returning more props by default would probably not make a big difference in request size, and could still result in a faster response if the response is cached in exchange Oct 02 07:49:13 * alantz has quit (Read error: Connection reset by peer) Oct 02 07:49:21 * Guest24138 has quit (Changing host) Oct 02 07:49:21 * Guest24138 (sid13042@wikimedia/DangSunM) has joined #wikimedia-office Oct 02 07:49:31 * alantz (~Anna@tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:49:36 * Guest24138 is now known as DangSunM|cloud Oct 02 07:49:39 <gwicke> there are some entry points where the choices culd perhaps be reduced a bit without major ill effects Oct 02 07:50:22 <TimStarling> #info gwicke suggests we consider a gradual shift towards greater edge caching coupled with the use of SPDY, as a replacement for batches embedded in single queries (incl. generators) Oct 02 07:50:53 <gwicke> that's overstating it quite a bit Oct 02 07:51:24 <TimStarling> the meetbot command is unprivileged, you can do your own #info if you like Oct 02 07:52:57 * alantz has quit (Read error: Connection reset by peer) Oct 02 07:52:59 <gwicke> #info s/as a replacement for batches/as a replacement for *some* batches and expensive generators/ Oct 02 07:53:21 * alantz (~Anna@tan3.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:53:47 * PPena has quit (Quit: Computer has gone to sleep.) Oct 02 07:54:05 <TimStarling> should we mark this RFC as approved? Oct 02 07:54:14 <gwicke> I also think that we could do some of the assembly and orchestration in an intermediate layer Oct 02 07:54:18 * JetLaggedPanda is now known as YuviPanda|zzz Oct 02 07:54:51 <gwicke> netflix of example has been doing something like that: http://techblog.netflix.com/2012/07/embracing-differences-inside-netflix.html Oct 02 07:54:52 * awight has quit (Remote host closed the connection) Oct 02 07:55:11 * Krenair thinks we should Oct 02 07:55:15 <TimStarling> I think the reason RFCs don't get approved is that we worry that by marking an RFC approved, we are approving every little aspect Oct 02 07:55:59 <anomie> It's fine with me to mark it as approved; I've been treating it that way for a while now. Oct 02 07:56:22 <anomie> The only drawback might be that it might discourage further discussion and further things for my "TODO" list. Oct 02 07:56:25 * mhurd (~anonymous@tan2.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:56:28 * moizsyed (~moizsyed@tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:57:13 * kaity|away is now known as kaity Oct 02 07:57:15 <TimStarling> yeah, maybe it makes sense for something this complex to be a living document Oct 02 07:57:44 <anomie> We could move the living document portion of it out of the RFC, although I'm not sure what would be left in the RFC then. Oct 02 07:57:45 <DanielK_WMDE> ...or factor out some parts that can be considered agreed on and treated as a "plan". Oct 02 07:57:50 <yurikR> gwicke and I just spoke about caching a bit, and it seems ideally we should somehow cache certain requests, and devise a well established way to flush them when they become obsolete Oct 02 07:57:54 * jhobs has quit (Ping timeout: 246 seconds) Oct 02 07:57:59 <TimStarling> it suggests a status flow "in draft" -> "archived complete" for big RFCs Oct 02 07:58:17 * jhobs (~jhobson@tan2.corp.wikimedia.org) has joined #wikimedia-office Oct 02 07:58:25 <yurikR> this caching won't apply to every api request, but we really ought to move in that direction Oct 02 07:58:33 <DanielK_WMDE> what does "archived complete" mean? Oct 02 07:58:43 <DanielK_WMDE> "we are done talking"? Oct 02 07:59:00 <TimStarling> it means it will be listed at https://www.mediawiki.org/wiki/Requests_for_comment/Archive#Implemented Oct 02 07:59:15 <TimStarling> yes, which means we are done talking Oct 02 07:59:19 * alantz has quit (Quit: Computer has gone to sleep.) Oct 02 07:59:23 * Jeff_Green (~jgreen@wikipedia/jgreen) has left #wikimedia-office Oct 02 08:00:01 <TimStarling> we presumably won't discuss archived RFCs in public IRC meetings or architecture committee meetings Oct 02 08:00:33 * alantz (~Anna@tan1.corp.wikimedia.org) has joined #wikimedia-office Oct 02 08:00:43 <TimStarling> for the API roadmap, the work could theoretically be eternal Oct 02 08:00:54 <DanielK_WMDE> yea, makes sense Oct 02 08:00:58 * ori (~ori@wikipedia/ori-livneh) has joined #wikimedia-office Oct 02 08:01:00 <TimStarling> but I prefer to see RFCs as change requests that can be approved and completed Oct 02 08:01:08 * zz_MissGayle (~gyoung@ec2-50-112-50-28.us-west-2.compute.amazonaws.com) has joined #wikimedia-office Oct 02 08:01:10 * zz_MissGayle is now known as MissGayle Oct 02 08:01:14 <DanielK_WMDE> in such a case, the goal of the rfc is not to implement a feaqture, but to agree on a general plan Oct 02 08:01:31 * MissGayle has quit (Changing host) Oct 02 08:01:31 * MissGayle (~gyoung@wikimedia/gyoung) has joined #wikimedia-office Oct 02 08:01:38 * PPena (~PPena@tan2.corp.wikimedia.org) has joined #wikimedia-office Oct 02 08:01:45 <TimStarling> maybe the RFC should be called "API roadmap 1" Oct 02 08:01:51 <anomie> DanielK_WMDE: I think that's a good summary of RFCs in general. Oct 02 08:01:54 <TimStarling> which can be marked approved Oct 02 08:02:20 <TimStarling> then while that is being implemented, an "API roadmap 2" RFC can be the parking lot for design of the next batch of features Oct 02 08:03:09 <TimStarling> then we can schedule a meeting to discuss "API roadmap 2" and we will know that that means we are looking forward not back Oct 02 08:03:22 <AaronS> heh Oct 02 08:03:41 <TimStarling> you know it is nice when people don't have to read so much Oct 02 08:03:51 * James_F|Away is now known as James_F Oct 02 08:03:53 * mhurd has quit (Quit: mhurd) Oct 02 08:04:14 <TimStarling> Daniel complained about the RFC being big already, but it has a lot of complete stuff mixed with plans for the near future, plus a few plans for the somewhat more distant future Oct 02 08:04:34 <DanielK_WMDE> anomie: that's my understanding too, but the final status is currently called "implemented". That'S a lot more than "agreed on a plan". Oct 02 08:05:08 <TimStarling> we have "accepted" also Oct 02 08:05:14 * Ltrlg has quit (Quit: Leaving.) Oct 02 08:05:18 <anomie> So, to summarize: RFC is approved, the "living document" aspect should be abstracted out into a project page of some sort (I'll do that), and when we have enough of a backlog of non-trivial changes we'll make a new RFC (I'll probably do that too when the time comes). Oct 02 08:05:27 * parent5446 (parent5446@mediawiki/parent5446) has left #wikimedia-office ("wikimedia-office") Oct 02 08:05:52 <TimStarling> yeah, makes sense I think Oct 02 08:06:24 * moizsyed has quit (Remote host closed the connection) Oct 02 08:07:02 * kaity is now known as kaity|away Oct 02 08:07:05 * alantz has quit (Quit: Computer has gone to sleep.) Oct 02 08:07:06 * kaity|away is now known as kaity Oct 02 08:07:23 <TimStarling> #action anomie to abstract the "living document" aspect of the RFC out to a project page Oct 02 08:07:36 <TimStarling> ok, anything else before I end the meeting? Oct 02 08:07:53 * kristenlans has quit (Quit: kristenlans) Oct 02 08:07:54 <anomie> Not from me, I was about to leave the meeting anyway Oct 02 08:08:07 * bearND has quit (Remote host closed the connection) Oct 02 08:08:26 <TimStarling> #endmeeting