Architecture meetings/RFC review 2013-12-18

Wednesday, December 18, 2013 at 10:00 PM UTC at #wikimedia-meetbot ^connect.

Requests for Comment to review

Propose your own RFCs:

Summary and logs

Meeting summary

wikimedia-meetbot Meeting

Meeting started by drdee at 22:03:29 UTC (full logs).

1. https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-18 (bd808, 22:05:57)
2. http://etherpad.wikimedia.org/p/RFC%20review (bd808, 22:06:37)
Localisation format RFC (drdee, 22:09:13)
1. https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format (bd808, 22:09:25)
2. ACTION: RoanKattouw to remove groups (TimStarling, 22:24:58)
3. ACTION: RoanKattouw to look at the number of stat() calls and consider optimisations (TimStarling, 22:34:34)
PHP web service interface (TimStarling, 22:37:39)
1. https://www.mediawiki.org/wiki/Requests_for_comment/PHP_web_service_interface (TimStarling, 22:37:46)
2. https://www.mediawiki.org/wiki/Requests_for_comment/Services_and_narrow_interfaces (gwicke, 22:42:40)
3. ACTION: AaronSchulz to propose an API (TimStarling, 22:51:13)
4. ACTION: AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling (TimStarling, 22:59:12)

Meeting ended at 23:15:26 UTC (full logs).

Action items

RoanKattouw to remove groups
RoanKattouw to look at the number of stat() calls and consider optimisations
AaronSchulz to propose an API
AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling

Action items, by person

AaronSchulz
1. AaronSchulz to propose an API
2. AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling
RoanKattouw
1. RoanKattouw to remove groups
2. RoanKattouw to look at the number of stat() calls and consider optimisations

People present (lines said)

RoanKattouw (83)
gwicke (71)
TimStarling (62)
parent5446 (34)
siebrand (20)
ori-l (17)
James_F (14)
AaronSchulz (10)
Nikerabbit (9)
drdee (9)
bd808 (8)
MaxSem (6)
robla (5)
Nemo_bis (3)
meetbot-wm (3)

Generated by MeetBot 0.1.4.

Full log

22:03:29 <drdee> #startmeeting
22:03:29 <meetbot-wm> Meeting started Wed Dec 18 22:03:29 2013 UTC. The chair is drdee. Information about MeetBot at https://bugzilla.wikimedia.org/46377.
22:03:29 <meetbot-wm> Useful Commands: #action #agreed #help #info #idea #link #topic.
22:04:50 <siebrand> It appears there is no architect present.
22:05:01 <drdee> yup, let's wait a couple of more minutes
22:05:06 <Nemo_bis> engineers are usually happy about that
22:05:57 <bd808> #link https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-18
22:06:10 <robla> I sms'ed TIm just now
22:06:28 <drdee> thanks robla
22:06:37 <bd808> #link http://etherpad.wikimedia.org/p/RFC%20review
22:07:36 <TimStarling> hi, sorry about that
22:07:49 <drdee> #chair TimStarling
22:07:49 <meetbot-wm> Current chairs: TimStarling drdee
22:07:55 <drdee> hi Tim
22:08:01 <drdee> are we good to go?
22:08:30 <TimStarling> yes
22:08:33 <drdee> first RFC Localisation format?
22:09:09 <TimStarling> ok
22:09:13 <drdee> #topic Localisation format RFC
22:09:23 <TimStarling> I did write a few comments about this one on the talk page
22:09:25 <bd808> #link https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format
22:09:37 <parent5446> OK, so how exactly are groups being done here. There was a brief mention of message prefixing in Discussion
22:09:54 <parent5446> If messages are separated into groups, how does core know which messages are where?
22:10:06 <siebrand> RoanKattouw: Can you comment?
22:10:18 <RoanKattouw> Yeah so the message groups are mostly for future application
22:10:32 <RoanKattouw> In the WIP implementation that I wrote, they are ignored
22:10:41 <siebrand> Groups are not relevant for the PHP implementation. All messages are still in the server side localisation cache.
22:10:55 <RoanKattouw> That is to say, you can specify multiple directories with JSON files in them, and you're required to name each of them as a group
22:11:04 <gwicke> James_F: we are still drafting those, so better to wait until Jan
22:11:08 <James_F> gwicke: OK.
22:11:09 <RoanKattouw> But the PHP message loader doesn't actually care that there are multiple directories or what their names are
22:11:16 <RoanKattouw> It just visits all of them and extracts all the messages
22:11:51 <RoanKattouw> In the future, I think that message grouping could be useful as a replacement for messages arrays in ResourceLoader definitions, or at least for us to identify which messages are needed in the frontend
22:12:43 <TimStarling> groups are less flexible than message lists
22:12:50 <RoanKattouw> That's true
22:12:58 <RoanKattouw> I'm not convinced that we'll use them for this purpose yet
22:13:03 <TimStarling> say, if someone needs one group plus one message from another group
22:13:16 <TimStarling> they might be inclined to grab both groups
22:13:23 <RoanKattouw> I think it came up while discussing a potential future RfC for changing how we do client-side localization (moving to jquery.i18n perhaps)
22:13:25 <RoanKattouw> Yeah, that's a valid concern
22:13:40 <RoanKattouw> I personally am fine with dropping the group names and just making it a flat array
22:13:47 <MaxSem> woudn't per-language i18n files make things slower on wikis without manual cache rebuild?
22:13:48 <RoanKattouw> That wouldn't even break the code I wrote
22:13:57 <RoanKattouw> MaxSem: Why would they?
22:14:10 <MaxSem> more stats?
22:14:21 <RoanKattouw> On the topic of group names for another second, does anyone object to dropping the group names?
22:14:31 <RoanKattouw> Asking for the opinions of the RfC co-authors in particular
22:14:34 <Nikerabbit> I thought one point of the message groups was to allow automatic prefixing... but was that already moved out of the RfC?
22:14:34 <James_F> RoanKattouw: We're we planning to use them for the follow-up RfCs?
22:14:37 <MaxSem> admittedly, I'm not very knowledgeable in LocalisationCache
22:14:41 <RoanKattouw> Because if no one is particularly attached to them, let's just kill them
22:14:46 <parent5446> I agree that the best idea for now would be to drop group names.
22:14:47 <James_F> RoanKattouw: The automatic prefixing in particular, but other things too.
22:14:47 <siebrand> I think the groups also remove some maintenance burden on the developers. I wouldn't drop it, as adding it back later may prove to be a lot of work.
22:14:48 <TimStarling> MaxSem: no, it's a good point
22:15:02 <RoanKattouw> James_F: Yeah but I'm not convinced they're particularly useful. Autoprefixing could be useful, though, yes
22:15:21 <James_F> Adding groups later could be a real pain, as siebrand says.
22:15:24 <RoanKattouw> siebrand makes a good point, it's easy to remove them later but hard to add them later
22:15:27 <parent5446> If you add groups now but don't do something like auto-prefixing, adding in auto-prefixing later will be a lot harder than adding in groups later.
22:15:51 <siebrand> What would auto prefixing accomplish?
22:16:00 <siebrand> And auto prefixing what exactly?
22:16:06 <Nikerabbit> of message keys
22:16:09 <Nikerabbit> not that much I think
22:16:14 <TimStarling> MaxSem: there won't be a huge number of extra stats, just the length of the fallback sequence
22:16:15 <RoanKattouw> Auto prefixing of message keys so you could share the same i18n file between different applications
22:16:22 <TimStarling> it'll be like loading core messages
22:16:26 <RoanKattouw> But I don't see much benefit in that
22:16:40 <RoanKattouw> (Let's hold the stat discussion for just one minute)
22:17:00 <RoanKattouw> You could just as well prefix the messages with the name of your application/extension
22:17:01 <James_F> RoanKattouw: It's so the author of an extension doesn't need to be prescient about what other extensions may be called, I thought.
22:17:11 <RoanKattouw> And that would presumably be unique enough in any context you integrated it into
22:17:15 <James_F> RoanKattouw: And the same messages be used in MW and non-MW context easily.
22:17:39 <parent5446> If you implement groups now and do *not* include a method of telling the software which messages are where, adding in a method to do that later will be very difficult.
22:17:50 <parent5446> Auto-prefixing is one of those methods.
22:18:30 <RoanKattouw> To be clear, auto-prefixing is not primarily intended as a way to tell the software which messages are where
22:18:32 <siebrand> I don't like auto prexing. namespaces or domains: possibly..
22:18:36 <RoanKattouw> Although it could be used that way to optimize loading
22:18:45 <parent5446> Then there is also the future possibility that, eventually, (even if out of scope for this current RFC), that the CDB cache will be split based on groups.
22:18:46 <RoanKattouw> Anyway
22:18:49 <siebrand> auto prefixing is a pain, because for example special page names, etc. do not participate in the prefixes.
22:18:53 <siebrand> So that'll be a pain to implement properly.
22:19:01 <RoanKattouw> It's clear that there is no consensus whatsoever for auto-prefixing
22:19:09 <siebrand> That'll significantly delay the implementation of the RfC as it is phrased now.
22:19:19 <RoanKattouw> We haven't discussed it properly and it's out of scope of this RfC
22:19:26 <siebrand> +1
22:19:33 <RoanKattouw> So let's keep things in scope
22:19:33 <James_F> So…
22:19:41 <RoanKattouw> Should we have group names or should we not have them?
22:20:06 <RoanKattouw> I argue that until we have a use for them, we should not have them now, and perhaps have them later. They will be optional for b/c, and using purely numeric group names will be forbidden
22:20:11 <parent5446> Like I said before, unless there is a comprehensive plan on exactly what to do with groups other than just as a means of organizing messages, they should not be added.
22:20:13 <RoanKattouw> That way we can distinguish between flat arrays and named groups
22:20:43 <RoanKattouw> If any group-related name-mangling is to happen, that needs to be introduced at the same time as the grouping system, otherwise b/c will be a massive pain
22:20:53 <siebrand> RoanKattouw: Can we have both with little effort?
22:21:01 <RoanKattouw> Both of what?
22:21:29 <siebrand> RoanKattouw: support for group names, or a flat array.
22:21:45 <RoanKattouw> Yes, that's what I'm saying
22:21:55 <RoanKattouw> We should support both in both directions
22:22:02 <siebrand> RoanKattouw: Okay, I must have missed some text.
22:22:12 <RoanKattouw> My current WIP implementation supports both groups and flat arrays because it completely ignores the array keys
22:22:32 <RoanKattouw> Any future implementation of groups should be tolerant of flat arrays without group names (hence the ban on numbers as group names)
22:23:20 <siebrand> is may result in reduced capabilities to not have group names, but that would be future functionality that will not harm existing code.
22:23:31 <siebrand> s/is may/It may/
22:23:48 <Nikerabbit> I think a concerete example here would make this clearer
22:24:09 <Nemo_bis> like the one bd808 added?
22:24:23 <siebrand> ULS currently implements it's own API class to serve a JSON file through RL.
22:24:24 <RoanKattouw> Does anyone object to dropping groups (both the subject in this meeting, and from the RfC) at this point?
22:24:28 <TimStarling> ok, can the implementation omit group names for now, since it's obvious that that's the only solution parent5446 wants, and everyone else seems to be content with it?
22:24:41 <TimStarling> then we can move on to the next issue
22:24:44 <James_F> Sure.
22:24:45 <siebrand> In the future, we see this file being served by making a generic request.
22:24:47 <RoanKattouw> Yeah let's discuss it later
22:24:58 <TimStarling> #action RoanKattouw to remove groups
22:25:05 <RoanKattouw> siebrand: I think that should work completely different anyway. But that's a different discussion for a different day and a different RfC
22:25:09 <RoanKattouw> *differently
22:25:16 <RoanKattouw> OK, so MaxSem said something about stats
22:25:36 <RoanKattouw> There was a concern that splitting languages into separate files would harm performance for wikis without pre-built caches
22:25:39 <TimStarling> yeah, for non-english page views, you would expect this feature to roughly double the number of stats
22:26:01 <TimStarling> since say fr.json and en.json will both have to be checked for freshness
22:26:03 <RoanKattouw> I haven't tested fallbacks with my code yet
22:26:15 <RoanKattouw> But my code doesn't exhibit this behavior
22:26:18 <MaxSem> also, cache rebuilds would be slower but that's not critical
22:26:25 <RoanKattouw> I'm also not clear on how MessageCache handles fallbacks
22:26:35 <Nikerabbit> Why would they be slower?
22:26:45 <RoanKattouw> They wouldn't necessarily be slower overall
22:26:48 <parent5446> If this would turn out to be an issue (not sure if it is), we could always do what jQuery i18n allows: having all languages in one file as a fallback.
22:26:57 <RoanKattouw> What would slow them down is the need to open more files (both fr.json and en.json for French)
22:27:10 <RoanKattouw> However, the amount of data it has to read in is still 100x less for an extension
22:27:30 <RoanKattouw> Because ExtensionName.i18n.php contains the messages for all 200+ languages and you can't selectively read from it
22:27:48 <parent5446> RoanKattouw: MessageCache doesn't really handle fallbacks at the moment. This only really affects LocalisationCache.
22:27:57 <MaxSem> this is access speed vs. latency. I'm all for making SSDs a requirement for MW:)
22:27:59 <RoanKattouw> Right, sorry I meant to say LocalisationCache
22:28:07 <parent5446> Ah sorry.
22:28:12 <RoanKattouw> My bad
22:28:24 <RoanKattouw> I'm not entirely unconfused as to how the i18n system in core works :)
22:28:28 <TimStarling> well, the case you have to think about is NFS
22:28:47 <TimStarling> since a lot of shared web hosting is apparently done over NFS or some equivalent slow network storage
22:29:07 <RoanKattouw> I see now that I have made a mistake in my implementation and that fallbacks will most likely be broken
22:29:19 <Nikerabbit> In that case one would hope they do manual localisation cache rebuilds
22:29:42 <James_F> RoanKattouw: That's why it's WIP. :-)
22:29:44 * RoanKattouw -1s his own code
22:29:53 <TimStarling> Nikerabbit: you mean someone technically competent who also uses shared hosting instead of a VPS?
22:29:53 <MaxSem> Nikerabbit, if they only had shell access...;)
22:30:04 <TimStarling> I'm not sure such people exist...
22:30:11 <RoanKattouw> Right, so we'd roughly double stat()s for them
22:30:38 <RoanKattouw> Which hurts the freshness checks
22:30:52 <Nikerabbit> TimStarling: I'm confident some of them would able to read and follow a documentation that states it can make MediaWiki faster
22:31:04 <RoanKattouw> I wonder if we can get the mtime of the directory instead of the individual files? I'm not quite sure what the semantics of that are
22:31:39 <TimStarling> what if we batch the checks, by storing a timestamp and only checking once every, say, 1 minute?
22:31:39 <bd808> `$stat = stat('\path\to\directory');`
22:31:49 <TimStarling> RoanKattouw: no, you can't
22:32:02 <TimStarling> the mtime of the directory is only updated when a file is created or removed
22:32:07 <RoanKattouw> Blegh
22:32:09 <James_F> Helpful.
22:32:09 <RoanKattouw> Thanks UNIX
22:32:28 <Nikerabbit> do we have any idea how big issue the stat calls can be?
22:32:36 <RoanKattouw> TimStarling: That sounds like a reasonable idea. We can probably work some magic in a custom CacheDependency subclass
22:32:50 <RoanKattouw> Nikerabbit: Not until we try it on a slow NFS setup? :)
22:33:01 <Nemo_bis> Uh! At last a use for gluster
22:33:02 <RoanKattouw> More seriously, we should compare stat() calls before and after
22:33:05 <parent5446> We could implement a manual stat(), i.e., have a file in the directory called mtime.txt or something. Every time the cache file is changed, update that file. Then it will act as a pseudo-mtime for all files in the directory.
22:33:05 <RoanKattouw> hahahaha
22:33:12 <parent5446> Not the cleanest solution but a possibility.
22:33:30 <RoanKattouw> parent5446: There's no need for that, CacheDependency will let us do nicer things
22:33:48 <parent5446> Ah OK. Didn't think about CacheDependency.
22:33:50 <RoanKattouw> Anyway
22:34:04 <Nikerabbit> I'm worried that we are spending a lot of effort on fine-tuning stat calls why other parts of the code have bigger effect...
22:34:19 <RoanKattouw> Are we agreed that I'll look at the number of stat() calls and maybe write a CacheDependency subclass if we need it?
22:34:34 <TimStarling> #action RoanKattouw to look at the number of stat() calls and consider optimisations
22:34:37 <parent5446> Yep
22:34:38 <RoanKattouw> Niklas is right, we don't even know if this is an issue or if it'll be eclipsed by something else
22:35:06 <RoanKattouw> Although this is the one thing that happens on every request (freshness check) so that's not a great thing to slow down
22:35:10 <RoanKattouw> Alright
22:35:14 <RoanKattouw> What else
22:35:20 <parent5446> "While handling JSON may be slower than using PHP (we have no benchmarks on this)"
22:35:30 <parent5446> Can we get benchmarks on this?
22:35:41 <siebrand> There's a stat on every request for every message group?
22:35:42 <ori-l> json_encode is often faster than serialize, I've found
22:35:43 <gwicke> RoanKattouw: you can always concatenate those json files, store an offset index and check for updates every <n> accesses so that those stats are amortized
22:36:03 <gwicke> not much fun, but doable..
22:36:04 <James_F> We now have VE messages in parallel in i18n.php and *.json, so theoretically.
22:36:07 <TimStarling> should we go to the next RFC?
22:36:21 <TimStarling> we've just about got enough time to fit another one in
22:36:22 <James_F> Is getting benchmarks needed?
22:36:34 <ori-l> no, IMO.
22:36:39 <RoanKattouw> Not really IMO
22:36:43 <James_F> OK, saves another action item.
22:36:45 <RoanKattouw> It may make sense to measure the entire process
22:36:48 <parent5446> The one thing I'm concerned is that using json_encode will cause much more memory usage.
22:36:57 <parent5446> Since the entire file is loaded into memory before being parsed.
22:36:58 <RoanKattouw> But we don't care terribly about the recache operation, since it writes to a cache
22:37:06 <RoanKattouw> And so it's done infrequently
22:37:08 <James_F> In that case, move to the next RfC.
22:37:12 <RoanKattouw> IMO the freshness check is more important
22:37:13 <ori-l> yeah, let's do another one
22:37:16 <bd808> parent5446: Is that not the case with a php file?
22:37:19 <drdee> next RFC PHP web service interface ?
22:37:22 * ori-l was late to the party and wants in on some RFC action
22:37:27 <RoanKattouw> Yeah if no one else has questions about this one, let's move on
22:37:32 <parent5446> bd808: No, it's read and parsed incrementally.
22:37:36 <parent5446> And yeah let's just move on.
22:37:39 <TimStarling> #topic PHP web service interface
22:37:45 <RoanKattouw> Link to RFC?
22:37:46 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/PHP_web_service_interface
22:37:47 <siebrand> Thanks for the comments and discussion, everyone.
22:37:48 <RoanKattouw> I haven't seen this one
22:38:01 <gwicke> that's still at an early draft stage
22:38:04 <parent5446> So I mentioned this on the discussion page, but this is literally Guzzle.
22:38:21 <parent5446> Oh if it's too early to discuss we can leave it for later.
22:38:31 <gwicke> Aaron and me have been discussing implementation options
22:38:39 <TimStarling> so this is coming out of the cloudfiles work?
22:39:00 <gwicke> partly, that's what Aaron is working on
22:39:17 <gwicke> my motivation is making it easy to work with web services from PHP
22:39:18 <ori-l> I think the name is misleading. You're proposing a generic, library-like load-balancing function that takes a collection of URLs as input, right?
22:39:44 <RoanKattouw> I think there is a lot of context missing from this RfC
22:39:58 <RoanKattouw> Does this replace the routing engine in MW?
22:40:02 <gwicke> ori-l, it works on paths; the storage backends map those to URIs
22:40:10 <parent5446> For the record, Guzzle has a really nice tool where it keeps an array of various web services, all with different configurations.
22:40:13 <RoanKattouw> Would, say, /wiki be one of the paths that would be matched?
22:40:15 <gwicke> it can also support full URIs, but that is not the main motivation
22:40:19 <TimStarling> is AaronSchulz actually online?
22:40:51 <ori-l> gwicke: so an expanded ArrayUtils::consistentHashSort, right?
22:41:01 <gwicke> parent5446: there are nice features in guzzle, it just does not seem to be certain that the implementations would work for us
22:41:06 * AaronSchulz is around, yes
22:41:19 <gwicke> that is an implementation question though
22:41:27 <gwicke> the RFC is more about the API than the implementation
22:41:54 <ori-l> what existing services would we port to use the API?
22:42:00 <TimStarling> do we have an immediate second application, or would it just be an abstraction of cloudfiles stuff?
22:42:01 <gwicke> ori-l: how load balancing is implemented depends on the backend handler
22:42:05 <ori-l> it'd be helpful to have a list; that way we can identify commonalities
22:42:29 <gwicke> TimStarling: my motivation is the storage service and related service apis
22:42:40 <gwicke> https://www.mediawiki.org/wiki/Requests_for_comment/Services_and_narrow_interfaces
22:42:47 <gwicke> to be discussed later
22:43:43 <ori-l> there is functionality in the objectcache classes for doing this that I have wanted to use in the past (I can't remember what for, frustratingly) and that I found to be too tightly coupled to objectcache specifically
22:43:44 <gwicke> there are already a bunch of web services that we are using including swift, parsoid, the math service and the pdf renderer
22:43:59 <gwicke> there will be more, so making is easy to work with them might be a good idea
22:44:06 <TimStarling> ok, well that is what I want to see on the RFC, I think
22:44:09 <TimStarling> a list of subclasses
22:44:32 <gwicke> you mean a list of services to abstract over?
22:44:53 <gwicke> the API is intended to be open-ended regarding handlers
22:45:01 <TimStarling> well, the RFC has
22:45:04 <ori-l> existing classes that would be ported to use this API and projected classes that would use it
22:45:05 <TimStarling> / General Rashomon storage service for all remaining buckets
22:45:06 <TimStarling> $wgStoreBackends['/'] = new RashomonBackend ( array (
22:45:28 <TimStarling> are you saying RashomonBackend is not a subclass of something in this RFC? it is its own thing?
22:45:45 <gwicke> that is an implementation detail to be figured out
22:45:55 <gwicke> IMO we won't need subclassing there
22:46:01 <gwicke> implementing an interface would be enough
22:46:02 <parent5446> Sorry, but I have no idea what Rashomon is.
22:46:17 <TimStarling> presumably a storage service
22:46:31 <gwicke> parent5446: https://www.mediawiki.org/wiki/Requests_for_comment/Storage_service
22:46:49 <gwicke> it is the revision storage service we wrote for HTML storage
22:46:54 <TimStarling> if you need more applications, maybe you could include EhcacheBagOStuff?
22:47:22 <gwicke> anything that speaks HTTP basically, and is worth making more convenient to work with
22:47:56 * gwicke looks up EhcacheBagOStuff
22:47:58 <TimStarling> I'm just worried that if the only immediate application is swift, it will end up looking swift-like, and everything that uses it in the future will have to fit into a swift-like API
22:48:22 <TimStarling> unless the API is planned to be solely following HTTP?
22:49:03 <gwicke> it is supposed to be a very convenient and parallel way to do HTTP
22:49:30 <TimStarling> ok, so who is going to write the API, because that is going to be an action item
22:49:41 <ori-l> tying it to HTTP seems a bit odd
22:49:45 <ori-l> what do you gain by that?
22:49:51 <gwicke> the idea is to put effort into the design of the HTTP APIs so that they don't need much extra wrapping apart from some convenience like auth, Content-MD5 etc
22:49:52 <TimStarling> ori-l: that's just what it is
22:49:54 <ori-l> rather, what would something more generic not be able to provide?
22:49:56 <TimStarling> an HTTP client
22:50:10 <TimStarling> more generic things can be built on top of it
22:50:20 * AaronSchulz doesn't want things to be too generic
22:50:28 <ori-l> hmmm
22:50:29 <gwicke> the paths can map to non-http stuff too
22:50:32 <TimStarling> like MaxSem's key/value store
22:50:32 <AaronSchulz> you end up not able to assume anything, or something very complex
22:50:35 <gwicke> but the abstraction is still HTTP-like
22:50:46 <gwicke> paths, headers and HTTP verbs
22:50:48 <ori-l> so the thought is that this would steer people toward designing restful services with nice APIs?
22:51:09 <ori-l> i like that, but it makes it especially important that the API be intuitive, easy, and well-documented
22:51:13 <TimStarling> #action AaronSchulz to propose an API
22:51:17 <gwicke> that too
22:51:53 <gwicke> ori-l: see the problem statement
22:52:12 <gwicke> so any comments on the API that is proposed in the RFC?
22:52:47 <TimStarling> that's half an API
22:53:05 <TimStarling> not even half
22:53:24 <bd808> I don't see an API so much as a list of lists
22:54:01 <gwicke> you might be able to extrapolate past the things that are not spelled out explicitly
22:54:14 <TimStarling> it's a bit tricky since I'm only reading this for the first time in this meeting
22:54:35 <gwicke> *nod*
22:54:45 <TimStarling> but I don't get what $wgStoreBackends is and what a generic store is for
22:54:48 <parent5446> It's definitely going to be a bit restricting having requests represented as arrays rather than proper objects with properties.
22:55:02 <TimStarling> if it's an HTTP client, it shouldn't need configuration, it should be configured by its constructor
22:55:31 <gwicke> TimStarling, it is a service client that is close to HTTP
22:55:33 <ori-l> I think the idea is that $wgStoreBackends is Rashomon-specific, but the class (not included in the RFC) that transforms the array of URLs into an API is what is being proposed
22:55:55 <bd808> The internal implementation could use lists but it should have a builder interface of some sort I would think.
22:56:05 <gwicke> but the configured services are additionally load-balanced and can have some more service-specific behavior
22:56:10 <bd808> s/lists/arrays/
22:56:14 <gwicke> they might not actually speak http to the backend for example
22:56:22 <gwicke> dispatching is based on paths
22:56:31 <parent5446> So it's HTTP-oriented but not necessarily HTTP-backed.
22:56:48 <TimStarling> ok, well I'm sure an API with class names and methods in classes will make this clearer
22:56:49 <gwicke> the default storage service is closely related to the bucket idea in the storage service RFC
22:57:25 <gwicke> TimStarling: are you mainly interested in the API backends should implement?
22:57:30 <TimStarling> AaronSchulz: have you surveyed existing HTTP client libraries in PHP?
22:57:44 <parent5446> AKA, Guzzle
22:57:51 <parent5446> ;)
22:58:20 <gwicke> the store API is fairly thin so far
22:58:33 <gwicke> a run method and maybe some convenience wrappers for get/post etc
22:58:39 <gwicke> and arrays as input
22:58:56 <AaronSchulz> I don't think we anything too complex so far
22:59:12 <TimStarling> #action AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling
22:59:15 * AaronSchulz remembers encountering guzzle in some AWS code...it's bit complex
22:59:18 <gwicke> the interesting stuff would happen in the handlers selected by path prefix
22:59:44 <AaronSchulz> unless a good portion of it's features were useful and non-trivial I wouldn't bother
22:59:54 <parent5446> Guzzle really isn't complex at all. In fact, if you're working with an actual REST API, most of the configuration can be done in JSON.
22:59:57 * AaronSchulz doubts that the auth stuff would be adaquate
23:00:10 <parent5446> It also has OAuth support and other auth.
23:00:13 * AaronSchulz is talking about source code
23:01:09 <robla> one alternative: https://github.com/kriswallsmith/Buzz
23:01:28 <gwicke> the basic functionality is already pretty much there in the curl_multi clients we have been using
23:01:37 <robla> (I know nothing about it other than quick search for Guzzle alternatives)
23:01:41 <gwicke> afaik much of the missing stuff is auth
23:02:02 * AaronSchulz snickers at https://github.com/kriswallsmith/Buzz/blob/master/lib/Buzz/Client/MultiCurl.php
23:02:28 <parent5446> I don't know about Guzzle's source code, but as a library it is definitely feature-ful and easy to use. There are very few things it does not support that we would need (except for, of course, non-HTTP backends).
23:02:57 <gwicke> parent5446, it might or might not be useful as a backend- we'll see
23:03:31 <gwicke> IMO that should not matter at this point
23:03:42 <parent5446> But it's not a backend. It has an API to use. If you're thinking of covering up Guzzle with *another* API then it's pointless.
23:04:26 <gwicke> if you don't see any value in the abstractions mentioned in the RFC, then yes
23:04:52 <TimStarling> alright, any other action items for this RFC?
23:05:12 <TimStarling> we're out of time now
23:05:54 <gwicke> the hope was to get some feedback on the backend abstractions in the RFC
23:06:11 <AaronSchulz> gwicke: well you could build it over guzzle, though that would be pretty heavy
23:06:15 <gwicke> we can just go ahead and implement parts of it though
23:06:39 <gwicke> AaronSchulz: yup
23:07:05 <TimStarling> I don't understand the bit about storage backends
23:07:58 <gwicke> the backends do whatever is needed to convert a path into an URI and actual request data (in case it is actually HTTP)
23:07:58 <robla> generally speaking, I would hope we either take off-the-shelf components, or play to win (i.e. create a library that many others would want to use)
23:08:17 <gwicke> so they can massage headers, handle auth, encode query strings etc
23:08:29 <parent5446> ^this. The reason I'm pushing Guzzle so much is because it would require little, if any, new code inside of MediaWiki itself.
23:08:31 <gwicke> do load balancing and fail-over
23:08:50 <gwicke> or use a non-HTTP transport
23:09:03 <gwicke> if that can be integrated into the curl event loop
23:09:28 <TimStarling> and how does this relate to the idea of a generic curl_multi client?
23:09:53 <TimStarling> would it a subclass or just be integrated?
23:09:55 <gwicke> can you be more specific?
23:10:10 <gwicke> they are backend handlers that just implement an interface
23:10:21 <gwicke> for auth they might want to cache some state
23:10:25 <robla> my fear whenever someone talks about doing a "lightweight" abstraction is that it gets heavyweight pretty quickly, and no effort is put into making it any sort of standard so bitrot sets in
23:10:27 <gwicke> so using an object seems to be reasonable
23:11:32 <TimStarling> would you have a class with an event loop, which holds a collection of backends to delegate certain logic to?
23:11:58 <gwicke> that would have a method to massage new requests and responses and possibly some to handle errors and maybe retries
23:12:40 <gwicke> the requests are all handled in the curl_multi event loop
23:12:51 <gwicke> trivially so if everything is actually handled by curl itself
23:13:19 <gwicke> by doing additional calls out of the curl_multi loop in case we also use other transports that have a similar 'do some work' interface
23:13:52 <gwicke> in the simple http case, all it does is complete the request with auth etc and pass it to curl
23:14:06 <gwicke> on the way back maybe do some error handling and retry
23:14:14 <gwicke> that's pretty much it
23:15:17 <TimStarling> ok, well if you could write that on the RFC, and ideally propose an interface, that would be helpful
23:15:26 <TimStarling> #endmeeting