Architecture meetings/RFC review 2013-12-18
Wednesday, December 18, 2013 at 10:00 PM UTC at #wikimedia-meetbot connect.
Requests for Comment to review
[edit]Propose your own RFCs:
- Requests for comment/Localisation format
- Requests for comment/PHP web service interface
- Requests for comment/Json Config pages in wiki
Summary and logs
[edit]Meeting summary
[edit]- wikimedia-meetbot Meeting
Meeting started by drdee at 22:03:29 UTC (full logs).
- https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-18 (bd808, 22:05:57)
- http://etherpad.wikimedia.org/p/RFC%20review (bd808, 22:06:37)
- Localisation format RFC (drdee, 22:09:13)
- https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format (bd808, 22:09:25)
- ACTION: RoanKattouw to remove groups (TimStarling, 22:24:58)
- ACTION: RoanKattouw to look at the number of stat() calls and consider optimisations (TimStarling, 22:34:34)
- PHP web service interface (TimStarling, 22:37:39)
- https://www.mediawiki.org/wiki/Requests_for_comment/PHP_web_service_interface (TimStarling, 22:37:46)
- https://www.mediawiki.org/wiki/Requests_for_comment/Services_and_narrow_interfaces (gwicke, 22:42:40)
- ACTION: AaronSchulz to propose an API (TimStarling, 22:51:13)
- ACTION: AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling (TimStarling, 22:59:12)
Meeting ended at 23:15:26 UTC (full logs).
Action items
[edit]- RoanKattouw to remove groups
- RoanKattouw to look at the number of stat() calls and consider optimisations
- AaronSchulz to propose an API
- AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling
Action items, by person
[edit]- AaronSchulz
- AaronSchulz to propose an API
- AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling
- RoanKattouw
- RoanKattouw to remove groups
- RoanKattouw to look at the number of stat() calls and consider optimisations
People present (lines said)
[edit]- RoanKattouw (83)
- gwicke (71)
- TimStarling (62)
- parent5446 (34)
- siebrand (20)
- ori-l (17)
- James_F (14)
- AaronSchulz (10)
- Nikerabbit (9)
- drdee (9)
- bd808 (8)
- MaxSem (6)
- robla (5)
- Nemo_bis (3)
- meetbot-wm (3)
Generated by MeetBot 0.1.4.
Full log
[edit]22:03:29 <drdee> #startmeeting
22:03:29 <meetbot-wm> Meeting started Wed Dec 18 22:03:29 2013 UTC. The chair is drdee. Information about MeetBot at https://bugzilla.wikimedia.org/46377.
22:03:29 <meetbot-wm> Useful Commands: #action #agreed #help #info #idea #link #topic.
22:04:50 <siebrand> It appears there is no architect present.
22:05:01 <drdee> yup, let's wait a couple of more minutes
22:05:06 <Nemo_bis> engineers are usually happy about that
22:05:57 <bd808> #link https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-18
22:06:10 <robla> I sms'ed TIm just now
22:06:28 <drdee> thanks robla
22:06:37 <bd808> #link http://etherpad.wikimedia.org/p/RFC%20review
22:07:36 <TimStarling> hi, sorry about that
22:07:49 <drdee> #chair TimStarling
22:07:49 <meetbot-wm> Current chairs: TimStarling drdee
22:07:55 <drdee> hi Tim
22:08:01 <drdee> are we good to go?
22:08:30 <TimStarling> yes
22:08:33 <drdee> first RFC Localisation format?
22:09:09 <TimStarling> ok
22:09:13 <drdee> #topic Localisation format RFC
22:09:23 <TimStarling> I did write a few comments about this one on the talk page
22:09:25 <bd808> #link https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format
22:09:37 <parent5446> OK, so how exactly are groups being done here. There was a brief mention of message prefixing in Discussion
22:09:54 <parent5446> If messages are separated into groups, how does core know which messages are where?
22:10:06 <siebrand> RoanKattouw: Can you comment?
22:10:18 <RoanKattouw> Yeah so the message groups are mostly for future application
22:10:32 <RoanKattouw> In the WIP implementation that I wrote, they are ignored
22:10:41 <siebrand> Groups are not relevant for the PHP implementation. All messages are still in the server side localisation cache.
22:10:55 <RoanKattouw> That is to say, you can specify multiple directories with JSON files in them, and you're required to name each of them as a group
22:11:04 <gwicke> James_F: we are still drafting those, so better to wait until Jan
22:11:08 <James_F> gwicke: OK.
22:11:09 <RoanKattouw> But the PHP message loader doesn't actually care that there are multiple directories or what their names are
22:11:16 <RoanKattouw> It just visits all of them and extracts all the messages
22:11:51 <RoanKattouw> In the future, I think that message grouping could be useful as a replacement for messages arrays in ResourceLoader definitions, or at least for us to identify which messages are needed in the frontend
22:12:43 <TimStarling> groups are less flexible than message lists
22:12:50 <RoanKattouw> That's true
22:12:58 <RoanKattouw> I'm not convinced that we'll use them for this purpose yet
22:13:03 <TimStarling> say, if someone needs one group plus one message from another group
22:13:16 <TimStarling> they might be inclined to grab both groups
22:13:23 <RoanKattouw> I think it came up while discussing a potential future RfC for changing how we do client-side localization (moving to jquery.i18n perhaps)
22:13:25 <RoanKattouw> Yeah, that's a valid concern
22:13:40 <RoanKattouw> I personally am fine with dropping the group names and just making it a flat array
22:13:47 <MaxSem> woudn't per-language i18n files make things slower on wikis without manual cache rebuild?
22:13:48 <RoanKattouw> That wouldn't even break the code I wrote
22:13:57 <RoanKattouw> MaxSem: Why would they?
22:14:10 <MaxSem> more stats?
22:14:21 <RoanKattouw> On the topic of group names for another second, does anyone object to dropping the group names?
22:14:31 <RoanKattouw> Asking for the opinions of the RfC co-authors in particular
22:14:34 <Nikerabbit> I thought one point of the message groups was to allow automatic prefixing... but was that already moved out of the RfC?
22:14:34 <James_F> RoanKattouw: We're we planning to use them for the follow-up RfCs?
22:14:37 <MaxSem> admittedly, I'm not very knowledgeable in LocalisationCache
22:14:41 <RoanKattouw> Because if no one is particularly attached to them, let's just kill them
22:14:46 <parent5446> I agree that the best idea for now would be to drop group names.
22:14:47 <James_F> RoanKattouw: The automatic prefixing in particular, but other things too.
22:14:47 <siebrand> I think the groups also remove some maintenance burden on the developers. I wouldn't drop it, as adding it back later may prove to be a lot of work.
22:14:48 <TimStarling> MaxSem: no, it's a good point
22:15:02 <RoanKattouw> James_F: Yeah but I'm not convinced they're particularly useful. Autoprefixing could be useful, though, yes
22:15:21 <James_F> Adding groups later could be a real pain, as siebrand says.
22:15:24 <RoanKattouw> siebrand makes a good point, it's easy to remove them later but hard to add them later
22:15:27 <parent5446> If you add groups now but don't do something like auto-prefixing, adding in auto-prefixing later will be a lot harder than adding in groups later.
22:15:51 <siebrand> What would auto prefixing accomplish?
22:16:00 <siebrand> And auto prefixing what exactly?
22:16:06 <Nikerabbit> of message keys
22:16:09 <Nikerabbit> not that much I think
22:16:14 <TimStarling> MaxSem: there won't be a huge number of extra stats, just the length of the fallback sequence
22:16:15 <RoanKattouw> Auto prefixing of message keys so you could share the same i18n file between different applications
22:16:22 <TimStarling> it'll be like loading core messages
22:16:26 <RoanKattouw> But I don't see much benefit in that
22:16:40 <RoanKattouw> (Let's hold the stat discussion for just one minute)
22:17:00 <RoanKattouw> You could just as well prefix the messages with the name of your application/extension
22:17:01 <James_F> RoanKattouw: It's so the author of an extension doesn't need to be prescient about what other extensions may be called, I thought.
22:17:11 <RoanKattouw> And that would presumably be unique enough in any context you integrated it into
22:17:15 <James_F> RoanKattouw: And the same messages be used in MW and non-MW context easily.
22:17:39 <parent5446> If you implement groups now and do *not* include a method of telling the software which messages are where, adding in a method to do that later will be very difficult.
22:17:50 <parent5446> Auto-prefixing is one of those methods.
22:18:30 <RoanKattouw> To be clear, auto-prefixing is not primarily intended as a way to tell the software which messages are where
22:18:32 <siebrand> I don't like auto prexing. namespaces or domains: possibly..
22:18:36 <RoanKattouw> Although it could be used that way to optimize loading
22:18:45 <parent5446> Then there is also the future possibility that, eventually, (even if out of scope for this current RFC), that the CDB cache will be split based on groups.
22:18:46 <RoanKattouw> Anyway
22:18:49 <siebrand> auto prefixing is a pain, because for example special page names, etc. do not participate in the prefixes.
22:18:53 <siebrand> So that'll be a pain to implement properly.
22:19:01 <RoanKattouw> It's clear that there is no consensus whatsoever for auto-prefixing
22:19:09 <siebrand> That'll significantly delay the implementation of the RfC as it is phrased now.
22:19:19 <RoanKattouw> We haven't discussed it properly and it's out of scope of this RfC
22:19:26 <siebrand> +1
22:19:33 <RoanKattouw> So let's keep things in scope
22:19:33 <James_F> So…
22:19:41 <RoanKattouw> Should we have group names or should we not have them?
22:20:06 <RoanKattouw> I argue that until we have a use for them, we should not have them now, and perhaps have them later. They will be optional for b/c, and using purely numeric group names will be forbidden
22:20:11 <parent5446> Like I said before, unless there is a comprehensive plan on exactly what to do with groups other than just as a means of organizing messages, they should not be added.
22:20:13 <RoanKattouw> That way we can distinguish between flat arrays and named groups
22:20:43 <RoanKattouw> If any group-related name-mangling is to happen, that needs to be introduced at the same time as the grouping system, otherwise b/c will be a massive pain
22:20:53 <siebrand> RoanKattouw: Can we have both with little effort?
22:21:01 <RoanKattouw> Both of what?
22:21:29 <siebrand> RoanKattouw: support for group names, or a flat array.
22:21:45 <RoanKattouw> Yes, that's what I'm saying
22:21:55 <RoanKattouw> We should support both in both directions
22:22:02 <siebrand> RoanKattouw: Okay, I must have missed some text.
22:22:12 <RoanKattouw> My current WIP implementation supports both groups and flat arrays because it completely ignores the array keys
22:22:32 <RoanKattouw> Any future implementation of groups should be tolerant of flat arrays without group names (hence the ban on numbers as group names)
22:23:20 <siebrand> is may result in reduced capabilities to not have group names, but that would be future functionality that will not harm existing code.
22:23:31 <siebrand> s/is may/It may/
22:23:48 <Nikerabbit> I think a concerete example here would make this clearer
22:24:09 <Nemo_bis> like the one bd808 added?
22:24:23 <siebrand> ULS currently implements it's own API class to serve a JSON file through RL.
22:24:24 <RoanKattouw> Does anyone object to dropping groups (both the subject in this meeting, and from the RfC) at this point?
22:24:28 <TimStarling> ok, can the implementation omit group names for now, since it's obvious that that's the only solution parent5446 wants, and everyone else seems to be content with it?
22:24:41 <TimStarling> then we can move on to the next issue
22:24:44 <James_F> Sure.
22:24:45 <siebrand> In the future, we see this file being served by making a generic request.
22:24:47 <RoanKattouw> Yeah let's discuss it later
22:24:58 <TimStarling> #action RoanKattouw to remove groups
22:25:05 <RoanKattouw> siebrand: I think that should work completely different anyway. But that's a different discussion for a different day and a different RfC
22:25:09 <RoanKattouw> *differently
22:25:16 <RoanKattouw> OK, so MaxSem said something about stats
22:25:36 <RoanKattouw> There was a concern that splitting languages into separate files would harm performance for wikis without pre-built caches
22:25:39 <TimStarling> yeah, for non-english page views, you would expect this feature to roughly double the number of stats
22:26:01 <TimStarling> since say fr.json and en.json will both have to be checked for freshness
22:26:03 <RoanKattouw> I haven't tested fallbacks with my code yet
22:26:15 <RoanKattouw> But my code doesn't exhibit this behavior
22:26:18 <MaxSem> also, cache rebuilds would be slower but that's not critical
22:26:25 <RoanKattouw> I'm also not clear on how MessageCache handles fallbacks
22:26:35 <Nikerabbit> Why would they be slower?
22:26:45 <RoanKattouw> They wouldn't necessarily be slower overall
22:26:48 <parent5446> If this would turn out to be an issue (not sure if it is), we could always do what jQuery i18n allows: having all languages in one file as a fallback.
22:26:57 <RoanKattouw> What would slow them down is the need to open more files (both fr.json and en.json for French)
22:27:10 <RoanKattouw> However, the amount of data it has to read in is still 100x less for an extension
22:27:30 <RoanKattouw> Because ExtensionName.i18n.php contains the messages for all 200+ languages and you can't selectively read from it
22:27:48 <parent5446> RoanKattouw: MessageCache doesn't really handle fallbacks at the moment. This only really affects LocalisationCache.
22:27:57 <MaxSem> this is access speed vs. latency. I'm all for making SSDs a requirement for MW:)
22:27:59 <RoanKattouw> Right, sorry I meant to say LocalisationCache
22:28:07 <parent5446> Ah sorry.
22:28:12 <RoanKattouw> My bad
22:28:24 <RoanKattouw> I'm not entirely unconfused as to how the i18n system in core works :)
22:28:28 <TimStarling> well, the case you have to think about is NFS
22:28:47 <TimStarling> since a lot of shared web hosting is apparently done over NFS or some equivalent slow network storage
22:29:07 <RoanKattouw> I see now that I have made a mistake in my implementation and that fallbacks will most likely be broken
22:29:19 <Nikerabbit> In that case one would hope they do manual localisation cache rebuilds
22:29:42 <James_F> RoanKattouw: That's why it's WIP. :-)
22:29:44 * RoanKattouw -1s his own code
22:29:53 <TimStarling> Nikerabbit: you mean someone technically competent who also uses shared hosting instead of a VPS?
22:29:53 <MaxSem> Nikerabbit, if they only had shell access...;)
22:30:04 <TimStarling> I'm not sure such people exist...
22:30:11 <RoanKattouw> Right, so we'd roughly double stat()s for them
22:30:38 <RoanKattouw> Which hurts the freshness checks
22:30:52 <Nikerabbit> TimStarling: I'm confident some of them would able to read and follow a documentation that states it can make MediaWiki faster
22:31:04 <RoanKattouw> I wonder if we can get the mtime of the directory instead of the individual files? I'm not quite sure what the semantics of that are
22:31:39 <TimStarling> what if we batch the checks, by storing a timestamp and only checking once every, say, 1 minute?
22:31:39 <bd808> `$stat = stat('\path\to\directory');`
22:31:49 <TimStarling> RoanKattouw: no, you can't
22:32:02 <TimStarling> the mtime of the directory is only updated when a file is created or removed
22:32:07 <RoanKattouw> Blegh
22:32:09 <James_F> Helpful.
22:32:09 <RoanKattouw> Thanks UNIX
22:32:28 <Nikerabbit> do we have any idea how big issue the stat calls can be?
22:32:36 <RoanKattouw> TimStarling: That sounds like a reasonable idea. We can probably work some magic in a custom CacheDependency subclass
22:32:50 <RoanKattouw> Nikerabbit: Not until we try it on a slow NFS setup? :)
22:33:01 <Nemo_bis> Uh! At last a use for gluster
22:33:02 <RoanKattouw> More seriously, we should compare stat() calls before and after
22:33:05 <parent5446> We could implement a manual stat(), i.e., have a file in the directory called mtime.txt or something. Every time the cache file is changed, update that file. Then it will act as a pseudo-mtime for all files in the directory.
22:33:05 <RoanKattouw> hahahaha
22:33:12 <parent5446> Not the cleanest solution but a possibility.
22:33:30 <RoanKattouw> parent5446: There's no need for that, CacheDependency will let us do nicer things
22:33:48 <parent5446> Ah OK. Didn't think about CacheDependency.
22:33:50 <RoanKattouw> Anyway
22:34:04 <Nikerabbit> I'm worried that we are spending a lot of effort on fine-tuning stat calls why other parts of the code have bigger effect...
22:34:19 <RoanKattouw> Are we agreed that I'll look at the number of stat() calls and maybe write a CacheDependency subclass if we need it?
22:34:34 <TimStarling> #action RoanKattouw to look at the number of stat() calls and consider optimisations
22:34:37 <parent5446> Yep
22:34:38 <RoanKattouw> Niklas is right, we don't even know if this is an issue or if it'll be eclipsed by something else
22:35:06 <RoanKattouw> Although this is the one thing that happens on every request (freshness check) so that's not a great thing to slow down
22:35:10 <RoanKattouw> Alright
22:35:14 <RoanKattouw> What else
22:35:20 <parent5446> "While handling JSON may be slower than using PHP (we have no benchmarks on this)"
22:35:30 <parent5446> Can we get benchmarks on this?
22:35:41 <siebrand> There's a stat on every request for every message group?
22:35:42 <ori-l> json_encode is often faster than serialize, I've found
22:35:43 <gwicke> RoanKattouw: you can always concatenate those json files, store an offset index and check for updates every <n> accesses so that those stats are amortized
22:36:03 <gwicke> not much fun, but doable..
22:36:04 <James_F> We now have VE messages in parallel in i18n.php and *.json, so theoretically.
22:36:07 <TimStarling> should we go to the next RFC?
22:36:21 <TimStarling> we've just about got enough time to fit another one in
22:36:22 <James_F> Is getting benchmarks needed?
22:36:34 <ori-l> no, IMO.
22:36:39 <RoanKattouw> Not really IMO
22:36:43 <James_F> OK, saves another action item.
22:36:45 <RoanKattouw> It may make sense to measure the entire process
22:36:48 <parent5446> The one thing I'm concerned is that using json_encode will cause much more memory usage.
22:36:57 <parent5446> Since the entire file is loaded into memory before being parsed.
22:36:58 <RoanKattouw> But we don't care terribly about the recache operation, since it writes to a cache
22:37:06 <RoanKattouw> And so it's done infrequently
22:37:08 <James_F> In that case, move to the next RfC.
22:37:12 <RoanKattouw> IMO the freshness check is more important
22:37:13 <ori-l> yeah, let's do another one
22:37:16 <bd808> parent5446: Is that not the case with a php file?
22:37:19 <drdee> next RFC PHP web service interface ?
22:37:22 * ori-l was late to the party and wants in on some RFC action
22:37:27 <RoanKattouw> Yeah if no one else has questions about this one, let's move on
22:37:32 <parent5446> bd808: No, it's read and parsed incrementally.
22:37:36 <parent5446> And yeah let's just move on.
22:37:39 <TimStarling> #topic PHP web service interface
22:37:45 <RoanKattouw> Link to RFC?
22:37:46 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/PHP_web_service_interface
22:37:47 <siebrand> Thanks for the comments and discussion, everyone.
22:37:48 <RoanKattouw> I haven't seen this one
22:38:01 <gwicke> that's still at an early draft stage
22:38:04 <parent5446> So I mentioned this on the discussion page, but this is literally Guzzle.
22:38:21 <parent5446> Oh if it's too early to discuss we can leave it for later.
22:38:31 <gwicke> Aaron and me have been discussing implementation options
22:38:39 <TimStarling> so this is coming out of the cloudfiles work?
22:39:00 <gwicke> partly, that's what Aaron is working on
22:39:17 <gwicke> my motivation is making it easy to work with web services from PHP
22:39:18 <ori-l> I think the name is misleading. You're proposing a generic, library-like load-balancing function that takes a collection of URLs as input, right?
22:39:44 <RoanKattouw> I think there is a lot of context missing from this RfC
22:39:58 <RoanKattouw> Does this replace the routing engine in MW?
22:40:02 <gwicke> ori-l, it works on paths; the storage backends map those to URIs
22:40:10 <parent5446> For the record, Guzzle has a really nice tool where it keeps an array of various web services, all with different configurations.
22:40:13 <RoanKattouw> Would, say, /wiki be one of the paths that would be matched?
22:40:15 <gwicke> it can also support full URIs, but that is not the main motivation
22:40:19 <TimStarling> is AaronSchulz actually online?
22:40:51 <ori-l> gwicke: so an expanded ArrayUtils::consistentHashSort, right?
22:41:01 <gwicke> parent5446: there are nice features in guzzle, it just does not seem to be certain that the implementations would work for us
22:41:06 * AaronSchulz is around, yes
22:41:19 <gwicke> that is an implementation question though
22:41:27 <gwicke> the RFC is more about the API than the implementation
22:41:54 <ori-l> what existing services would we port to use the API?
22:42:00 <TimStarling> do we have an immediate second application, or would it just be an abstraction of cloudfiles stuff?
22:42:01 <gwicke> ori-l: how load balancing is implemented depends on the backend handler
22:42:05 <ori-l> it'd be helpful to have a list; that way we can identify commonalities
22:42:29 <gwicke> TimStarling: my motivation is the storage service and related service apis
22:42:40 <gwicke> https://www.mediawiki.org/wiki/Requests_for_comment/Services_and_narrow_interfaces
22:42:47 <gwicke> to be discussed later
22:43:43 <ori-l> there is functionality in the objectcache classes for doing this that I have wanted to use in the past (I can't remember what for, frustratingly) and that I found to be too tightly coupled to objectcache specifically
22:43:44 <gwicke> there are already a bunch of web services that we are using including swift, parsoid, the math service and the pdf renderer
22:43:59 <gwicke> there will be more, so making is easy to work with them might be a good idea
22:44:06 <TimStarling> ok, well that is what I want to see on the RFC, I think
22:44:09 <TimStarling> a list of subclasses
22:44:32 <gwicke> you mean a list of services to abstract over?
22:44:53 <gwicke> the API is intended to be open-ended regarding handlers
22:45:01 <TimStarling> well, the RFC has
22:45:04 <ori-l> existing classes that would be ported to use this API and projected classes that would use it
22:45:05 <TimStarling> / General Rashomon storage service for all remaining buckets
22:45:06 <TimStarling> $wgStoreBackends['/'] = new RashomonBackend ( array (
22:45:28 <TimStarling> are you saying RashomonBackend is not a subclass of something in this RFC? it is its own thing?
22:45:45 <gwicke> that is an implementation detail to be figured out
22:45:55 <gwicke> IMO we won't need subclassing there
22:46:01 <gwicke> implementing an interface would be enough
22:46:02 <parent5446> Sorry, but I have no idea what Rashomon is.
22:46:17 <TimStarling> presumably a storage service
22:46:31 <gwicke> parent5446: https://www.mediawiki.org/wiki/Requests_for_comment/Storage_service
22:46:49 <gwicke> it is the revision storage service we wrote for HTML storage
22:46:54 <TimStarling> if you need more applications, maybe you could include EhcacheBagOStuff?
22:47:22 <gwicke> anything that speaks HTTP basically, and is worth making more convenient to work with
22:47:56 * gwicke looks up EhcacheBagOStuff
22:47:58 <TimStarling> I'm just worried that if the only immediate application is swift, it will end up looking swift-like, and everything that uses it in the future will have to fit into a swift-like API
22:48:22 <TimStarling> unless the API is planned to be solely following HTTP?
22:49:03 <gwicke> it is supposed to be a very convenient and parallel way to do HTTP
22:49:30 <TimStarling> ok, so who is going to write the API, because that is going to be an action item
22:49:41 <ori-l> tying it to HTTP seems a bit odd
22:49:45 <ori-l> what do you gain by that?
22:49:51 <gwicke> the idea is to put effort into the design of the HTTP APIs so that they don't need much extra wrapping apart from some convenience like auth, Content-MD5 etc
22:49:52 <TimStarling> ori-l: that's just what it is
22:49:54 <ori-l> rather, what would something more generic not be able to provide?
22:49:56 <TimStarling> an HTTP client
22:50:10 <TimStarling> more generic things can be built on top of it
22:50:20 * AaronSchulz doesn't want things to be too generic
22:50:28 <ori-l> hmmm
22:50:29 <gwicke> the paths can map to non-http stuff too
22:50:32 <TimStarling> like MaxSem's key/value store
22:50:32 <AaronSchulz> you end up not able to assume anything, or something very complex
22:50:35 <gwicke> but the abstraction is still HTTP-like
22:50:46 <gwicke> paths, headers and HTTP verbs
22:50:48 <ori-l> so the thought is that this would steer people toward designing restful services with nice APIs?
22:51:09 <ori-l> i like that, but it makes it especially important that the API be intuitive, easy, and well-documented
22:51:13 <TimStarling> #action AaronSchulz to propose an API
22:51:17 <gwicke> that too
22:51:53 <gwicke> ori-l: see the problem statement
22:52:12 <gwicke> so any comments on the API that is proposed in the RFC?
22:52:47 <TimStarling> that's half an API
22:53:05 <TimStarling> not even half
22:53:24 <bd808> I don't see an API so much as a list of lists
22:54:01 <gwicke> you might be able to extrapolate past the things that are not spelled out explicitly
22:54:14 <TimStarling> it's a bit tricky since I'm only reading this for the first time in this meeting
22:54:35 <gwicke> *nod*
22:54:45 <TimStarling> but I don't get what $wgStoreBackends is and what a generic store is for
22:54:48 <parent5446> It's definitely going to be a bit restricting having requests represented as arrays rather than proper objects with properties.
22:55:02 <TimStarling> if it's an HTTP client, it shouldn't need configuration, it should be configured by its constructor
22:55:31 <gwicke> TimStarling, it is a service client that is close to HTTP
22:55:33 <ori-l> I think the idea is that $wgStoreBackends is Rashomon-specific, but the class (not included in the RFC) that transforms the array of URLs into an API is what is being proposed
22:55:55 <bd808> The internal implementation could use lists but it should have a builder interface of some sort I would think.
22:56:05 <gwicke> but the configured services are additionally load-balanced and can have some more service-specific behavior
22:56:10 <bd808> s/lists/arrays/
22:56:14 <gwicke> they might not actually speak http to the backend for example
22:56:22 <gwicke> dispatching is based on paths
22:56:31 <parent5446> So it's HTTP-oriented but not necessarily HTTP-backed.
22:56:48 <TimStarling> ok, well I'm sure an API with class names and methods in classes will make this clearer
22:56:49 <gwicke> the default storage service is closely related to the bucket idea in the storage service RFC
22:57:25 <gwicke> TimStarling: are you mainly interested in the API backends should implement?
22:57:30 <TimStarling> AaronSchulz: have you surveyed existing HTTP client libraries in PHP?
22:57:44 <parent5446> AKA, Guzzle
22:57:51 <parent5446> ;)
22:58:20 <gwicke> the store API is fairly thin so far
22:58:33 <gwicke> a run method and maybe some convenience wrappers for get/post etc
22:58:39 <gwicke> and arrays as input
22:58:56 <AaronSchulz> I don't think we anything too complex so far
22:59:12 <TimStarling> #action AaronSchulz to survey existing HTTP client libraries for ideas and potential bundling
22:59:15 * AaronSchulz remembers encountering guzzle in some AWS code...it's bit complex
22:59:18 <gwicke> the interesting stuff would happen in the handlers selected by path prefix
22:59:44 <AaronSchulz> unless a good portion of it's features were useful and non-trivial I wouldn't bother
22:59:54 <parent5446> Guzzle really isn't complex at all. In fact, if you're working with an actual REST API, most of the configuration can be done in JSON.
22:59:57 * AaronSchulz doubts that the auth stuff would be adaquate
23:00:10 <parent5446> It also has OAuth support and other auth.
23:00:13 * AaronSchulz is talking about source code
23:01:09 <robla> one alternative: https://github.com/kriswallsmith/Buzz
23:01:28 <gwicke> the basic functionality is already pretty much there in the curl_multi clients we have been using
23:01:37 <robla> (I know nothing about it other than quick search for Guzzle alternatives)
23:01:41 <gwicke> afaik much of the missing stuff is auth
23:02:02 * AaronSchulz snickers at https://github.com/kriswallsmith/Buzz/blob/master/lib/Buzz/Client/MultiCurl.php
23:02:28 <parent5446> I don't know about Guzzle's source code, but as a library it is definitely feature-ful and easy to use. There are very few things it does not support that we would need (except for, of course, non-HTTP backends).
23:02:57 <gwicke> parent5446, it might or might not be useful as a backend- we'll see
23:03:31 <gwicke> IMO that should not matter at this point
23:03:42 <parent5446> But it's not a backend. It has an API to use. If you're thinking of covering up Guzzle with *another* API then it's pointless.
23:04:26 <gwicke> if you don't see any value in the abstractions mentioned in the RFC, then yes
23:04:52 <TimStarling> alright, any other action items for this RFC?
23:05:12 <TimStarling> we're out of time now
23:05:54 <gwicke> the hope was to get some feedback on the backend abstractions in the RFC
23:06:11 <AaronSchulz> gwicke: well you could build it over guzzle, though that would be pretty heavy
23:06:15 <gwicke> we can just go ahead and implement parts of it though
23:06:39 <gwicke> AaronSchulz: yup
23:07:05 <TimStarling> I don't understand the bit about storage backends
23:07:58 <gwicke> the backends do whatever is needed to convert a path into an URI and actual request data (in case it is actually HTTP)
23:07:58 <robla> generally speaking, I would hope we either take off-the-shelf components, or play to win (i.e. create a library that many others would want to use)
23:08:17 <gwicke> so they can massage headers, handle auth, encode query strings etc
23:08:29 <parent5446> ^this. The reason I'm pushing Guzzle so much is because it would require little, if any, new code inside of MediaWiki itself.
23:08:31 <gwicke> do load balancing and fail-over
23:08:50 <gwicke> or use a non-HTTP transport
23:09:03 <gwicke> if that can be integrated into the curl event loop
23:09:28 <TimStarling> and how does this relate to the idea of a generic curl_multi client?
23:09:53 <TimStarling> would it a subclass or just be integrated?
23:09:55 <gwicke> can you be more specific?
23:10:10 <gwicke> they are backend handlers that just implement an interface
23:10:21 <gwicke> for auth they might want to cache some state
23:10:25 <robla> my fear whenever someone talks about doing a "lightweight" abstraction is that it gets heavyweight pretty quickly, and no effort is put into making it any sort of standard so bitrot sets in
23:10:27 <gwicke> so using an object seems to be reasonable
23:11:32 <TimStarling> would you have a class with an event loop, which holds a collection of backends to delegate certain logic to?
23:11:58 <gwicke> that would have a method to massage new requests and responses and possibly some to handle errors and maybe retries
23:12:40 <gwicke> the requests are all handled in the curl_multi event loop
23:12:51 <gwicke> trivially so if everything is actually handled by curl itself
23:13:19 <gwicke> by doing additional calls out of the curl_multi loop in case we also use other transports that have a similar 'do some work' interface
23:13:52 <gwicke> in the simple http case, all it does is complete the request with auth etc and pass it to curl
23:14:06 <gwicke> on the way back maybe do some error handling and retry
23:14:14 <gwicke> that's pretty much it
23:15:17 <TimStarling> ok, well if you could write that on the RFC, and ideally propose an interface, that would be helpful
23:15:26 <TimStarling> #endmeeting