Jump to content

User:Greg (WMF)/Sandbox

From mediawiki.org

This is an updated version of a previous code-review tool analysis done in 2012 (lead by Brion Vibber, Chad Horohoe, and David Schoonover).

META TODO:

[edit]
  1. Address FIXMEs ("QUESTION:"s and "???"s)
  2. Summarize previous commentary
  3. Get new owners for options
  4. Rewrite/update the cases for and cases against sections

Criteria by which to judge a code review tool

[edit]

The following criteria are what we will judge the system by. Note, it is almost certainly the case that no system currently fully meets these criteria, so every proposal should include a credible plan to meet the missing criteria.

MUSTs

[edit]

The following are required features of any alternative plan:

  • It must not be ugly
  • It must be reasonably intuitive and user-friendly
  • It must support pre-commit review well
  • It must support post-commit review reasonably
  • It must have easy and clear diffs between commits
  • it must support git-only interaction
  • It must make source code easily discoverable
  • It must be fast and responsive
  • It must be extensible
  • It must support automated actions from Wikimedia-approved bots
  • It must have robust ACLs (grant/deny permissions on repository level and branch level)
  • It must support private repositories (ops private)
  • It must support automatic merge on approve
  • It must have command line access
  • It must have a license the Wikimedia Foundation can agree to (or we must have a plan to negotiate it)
    • FIXME QUESTION: "License" as in TOS or Cost or both?
  • it must allow developers to contribute under pseudonyms
  • We must have a plan for on-site-only, private repositories in some cases
  • We must have a plan for automatic repository mirroring

SHOULDs

[edit]

The following are strongly suggested features:

  • It should be beautiful
  • It should be completely intuitive and user-friendly (user interface should be translatable!)
  • It should support pre-commit and post-commit review workflows equally (un-opinionated / does not impose a particular workflow)
  • It should support pull requests
    • FIXME QUESTION: Is this about "allows users to have personal copies of repos to work in" or "can review branches (sets of commits) as one review"?
  • It should enable deep integration with Phabricator (NB: the previous evaluation had "deep integration with bugzilla)
  • It should have LDAP integration
  • It should be open source
  • it should allow self-hosting options (to allow Wikimedia's TOS/Privacy Policy to be enforced)
  • It should be under active development, and not rely too heavily on Wikimedia involvement
  • It should have built-in SSH support
  • It should offer self-serve account creation

Table of requirements

[edit]
MUSTs Gerrit GitHub Phabricator GitLab
It must not be ugly Yes Yes Yes
It must be reasonably intuitive and user-friendly No Yes Yes
It must support pre-commit review well Yes Yes Yes [1]
It must support post-commit review reasonably No No Yes [2]
It must have easy and clear diffs between commits No Yes Yes
It must make source code easily discoverable No Yes Yes
it must support git-only interaction Yes Yes No

(upstream task)

it must be fast and responsive Yes Yes Yes
It must be extensible Yes Yes Yes
it must support automated actions from Wikimedia-approved bots Yes Yes Yes
It must have robust ACLs (grant/deny permissions on repository level and branch level) Yes Yes FIXME
It must support private repositories (ops private) Yes No FIXME
It must support automatic merge on approve Yes Yes No

(upstream task)

It must have command line access Yes Yes Yes
It must have a license the Wikimedia Foundation can agree to (or we must have a plan to negotiate it) Yes FIXME Yes
it must allow developers to contribute under pseudonyms Yes No Yes
We must have a plan for on-site-only, private repositories in some cases Yes No Yes
We must have a plan for automatic repository mirroring Yes Yes Yes
SHOULDs
It should be beautiful No Yes Yes
It should be completely intuitive and user-friendly (user interface should be translatable!) No Yes Yes
It should support pre-commit and post-commit review workflows equally (un-opinionated / does not impose a particular workflow) No No Yes
It should support pull requests No Yes FIXME
It should enable deep integration with Phabricator (NB: the previous evaluation had "deep integration with bugzilla") No

(only bots)

No

(only bots)

Yes
It should have LDAP integration Yes Yes Yes
It should be open source Yes No Yes
it should allow self-hosting options (to allow Wikimedia's TOS/Privacy Policy to be enforced) Yes No Yes
It should be under active development, and not rely too heavily on Wikimedia involvement Yes Yes Yes
It should have built-in SSH support Yes Yes Yes
It should offer self-serve account creation Yes

(2-step process involving wikitech)

Yes Yes

Gerrit

[edit]

The case for Gerrit

[edit]
Primary editor: Chad Horohoe. Others should feel free to contribute.
  • LDAP integration. LDAP is a requirement for tie-ins with Labs and other developer/operations projects. Gerrit supports this out of the box with minimal configuration.
  • Robust ACL. Gerrit has the ability to grant/deny permissions on the repository level and even down to the branch level. This is important because different teams have different workflows for their branches.
    • Additionally, Labs requires a private repository for puppet.
  • Automatic merge on approve, unlike Phabricator that only works with patches (which then must be applied to the repository).
  • Open source, under active development. Using free and open source software is very important to our community, and Gerrit is licensed under the Apache License. Additionally, the Gerrit project has a very active development community that is highly responsive to feedback, questions, and contributions. Google develops Gerrit to support the Android Open Source Project (AOSP), but there are many other contributors, including Qt and OpenStack.
  • Quick release cycle. 2.2.x, 2.3.x and 2.4.x have all been released in the last year (with 2.3 and 2.4 both in the last several months). The branching of 2.5 is currently being discussed.
    • I pushed the Gerrit repository to GitHub as a way of getting some stats about commit activity. The data does indeed show that Gerrit is under active development, with the number of commits per month increasing slightly but steadily over the course of the past year. Ori.livneh (talk) 23:00, 13 July 2012 (UTC)
  • Handles 'pre-commit' style review. One of the major motivations for switching from Subversion was for the ability to review changes before they landed in trunk/master. This has stabilized our codebase and enabled the rolling release cycle to WMF wikis (currently every 2 weeks, working towards every week).
  • Command line access. It's possible to interface with Gerrit from the command line, which makes scripting Gerrit actions relatively easy, and provides a less hideous alternative to the Gerrit UI.
  • SSH handled by gerrit From a security perspective, it's nice that Gerrit doesn't use the system SSH, which means users don't have shell access to the gerrit host, and we don't need hacks like sillyshell.
    You could simply use a second ssh instance for that. Platonides (talk) 20:44, 13 July 2012 (UTC)
    • A second ssh daemon would still do system authentication, and would require something like sillyshell. It's nice to be able to separate system authentication from daemon authentication. It means we don't need to have a production system that allows authentication against Labs credentials. --Ryan lane (talk) 23:34, 29 July 2012 (UTC)
  • Abandoning Gerrit would mean changing a system users just got used to a little bit, and introduce another one that will have its own issues, and scare away beginners who are overwhelmed already by the number of tools and UIs they are supposed to use for labs. Mutante (talk) 09:15, 12 July 2012 (UTC)
    Part of the migration pains were due to the move from svn to git, which is more complex, so it's not an entirely fair comparison. Besides, those who are already used to Gerrit will have little trouble understanding a more intuitive system, and the future developers will have an easier learning curve to climb. --Waldir (talk) 17:01, 20 July 2012 (UTC)
  • Git, Gerrit and Bugzilla are all supported by eclipse.
    • This hints that some of the criticism stated is not be 100% relevant.
    • Every time there is a major change in the critical toolsets (bugzilla/git/gerrit/jenkins/labs) this is a major setback in time -- I need to keep abreast of changes in the Java ecosystem (maven/lucene/solr/haddoop/open relavance/carrot/uima/apertium/gate/etc...)
  • Gerrit has a large community behind it, including some organizations we work fairly closely with.

The case against Gerrit

[edit]
Primary editor: David Schoonover. Others should feel free to contribute.
Gerrit flotsam
Gerrit flotsam -- which we will be able to delete in v2.5
  • Gerrit has one key architectural assumption: It uses git commits and git repository to store proposed changes. While this is a relatively simple and nice idea, it has a far reaching consequences. Everything that is proposed (at various stages of readyness) is already committed to the repository. Some other code review approaches (like Linus' email workflow or Mercurial's patch queue) treat changes for review as freely floating patches, not commits (in git this means snapshots of directory trees), thus focusing on differences. Therefore it should be argued that Gerrit does not support pre-commit review, as suggested above; it's post-commit and before eventual merge to the deployment branch. One unit of review, a changeset is just one commit (patchsets are just different incarnations, or revisions, of a single commit). This assumption forces a one commit at a time workflow on developers and forces the use of git commit --amend as the only way to update patches. As a consequence, it is impractical to work on local branches with multiple commits to be merged - this is one of the reasons to use git at all (convenience single commit branches created by git review do not count). Even if multiple dependent commits are pushed to the repository, a need to update one of them breaks a whole chain of dependencies between them, since the original commit is left in the void and the subsequent changes depend on that change in git terms. Commands like git cherry-pick (without -n), git merge (other than --ff-only) cannot be used by unpriviledged developers; this is a consequence of the workflow imposed by the single-commit-based submission. Therefore integration with GitHub (as wished by some) will be almost impossible to implement - at least until Gerrit breaks away with the one change - one commit paradigm.
  • Gerrit is ugly, intimidating, and inhumane. More than anything else, we need people to actually use the system we choose: success is measured in activity, in contributions. If the tool is always in the way, or hard to figure out, or unpleasant to interact with, people will give up rather than use it. Gerrit is all of those things (with a healthy dollop of "doesn't actually work" thrown in at irregular-but-frequent intervals). I would never choose to subject my coworkers to such a tool, to say nothing for people freely volunteering their time and energy. Usability should be the first and most important criterion above all others, and for that, Gerrit has committed a cardinal sin. Dsc (talk)
    • Though this is a very impassioned critique, it lacks any actionable issues. Can you please expand this with exactly how the usability sucks? If we don't know your issues with usability, we may end up with the exact same issues you hate in a new system.--Ryan lane (talk) 18:58, 25 July 2012 (UTC)
  • Gerrit accounts must be requested. This slows down all contributions, and is particularly egregious in that it slows down onboarding of new WMF staff. Weeks have been lost waiting for all necessary rights to be given to new staff.
    • Is this an inherent issue or is that something we can reasonably change? What limitations exactly prevent us from allowing open registration? --brion (talk) 17:47, 11 July 2012 (UTC)
    • Alternately, we should be able to create a guest account with password guest for anonymous comments
  • Gerrit does not have an official API.[3] Existing tools work by reverse-engineering Gerrit's internal APIs[4], which are largely undocumented.
    • Gerrit certainly has the SSH based API. Also, 2.6 has introduced an official REST API.
  • It is not possible to alter the sort order in which changes are listed.[5] Changes are listed by last update date, so the changes that have been neglected the longest are also the hardest to see. A feature request for parametrizing the sort order was accepted in 2010, but no progress has been made as of July 2012.[6].
    • I think this is a rather serious issue. Not being able to sort columns in Gerrit coupled with the fact that recently added/commented on patchsets jump to the top of the list makes it easy for older/unreviewed commits to languish in the backlog. I've made it practice when doing code review to start from the bottom of the stack to help combat this, but it's a total PITA and it is not likely common practice. Arthur Richards (talk) 22:01, 13 July 2012 (UTC)
  • Gerrit's developers want to move away from using a relational database as a data store. As a result, they are reluctant to invest time in improving Gerrit's query interface.[7] Such a change would entail a costly migration for us.
    • They want to move away from a relational database to lucene. Searching should improve greatly with lucene.--Ryan lane (talk) 18:38, 26 July 2012 (UTC)
  • Gerrit's codebase is written in Java and Prolog(!)[8] and uses Google Web Toolkit. Few (if any) engineers at the foundation are proficient with this stack. Large swaths of code in the Gerrit codebase are undocumented, making it hard to understand or modify.
    • I dont see this as a problem necessarily unless we expect to modify the codebase. As for Prolog and Java, I have coded in both languages (it has been a while using Prolog, but it is fairly easy to pick up if ever this became necessary). Ssastry (talk) 15:58, 11 July 2012 (UTC)
      • I disagree, its a very very minor point to be sure, but things are better when most people can read/edit the code, since its easier to fix issues when they arise, and customize to our needs. (I also know prolog, and like that language, but it is an obscure language). Bawolff (talk) 19:48, 11 July 2012 (UTC)
  • The ability to add a repository requires administrator privileges on Gerrit. However, administrator privileges also grant unlimited access to gsql, a raw SQL interface to the Gerrit database.[9] As a result, administrator rights cannot be extended beyond a narrow group of people (currently five).[10] This creates a frustrating bottleneck for developers, who must compete for the attention of the overburdened administrator group for routine Git requests.
    • Creating a gerrit project does not require admin access, it requires Project Creation access. Which was granted to a larger group of users. [11] Additionally, there is currently discussions going on to implement a "request queue" type of system so users can say "I want a repo called foo" and the approver can just come along and press a button.
  • Gerrit is slow.[12][13][14] The "Groups" listing takes almost a minute to render.
    • "Gerrit is slow" is partially due to some mistakes on our part (putting the database and app server in different datacenters). This is being worked on and will be resolved soon. The groups listing page is annoyingly slow, but I'm working on profiling the problems here and submitting a fix upstream for it.
  • There is no way to delete projects. A request for this feature was opened in 2009 and has been starred by over a hundred people.[15]
    • In Gerrit 2.5, there will be a new plugin that makes this possible. [16]
  • As of 19:59, 10 July 2012 (UTC) There are 164 issues on the Gerrit bug tracker that are status:accepted and have no owner.[17] Of those, two are set as "critical" and twelve are set as "major".[18]
    • I don't neccesarily see that as a bad thing. Have you seen our bug database? Better they track issues that they don't have resources to or otherwise aren't working on now, then to just simply sweep them under the rug. Bawolff (talk) 19:48, 11 July 2012 (UTC)
  • Post-merge review is impossible. Comments just get lost and bugs are not effective.
    • Lost how? Do they not save, or are there error messages? What do you mean "bugs are not effective"? --brion (talk) 17:49, 11 July 2012 (UTC)
      • I think that may be the same thing I was going to post. Once a change is merged, it's sacred. You can't mark it -2 even if it's a php error. You can provide a comment, but nothing marks it as requiring attention (how many mails reminding fixmes were sent by hexmode?).
      With a "magic MW VCS" it would be "unmerged", and leave pending for a new merge after it gets fixed. As git doesn't support it (without producing merge conflicts to git users), the most sensible approach would probably be to "freeze" master until a patchset which solves it gets merged. Or even without such strict ruling, if it keeped a track of broken ranges that would be useful (obviously show a big warning when master or a branch is inside such range...).
      Platonides (talk) 20:36, 13 July 2012 (UTC)
    • post-merge workflows are almost impossible; self-review is a hacky and tedious workaround, see the recent discussion about the operations/puppet repository and the percentage of self-reviews.
  • It is difficult to find or track projects unless someone walks you through it. For example, there is only one very small, hidden link to view the list of projects. This makes the state of any particular branch pretty opaque and potentially creates an issue for managers or others who want to have a window into our code.
  • There is little to no ability for non-technical staff or volunteers to meaningfully contribute (e.g. to processes like code review or simply commenting). Compare to GitHub or even Bugzilla, where FOSS projects see a large amount of comments or other contributions from designers, product managers, or users whether they have familiarity with git or commit access.
    • Can you articulate more clearly what the issue is here? Anyone with an account can comment or even -1/+1 commits. And we have a very large and complex codebase (MediaWiki + extensions + ops stuff + tools and misc), no matter where we host it, which of course makes it harder for people to jump in and comment. Does GitHub offer specific tools that would help engage non-technical folks, or are you just saying "Gerrit's UI sucks" one more time?.--Eloquence (talk) 17:41, 11 July 2012 (UTC)
      • I suspect the main problem comes down to 'accounts must be requested' up above -- you can't make comments until you've been placed in the system, whereas github, bugzilla etc allow open registration. --brion (talk) 17:46, 11 July 2012 (UTC)
        • It's not just having to request an account. Unlike tools such as GitHub, Gitorious, or Bugzilla, it is really really hard for anyone who's not a developer to find the relevant projects, relevant commits, and then contribute comments that are not strictly part of the code review process. For example, I had to go ask RobLa in IRC about how to find a simple list of projects. In GitHub, a repository is not just easier to navigate through the GUI (AKA the issue Erik is talking about) but is very tightly integrated with tools that anyone can use to track what is happening in a project (i.e. watching users and repos) and then contact someone when they have a comment. I find it was actually better when we were on SVN and the only Web interface was the comment-less ViewVC, because it forced conversation either on to Bugzilla or on-wiki. One of the largest weaknesses about the MediaWiki development community is the fact that we have historically not done much to include designers, product people, and users in the collaboration on code. This is how we ended up with the MediaWIki we have today, which everyone agrees has deep issues with usability. We need to strike at the heart of this problem by choosing tools that work for everyone who has a stake in the project, not just people committing or reviewing code from a purely technical standpoint. Steven Walling (WMF) • talk 20:38, 17 July 2012 (UTC)
          • A more general note: from my interaction with the Gerrit community, it is usually embraced by the enterprise deployments. It seems that tighter control on access, permissions and workflow appeals to those kinds of users. For this reason it might be also more suitable for Wikimedia operations than to the development community at large.  « Saper // talk »  23:46, 7 August 2012 (UTC)
  • UI sucks profoundly: too wide, clumsy, no visual comparison for image changes (ol' good CodeReview did this!), no "view this file" link, just "download as zip" (WTF????). Almost no flexibility when it comes to searching for anything other than the most recent activity overall.
    • This was a configuration issue, we've had image diffs available for a couple of weeks now.
  • Operaphobic
  • Per the UI sucks, It's not desinged to encurage volenteer development, its geered towards a single project, single team proccess for deployable code. Look at the main "landing page" ... there is little concept of what projects are represented, no per project pages with descriptions, no agragate views of which projects are more active than others, no person icons, no sense who is working on what, too focused on individual commits rather then the final change state represented by a pull request for a particualr feature or refactor. There is a false sense of linear pipeline rather than a distributed collection of interconected activity. There is no big "fork me" button! ( there is no per-project interface views, so not clear where you would put that ). Aside from getting deployable code the tools for feature develop should also encurage expeirmentations and be fun. Github is good at doing this, makes coding and sharing "fun". Gerrit is not fun. --Mdale (talk) 23:28, 11 July 2012 (UTC)
  • Sometimes, development requires a number of commits before a feature/bug/change/refactoring can be fully worked out. So, code review requires a review of a set of commits rather than individual commits. I may not be fully aware of Gerrit's abilities, but a commit-based review model gets in the way of such review flows. Alternatively, it prevents developers from committing something and amending commits till all work on that phase is complete. This may also encourage work in smaller reviewable chunks, so, I am not going to make this a strong case against gerrit. I prefer frequent and regular commits to let others work on it or take over partially done work -- and the in-progress commits may not necessarily be reviewable or need to be reviewed. Ssastry (talk) 03:32, 12 July 2012 (UTC)
    • This is not a hypothetical situation. I am right now reworking some code and I dont see a way to get everything reviewed at once. I have to either rebase all commits to squash them into one giant commit (which is totally undesirable), or I have to submit all commits as a chain of dependent commits, and hope that I dont have to amend any of the earlier commits as part of a review, or work out an arrangement with the reviewer to accept my commits en masse and that I'll made any fixes as part of a new set of commits which could be reviewed. Am I missing some Gerrit feature which will make this less of an issue? There was an email recently on wikitext-l where Mark Bergsma was asked to squash his commits (http://osdir.com/ml/general/2012-07/msg20847.html) -- I personally think this is bad practice. -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
    • Related comment: dependent commits are problematic because amending an earlier commit breaks the commit sequence and I have to either amend all later commits or break up the commits into independent branches by rebasing (which I used recently when I forgot to use topic branches). -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
  • Can't search for file names / changes that touch a file. We no longer have a database table that enumerates every file modified in a change, which makes it impossible to query for the changes that touch a file. [1] Mutante (talk) 09:12, 12 July 2012 (UTC)
  • It's related to Ssastry's comment a bit above this one, but simplified and generalized: Gerrit suggests using lots of amends. The more natural Git way of improving on an idea is adding another commit in the branch. That's how Github works. It allows to see how the idea developed. As far as I understand it, amends are supposed to be used only for fixing commit messages or embarrassingly mistaken commits, but Gerrit takes amends to the extreme. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)
  • In some cases Gerrit shows unrelated files as files that were changed in this commit. I never quite understood which cases are these exactly, but it has something to do with rebasing. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)
    • This has to do with rebasing between changes, and work is being done in master right now to resolve this issue.
  • A changeset can't be applied to several branches. If the fix landed on master has to be committed to REL1_19, REL1_20, wmf/1.20wmf7 you need to submit other three independent changesets. Expected: you could request a merge to several branches on one commit, and perform the merge on that same page (assuming it merges fine, which it probably does). Platonides (talk) 20:42, 13 July 2012 (UTC)
  • UI: I wanted to look up one of my older reviews on gerrit so I could paste a link here for documenting the dependent commits amend problem, but my dashboard only shows me the most recent X actions and I cannot go back in history. -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
  • LDAP support in Gerrit is kind of afterthought. HTTP, SSH and other interfaces are highly decoupled inside of Gerrit and there is hardly a common, application-wide, notion of the user and user authentication. That's why SSH supports only keys stored in some SSH-specific store; that's why issue 1124: Use LDAP for ssh keys is not easy to solve.
  • Baroque permissions. It is difficult to find out why something is not working and which permissions are exactly needed to achieve something without detailed analysis of global, group and per-project permissions.
  • Gerrit defeats git capability to support off-line and detached work. While, with some effort (git fetch remote refs/changes/*:refs/changes/*) it is possible to download refs for all changes pending in gerrit, it is impossible to review and comment on the code off-line and send review results upstream. So, no off-line work. Also difficult to work from command-line only (only some operations are possible via SSH). See also Git/Conversion#Rationale.
  • Confusing naming of git refs: You have to use refs/for/master (or refs/publish/master is supposed to take it over in future - TBC) to submit a change (git-review seems to be born to simplify this for end user). Then the patchset gets named refs/changes/14/3714/1 but you can still push changes to refs/changs/3714. Looks like the need to hash refs/changes directory and ability to clone the Gerrit-specific ref tree (git fetch remote refs/changes/*:refs/changes/*) forced Gerrit developers to expose that confusing naming to the end users. Pity that 14/3714/1 label gets exposed in gitweb... Looks like a case number in some Kafkesque trial.
  • Gerrit produces a messy git history, impossible to read e.g. with gitk or bisect, due to the significant amount of (useless) merges.
  • Gerrit's email notifications are close to useless. There's a lot of repetition between mails, little context about the changes and no way to actually get the unified diff. Extending it to do it (something that could be a three-line shell script with git) seems non-trivial.
    • Diffs will be available to e-mail in v2.5 ^demon (talk) 21:28, 25 July 2012 (UTC)
  • There's no single page diff or side-by-side. There's a "diff all" which spawns a new tab for every file touched in a commit. This makes it difficult to review changes that spawn multiple files without going back and forth between your browser's tabs.
    • This has been fixed by guys at Qt, and is pending integration with upstream Gerrit. No current ETA, however. ^demon (talk) 21:28, 25 July 2012 (UTC)
  • Gerrit's urls are not very human-readable and their syntax is not intuitive ("r", "#", "q", ",n,z", ";a=", etc.)
  • Gerrit's project/repository listing is in the admin panel. While the blokes from the project claim it can handle over 9,000 repos, I am not terribly excited about scrolling through that list, or explaining it to new users.

GitHub

[edit]

How GitHub would be configured for Wikimedia use

[edit]

Deploy and development repositories:

  • Maintain 2 repositories: one "deploy" repository that has very few admins, and another "development" repository that has a lot more trusted developers.
    • Why not simply use "deploy" and "development" branches? No one should be committing directly to the main repository anyway -- access should be mediated via pull request from user repos. This is the point (and power) of the "fork". Dsc (talk)
  • Code gets into the deploy repository only via pull requests from the development repository -- this can only be done when "trusted" developers/admins do necessary code reviews and issue pull requests to the deploy repository.
  • Periodic downstream rebasing/merging from the deploy repo into the development repo, if necessary (to account for any direct commits into the deploy repo).

To ensure bad code does not get into the deploy repository, development repository might have to be managed as follows:

  • Topic branches are used for code that requires review.
    • There's no need for topic branches if all merges are pull requests from user repositories. Dsc (talk)
  • Commits on the master branch that are not reviewed, and cannot be part of a pull request are moved to a branch.
  • Problematic commits into the development repo either get fixed or reverted.
    • (I know I'm a broken record, but...) There's no need for reverts if all merges are pull requests from user repositories. Dsc (talk)
  • Protocols required for who can issue pull requests into the deploy repository.
    • There's no need for a protocol on issuing a pull request -- anyone should be able to do that. Since each pull request opens an issue, we simply close the bad ones with an explanation of why it's not being merged. No need for a new workflow, and we always have an archive of both successful and unsuccessful requests. Dsc (talk)

Code gets into the development repository in 2 ways:

  • trusted developers (staff, volunteers, aliens) get commit rights and can directly push code.
    • I'm of the opinion that no one should commit code directly to the repository. Trusted contributors should be able to moderate pull requests and accept them for merge, but without going through that process, there's essentially no review. To me, the pull request approval process is the review process.
  • anyone else in the world (who dont have commit access to the development repo) forks the development repo and issues a pull request into the development repo.
  • pull requests are used to do code review.

Reviewing development repo commits:

  • Via github comments -- drawback is that there is no way to track reviews. Given current review requirements via Gerrit, unlikely to be an acceptable model.
  • Via pull requests on topic branches -> master of the deployment repository.
  • Via an external standalone review tool like barkeep.

The case for GitHub

[edit]
  • Better usability all around
  • Anyone can participate in development -- creating user accounts, forking, commenting, submitting issues, etc. are all easy and open to anyone and removes WMF from the loop as far as these areas are concerned.
  • Commit rights determine who gets to push code to the repo.
  • Open source projects on GitHub are accessible to thousands of the most skilled, professional Web developers around,[19] making it the best location to recruit new participants without spending donor funds on outreach
  • Already includes outstanding materials for introducing developers to git,[20][21] providing better documentation than we currently have
  • Strong cross-platform support (at the desktop and browser level)
  • Allows access using SVN[22] - which means people used to either SVN or git can both contribute without extra learning curve.

The case against Github

[edit]

Drawbacks:

  • Reviews are done through pull-request (a bit like Gerrit merge requests). However contrary to Gerrit, those with the rights to merge to master are also able to push to master directly (bypassing review).
    • This is not really a drawback, in Gerrit it is also possible to practically push to master by self-reviewing. This is a matter of setting rules, it is not an issue of trust or technical requirements. GitHub doesn't force people with merge access to push, it just means they are technically able to. It's no different from Gerrit in practice. The difference is that anyone (whether someone with merge access, a known volunteer or a brand new contributor) can submit a pull request.
  • Depending on review requirements, could require a standalone review tool like barkeep to track reviews.
  • Depends on developers acting trustworthily and following protocols (ex: use topic branches, use pull requests to merge into master, etc.).
  • Requires MW contributors to get an account at a third party site which is time consuming
    • (discussion) Really? How is that time consuming compared to having to manually request an LDAP account and waiting for someone to review and create it. At GitHub you create an account in under a minute and off you go without needing dedicated user rights. Krinkle (talk) 05:08, 18 July 2012 (UTC)
      • I for one cannot read the github terms of service in under a minute, let alone also understand them. 81.10.140.176 12:58, 18 July 2012 (UTC)
    • It would also decouple getting a developer account from getting a Labs account, which makes the barrier to entry to Labs higher.
  • Unnecessarily bringing in a further foreign authority is against hacker ethics, as hacker ethics call for decentralization.
  • Github's terms of service contain items that allow to disconnect users at GitHub's sole discretion: “If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.”[23]
  • github.com's web ui itself is not free software nor open source. We want to use free tools to develop free software, so use alternatives like gitlabs hosted on our own infrastructure. (independent from the gerrit evaluation). There is an Enterprise product to have github on your own servers. but we should not pay if there are free alternatives. Mutante (talk) 09:37, 12 July 2012 (UTC)
    • While I understand and appreciate the impetus behind this, 'free tools to develop free software' principle also means all mediawiki developers would have to stop using macs, ibooks, iphones, ipads during development. That means, Wikipedia could not be supported on iphones. No using Amazon EC2 either. I am being facetious, but only to make a point. You have to draw the line somewhere. To me, more pertinent is the ownership of code, data, and interactions on the site. There is something to be the said for the collaborative and social nature of Github. -- Ssastry (talk) 12:16, 12 July 2012 (UTC)
      • It's one thing to allow people to work with/for us using non-free tools (e.g. many Wikimedia Foundation employees prefer using some non-free stuff because they feel more productive and happier using those tools). It's another thing to make use of non-free software (and agreement to a corporate EULA) a requirement to work with us. The latter is much dicier, especially for volunteers. -- RobLa-WMF (talk) 18:54, 18 July 2012 (UTC)
    • We can't use github enterprise. Totally ignoring the high cost, it's terms disallow public repos, which is a non-starter.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)
  • No true pseudonyms on github. Github at least knows your real name, even if it might not show it (TOS: “You must provide your legal full name, [...]”)
    • This is a fairly major point. We actively encourage users to use pseudonyms if they want to. In some countries (Japan for instance), people don't use their real names online, and prefer to be totally anonymous if possible. Enforcing a real-name policy ensures that we will alienate a number of potential developers.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)
  • Operations would very likely not move their repos to github, as we can't trust our repos to a third party for security reasons. If we used github we'd split our workflow between two places, and that's a bad idea.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)

Phabricator

[edit]

The case for Phabricator

[edit]
  • FOSS and PHP-based.
  • Active developer community: ~1200 commits/year from 40 authors (Phabricator) compared to 500 commits/year from 50 authors (Gerrit) (outdated)
  • Has an official API, unlike Gerrit
  • Good command line support through Arcanist
  • Includes project activity feeds that normal humans can use
  • Search works very well, unlike Gerrit
  • Supports open registration with a variety of methods (LDAP and MW OAuth)
  • Pre-commit and post-commit review workflows

The case against Phabricator

[edit]
  • The permissions system doesn't seem to be very fine-grained, e.g. not designed with large public projects with varying levels of access involved (volunteers, staff, etc.)

GitLab

[edit]

The case for GitLab

[edit]
  • Well maintained (active, monthly release schedule)
  • Inspired by GitHub, looks a lot like it
  • Self-hosted, not an external service
  • Easy to use & pretty
  • Manages repositories (using a custom-written system) and users; not just a code review tool
  • Sane repository listing
  • Supports LDAP authentication
  • Granular permission scheme
  • Pre-commit and post-commit review workflows
  • Has an external API
  • Somewhere in the middle between Gerrit & GitHub

The case against GitLab

[edit]
  • Impossible to find anything with the user interface.
  • No way to insert inline comments like with Gerrit, which is far supperior in usability btw!.
  • All change diffs in a single page, which is completely dumb. Now way to view the changes file by file.
  • Comments are published immediately which is extremely counter productive. With Gerrit you review the code, place your comments, review your comments and publish them when you're done. Much better.
  • No way to amend changelists once it's pushed. Creates ridiculous situations where you basically have to create new CLs to address code review comments!... rather than amending the existing change list.
  • All in all, a very very poor code review tool!... Gerrit is far more usable albeit a little harder to setup.
  • [No longer true] "does not allow branch-level access control" [2] (note: doesn't this conflict with the "Granular permission scheme" item above?)
  • [No longer true] "All projects are system level, you can’t create project under your account. As a developer, you can only create branches in the repository."[24] (note: how is Gerrit different?)

Other points to consider

[edit]
  • Written in Ruby on Rails
  • Uses a "pull/merge request" workflow (that some people love and others hate)
  • Includes an issue tracker and a... wiki; probably easy to disable, at least in the HTML from showing
    • The bug tracker might be an advantage: it could replace Bugzilla, people seem to want a replacement for that as well and a separate account would no longer be necessary
      • [no longer true][25] "Issues are not related to commits; in order to fix an issue, you need to submit a merge request, and then close that issue."[24]
  • [No longer true] All projects are system level; users can't create projects under their account. Developers can only create branches in the repository.[24]
  • [No longer true] "All projects are private; the project owner needs to assign other developers to his/her project."[24]

Notes

[edit]
  1. Reviews vs Audit in Phabricator
  2. Reviews vs Audit in Phabricator
  3. https://groups.google.com/forum/?fromgroups#!topic/repo-discuss/RC4xNQVCMRs
  4. https://github.com/bcwaldon/python-gerrit/blob/master/README
  5. http://gerrit.googlecode.com/svn-history/r3021/documentation/2.1.4/user-search.html
  6. http://code.google.com/p/gerrit/issues/detail?id=536
  7. http://code.google.com/p/gerrit/issues/detail?id=536
  8. https://gerrit.googlesource.com/gerrit/+/master/gerrit-server/src/main/prolog/gerrit_common.pl
  9. http://gerrit.googlecode.com/svn/documentation/2.1.2/cmd-gsql.html
  10. https://gerrit.wikimedia.org/r/#/admin/groups/1,members
  11. https://gerrit.wikimedia.org/r/#/admin/groups/119,members
  12. http://code.google.com/p/gerrit/issues/detail?id=950&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  13. http://code.google.com/p/gerrit/issues/detail?id=954&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  14. http://code.google.com/p/gerrit/issues/detail?id=1370&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  15. http://code.google.com/p/gerrit/issues/detail?id=349
  16. https://gerrit-review.googlesource.com/#/admin/projects/plugins/delete-project
  17. http://code.google.com/p/gerrit/issues/list?can=2&q=status%3Aaccepted+-has%3Aowner&sort=owner&colspec=ID+Type+Stars+Milestone+Status+Priority+Owner+Summary&cells=tiles
  18. http://code.google.com/p/gerrit/issues/list?can=2&q=status%3Aaccepted+-has%3Aowner&sort=owner&colspec=ID+Type+Stars+Milestone+Status+Priority+Owner+Summary&cells=tiles
  19. https://github.com/popular/watched
  20. https://github.com/training/free
  21. http://learn.github.com/p/intro.html
  22. https://github.com/blog/966-improved-subversion-client-support
  23. https://help.github.com/terms-of-service
  24. 24.0 24.1 24.2 24.3 http://blogs.perficient.com/multi-shoring/blog/2012/05/02/opinionate-review-on-gitlab/
  25. Issues, Merge Requests and Integrations in GitLab