Jump to content

Topic on Talk:GitLab/2020 consultation

Notes from Product Analytics

5
MPopov (WMF) (talkcontribs)

Hello, I'm writing on behalf of Product Analytics. From our discussion:

  • THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).
  • We frequently need to read and search code, and Gerrit has extremely poor support for this. Many of us use GitHub to search the mirrored repositories.
  • We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).
  • Conversations on Gerrit can be difficult to navigate since comments are tied to specific patchsets, so there may be an active discussing happening about something in patchset 3 meanwhile the patch is already on patchset 9. If CR in GitLab is similar to GitHub (in terms of how comments/conversations happen & are displayed) that is nice.
  • In the past we've used GitHub Pages for sharing reports. For example, if generating an HTML document from the R Markdown source document where the analysis is done it's easy to enable GH Pages to have the "rendered" version of the report available via URL (example); GitLab appears to also have this feature and we'd like it available if possible.

From my own perspective, as author & maintainer of several R packages the team uses in our workflows, GitLab's support for CI for R packages (more info) is very appealing. There have been efforts made in the past (T153856), but modern CI tools (especially with availability of r-base Docker image) will make it possible for us to have proper CI (which I have on my personal R packages on GitHub).

Tgr (WMF) (talkcontribs)

Since Gerrit 3 we have a Comment Threads tab which is fairly similar to how conversations are displayed in Github.


The consultation page says In addition [to issue tracking] we would turn off repository wikis, GitLab Pages, and other features overlapping with currently provided tooling. (which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead. Doc page generation via CI, maybe? It's not quite the same thing - you can use Pages to generate a webpage from your repo code, but also in a number of other ways. And in any case, doc generation via CI seems even more arcane and complex to set up than Toolforge.)

BBearnes (WMF) (talkcontribs)

Re: Pages:

which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead

FWIW, I don't think we in the consultation WG have analyzed that particular aspect of things deeply. If there's a strongly felt use case for a Pages-like feature, then I think that's probably a reasonable discussion to have. We've called out wikis and issue tracking explicitly to prevent fragmentation in those domains, and I don't have a strong feeling as to whether Pages presents a similar risk. Would be curious what others think.

Neil Shah-Quinn (WMF) (talkcontribs)

@MPopov (WMF) said above:

We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).

To expand on that, it's not just that we have the rights to create GitHub repositories in the wikimedia and wikimedia-research organizations. It's also that we can create repositories under personal GitHub accounts and later move them effortlessly to the main organization.

For example, I originally created wmfdata-python to streamline my personal analysis workflows, so I naturally stored it in my personal GitHub namespace. Over time, others on my team and, later, researchers on other teams started using it too. Eventually, we decided we should move it to a more official location. With GitHub's move repo feature, it literally took 1 minute to accomplish this and the automatic redirection (for both web and Git access) make it completely seamless for user.

From what I understand, GitLab has these exact same abilities natively. Some comments here have pointed out that it would be theoretically possible to create user namespaces in Gerrit, which would be an improvement on the current situation, but as @BBearnes (WMF) said it would be "fighting the design of the system" and wouldn't be nearly as good as the GitLab/GitHub model.

Neil Shah-Quinn (WMF) (talkcontribs)

Also let me emphasize another point that Mikhail made:

THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).

Jupyter notebooks have nearly become the common format for data science (for example's, GitHub's State of the Octoverse report says that their use on GitHub has grown more than 100% in each one of the last three years).

Gerrit can only display Jupyter notebooks as long JSON blobs, but GitLab can show them in their rich, rendered format. This is a hugely important feature for us; if we switch to GitLab, we can start using it to host our analysis code, but if we stick with Gerrit, we will have no choice but to continue the fractured status quo ("production"/"library" code on Gerrit, analysis code on GitHub).

Reply to "Notes from Product Analytics"