Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a tool for analysing our contributions on GitHub #9

Open
mhauru opened this issue Sep 13, 2023 · 11 comments
Open

Build a tool for analysing our contributions on GitHub #9

mhauru opened this issue Sep 13, 2023 · 11 comments
Assignees

Comments

@mhauru
Copy link
Collaborator

mhauru commented Sep 13, 2023

I would be interested in understanding REG's contributions to the OS ecosystem. We had a conversation about this with TPS people in July 23: Malvika, Arielle, Anne. Jim attended as well, as the REG TPS contact. They would likewise be interested in analysing Turing's contributions. I said we would work on a tool to pull some data from GitHub that could be useful for all parties.

The exact functionality isn't entirely clear to me yet, but the questions I would like answers to are things like:

  • Counting all the people in the Turing GitHub org,
    • How much do we contribute to OS repos outside the org?
    • Which ones?
    • What form do those contributions take?
    • Are they at all aligned with the projects we feel we should be helping? Are there notable omissions?
  • Among all the repos we "own" (are under the Turing org or otherwise clearly Turing projects)
    • Do we have many external users and contributors for them? Which ones do, which ones don't?
    • Do we maintain our repos well, especially the ones that do have external users?
    • Are we welcoming to outside contributions? Do we have things like a license, contribution guidelines, do we review and merge PRs?
    • Do people build on top of things we build?

Especially for the part of how we maintain our repos, Aoife was interested in this in the past. She started writing a tool for doing an analysis on this, that I think we could build on.

@crangelsmith
Copy link

This feels like an important and impactful project for the Turing, where REG could contribute meaningfully. However, people could feel that they are being monitored. It would be good to propose to TPS to take the project through an ethics approval process.

@JimMadge
Copy link
Member

Some points about this:

  • Reporting on open-source impact and and outputs is important for TPS. They are likely to produce reporting on this whether we help or not. However, I expect the reporting would be better if we can help build tooling
  • The tools we build here could have much wider impact. Better understanding of our open-source work could help recognise and incentivise working in the way we want.

@mastoffel
Copy link

The tool could have a feature for recognising dead repos and sending e-mails to the maintainers, eventually leading to archiving etc. of the repo. This could help to keep the the Turing repo clean. Also links to #11 .

@mhauru
Copy link
Collaborator Author

mhauru commented Oct 4, 2023

Once we get to this we should

  • Check whether the code that @AoifeHughes already wrote is a good starting point for us
  • Whether we should build this on top of/as an addition to whatwhat
  • Look into what tools already exist for this purpose in the wider world

@mhauru mhauru moved this from Planning to Todo in hut23-open-source-sa Oct 4, 2023
@mhauru mhauru mentioned this issue Oct 4, 2023
@AoifeHughes
Copy link

For ref: https://github.com/alan-turing-institute/Hut23/issues/1458

@rwood-97
Copy link

rwood-97 commented Oct 5, 2023

There is desire for this as part of the new building sustainably scholarly communities project, particularly re. contributions in to our repos from externals. The goal would be to use this data as a measure of success in terms of 'building a community'.

@mhauru
Copy link
Collaborator Author

mhauru commented Oct 5, 2023

I just had a chat with @yongrenjie about whether he thinks this should be built on top of whatwhat or as an extension of whatwhat. I would summarise his comments as (Jon please correct/add):

  • It's OCaml. That's great if you like it or want to learn it, but it's not the most welcoming language to make contributing easy for anyone.
  • There isn't all that much overlap between what the two tools do, the shared code would be limited. whatwhat starts with the assumption that we want to look at all the issues in a project board, and most of the code that interfaces with GitHub is somewhat bespoke with this in mind. The only more general part is the one that pulls in basic user information, that we could probably reuse. But in other words, the headstart we would get by using what whatwhat has already built would be quite small.

@AoifeHughes
Copy link

I’ve been meaning to pick up a bit of dev on the tool which Markus has already mentioned, for another (edi) reason. if someone did want to colab and get it modified to do what the OS SA wants then I’d be happy to do that 😊

@mhauru
Copy link
Collaborator Author

mhauru commented Oct 6, 2023

@AoifeHughes, the conclusion from the OS SA meeting this week was that this is a high priority for us (see emoji voting above), but right now we all have our hands full with things with deadlines. I do hope, maybe even expect, one or another of us to get to this Soon (TM).

@mhauru mhauru moved this from Todo to In Progress in hut23-open-source-sa Jan 10, 2024
@mhauru mhauru added the private Issues that might need to be Turing-only or REG-only visible label Apr 17, 2024
@mhauru
Copy link
Collaborator Author

mhauru commented Apr 17, 2024

Work on going on a tool to fetch data here: https://github.com/alan-turing-institute/github-analyser
and on analyses using that tool here: https://github.com/alan-turing-institute/github-analysis

@mhauru mhauru removed the private Issues that might need to be Turing-only or REG-only visible label Apr 30, 2024
@llewelld
Copy link

llewelld commented Sep 9, 2024

At RSECon24 there was a poster on the topic of "Mining RSE repository timelines on GitHub: How long will it live, and who will notice?" which reminded me of this task and generated what look to me like some really interesting results, e.g. "if you want stars, publish papers" and "if you want contributors, respond to their issues/PRs". These might sound obvious but I found it fascinating that this was shown in the data.

Anyway, mentioning it in case there's scope for sharing ideas. Kara — one of the authors — from EPCC was at RSECon and very open to talking about the approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

7 participants