-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull Request #60
Comments
Thanks very much - I'll try to get you comments tonight. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@gvwilson Here's the info we can get from Github about Pull Requests (see here: http://ghtorrent.org/relational.html):
Details on code reviews data:
We pulled all of the code reviews from the ~10k projects we sampled. Around 0.5% of the projects had at least one code review, which was a little lower than expected. Our hypothesis as to why this is low is that it is only recording comments directly on code in a PR, and not recording general comments on a pull request (even though we might want to consider these code reviews). Looks like these 'general' comments are stored in a separate table called issue_comments, along with all the comments on issues (for more info, see the 4th paragraph in the Challenges and Limitations section here: http://www.ghtorrent.org/files/ghtorrent-data.pdf). Next steps here are to pull in these 'general' comments on PRs as well.
Details on pull request data:
We pulled all of the pull requests from the ~10k projects we sampled. Around 6.6% of the projects had at least one PR.
Below we have plotted each cluster (where a cluster is a group of 'similar' GitHub projects based on Graph2Vec embeddings) by what % of the cluster is just a single chain (no branching and merging) against % of projects in the cluster with at least one pull request. We expected these 2 variables to be negatively correlated, which is (kind of) what we see
The text was updated successfully, but these errors were encountered: