This repository contains a number of utilities that I've written (and rewritten) over the years to help with GitHub Classroom. Most recently (Oct 2019), I revamped the code to launch multiple, concurrent requests to the GitHub servers whenever possible. These tools now run significantly faster.
I typically have an administrative repository, private to just the instructors, where I keep grades, slides, and other materials related to running my classroom. I'll copy these utilities to that directory and configure them to operate on the student repositories for just that class.
-
If you haven't already done this, you'll first need to
pip3 install iso8601 pandas matplotlib requests aiohttp
for necessary libraries. (Everything here requires Python3 and is tested with Python 3.7.2. Earlier versions of Python3 might or might not work.) -
Copy all the
py
files here into your class administrative repository. -
Get a GitHub token with all the "Repo" privileges. You do this on the GitHub website (instructions).
-
Edit the
github_config.py
file. In this file you can save values that every tool here will use. These parameters can be specified on the command-line for every tool here, but it's nice to save them so you're not typing them over and over again.-
default_github_organization
: Your organization's name (e.g., forhttps://github.com/RiceComp215-Fall2018
, the organization name isRiceComp215-Fall2018
). -
default_github_token
: Your API token goes here. -
default_prefix
: When you're cloning and otherwise working with a specific assignment for your students, you can specify this here. -
default_grader_list
: Used by github_graders, see below -
default_grader_ignore_list
: Used by github_graders, see below -
default_timezone
: used by github_completion_times, see below
-
All of these tools use a common library to interact with GitHub that tries to avoid rescanning student repositories unless something has changed. These scans can take a while to run and also burn through your available GitHub API request limit, so it's important to cache the results. (You'll see a multi-megabyte JSON file written out as a dot-file in the current directory.)
The cache uses the
ETag headers
generated by GitHub to try to avoid repeated downloads of identical lists
of all the student repositories.
GitHub's implementation of ETag
seems to be unreliable, so you'll be wanting to manually delete the cache
if you know that your students have created new repositories. You
do this by removing the cache file, which has a name like
.github-classroom-utils.RiceComp427-Spring2019.json
).
This forces a rescan of the students' repositories the next time
you run one of the tools here.
Tool usage. Each tool below let's you run it with a --help
argument which will summarize
the command-line arguments.
You often want to get a local copy of every repo beginning with a common prefix,
e.g., comp215-week06
for the week6 projects.
Run python3 github_clone_all.py --prefix comp215-week06 --out codedump-week06
and it will create the directory codedump-week06
and will check out all
of the matching repos into the desired directory.
An optional flag, --safe
, creates repositories that do not have your
API token embedded in them. This means that remote Git actions that
require the token might not work, but the repos are safer to share.
By default, the API token is embedded in the cloned Git repos.
If you keep running these tools, you'll eventually hit the wall with GitHub's rate limits. This tool tells you how many requests you have left and when the timer will reset.
If you checked the wrong box when setting up GitHub Classroom, and all your students' repositories are public, when you meant them to be private, you can go back into GitHub Classroom's settings and make sure that future cloned repositories will be private, but what about the existing ones? This tool will tell GitHub to make private all the matching repositories. (I've needed this twice in so many years, so I figure others might need this as well.)
You tell the students that they're required to have a partner, or to have a minimum group size. How do you detect the missing students? This script does all that. It scans all the projects, extracts the teams, and lets you know about any project with fewer than the minimum number of members. Likewise, if there are students in your database who aren't attached to any GitHub projects, you get their information as well.
If you're using GitHub Classroom, one of the things you may need to do is assign student submissions to graders. This project does this as a random mapping, printing a document that you might share with your graders on Piazza or whatever forum, with grader names and student project hyperlinks.
First, create a list of GitHub IDs that correspond to your graders and
place that in the default_grader_list
in github_config.py
.
This tells the tool who your graders are, and also any repos that they
might have cloned for their own benefit will be ignored. If you want to
ignore any other names, such as the professors, you can add them to
default_grader_ignore_list
.
Our graders need to know how to go from GitHub identifiers to our
internal NetIDs, emails, and so forth. The tool will read in a CSV
file with all this specified (by default, student-data.csv
). To
the extent anything is standard in the CSV universe, the first row
should be a list of strings giving the names of each column. We
use a GitHubID
column for GitHub user ids, and then Name
for
their printable name, Email
for their full university email address,
and NetID
for their university unique "network" identifier (which
is often, but not always, their email address).
Typical usage: python3 github_graders.py --prefix comp215-week06
will
print out everything you need, assuming your assignment repos are named comp215-week06
with the students' names afterward.
A new feature, --teams
is useful when students are working as teams
with GitHub Classroom. This will use GitHub's APIs to identify the
names of each student associated with each repo and will adjust its
printed output appropriately. We also have our students write their
team information into their README.md
files, which is a helpful
backstop in case the GitHub metadata is incorrect.
Another new feature, --ignore
lets you specify a substring of a repo
name to ignore when assigning grading. We tell our graders, when they
want to check out a repo to play with it, to add the word STAFF
in
the name. This helps us skip those so they don't get assigned to be
graded.
The output of this tool is in Markdown format, which Piazza has recently added. Select the Markdown button before cutting-and-pasting. We post this on Piazza, visible only to the graders, and we ask the graders to edit the post to mark the students as "done" when they're done with their grading session. (This helps us see what graders haven't finished their work and, if necessary, assign other graders to pick up the slack.)
This program uses the GitHub "Events" API to print all of the push times
for each commit. This might be
useful if you have a student who you suspect of falsifying commit times
around a deadline and you need to document what happened. By default,
you get a LaTeX "table" with a "tabular" inside. If the table is
too large, you've got two useful flags, --longtable
and --tiny
.
The former uses the LaTeX longtable package for multi-page tables.
The latter uses a smaller font.
Lets say you want to get the commit times for a series of repos
with
names like assignment3-student1
and assignment3-student2
,
you run python3 github_event_times.py assignment3-student1 assignment3-student2
and it will print a table with the commit IDs (7 digit prefix, same
as reported on GitHub's list of commits), the commit string, and the
time at which that commit was pushed to GitHub, converted to your
local timezone (from the UTC times reported by GitHub).
Note that GitHub only retains the underlying event data for a small amount of time, maybe three months. If you see something unusual, capture this output while it's still available.
This reads all the available CI data for every commit in every repo and produces a plot over time of how many students have passed all the tests and gotten a green checkmark. Here's an example from my own students, showing work in progress toward a deadline on 2019-09-01; you can see roughly 100 of 170 students have completed the work on the evening of 2019-08-30.
The timezone used to render the chart is set from the
default_timezone
setting in github_config.py
.
This will read a specified JSON file (--file
parameter) to a GitHub organization (either the default in github_config.py
or through the --org
parameter) and invite the users specified in the file to the specified organization under the role associated with the user in the file.
The JSON file should define a dictionary (mapping) of {role: [username,...]}
where role
is designated role of the usernames in the associated list. role
is restricted to member
or admin
(owner). Multiple roles can be specified in the dictionary but do not specify the same username for multiple roles as which role they will actually be assigned is ill-defined.
The role to which each username is being invited will be printed as well as the response from GitHub showing the status of that request.
The resultant printout is in the format of a series of JSON dictionaries.
This will print out information about the teams in a GitHub organization (either the default in github_config.py
or through the --org
parameter).
Note: The team name "slug" is included in the printout.
The resultant printout is in a JSON dictionary format.
This will print out the members of a specified team (--team
parameter) a GitHub organization (either the default in github_config.py
or through the --org
parameter).
Note: The team name must be the team name slug
. This is the part of the GitHub URL that specifies team when browsing to the team on the web. Typically, the slug is the team name with spaces replaced with dashes and no capital letters.
The resultant printout is in a JSON dictionary format.
This will read a specified JSON file (--file
parameter) to a specified team (--team
parameter) a GitHub organization (either the default in github_config.py
or through the --org
parameter) and invite the users specified in the file to the specified organization under the role associated with the user in the file.
Note: The team name must be the team name slug
. This is the part of the GitHub URL that specifies team when browsing to the team on the web. Typically, the slug is the team name with spaces replaced with dashes and no capital letters.
The JSON file should define a dictionary (mapping) of {team: [username,...]}
where team
is team to invite the associate list of usernames. Multiple teams can be specified in the dictionary and any username can be invited to multiple teams.
The team to which each username is being invited will be printed as well as the response from GitHub showing the status of that request.
The resultant printout is in the format of a series of JSON dictionaries.
-
I did a talk at SIGCSE 2019 about an earlier version of these tools.
-
For dealing with Travis-CI, check out travis-activate. (Travis-CI normally "activates" immediately when a new repository is created, but at least once I've seen this fail. This script was something I originally ran from a
cron
job to force all repos to "activate" with Travis, years ago, when Travis didn't know how to automatically activate new repos.) -
https://classroom.github.com/assistant - GitHub's web-based tool for bulk repo downloads (open source)
-
https://github.com/dwalkes/github-classroom-scripts - knows how to set up pull requests
-
https://github.com/ccannon94/github-classroom-utilties - knows how to clone assignments, add files, and to set things up for a run of MOSS
-
https://github.com/osteele/multiclone - another repo cloning tool, written in Golang
-
https://github.com/konzy/mass_clone - there are many forks of this, some with additional features
- (8/27/2021 by S. Wong) Added new functions to invite users to organizations and teams as well as to print out the teams in an organization and the members of a given team.