Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide acknowledgments to Gibhub editors of each code #6

Open
phfaist opened this issue May 11, 2022 · 1 comment
Open

Provide acknowledgments to Gibhub editors of each code #6

phfaist opened this issue May 11, 2022 · 1 comment
Assignees

Comments

@phfaist
Copy link
Member

phfaist commented May 11, 2022

See issue eczoo_data#218.

The script that does the changes should be added into this repository (e.g., by pull request to claim the UnitaryHACK bounty) under the folder

tools/contributors_via_git/

This way, we can play around with the script, do additional tests to make sure the contributors are captured in the way we were intending to, and carry out any additional necessary tweaks.

Thanks to UnitaryHACK-ers that want to contribute!

@phfaist phfaist self-assigned this Jun 2, 2022
@phfaist
Copy link
Member Author

phfaist commented Jun 9, 2022

Here is a potential blueprint for the logic of such a script (in pseudo-python). The strategy is to go through all repository commits in chronological order and take note of contributors to each code (identified by code_id). Using the code_id to identify codes will help with files that were moved around in the git tree, and for which git doesn't display history past the file rename point.

# dictionary of code_id -> list of author info dictionaries
codes_contributors_information = {}

for commit_object in (traverse through all commits of the repo in chronological order):

    # get author information associated with that commit_object
    author_information = {
      'githubusername': ...,
      'name': ...,
    }

    for yml_file in (all YML code file changes in commit_object):
        code_id = (read the code_id field in the YAML file)
        
        is_change_significant = get_is_change_significant( ... )
        if is_change_significant:

            # add this author to the list of contributors to that code
            if code_id not in codes_contributors_information:
                codes_contributors_information[code_id] = []
            codes_contributors_information[code_id].append( author_information )

# update the codes tree.
for code_id, list_of_contributors in codes_contributors_information:
    
    # fetch the YML file associated with the code ID (available through via
    # ecczoogen, from the generator code)
    code_yml_file = zoo.get_code(code_id).source_info_filename

    ... # append the relevant information to the data in the code YAML file

def get_is_change_significant( ... ):
    ... # logic to detect whether a change was substantial (to be listed as a contributor) is
        # coded here. A "substantial contribution" means basically more than fixing a
        # few typos.
        # This function's parameters should include at least the YML file name and
        # the commit id, so we can call "git --word-diff=porcelain"

Notes:

  • The site generator script / ecczoogen package has code that can load the whole codes tree, and we can easily find a code YAML file by its code_id with zoo.get_code(code_id). See this line.
  • I think what we need to determine if a change is significant is to parse the output of git --word-diff=porcelain, and look at the number of word changes. We might have to test this and tweak it to get good results
  • The above logic doesn't account for changes in code_id in the history of a code. We will either have to deal with these manually, or find some other fix. What seems most reasonable is to list all the encountered code_id's that don't exist in the current tree; then we can re-run the script with a hard-coded mapping of old code_id's to new code_id's, making sure the script takes note of changes directly under the new code_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant