Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copyright check #9

Merged
merged 24 commits into from
Jan 29, 2024
Merged

Conversation

KyleFromNVIDIA
Copy link
Contributor

Fixes: #2

@KyleFromNVIDIA KyleFromNVIDIA force-pushed the copyright branch 12 times, most recently from be30991 to 3ac9bef Compare January 22, 2024 18:46
@KyleFromNVIDIA KyleFromNVIDIA marked this pull request as ready for review January 22, 2024 20:23
@KyleFromNVIDIA KyleFromNVIDIA requested a review from a team as a code owner January 22, 2024 20:23
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Initial pass -- these comments may be outdated by the latest changes.)

.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Show resolved Hide resolved
.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Jan 22, 2024

At this point, the code looks good and I just need to spend time testing before I approve.

@bdice
Copy link
Contributor

bdice commented Jan 22, 2024

This failed for me when I tested this hook on cudf. I got a UnicodeDecodeError because I forgot to exclude binary files (*.parquet, *.orc, etc.). Should we patch to ignore binary files, and possibly warn in those cases?

diff --git a/src/rapids_pre_commit_hooks/lint.py b/src/rapids_pre_commit_hooks/lint.py
index edb1f93..4f8ea9c 100644
--- a/src/rapids_pre_commit_hooks/lint.py
+++ b/src/rapids_pre_commit_hooks/lint.py
@@ -210,7 +210,10 @@ class ExecutionContext(contextlib.AbstractContextManager):
 
         for file in self.args.files:
             with open(file) as f:
-                content = f.read()
+                try:
+                    content = f.read()
+                except UnicodeDecodeError:
+                    pass
 
             linter = Linter(file, content)
             for check in self.checks:

@KyleFromNVIDIA
Copy link
Contributor Author

KyleFromNVIDIA commented Jan 22, 2024

Should we patch to ignore binary files, and possibly warn in those cases?

I'll open a separate PR for that.

Edit: #10

@bdice
Copy link
Contributor

bdice commented Jan 23, 2024

I tried testing this hook on cuspatial. That repo doesn't have any automated enforcement of copyrights today, so many of its files are outdated. I tried editing some files, and the copyright script appeared to work as intended.

However, I am not sure what options I can provide to get this repo "up to date." There are a lot of files that were modified in past PRs in 2023 and 2024 but those changes aren't reflected in the copyright dates because this repo had no copyright year header enforcement. It seems like this hook might provide eventual consistency after every file is modified, but I want to batch-process the existing files and get them caught up so that the copyrights on branch-24.04 reflect the dates that the files were last modified in the git history. Is there a way to do this? Is that an option we can provide?

Testing PR here: rapidsai/cuspatial#1328

@harrism
Copy link
Member

harrism commented Jan 23, 2024

Is it really essential to correct cuSpatial? Seems to me "eventual consistency" is good enough.

@bdice
Copy link
Contributor

bdice commented Jan 23, 2024

Is it really essential to correct cuSpatial? Seems to me "eventual consistency" is good enough.

I would accept eventual consistency if a bulk update is impossible, but I want to make sure I'm understanding the scope and capabilities of this tool correctly. I think it would be reasonable for this kind of bulk update to be in scope, too, and it shouldn't be very complex if my understanding is correct.

@KyleFromNVIDIA
Copy link
Contributor Author

KyleFromNVIDIA commented Jan 23, 2024

I want to batch-process the existing files and get them caught up so that the copyrights on branch-24.04 reflect the dates that the files were last modified in the git history. Is there a way to do this? Is that an option we can provide?

Right now the script does "update all files with the current year that have been modified since the merge-base with the target branch." The batch use case is "update all files with the year they were last modified at any point in their Git history."

The script does not do this in its current state. It is definitely possible, and I'd be happy to do it, but it will require a fair amount of rework. We will have to trace the history of each file, including through copies and renames. Let's get this first iteration merged first, and then we can do that as a follow-up.

Edit: I added this feature to this PR. You can try it with the --batch argument. We may be able to merge these two modes later on, but for now I'm keeping them separate.

@KyleFromNVIDIA KyleFromNVIDIA marked this pull request as draft January 23, 2024 14:34
@KyleFromNVIDIA KyleFromNVIDIA marked this pull request as ready for review January 23, 2024 14:56
@KyleFromNVIDIA
Copy link
Contributor Author

KyleFromNVIDIA commented Jan 23, 2024

I just ran --batch mode on all of cudf (2798 files, 37945 commits.) It completed in a reasonable-ish amount of time (18 minutes (98,000 file-commits per second)), and AFAICT it didn't make any mistakes.

@KyleFromNVIDIA
Copy link
Contributor Author

KyleFromNVIDIA commented Jan 23, 2024

cuspatial (570 files, 2006 commits) completed in 13 seconds (88,000 file-commits per second.)

.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few final comments, but overall I think this is really close. I approve. Feel free to iron out any details and ask for more feedback if you want it, but feel free to merge when you're ready and we can start to migrate RAPIDS to using this hook.

.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
.pre-commit-hooks.yaml Outdated Show resolved Hide resolved
src/rapids_pre_commit_hooks/copyright.py Show resolved Hide resolved
@KyleFromNVIDIA KyleFromNVIDIA merged commit 342015c into rapidsai:main Jan 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add copyright hook
3 participants