Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent python style, linting and formatting for scripts folder #245

Open
maltekuehl opened this issue Dec 18, 2021 · 3 comments · May be fixed by #251
Open

Consistent python style, linting and formatting for scripts folder #245

maltekuehl opened this issue Dec 18, 2021 · 3 comments · May be fixed by #251
Labels
enhancement New feature or request scope: scripts Related to scripts unfinished

Comments

@maltekuehl
Copy link

Currently the python scripts in the ./scripts/ folder do not seem to strictly follow any linting, formatting, typing or style guide, docstrings are often missing and no configuration files to ensure code style consistency across IDEs seem to be present.

To enable consistent collaboration, it might perhaps be of interest to setup:

@maltekuehl maltekuehl added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed needs triage Pending maintainers' attention labels Dec 18, 2021
@emmahodcroft emmahodcroft removed the needs triage Pending maintainers' attention label Dec 20, 2021
@emmahodcroft
Copy link
Collaborator

Hi @maltekuehl thank you for this suggestion! I'm afraid I'm not very strongly versed in linting. I generally code with Visual Studio Code, if there's something specific you'd recommend for helping to do this through that. Help would be welcomed to address this, as for me it is likely low on the priority list with the very limited resources we have! :)

@maltekuehl
Copy link
Author

Thanks for the comment, @emmahodcroft! With this issue I just wanted to check if there's interest in this or if anyone's already working on it.
I originally took a look at the scripts folder in the context of #243 as I wanted to see what it would take to integrate the JHU/OWID prevalence data in the JSON data. Since I didn't want to randomly mess around with the code without a proper structure or introduce unwanted linting/formatting, I thought creating this issue might be a good first step :)

Now if you say that help would be welcomed, I might take a look at this and implement some basic linting and formatting over the Christmas days if I manage to have some time off and hopefully this will facilitate future additions to the python code.

@ivan-aksamentov
Copy link
Collaborator

ivan-aksamentov commented Dec 20, 2021

Hi @maltekuehl,

Thanks for the suggestion. As you probably know, due to huge workload, academic researchers are not typically willing to spend time on learning and using the best practices, even for tools they use routinely, because there are more pressing activities they need to pursue. As an engineer, I'd be happy if the code was formatted and linted, but the most important thing is to not make obstacles for Emma to produce the results quickly, because they are so much needed during pandemic. In other words, linting is okay, as long as things gets done. Docs are probably is a complete waste of time, because this is not a library, the code changes very fast, and nobody ever reads it except Emma. Tests will be almost impossible, because Emma will not write them for the new code and the old ones will get obsolete the moment they are written.

There's been attempts previously, like this:
#131

But important things to note is that

  • the checks should not be too noisy, i.e. we cannot force Emma's text editor to become permanently decorated with thousands of red and yellow squiggles. This means that the checks will need to be run separately from the development process. And there will likely be need to make these checks somewhat relaxed in the beginning.
  • the checks should be optional and to not prevent from running and releasing anything
  • I can imagine a periodic process where an engineer comes, runs lints and applies a few fixes/refactorings. Not too many at a time to ensure it does not break science
  • Python code runs offline and only the output JSON results of these runs (which we call "web data") are used in the web app (which is a static React app hosted on S3 and does not have any programmable back-end server). In other words, this code does not run in production, so it does not make sense to block the build and deployment like it is done in the linked PR.
  • The Python code is hard to run, because it requires large amounts of private data (which cannot be redistributed due to GISAID terms of service). That is, validating the result of refactoring is near impossible for a 3rdparty contributor.
  • It might be more useful to perform validation of the JSON results instead
  • Formatting will be super nice, but Emma will need to remember to use it.
  • Pre-commit hook will probably end in a disaster. Emma runs Windows and is not on very good terms with git :)
  • Not Python, but related: The web app is linted on build, on push and on deploy. There isn't much to test, because the web app is somewhat dummy - it mostly renders markdown and some plots from JSON. But it wouldn't hurt. Emma does not deal with web app and I will likely not have time to work on this.

Happy to hear your thoughts and to consider contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request scope: scripts Related to scripts unfinished
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants