Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a "cookie-cutter" repository so folks have a template to start from #2

Open
2 of 3 tasks
cesar-rocha opened this issue Oct 15, 2021 · 15 comments
Open
2 of 3 tasks
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@cesar-rocha
Copy link
Contributor

cesar-rocha commented Oct 15, 2021

  • create a template repository for data analysis.
  • write an MD file describing the template and its use.
  • Include a Binder badge and instructions for how to set up one
@biavillasboas biavillasboas added the documentation Improvements or additions to documentation label Oct 15, 2021
@wenegrat
Copy link

Can we leverage @jbusecke's Cookie Cutter Science template?
https://github.com/jbusecke/cookiecutter-science-project

@biavillasboas
Copy link
Contributor

We (myself, @jtomfarrar, and @cesar-rocha) all use and love @jbusecke's Cookie Cutter. When we created this issue, we were trying to find a way to get science team members that don't usually use Python and git to engage and feel comfortable sharing their code. @jtomfarrar suggested having a "minimalist" template for folks to follow (without requiring a Python installation, CI, unit testing, etc...).

Maybe, when writing the CONTRIBUTING.MD, we could recommend "experienced" users to use the cookiecutter-science-project and novices to follow the minimalist template? What do you think @wenegrat?

@wenegrat
Copy link

Understood. Sounds like a good plan.

@jtomfarrar
Copy link
Contributor

Bia wrote to me:

**_Hi Tom,
thanks for starting this discussion.

(1) No need to apologize and this is actually my fault for not having completed yet a document with guidelines for how to contribute. I'll work on that asap. For, now I'll just explain how you would do for this particular point that you're discussing.

When you log in to your account on the top bar, click on "issues" and then "assigned". That should take you to a page that shows all the issues that you have been assigned (there is also a "mentioned" tab). The "cookie-cutter" issue will show up, as you can see below:
Screen Shot 2021-10-24 at 1.29.19 PM.png

If you click on that issue, you will be taken to this page. At the bottom of the page, there will be a space to write a comment. You can write there (it accepts markdown format, attachments, tagging people, etc), and then click "comment". Folks that you tagged will get a notification and can reply to you there.

Screen Shot 2021-10-24 at 1.39.22 PM.png
When/if we have a chance I would be happy to walk you through a complete workflow of open issue -> discussion -> pull request -> close issue. Cesar and I worked like this last week on the "Maps" repository.

(2) Here are some of my thoughts regarding using the cookie-cutter package.

I love the fact that we have this tool available -- it makes my life a lot easier. That said, for someone who doesn't use Python and/or is not familiar with GitHub jbusecke's cookie-cutter could be very intimidating and even prohibitive in some cases. As much I want to advocate for open source software, I personally believe that best practices come first, and that applies to all languages/tools. Maybe if we think of this as being also a "GitHub pilot", we could prioritize having clear instructions of how to name repositories, structure directories, name files, document code, how to add a README with instructions of how to use the code, etc.. I feel like, especially at this stage, where we're doing near-real-time analysis, writing code fast to make plots that need to be updated as the data comes along, it will be hard to make things reproducible. So, yes. I think that unit testing /CI is too much.

Now, back to your question regarding the template itself. I like what you're using. I follow something similar for my projects. Perhaps, for S-MODE, something like this would make sense_**

project_name/
├── data
│ ├── subfolder
│ ├── subfolder
├── figs"
├── src
└── notebooks
.gitignore
environment.yml
README.md

What we need to make clear is that one shouldn't "clone" the template repository, or even download de .zip from GitHub, unless they make sure that they remove the .git folder by running
$ rm -rf .git

Anyway, this email is already far too long. I'm happy to continue discussing and working on this in the next couple of days.

But before I go, I'll just say that I think you problem with CI is coming from the .github/workflows folder. That is the place that controls "github actions". If you look there, you will find "release.yaml" and "test.yaml". These are the files that "trigger" the tests. At the top of "test.yaml" for example, you will see

name: Tests
on: [push, _pull_request]_

this means that every time you push or accept a pull request, this script will run the tests (thus, the error that you're getting). You could change these lines to "pull_request" only or you could probably delete the .github folder from your repository with "git rm .github", commit the change, and push. Let me know if you have any questions about that.

B.

@jtomfarrar
Copy link
Contributor

Thank you, @biavillasboas . Your suggestion about deleting the .github folder solved the problem I was having with unwanted Travis CI when using my adapted jbusecke cookiecutter template.

About the template, it seems we could do the following:
(1) Specify a recommended directory structure for new repositories, as part of 'best practices' for contributing usable code.
(2) Provide a cookiecutter template that might be convenient for those who would like to use it

I like your recommended directory structure fine. It looks like something I read about. I was wondering what the purpose of the ./src directory is-- in most cases, is it just python scripts/functions? If so, why separate that from ./notebooks?

@jbusecke
Copy link

Hi folks, just wanted to chime in here

First of all thank you for using the cookiecutter template. I realize that there are many issues with it still remaining, and I would very much like to see this being adopted (and changed!!!) to server a wider audience. I think that some of these feedbacks here could be used to improve the existing template rather than making new ones? But I might also not completely grasp the scope and timeline of what you are trying to do here.

Some inline responses:

could be very intimidating and even prohibitive in some cases.

I see where this is coming from. My goal with this template was actually to avoid this while still providing advanced functionality if they are needed and wanted.

As much I want to advocate for open source software, I personally believe that best practices come first, and that applies to all languages/tools.

I would like to find out more how to actually interpret this. The idea about the cookiecutter template from my side was to provide both the template and documentation (currently only the readme) together, to enable beginners to just get started but avoid time intensive refactoring. Ideally for every decision or optional feature (code linting etc) a documentation would be provided in the guides.
My initial design idea was to make all of the 'advanced' stuff purely optional, so that folks can just use this template, but dont have to redo stuff once they actually want these features. I think that this holds true for everything but the CI? Perhaps we can modify my template to serve this purpose?

I also want to mention that there was some discussions over at Project Pythia: ProjectPythia/pythia-foundations#63 that might be interesting, and could benefit from some input from this community?

@jtomfarrar I think you can also deactivate all actions like this...wondering about travis though. the newer versions of the template runs on github actions.

@biavillasboas
Copy link
Contributor

Thanks for joining the discussion, @jbusecke! We're all pretty new to this and would greatly benefit from your feedback/experience.

To give you some context, the S-MODE pilot experiment has just started last week, and we're trying to figure out a way to facilitate collaboration among Science Team members. This includes some folks who program in MATLAB and have little (or no) experience with git/GitHub.

My goal with this template was actually to avoid this while still providing advanced functionality if they are needed and wanted.

I think your template does meet this goal (I use it all the time, btw)! I guess we were trying to come up with the simplest possible solution that would provide a structure for folks who don't have Python and just want to "dump" their code on the SMODE GitHub.

Now, trying to clarify this:

As much I want to advocate for open source software, I personally believe that best practices come first, and that applies to all languages/tools

Here again, I was coming from a Python vs. MATLAB point of view. I guess I was trying to say that I see value in prioritizing things from the Project organization section of Wilson et al., 2017 before trying to get folks to use Python, version control, unit testing, etc.

Thinking about it again, yes, maybe all we need is a version of the cookiecutter-science-project that doesn't include CI by default, and that has a folder for code that is not notebooks. Folks could still use the package to create the structure of the repo and then populate it with MATLAB code if they wish. Does this make any sense?

@jtomfarrar
Copy link
Contributor

Yes, @jbusecke, thanks for joining our discussion! I think I was mistaken in saying Travis CI-- I just meant CI. I am too much of a novice to use it, but I do love the template you have made and the good description of how to use it. I think your solution would work, but I just deleted the ".github" directory, which of course works (but eliminates the possibility of doing CI, which is probably something for me to try later-- I just learned python as a pandemic hobby, so I am pretty happy to be doing things at a basic level).

I totally see where @biavillasboas was coming from, and I agree with her: we have a diverse team with different backgrounds and comfort levels, and we don't want to make barriers for people who would prefer to just be told a good, conventional way to manually organize their repo directory structure.

@biavillasboas , I have read Wilson et al., 2017, but I felt it was a little more complicated than it needs to be for one of my science projects. I felt like "data", "code" and "plots" would be enough for the core project structure in a data analysis science project. Maybe I am missing the point of having separate notebook and src directories? (I feel like it must be for more complicated projects than one of the papers I write.)

@biavillasboas
Copy link
Contributor

I agree with you @jtomfarrar that there is no need to replicate the Wilson et al., 2017, structure. I brought that reference up just to clarify what I meant by prioritizing organizational/format best practices. On a paper/science project I see reasons for having a notebooks directory and a src directory. You might want to illustrate something on notebooks but have a bunch of functions and modules that live outside the notebook and that users/readers might not care about. This ofcourse varies from project to project and there is a lot of personal preference too! For now, I suggest that I go ahead and create a template repo following @jtomfarrar's structure along with instructions of how to use it. We can see how it goes and revisit this discussion if needed. How does that sound?

@jtomfarrar
Copy link
Contributor

@biavillasboas , it sounds fine to me. I certainly don't want to dictate the structure and would be happy to go with something else.

@jbusecke
Copy link

jbusecke commented Dec 6, 2021

Hey everyone. Thank you for these detailed explanations. I did not realize this was so time sensitive and then totally dropped the ball on this. I think that these are all very valid points, and maybe we could have a chat after the project is finished to distill out some of the lessons learned and maybe improve the cookiecutter for future projects? Even if we don't make any changes to my template, I would totally appreciate some more perspective on the participants experiences.

@jtomfarrar
Copy link
Contributor

No problem! It isn't too time sensitive. We have another campaign in a year, and people will probably start doing code and analysis in the next few months.

I made some edits to my fork of @jbusecke's template, trying to keep CI but have it default to "on pull_request". Does it make sense to have the template have this structure?

project_name/
├── data
│ ├── subfolder
│ ├── subfolder
├── plots
└── code
.gitignore
environment.yml
README.md

@biavillasboas
Copy link
Contributor

Hi @jtomfarrar, does this looks like what you had in mind? Any comments/suggestions?

https://github.com/NASA-SMODE/TemplateRepo

@jtomfarrar
Copy link
Contributor

Hi @biavillasboas , yes it does! Thank you! I like how you explained relative paths, gitignore, etc. Maybe we could also make a cookiecutter version? I think I already have a version (forked from @jbusecke 's) that is pretty close.

@jtomfarrar
Copy link
Contributor

Let's also add \data to the gitignore file. I will try to do that now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants