Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should provenance be recorded in the code files? #3

Open
jwblin opened this issue Feb 16, 2013 · 3 comments
Open

Should provenance be recorded in the code files? #3

jwblin opened this issue Feb 16, 2013 · 3 comments
Labels

Comments

@jwblin
Copy link
Member

jwblin commented Feb 16, 2013

Jonathan H. wrote: "As for the provenance/version of the files, I would vote against including this information in the file themselves or a python docstring. I think that a betterplace for information like this belongs in the version control system (git) or a separate text file." I was wondering if Jonathan (and others who agree) could share why you feel this way. If provenance changes often I can see why this would be good, but for code that doesn't get changed very often, wouldn't it be a net plus to include where the code came from in the file?

@david-ian-brown
Copy link
Contributor

There are several reasons I think it is important to have the origin of each routine documented in the code:

  1. One of the innovations of Python was giving the documentation of functions first class status by providing a simple syntax for self-documentation inside the function body. Modern data formats used in the AOS community such as NetCDF and HDF also provide self-documentation -- in fact this feature has become absolutely essential for ensuring consistency in and keeping track of the contents of big data sets. Detached documentation is not always available and is not always updated consistently with the code: even though we are establishing good documentation guidelines, we cannot guarantee they will always be followed.
  2. Credit: the creators of the code and the institutions that support them should be acknowledged in a visible way.
  3. Support issues: we, the builders of this Python wrapper library, cannot be expected to know enough about the algorithms used to help people with problems they might encounter, including possible bugs in the original code. At most we should be responsible for ensuring that the wrapped version of the code returns the same values into the Python environment that the raw code would produce in a Fortran program. Of course, we intend to use only code that has been deemed reliable, due to long history of use or some other criteria. But as we all know, errors can sometimes be found even in code long thought to be reliable. People need to be able to figure out who is responsible for the code, even if the creators of the code have a blanket no support policy.
  4. Political considerations: atmospheric and particularly climate science now has a political dimension. This makes it even more important to know the provenance of the code used for calculations supporting conclusions that may be controversial. The easier it is to find this information the better.

That said, I can see that it may not be necessary to put this information into each function if the information is the same for a whole set of functions. If the text would be the same for all AWIPS functions, for instance, then perhaps it could be documented at the sub-module level only: i.e., in aos.awips.

@david-ian-brown
Copy link
Contributor

I added a comment giving my reasons for thinking this important under issue number 3.

On Feb 15, 2013, at 10:00 PM, Johnny Lin wrote:

Jonathan H. wrote: "As for the provenance/version of the files, I would vote against including this information in the file themselves or a python docstring. I think that a betterplace for information like this belongs in the version control system (git) or a separate text file." I was wondering if Jonathan (and others who agree) could share why you feel this way. If provenance changes often I can see why this would be good, but for code that doesn't get changed very often, wouldn't it be a net plus to include where the code came from in the file?


Reply to this email directly or view it on GitHub.

@jjhelmus
Copy link
Contributor

I think including a one line comment in the Python doc string stating where the original code came from may be appropriate. My initial comment was more concerning keeping bulky version information in the files. This information is available from the git commit history and is much easier to keep up to date and much harder to accidentally forget to update.

I also think keeping a listing of where the various files originally came from for licensing, etc might be prudent. This could be included at the top of each file but may cause complication/work when comparing or updating these files with new version of the original.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants