Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@jhkennedy's review for JOSS #2

Closed
jhkennedy opened this issue Dec 29, 2022 · 4 comments
Closed

@jhkennedy's review for JOSS #2

jhkennedy opened this issue Dec 29, 2022 · 4 comments

Comments

@jhkennedy
Copy link
Contributor

jhkennedy commented Dec 29, 2022

I am collecting my thoughts/suggestions for openjournals/joss-reviews#4982 here. Since this is a review of a collection of software and services, I may open more targeted discussions/issues/PR in the associated repos.

Note: This might be more appropriate as a discussion, but this repository does not have discussions enabled. Feel free to move it to a discussion if you like.


Summary

Overall, I am really impressed by Sliderule and all the work that has gone into it. It clearly fulfills a very valuable need in the community and does so exceedingly well. Coming in from the outside, I found the services, documentation, and code very approachable and could get up and running pretty quickly with only a few minor hitches. There are some areas of the manuscript itself that could be improved, however, so I am going to recommend accepting this manuscript with minor revisions.

Detailed Review

Here I'll go through my review checklist and add discussion for any items I haven't already checked off -- if it's checked, I didn't feel the need to (further) discuss anything.

General: Contribution and authorship

Overall, the authorship looks appropriate, but I am curious why Eric Lidwa is not included in the author list? It appears he had made substantial contributions even before this review was started in August (!) and has since become quite prolific on the project (e.g., #2 contributor for sliderule and #4 contributor for sliderule-python)

Documentation: Installation Instructions

sliderule

Following the README it was pretty straightforward to get to a successful build and install, but from there I never really got anything working. sliderule scripts/selftests/test_runner.lua would work and pass, but sliderule scripts/apps/server.lua <config.json> appears to need a config.json and I have no idea what that should be or look like.

Since there are docker instructions right in this repo (but no Dockerfile?) it'd be nice to provide a container for released versions a user/developer could pull down and run.

Other very minor thoughts:

  • make help is super great! You should mention it in the README
  • The Docker sections could probably just link out to the docker install docs instead of maintaining a copy yourself
  • Most everything worked just the same for Ubuntu 22.04
  • In the prerequisites section, I really had no idea what packages were available and why/when I would want them. I basically ignored the package dependency step entirely. It might be worth having an overview of the packages in a packages/README.md

Documentation: Example usage

Viola Demo

This is fantastic overall. My only minor quibble is the controls require scrolling away from the map, but there's a lot of blank space to the right of the controls. If possible, it'd be nice to have those as a row along the plot instead of stacked in a column.

Sliderule-docs

In the Getting started section of the user guide, I'd prefer "Examples" to be before "Project Map" so that when I click the "next" button at the bottom of the page, the examples appear after the basic usage.

In the Making Your First Request user tutorial:

  • I was (also) confused by the netrc/auth warnings throw; maybe leaving verbose=False for this one would be better. I also think the error messages could be improved:
    • for WARNING:sliderule.sliderule:Failed to retrieve username and password from netrc file: 'ps.slideruleearth.io', I expect a file path to come after the colon as written -- adding ... from netrc file for machine: would be more clear
    • for ERROR:sliderule.sliderule:Unable to authenticate user jhkennedy to https://ps.slideruleearth.io/api/org_token/, I went down the path of hitting https://ps.slideruleearth.io/api/org_token/, determining the public org is sliderule, and finally getting the 403 Forbidden "detail": "jhkennedy is NOT a member of sliderule" message. Unfortunately, this "detail" is not passed to the user as part of the warning.
  • I am surprised this basic tutorial expects a development install? The other tutorials just expect you to have the sliderule client installed; I would expect the same here.

Sliderule-python

+1 to: SlideRuleEarth/sliderule#190

Also, verbose=True is in the README here as well, but I'd prefer it not to be (see above).

Documentation: Community Guidelines

It'd be nice to include your Contributing Guidelines in your .github repo so that they are easily found and reported by GitHub. e.g.: https://github.com/ICESat2-SlideRule/sliderule-python/community

Outside contributions to sliderule-python are likely hampered by: SlideRuleEarth/sliderule-python#121

Software paper: Quality of writing

Overall very good! I have some minor suggestions that I think are best left here in this format.

I would like to see the version numbers stated for sliderule, sliderule-python and sliderule-docs -- this will help readers in the future understand what version was reviewed and help me as a reviewer separate out current development work from the latest release. Notably, since version numbers for all three seemed to be synced, I've been mostly looking at v1.5.8 and subsequent development.

I found the transition to the "ICESat-2 packages" a bit abrupt and without the necessary context.

It might be worth noting this API is currently a synchronous service (the user connection stays open and results stream in) as opposed to an async service (the user submits a request and comes back later for data) such as ASF HyP3 -- synchronous is a better user experience as long as the time to results is "short". Likewise, are there associated performance goals with the service? And are async data requests expected in the future or excluded from this service?


Other things

I noticed sliderule is undergoing a transition from http://icesat2sliderule.org/ to https://slideruleearth.io. There are a few broken links around because of it.

It might be worth looking through all the hits for the old .org in the GitHub org:
https://github.com/search?q=org%3AICESat2-SlideRule%20sliderule.org&type=code

@jhkennedy
Copy link
Contributor Author

And tagging @dshean @jpswinski to ensure they see this.

@jpswinski
Copy link
Member

Thank you for the thorough review of our repos! I appreciate that you took the time to dig as deep as you did and provide the feedback that you did.

Documentation: Installation Instructions

I made the recommended changes to the README documentation in the sliderule repo at cd7780b, and 04760a8. There are also some changes related to handling endianness. When updating the readme per the feedback you provided, I realized that the configuration support for selecting endianness was not great, so I cleaned that up.

  • I added a note on running make help to the README
  • I removed the instructions for installing docker from the README.
  • Ubuntu 22.04 is a bit of a conundrum for us at the moment and we are currently not supporting it. Everything will build and run fine, but the performance takes a 50% to 100% hit when running large processing runs. We lost a week of time trying to figure it out (mostly because I incorrectly thought it had to do with an AWS SDK update), but have not had the time to revisit and get to the bottom of it.
  • I added a packages/README.md and plugins/README.md

Documentation: Example usage

  • Voila page layout - we fully agree. The initial scope of the demo was to have a quick way to run ipywidgets without needing to run a jupyter notebook. We considered several options for interaction, including Panel, Cesium, and Voila. As the scope has grown, we’ve concluded that voila is not the right solution, so we are in the process of designing (and then implementing) a more sophisticated web-based user interface that will really be a software tool in its own right.
  • Getting Started - as suggested, I moved examples up to be right after the getting started page.
  • Making Your First Request - I changed the netrc warning message as suggested, and now also only issue the warning when attempting to access a private cluster (so for the public sliderule cluster, the warning will not be issued). I changed the tutorial to specify verbose=False as suggested.
    • Regarding authentication - we are currently in the process of migrating our entire system to a new architecture. Our old architecture had no authentication anywhere. In our new architecture, we have a concept of private clusters and public clusters. We have a public cluster running all the time at https://sliderule.slideruleearth.io/, and by default everything uses this public cluster. When using a public cluster, there is no need to authenticate (though as developers we are the only ones that are members of the public cluster which does give us access to some control APIs when we do authenticate). For private clusters, authentication allows access to the cluster. They are reached at https://{org}.slideruleearth.io, where {org} is the name of the organization associated with the cluster. Very little of this is reflected in our documentation, and we are only just now beginning to support our first groups interested in private clusters.
  • Developer Install Required - I was thinking using conda env create -f environment.yml was the safest option for new users because it creates a new environment and installs everything needed for any examples after the tutorial.
    README.md - I changed verbose to False as suggested

Documentation: Community Guidelines

  • Contributor Guidelines - I created issue Add contributor related files to repo sliderule#170 to add a CONTRIBUTING file to our repos. One question I have is if you have any recommendations on how to handle the fact that we have multiple repos. Do you have a copy of the file in each repo? Do you point to the website in each file, or have the full text? I’ve found managing the multiple SlideRule repos on GitHub to be challenging (they definitely started out as a Repo-First system and are only slowly moving towards an Org-First approach). I’d be glad for any lessons learned you have on this.
  • ICESat2-SlideRule/sliderule-python#121 - see response in comments of issue

@dshean
Copy link
Member

dshean commented Jan 10, 2023

Thanks again @jhkennedy. Here are additional responses regarding the paper:

Overall, the authorship looks appropriate, but I am curious why Eric Lidwa is not included in the author list? It appears he had made substantial contributions even before this review was started in August (!) and has since become quite prolific on the project (e.g., ICESat2-SlideRule/sliderule-python#2 contributor for sliderule and ICESat2-SlideRule/sliderule-python#4 contributor for sliderule-python)

This paper was initially prepared to summarize the initial ~1.5 years of the SlideRule project, before Eric joined the team for the ongoing ICESat-2 mission extension project. But considering the time that has passed since initial submission and Eric’s subsequent contributions, we added him to the coauthor list. It is worth noting that coauthor @cugarteblair has also made significant contributions to the SlideRule provisioning system development (https://github.com/ICESat2-SlideRule/sliderule-ps-server/graphs/contributors and https://github.com/ICESat2-SlideRule/sliderule-ps-web/graphs/contributors), which was not covered in this paper.
We had several internal discussions about the effort required vs. impact of an academic paper, most valuable content to include, and author order. We have all made significant contributions to the project over the years and it is challenging to rank or capture these contributions in a snapshot author order for an atypical journal like JOSS, even with systems like CRediT. The reality is that academic publications are not a priority for the SlideRule development team, but they are essential for the SlideRule science/management team, and the latter led the JOSS paper effort.

Software paper: Quality of writing: I would like to see the version numbers stated for sliderule, sliderule-python and sliderule-docs -- this will help readers in the future understand what version was reviewed and help me as a reviewer separate out current development work from the latest release. Notably, since version numbers for all three seemed to be synced, I've been mostly looking at v1.5.8 and subsequent development.

Good suggestion to document version number for paper and reviews. We added version number to the Sliderule project section:
“As of version 1.5.8, the SlideRule project includes the following repositories:”

I found the transition to the "ICESat-2 packages" a bit abrupt and without the necessary context.

Agreed. This was included as part of the “state of the field” section. We modified the section head to “Currently available ICESat-2 processing packages” and added some introductory sentences “Several existing projects offer software and/or APIs to process, analyze and visualize ICESat-2 data products. We briefly summarize these efforts to provide context and justification for the ICESat-2 SlideRule project.”

It might be worth noting this API is currently a synchronous service (the user connection stays open and results stream in) as opposed to an async service (the user submits a request and comes back later for data) such as ASF HyP3 -- synchronous is a better user experience as long as the time to results is "short".

Good point! We added the following text to the “Sliderule server framework” section:

“The SlideRule service was designed for synchronous processing - the client connection remains open after a request is submitted, and results are streamed back to the engaged user in near-real-time. This model is preferable over asynchronous processing, where requests are queued and users are notified to retrieve results at a later time.”

Likewise, are there associated performance goals with the service? And are async data requests expected in the future or excluded from this service?

We don’t currently have specific performance goals, and the actual time delay depends on the size of the request, number of ATL03 granules and the configuration of the processing cluster. In general, we aim to deliver results as fast as possible, so the user stays engaged and any delay does not interrupt the interactive testing and development process.

@jhkennedy
Copy link
Contributor Author

Overal, I'm happy with this feedback! I'm going to close this issue and continue some of the related discussions in the appropriate issues ( SlideRuleEarth/sliderule-python#121, SlideRuleEarth/sliderule#170 )


Ubuntu 22.04 is a bit of a conundrum for us at the moment and we are currently not supporting it. Everything will build and run fine, but the performance takes a 50% to 100% hit when running large processing runs. We lost a week of time trying to figure it out (mostly because I incorrectly thought it had to do with an AWS SDK update), but have not had the time to revisit and get to the bottom of it.

I feel that! We've had lots of similar "fun" when upgrading OSs and critical software dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants