diff --git a/Gemfile b/Gemfile index e77a8bb..93eaed7 100644 --- a/Gemfile +++ b/Gemfile @@ -1,11 +1,11 @@ source "https://rubygems.org" -gem "github-pages", '228', group: :jekyll_plugins +gem "github-pages", '232', group: :jekyll_plugins # enable tzinfo-data for local build # gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw] gem 'jekyll-paginate', '1.1.0' -gem 'faraday', '2.7.4' -gem 'faraday-retry', '2.0.0' +gem 'faraday', '2.10.1' +gem 'faraday-retry', '2.2.1' gem 'webrick', '1.8.1' diff --git a/_data/menus/program.yml b/_data/menus/program.yml index a3b84f2..ee8dc97 100644 --- a/_data/menus/program.yml +++ b/_data/menus/program.yml @@ -10,6 +10,10 @@ link: program/papers/ - name: Posters link: program/posters/ + - name: Rapid Access Microtalks + link: program/ram/ + - name: Sessions + link: program/sessions/ - name: Talks link: program/talks/ - name: Tutorials diff --git a/_data/navigation.yml b/_data/navigation.yml index bf8ea70..770c2f2 100644 --- a/_data/navigation.yml +++ b/_data/navigation.yml @@ -15,6 +15,10 @@ link: program/papers/ - name: Posters link: program/posters/ + - name: Rapid Access Microtalks + link: program/ram/ + - name: Sessions + link: program/sessions/ - name: Talks link: program/talks/ - name: Tutorials diff --git a/pages/program/abstracts/bofs.md b/pages/program/abstracts/bofs.md index 286b66e..1280111 100644 --- a/pages/program/abstracts/bofs.md +++ b/pages/program/abstracts/bofs.md @@ -8,6 +8,7 @@ menubar_toc: true set_last_modified: true --- + ## Teaching Research Software Engineering _Julia Damerow, Jeffrey C. Carver, Jason Yalim_ @@ -52,6 +53,7 @@ engineering? --- + ## Mapping Open Source Science _Jonathan Starr_ @@ -68,6 +70,7 @@ ecosystem. --- + ## Exploring the Potential Impact of Advancements in Artificial Intelligence on the RSE Profession _David Luet_ @@ -97,6 +100,7 @@ trained on your open-source code published under a GPL license? Before the confe --- + ## Brainstorming Strategies for Cultivating Successful and Collaborative RSE Teams _Abbey Roelofs and Kristina Riemer_ @@ -126,6 +130,7 @@ cultivating successful RSE teams. --- + ## Navigating the Remote Landscape: Working Effectively with Stakeholders _Troy Comi_ @@ -165,6 +170,7 @@ their stakeholders are local or globally distributed. --- + ## Better Scientific Software Fellowship Community _Elsa Gonsiorowski, Erik Palmer and Mary Ann Leung_ @@ -189,6 +195,7 @@ serve to amplify the connections between our communities. --- + ## Sharing lessons learned on the challenges of fielding research software proof-of-concepts / prototypes in Department of Defense (DoD) and other Government environments _Daniel Strassler_ @@ -204,6 +211,7 @@ environments and how to mitigate them. --- + ## RSEs in domain-specific ecosystems _Julia Damerow, Rebecca Sutton Koeser, Laure Thompson and Jeri E. Wierenga_ diff --git a/pages/program/abstracts/tutorials.md b/pages/program/abstracts/tutorials.md index 8656ed2..506750d 100644 --- a/pages/program/abstracts/tutorials.md +++ b/pages/program/abstracts/tutorials.md @@ -11,6 +11,7 @@ set_last_modified: true Tutorials will be online only and conducted virtually in the weeks prior to the conference. + ## Globus Compute: Managed Compute Across the Computing Continuum _Kyle Chard and Yadu Babuji_ @@ -30,6 +31,7 @@ where allocations are available, etc. ------ + ## Rapid Prototyping for a Usable React-based Web Application with STRUDEL _Rajshree Deshmukh, Cody O'Donnell, Lavanya Ramakrishnan_ @@ -46,6 +48,7 @@ scientific projects and target a broad range of people involved in web applicati ------ + ## Leveraging Django Views and Permissions at Object-Level: The gist of an envisioned solution for managing agricultural datasets _Diego Menéndez and Danying Shao_ @@ -65,6 +68,7 @@ as well as a conceptual class diagram and other artifacts. ------ + ## Research Data Automation with Globus Flows and Globus Compute _Lee Liming and Steve Turoscy_ @@ -78,6 +82,7 @@ construct data processing pipelines that are reliably managed and executed by Gl ------ + ## GitHub Actions for Scientific Data Workflows _Valentina Staneva_ @@ -93,6 +98,7 @@ integrate Github Actions in their own work. ------ + ## Overview of GIS Open Source Software Ecosystem and Theory _Dennis Milechin_ diff --git a/pages/program/abstracts/tutorials.tbd b/pages/program/abstracts/tutorials.tbd deleted file mode 100644 index 5441237..0000000 --- a/pages/program/abstracts/tutorials.tbd +++ /dev/null @@ -1,214 +0,0 @@ ---- -layout: page -title: Tutorials -description: -menubar: program -permalink: program/tutorials/ -menubar_toc: true -set_last_modified: true ---- - -Tutorials will be online only and conducted virtually in the weeks prior to the conference, Monday, October 2nd to Friday, October 13th. - -## GitHub Actions for Scientific Data Workflows - -_Valentina Staneva, eScience Institute, University of Washington_ - -[Register Now: October 2, 12-1:30 PM CT](https://mit.zoom.us/meeting/register/tJwrceGprTovG9JwpE7LrHOcPy4UvTvwpD0B) - -In this tutorial we will introduce GitHub Actions as a tool for lightweight -automation of scientific data workflows. GitHub Actions have become a key -tool of the software development lifecycle, however, many scientific -programmers who are not involved in software deployment may not be familiar -with their functionalities and/or do not know how they can be applied within -their data pipeline. Through a sequence of examples, we will demonstrate some -of GitHub Actions' applications to automating data processing tasks, such as -scheduled deployment of algorithms to streaming data, updating visualizations -based on new data, model versioning and performance benchmarking. For the -demonstration we will access a public hydrophone stream and compute and -visualize statistics of sound patterns. The goal is that participants will -leave with their own ideas on how to integrate GitHub Actions in their own work. - -**Prerequisites**: GitHub account, basic familiarity with git, GitHub, and -version control, programming in a scripting language such as Python/R - -**Audience**: scientific programmers interested in automating components of -their workflows through existing tools for software continuous -integration/deployment. - - ------- - -## Introduction to Spatial Data Processing - -_Nick Santos, University of California, Merced_ - -[Register Now: October 5, 12-3:30 PM CT](https://mit.zoom.us/meeting/register/tJMrceusrj8rGdVlgnmbUd2GSJ63R_rg81Ys) - -This tutorial provides an introduction to processing spatial data. -The goal is that participants leave the workshop with an -understanding of and ability to use spatial data types, coordinate systems, -and basic data processing with spatial joins and zonal statistics. These -processing methods allow for a wide variety of data manipulation and -aggregation. Participants will also learn to visualize data and results -in basic maps. Though introductory, the tutorial is designed to teach a -skill that is important for many areas of research, but which may be new to -some RSEs. - -The tutorial will use Google Colab or Jupyter notebooks and be primarily -hands-on, with short lectures only to describe core concepts with graphics. -The workshop will use Python with the geopandas and rasterstats libraries but -will emphasize concepts and topics that can be applied in any language or -computing environment that supports spatial data. Participants will only need -access to a web browser. - -**Prerequisites**: Required background knowledge includes: - -1. Being comfortable with tabular data and the concepts of table joins (e.g. with SQL, data frames, etc) -2. The ability to read Python code. Participants will modify and run code snippets, but won’t need to write Python code without using an example as a base. - -**Audience**: research software engineers who are already familiar with tabular and/or image data but do not yet -have experience with the characteristics and requirements of spatial information. - ------- - -## Publish your software in conda-forge - -_Dave Clements and Valerio Maggio, Anaconda_ - -[Register Now: October 10, 12-3:30 PM CT](https://mit.zoom.us/meeting/register/tJwlf-2oqjspGNLhUV4b4kpZmdgTX_F1M4mt) - -[Conda](https://github.com/conda/conda/blob/main/README.md) is a **widely used** -(30M+ users) **multi-platform** (Linux, macOS, Windows, ...) and **language agnostic** -packaging and runtime environment management ecosystem. This workshop will be a -worked, hands-on tutorial demonstrating how to publish your open source software -packages in [conda-forge](https://conda-forge.org/). - -**conda-forge hosts over 20,000 packages and serves over 3 billion package downloads -per year.** It is the largest community managed conda channel in existence and it is an -excellent platform for making your software easy for others to install and integrate with -other open source tools. - -In this tutorial we will: - -- 0:20 - Introduce software packaging concepts and challenges -- 0:20 - Introduce the conda ecosystem -- 1:30 - Walk through how to prepare a sample software package for publishing in conda-forge from scratch - - 0:10 - introduce example package and it's dependencies - - 0:15 - adding tests - - 0:50 - defining your package recipe in a meta.yaml file. - - 0:15 - Building your package with conda-build -- 0:20 - How to submit your package to conda-forge and shepherd it through to publishing -0 0:30 - How to port packages that are already in PyPA/pip (Python) or CRAN (R) to conda using Grayskull - -At the end of the tutorial participants will have a basic understanding of software packaging, -how conda implements it, and how to prepare and publish your packages in the conda -ecosystem. - -**Prerequisites**: Participants will need either a Linux or macOS laptop, or a Windows laptop with WSL. Laptops -will need a web browser, shell access, a text editor program, and git and/or a GitHub client -already installed. Participants should have experience with the command line, a text editor, and -GitHub. No prior package creation knowledge is assumed. - -**Audience**: software engineers with some experience incorporating -software dependencies in their work. - - -_Dave Clements is an open source community manager at Anaconda, and has been involved in -training and teaching throughout his career. Most recently, he led training efforts at the Galaxy -Project for over ten years. Before that he had a similar role at the GMOD Project, was adjunct -faculty at the University of Oregon, taught courses while in graduate school, and developed and -presented training to programmers and end users at a fortune 500 company._ - ------- - -## Software Quality Practices for Reproducibility - -_Reed Milewicz and Miranda Mundt, Sandia National Laboratories_ - -[Register Now: October 12, 12-3:30 PM CT](https://mit.zoom.us/meeting/register/tJYvceqqrjgvHtDtkfO0Lv3_kentmeEK6Rxi) - -In this tutorial, participants will learn about evidence-based software -engineering strategies for addressing reproducibility across the software -lifecycle. The tutorial will center around three interrelated topics: - -1. Setting software quality priorities around reproducibility -2. Tailored software development practices that facilitate reproducibility -3. Software process improvement techniques for incrementally introducing those practices into teams' workflows - -Participants will receive instruction on managing software quality priorities -with regards to reproducibility, hear insights from real-world teams on -practices that facilitate reproducibility, and finally will learn how to take -concrete steps toward improving those practices within their respective projects -based on the Productivity and Sustainability Improvement Planning (PSIP) -toolkit which the presenters previously helped develop and pioneer for use with -teams in the Exascale Computing Project. - -The course content represents a living curriculum based on the organizers' -ongoing research with real-world teams into software quality practices for -reproducibility. Organizers will solicit feedback on how to improve upon or -add to the tutorial. The outcome of this session will be concrete steps that -teams can take to improve their development practices with respect to -reproducibility, and participants will learn some of the skills needed to -approach their teams and precipitate process improvement. - -**Audience**: research software engineers and other professionals responsible -for supporting, developing, and maintaining the development and use of -scientific and engineering software systems and workflows. This includes -students and researchers as well as the core production practitioners. - -The course is relevant both for people looking to learn more about best -practices in engineering reproducible software and for those hoping to promote -those best practices within their respective institutions. - ------- - -## Using Globus Platform Services in Research Software Applications - -_Lee Liming, Steve Turoscy, and Vas Vasiliadis, University of Chicago_ - -[Register Now: October 4, 12-1:30 PM CT](https://mit.zoom.us/meeting/register/tJMpdeivqDwvH92RbUu2BwekBp3dTMqf48Pp) - -Research applications increasingly need to leverage, and often orchestrate, -diverse systems: campus data storage and computing systems, national data and -computing centers, and data-generating instruments such as gene sequencers, -Cryo-EM microscopes, CT scanners, and sensor networks. Unless these interactions -are automated, this is time-consuming and wasteful of scarce human resources on -research teams. - -The Globus platform enables research applications developed by research teams -to leverage data and compute services across many tiers of service—from -personal computers and local storage to national supercomputing centers—with -minimal deployment and maintenance burden. Globus is operated by the -University of Chicago and is used by nearly all R1 universities, national labs, -and supercomputing centers in the United States, as well as many smaller institutions. - -In this tutorial, we’ll begin by introducing the Globus platform-as-a-service, -including how to register an application and how to access Globus APIs using -our Python SDK. We will present examples of how the various Globus services, -interfaces, and tools may be used to develop research applications. We will walk -participants through authentication and access control with Globus’s Auth and -Groups APIs; making data findable and accessible using Globus guest collections, -data transfer API, and indexed Search API; and automating research with -Globus’s Flows and Compute APIs. Participants will use Jupyter notebooks to -experiment with these capabilities, and they will also become familiar with -the Globus web application. - -This tutorial is hands-on, utilizing Jupyter notebooks on our cloud-hosted -JupyterHub system, accessible via web browser. - -The presenters lead projects—sponsored by NSF, NIH, and DOE—that are building -research applications leveraging the Globus platform. Employed by the -University of Chicago and Argonne National Laboratory, we have decades of -combined experience with both building research applications and teaching -collaborators how to build them. - -**Prerequisites**: Tutorial participants should have beginner to intermediate -familiarity with Python. - -**Audience**: research software engineers supporting teams whose work needs to be -scaled up (either to solve larger problems or to achieve a faster rate of -smaller problems) using university research computing resources, national-scale -systems (NSF, DOE, NASA), or cloud systems. - ------- diff --git a/pages/program/abstracts/workshops.md b/pages/program/abstracts/workshops.md index 7853701..f6ac56e 100644 --- a/pages/program/abstracts/workshops.md +++ b/pages/program/abstracts/workshops.md @@ -7,6 +7,7 @@ permalink: program/workshops/ set_last_modified: true --- + ## Research Data Automation with Globus Flows and Globus Compute _[Lee Liming](http://www.uchicago.edu/) and [Steve Turoscy](https://www.globus.org)_ @@ -20,6 +21,7 @@ enable RSEs to construct data processing pipelines, managed and executed by Glob --- + ## Establishing RSE Programs - From early stage formalization to mature models _[Ian Cosden](https://researchcomputing.princeton.edu/services/research-software-engineering), [Sandra Gesing](https://www.sdsc.edu/) and [Adam Rubens](https://rsenyc.org/)_ @@ -44,6 +46,7 @@ and the evolution towards robust group models. --- + ## Emerging as a Team Leader through Cultural Challenges _[Elaine M. Raybourn](https://www.sandia.gov/-emraybo/), Angela Herring and Ryan Shaw_ @@ -68,6 +71,7 @@ skill building in intercultural and interpersonal communication, and inclusion. --- + ## Special Workshop: Community discussion: teachingRSE project _Jan Philipp Thiele_ diff --git a/pages/program/program.md b/pages/program/program.md index acc2c3d..db2e2b4 100644 --- a/pages/program/program.md +++ b/pages/program/program.md @@ -7,295 +7,1126 @@ permalink: program/ set_last_modified: true --- -## Accepted Submissions - -- [Birds of a Feather]({{ site.baseurl }}/program/bofs/) -- [Notebooks]({{ site.baseurl }}/program/notebooks/) -- [Papers]({{ site.baseurl }}/program/papers/) -- [Talks]({{ site.baseurl }}/program/talks/) -- [Tutorials]({{ site.baseurl }}/program/tutorials/) -- [Workshops]({{ site.baseurl }}/program/workshops/) - -## Program Timetable - - - All times listed in Mountain Time (MT)

-
+
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
Oct 1 to Oct 11th - Online
TBATBA
Tuesday, Oct 15th
8:00 AMBreakfast - / Registration
8:45 AMWelcome (Lauren Milechin, Miranda Mundt, Sandra Gesing, Ian Cosden)
9:00 AMKeynote: TBA
10:00 AMBreak
10:30 AMTechnical Session
12:00 PMLunch -
+ + + + + + + + + - - - - - - - - - - - - - - - + + + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + - - - + + + + + - - - + + + - - - + + + + + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + - - - + + + + + - - - + - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - diff --git a/pages/program/ram.md b/pages/program/ram.md new file mode 100644 index 0000000..0d5bfca --- /dev/null +++ b/pages/program/ram.md @@ -0,0 +1,28 @@ +--- +layout: page +title: Rapid Access Microtalks +description: +menubar: program +permalink: program/ram/ +menubar_toc: true +set_last_modified: true +--- + +The "Rapid Access Microtalks" (RAM) session is a dynamic and interactive opportunity for conference +attendees to share their ideas in a fast-paced format. This session encourages spontaneity +and creativity, offering participants a platform to present a 5-minute rapid-fire talk on +any topic of their choice. Whether you have a groundbreaking idea, a unique perspective, +or want to regale an abject project failure, this is your chance to shine! + +### How It Works + +- **Idea Submission:** From the start of the conference until Wednesday lunch, all attendees are invited to propose their microtalks. Simply head to Ballroom A and write down your idea on the designated poster board. Be sure to include the title of your talk, your name, and your email address. + +- **Voting:** After lunch on Wednesday, the voting phase begins. Attendees can vote for their favorite proposed topics using the provided stickers. Each attendee is given **three** (3) votes to distribute as they see fit. Voting will continue through the beginning of the conference dinner on Wednesday. + +- **Selection:** The votes will be tallied during the conference dinner, and the selected presenters will be notified by the end of the evening. + +- **Microtalk Presentations:** The chosen speakers will present their 5-minute talks on Thursday morning from 10:30 AM to 12:00 PM in Ballroom A. This session promises to be an exciting and fast-paced showcase of diverse ideas and insights. + +Don’t miss this chance to contribute to the conference in a fun and engaging way! Submit your idea, +cast your votes, and join us for a morning of rapid access microtalks. diff --git a/pages/program/sessions.md b/pages/program/sessions.md new file mode 100644 index 0000000..f35bf7b --- /dev/null +++ b/pages/program/sessions.md @@ -0,0 +1,2254 @@ +--- +layout: page +title: Sessions +description: +menubar: program +permalink: program/sessions/ +menubar_toc: true +set_last_modified: true +--- + + +## Session 1A: RSE Pedagogy + +
+ +
+
+
+ +
+
+ +
+
+

As both hardware and software becomes more prevalent in research computing, the user base of + these systems has broadened considerably. Novice users from many backgrounds and at many stages + of their careers are looking to make effective use of these resources, while retaining focus on + their domain work.

+

Formal curricula for students of those domains may not have room for computational training. + Non-student researchers face challenges making time to acquire the relevant expertise. Moreover, + the parallelism of HPC systems adds complexity to an already demanding software development + task. Consequently, large subsets of researchers have access to HPC resources without the + technical skills to use them effectively.

+

These challenges are familiar to the Research Software Engineering community.

+

HPC Carpentry provides training solutions complementary to the research software engineering + role, supporting effective use of novel, shared computing resources.

+

HPC Carpentry workshops are modeled after those of The Carpentries1 and take place over one or + two days, providing a hands-on mode of instruction, where learners type along with instructors + to acquire the basic skills necessary to get started on HPC systems. Learners are not expected + to come away as experts, but instead with the "muscle memory" of how basic operations work on + HPC systems, with a mental model of the shared HPC system and its resources, and with enough + vocabulary to make self-directed training more accessible and effective.

+

This talk will describe the current state of the HPC Carpentry project, our strategic + development plan for the workshops, current challenges, and the lesson content that we develop, + teach, host, and cite.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

To tackle the problem of sustainably training and developing a workforce, SDSC has experimented + over the past decade with various strategies to shape a seasonal internship program that has met + and exceeded its original goal of research software developer workforce training. Using modern + agile frameworks, a novel summer training program, and minimal resources, SDSC has supported + over 200 interns over the past four years who have learned about and supported research software + development. Come hear the internship program founders, Ryan Nakashima and Jenny Nguyen, share + both unsuccessful and successful strategies used to build the SDSC software development + internship program and connect with them for follow-up discussions.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Student opportunities are important for diversifying RSE and getting students hooked on research + software engineering. Engaging with undergraduate and graduate students interested in scientific + computing is both rewarding and beneficial to the RSE community, given there is not yet a clear + academic curriculum nor career path to becoming a research software engineer. This talk will + cover the macro aspects of RSE internships through the lens of SIParCS, a successful, long + running internship program, and deep-dive into the micro aspects of working with student RSEs by + sharing experiences at DART, an open source project at the intersection of science and software. +

+

We’ll give a lightning overview of the 17 years of SIParCS, the Summer Internships in Parallel + Computational Science at the NSF National Center for Atmospheric Research, including background, + history and motivations. SIParCS provides opportunities for undergraduate and graduate students + to gain hands-on experience in computational science, particularly focusing on high-performance + computing, scientific computing, and data analysis. The program’s goal is to develop and + diversify the next generation of computational scientists and engineers by offering holistic + mentorship, professional development, and the chance to work on cutting-edge projects alongside + experienced researchers.

+

DART, the Data Assimilation Testbed, has been fortunate to have various interns though SIParCS + as well as part-time student RSE employees working year-round. We'll share our specific + experiences, challenges and triumphs, working with student RSEs. What worked, what didn’t, and + how summer internship RSEs differ from year-round part-time student RSEs. Everyone is different, + what motivates and incentivizes people varies from person to person, and can change over time. + People’s time has value, we want people spending that time on the most interesting and impactful + thing they can be working on. Working with students requires a balance between getting quality + work from them, and the students finding benefit in this work and progressing their career. + We'll conclude with thoughts on future student interactions and possible community + collaborations.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Research software engineers and research data curators face similar challenges in their efforts to + support truly reproducible science. This talk anticipates a future in which the research software + engineering and research data curation communities identify ways to align their respective efforts + in + promoting best practices. We present a project to develop specialized training for curating research + software as one such opportunity.

+

There is great interest on the part of both scientific communities and funding agencies to see that + science research is reproducible, and that the products of research—both data and code—are + “Findable, + Accessible, Interoperable, and Reproducible” (FAIR) now and into the future. To enhance + reproducibility + and FAIRness, funding agencies typically require that grant applicants file so-called Data + Management + Plans (DMPs); journals increasingly require that authors deposit code and data in a certified + repository + and link those artifacts to their publications; and community and institutional repositories work to + ensure quality of deposits by employing curators to examine, approve, and sometimes validate and + even + improve deposited code and data. The curation step is critical in maintaining viable research data + lifecycles, and requires that curation workflows be implemented and that curation staff be funded + and + trained.

+

The Data Curation Network (DCN) is a membership organization of institutional and non-profit data + repositories whose vision is to advance open research by making data more ethical, reusable, and + understandable. Its mission is to empower researchers to publish high quality data in an ethical and + FAIR way, collaboratively advance the art and science of data curation by creating, adopting, and + openly + sharing best practices, and supporting thoughtful, innovative, and inclusive data curation training + and + professional development opportunities.

+

Last year the DCN, with project leadership from Duke University, obtained funding from the Institute + of + Museum and Library Services (IMLS) to create course curricula to train new curators in addressing + curation of data types that require specialized knowledge and often warrant specific types of + treatment + and analysis (IMLS Award no. RE-252343-OLS-22). These specialized data types include geospatial + data, + scientific images, simulations and models, and, last but not least, code. The presenters are members + of + the cohort that developed specialized training for curating code. In our talk, we will give an + overview + of the topics covered in the pilot workshop, which include introductory topics such as dependency + management, licensing, and documentation, as well as more advanced topics such as containerization + and + nondeterminism. We will also describe how these topics apply to research software developers wishing + to + create more reproducible and sustainable code.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In comparison to traditional software engineers, research software engineers (RSEs) often come to + software engineering from scientific domains and may lack formal training. As the field continues to + develop, direct educational pathways and formal training are likely to expand. Questions about best + practices for training students and early career RSEs must be answered to ensure new RSEs are able + to + contribute high quality code. How should we train students with minimal experience to work on + real-world + projects? How do we bridge the gap between classroom learning and the expectations of writing + reproducible code? What assumptions can be made about what students can (and should) learn + themselves, + and what do they need to be explicitly taught? The University of Chicago’s Data Science Institute + has + been able to wrestle with these questions over the past three years by engaging with students via + its + experiential, project-based, Data Science Clinic course.

+

The Data Science Clinic is a useful setting for asking these questions and testing related + hypotheses. + The clinic works with 3 cohorts of students each year and typically has more than 50 students per + cohort. This provides a great environment for iterating on best practices. Preparing students who + are + interested in research software engineering careers is similar to training early-career RSEs who are + coming from backgrounds with limited computer science education. Students in the clinic come from + diverse backgrounds, with both master’s and undergraduate students well-represented, but most have + one + university computer science course. This level of formal computer science background is similar to + many + RSEs coming from non-computer science backgrounds. Additionally, code reproducibility and code + quality + are often lower priorities on both student projects and RSE projects.

+

The Data Science Clinic has led to the important conclusion that relying on assumptions about + student + background knowledge leads to negative outcomes. When using experiential learning or project-based + classes, it's easy to have a biased view of student understanding since only the most confident and + engaged students are likely to volunteer to participate. These advanced students can shoulder most + of + the load of a project and make quite a bit of progress, while allowing the students who would gain + the + most from direct instruction to coast undetected. In reality, many students lack robust mental + models of + computer operation, familiarity with essential terms and concepts, and an appreciation for software + engineering best practices. Overcoming these challenges requires significant investment from + experienced + mentors.

+

The purpose of this talk is to share these lessons and conclusions, discuss why these problems are + so + difficult, and to consider next steps.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Building on discussions first started at the German RSE conference in 2023 (de-RSE23), a recent + pre-print, Foundational Competencies and Responsibilities of a Research Software Engineer, + identifies a set of core competencies required for RSEng and describes possible pathways of + development and specialisation for RSEs. It is the first output of a group with broad interests + in the teaching and learning of RSEng skills.

+

With continuing growth in RSE communities around the world, and sustained global demand for + RSEng skills, US-RSE24 presents an opportunity to align international efforts towards

+

* training the next generation of RSEs + * providing high-quality professional development opportunities to those already following the + career path + * empowering RSE Leaders to further advocate for the Research Software Engineering needs and + expertise in their teams, institutions, and communities.

+

Therefore, we want to give an overview of what the group has been working on so far, discuss the + aims of our future work, and invite members of the international RSE community to contribute and + provide feedback. We particularly encourage members of regional groups focused on RSEng training + and skills to attend and share their perspectives. +

+
+
+
+ +
+ +--- + + +## Session 1B: AI, ML, and Automation + +
+
+
+
+ +
+
+ +
+
+

In the field of research software engineering, Large Language Models (LLMs) have emerged as + powerful tools for enhancing coding practices. This presentation, "Leveraging LLMs for Effective + Coding," delves into the practical applications of LLMs in automating and improving various + aspects of the development process. By providing some empirical evidence from our firsthand + experience using LLMs for software development, we explore how LLMs can significantly + augment a developer's toolkit, making what are often time-consuming tasks more efficient and + Reliable.

+

Automated test generation, for instance, not only speeds up the testing process but also + ensures a more comprehensive coverage, leading to robust software products. Similarly, + leveraging LLMs for code review can preemptively identify potential issues, optimizing code + quality before it reaches human reviewers. Furthermore, the ability of LLMs to generate and + update documentation in tandem with code changes addresses one of the most common + challenges in software development, maintaining accurate and helpful documentation. + Beyond these key areas, the presentation also touches upon additional use cases where LLMs + can make a significant impact, including debugging assistance, code implementation, and code + refactoring, among others. We discuss effective ways for integrating LLMs into the development + workflow, emphasizing the importance of clear communication, context provision, and iterative + refinement. Ethical considerations, particularly in addressing potential biases and ensuring + responsible use, are also explored.

+

Join us as we navigate the practicalities of incorporating LLMs into coding practices, aiming to + inspire developers to harness these AI tools for more efficient, high-quality software + development. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In the rapidly evolving field of Artificial Intelligence and Machine Learning (AI/ML), the journey + from innovative research to scalable, robust deployment is fraught with challenges. This + presentation delves into the critical lessons learned from our experiences in navigating this + complex transition, offering insights that are vital for researchers and practitioners alike. + The initial phase of any AI/ML project is marked by excitement and potential. However, we + quickly learned the importance of grounding this enthusiasm with practical considerations, + particularly the early and thorough definition of metrics and benchmarks. This foundational step, + often overlooked, became the cornerstone of our project's success, enabling us to evaluate research + solutions effectively and pivot our strategies as needed.

+

Another significant hurdle we encountered was the transition from the exploratory and often + chaotic environment of Jupyter notebooks to the structured realm of development-ready code. + The ability of our research engineers to write modular and reproducible Python code was + instrumental in bridging the gap between research findings and development, highlighting the + necessity of coding best practices in the research phase.

+

Deployment presented its own set of challenges, notably the infamous "It works on my + computer" syndrome. Our solution was a strategic embrace of Docker images, which not only + streamlined our deployment process but also ensured consistency and reliability across different + environments. This approach, coupled with a focus on cache optimization, significantly reduced + deployment headaches.

+

Perhaps the most profound lesson was the value of rapid prototyping. Moving swiftly from + concept to an end-to-end solution, even if imperfect, provided multiple benefits. It accelerated + our learning about the problem space, facilitated iterations based on real-world feedback, and + improved stakeholder engagement by providing a tangible product for demonstration. This + approach also forced us to make critical decisions about technology investments and workflow + design, laying a foundation for continuous improvement.

+

This talk aims to share these insights and more, exploring strategies that can help bridge the + often daunting gap between AI/ML research and impactful deployment. Join us to learn how to + navigate this transition effectively, ensuring that your projects are not only innovative but also + ready for the real world. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The demand for efficient and innovative tools in research environments is ever-increasing in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML). This paper explores the implementation of retrieval-augmented generation (RAG) to enhance the contextual accuracy and applicability of large language models (LLMs) to meet the diverse needs of researchers. By integrating RAG, we address various tasks such as synthesizing extensive questionnaire data, efficiently searching through document collections, and extracting detailed information from multiple sources. Our implementation leverages open-source libraries, a centralized repository of pre-trained models, and high-performance computing resources to provide researchers with robust, private, and scalable solutions.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In this work we aim to partially answer the question, “Just how many research software projects are out there?” by searching for open source GitHub projects affiliated with research universities in the United States. We explore this through keyword searches on GitHub itself and by scraping university websites for links to GitHub repositories. We then filter these results by using a large language model to classify GitHub repositories as research software engineering projects or not, finding over 35,000 RSE repositories. We report our results by university. We then analyze these repositories against metrics of popularity, such as stars and repository forks, and find just under 14,000 RSE repositories meet our minimum criteria for projects which have a community. Based on the time since a developer last pushed a change to a RSE repository with a community, we further posit that 3,300 RSE repositories with communities and a link to a research university are at risk of dying, and thus may benefit from sustainability support. Finally, across all RSE projects linked to a research university, we empirically find the top repository languages are Python, C++, and Jupyter Notebook.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

AutoRA – the Automated Research Assistant – is a growing collection of python packages for + running fully automated psychological experiments online. It allows the user to automate the + specification of experimental conditions, data collection, and theory derivation, cycling back + to specifying new experimental conditions.

+

One primary goal of the PI was to allow unaffiliated developers to contribute new methods for + generating experimental conditions, and new regression techniques for theory derivation. But + taking the naïve monorepo approach would be too costly: either 1) testing all of the + contributions for every change to the code – which would be too costly as some of the + contributions train neural networks as part of their execution and require hours to run; or 2) + would require configuring the CI so that only relevant parts of the code were tested for each + pull request – which would mean a high maintenance burden. Furthermore, since this work is about + applying ML to experiments run on people, it’s vital that every submission be ethically vetted + before it can be part of the official release.

+

Thus, one primary goal of our work was to allow for decentralized extensibility, so that + contributors unaffiliated with the core team could easily innovate and share new functionality + without leading to a high centralized maintenance and testing costs. Another was to ensure that + contributions could be vetted and included easily.

+

We’ll present how RSEs helped to establish a common interface based around a simple functional + paradigm, with namespace packages spread across multiple repositories so that each contributor + could be owner of and responsible for their own work, and how their contributions are integrated + into the main package. We’ll also look at how we enable contributors to start their work quickly + using templates.

+
+
+
+ +
+ +--- + + +## Session 1C: Insights on Research Software Practices and Principles + +
+ +
+
+
+ +
+
+ +
+
+

At the National Center for Supercomputing Applications (NCSA), our team of research software UIX + (User + Interface and User Experience) designers is dedicated to enhancing academic research applications + through innovative design thinking and user-centric methodologies. Since our expansion in 2021, we + have + successfully collaborated on over 30 applications across diverse scientific domains, underscoring + the + growing demand for design as part of the research software development process.

+

Our presentation will delve into the principles of design and design thinking, highlighting the + distinction between UI and UX design (and why both are important), and describe our role as user + advocates. We will outline our comprehensive design process, which includes discovery, ideation, and + implementation phases, and the highly iterative and user-engaged form this process takes. We will + give + an overview of design workflow tasks such as user research, wireframing, rapid prototyping, + high-fidelity design production and usability testing, all facilitated by tools like Figma for + collaboration with stakeholders, streamlined handoff and communication with developers.

+

We will also showcase an example project to illustrate our project lifecycle, from initial + requirements + gathering to design audits and continuous process improvement. Our collaborative, cross-functional + teams, which integrate designers, developers, and research scientists, are pivotal in producing + high-quality, sustainable software. By prioritizing user experience, we ensure that our applications + not + only meet the technical needs of scientists and researchers but also provide delightful and + efficient + user interactions, fostering faster onboarding and greater adoption. Join us to explore how + thoughtful + design and interdisciplinary collaboration can lead to more effective and impactful research + software + solutions.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

This talk will look at some containers that are actively used in research computing. It will try and + examine how easy they are to dissect and understand as software engineering artifacts. The talk will + aim + to provoke everyone to think about what good practices and guidance the RSE community might put + forward + around the use and role of containers.

+

In many ways containers provide an elegant solution to ensuring reproducibility and portability of + codes. Each layer in a container has a unique hash that ensures the full stack of a container is + defined + unambiguously. Containers can carry with them a deep set of software dependencies that help simplify + the + challenge of making a portable code. Container repositories and publication services make containers + findable and easily cloned for shared use. These are all unambiguously valuable features. However, + it is + not uncommon to come across containers in active use that contain incredibly expansive layers, so + that + they in effect encompass entire operating system distributions. Often in such containers it is not + clear + what is key to an application and what is more of an expedient, included to enhance short-term + productivity.

+

In this talk I will dissect a few large containers and examine what structure is or isn't present + and + how their formulation sits with regard to traditional software engineering practices. In particular + the + talk will look through a lens that channels Edsger Dijkstra's motivation for promoting structured + programming. Djikstra argued persuasively that programs should not only be functional but they + should + also strive to be comprehensible and digestible to a "a slow-witted human being with a very small + head". + It is interesting to look at some containerized software through this lens. For the RSE community + especially some very effective and expedient practices in container publication and software + distribution can appear to be in tension with other software engineering design ideas around + modularity + and composability.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Since the term research software engineer (RSE) was coined over a decade ago, the field has enjoyed + rapid + growth with the establishment of RSEs groups at labs and universities, professional societies, and + conferences and workshops. Today, RSEs worldwide make impactful contributions to science and + engineering + through excellence in software, but we believe the best is yet to come. RSEs represent an emerging + profession, one that continues to develop its identity, values, and practices (Sims 2022). There is + a + growing body of literature around who RSEs are and the future of the field, with many works written + by + RSEs themselves. Concurrently, it is also important to consider how RSEs relate to other professions + within research organizations. RSEs regularly interact with staff from a diverse range of + backgrounds, + including domain researchers and engineers, computing facility and IT professionals, data + scientists, + technical librarians, and managers and HR specialists. When we examine this organizational context, + we + are led to ask many important questions. How non-RSE allies can best support RSEs? How can we create + a + supportive ecosystem in which RSEs will thrive? How do we integrate RSEng with allied professions to + achieve mutual success? In this talk, we consider the case of RSEs and software engineering + researchers + (SERs). Both SE academics and practitioners have a common interest in improving the quality of + software + and its production (Stol and Fitzgerald 2018). While CSE software development has historically + received + little attention from mainstream software engineering, RSEs have been successful in building bridges + between the two worlds. We believe the SE research community should work more closely with RSEs and + serve their needs. Based on our experiences, which include three of the authors participating in a + recent Dagstuhl workshop on this topic, we discuss (1) SE-related needs that RSEs report having, (2) + what SE researchers can do to address those needs, and (3) how to foster productive relationships + between RSEs and SERs.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Research software is critical for scientific advancement and, like all software, is susceptible to + being targeted by malicious actors and misuse alike, meaning that security is an important + quality of research software. Implementing and evaluating security is a complex and + ever-evolving process. However, poor research software security could result in the sabotage of + data, hardware, or research findings. Proper security implementation requires security + knowledge and expertise that many research software stakeholders do not have, resulting in + more burden placed on the limited bandwidth of security resources and personnel. Therefore, it + is important to identify methods of improving methods of research software security without + increasing demand for limited security resources.

+

To improve the security of research software, we propose introducing security concepts, such as + threat modeling, to RSEs so they can be involved in ongoing security efforts. At its root, threat + modeling is the process of creating a model of a system or piece of software that is used to + theorize both potential attacks and countermeasures to prevent them. Threat modeling is a + low-cost, effective way to supplement security efforts, improve security posture, and create + cleaner software architecture. While difficult to automate, threat modeling has a host of tools + available to make it easier to perform with less required security expertise compared to other + security activities. RSEs are prime candidates for threat modeling because of their expertise in + both the research domain and in software engineering.

+

To establish a baseline for how RSEs view security, we replicated a security culture survey + originally focused on open-source software. This survey contains questions along six + dimensions: Attitude, Behavior, Competency, Governance, Subjective Norms, and + Communication. In aggregate, these six dimensions describe the security culture of the RSE + community. In addition to measuring the current security culture, we exposed participants to + three vignettes depicting security events. In the summary of these vignettes, we explained how + threat modeling could have been used to prevent or diagnose malicious or accidental damages + before they occurred.

+

We recruited 96 US and German RSEs for the survey. Our initial results show a generally + positive security culture in the RSE community. Respondents perceived all cultural dimensions + positively, except for Governance, which represents security expertise, policies, and + implementation. The respondents also responded positively to threat modeling. They saw the + value of threat modeling and thought it would fit nicely into their existing development + processes. Respondents also indicated they would need additional training to effectively threat + model and were interested in receiving this training.

+

We are using the data from the survey and vignettes to create resources that educate RSEs on + threat modeling practices that can be incorporated into their existing development processes. + We will use this talk to 1) present our findings to the US-RSE community, 2) gather feedback on + the security resources we are developing for RSEs, and 3) promote dialogue about involving + RSEs in security efforts.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Johns Hopkins Applied Physics Laboratory (APL) is the U.S.’s largest university-affiliated + research center, home to over 9000 staff dedicated to making “Critical Contributions to Critical + Challenges” for our various federal agencies. APL’s Space Exploration Sector alone has designed, + built, and operated over 70 spacecraft missions; developed hundreds of specialized instruments + for yet more missions, and collectively has visited every planet in the solar system. Within the + sector’s Space Science Research branch resides one of the largest RSE organizations we are + currently aware of: our very own Space Analysis & Applications group – a team of 60+ research + software engineers that directly support our missions and the scientific research enabled by + them. Our talk will explore the history, functions, and operation of this group as a means to + examine a mature RSE organization and to share our insights and experience with those US RSE + colleagues developing and managing their own. Individual topics covered will include + organizational structure, team composition, funding sources, work discovery, intake, and some + brief visuals or demonstrations of the group’s software products and the research we have + enabled.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Historically, US research software has predominantly been utilized within the country by domestic + researchers. However, recent years have seen a surge in international collaboration, with + research software playing a pivotal role. International users can represent a substantial user + base for some research software, and foreign engineers have huge potential to contribute + significantly to the US software community. As a result, it is crucial for RSEs and researchers + to recognize the importance of software internationalization and localization, and to acquire + the methodologies necessary for their effective implementation. This talk will offer guidance on + designing, developing, and testing internationalized research software, ensuring that it meets + the needs of a global audience in the future.

+

This talk will be structured from broad concepts to specific skills (i.e., Internationalization + → Localization → Translation) to present software design principles that can prepare for + research software a global impact in the future.

+

The first section (~3 mins) will cover globalization/internationalization, focusing on the + product design perspective. This part will cover the concept of internationalization, some + regulations for product owners to keep in mind, product design principles, and potential costs, + with examples for demonstration. The audience will learn the importance of including + internationalization considerations at the proposal drafting stage, rather than leaving it as a + task during the development stage.

+

The second section (~4 mins) will transition to localization. The presenter will discuss the + meaning of localization and how the lack of localization can hinder the global promotion of US + research software. Examples will illustrate the steps of designing, developing, and testing + software localization. At the end of this section, the presenter will provide a checklist for + RSEs and researchers to refer to in future localization processes.

+

The third section (~4 mins) will focus on translation, a crucial component of localization. The + presenter will introduce software design and development principles for adding translation + capabilities, followed by a discussion of common translation tools that RSEs can use for popular + frontend and backend frameworks. The section will conclude with a focus on utilizing AI tools to + enhance translation quality.

+

Overall, this talk aims to inspire the research software community to rethink software from an + international perspective and empower them with the knowledge to promote U.S. research worldwide + in the future. +

+
+
+
+ +
+ +--- + + +## Session 2A: Software Sustainability and Legacy Code + +
+ +
+
+
+ +
+
+ +
+
+

Reading computer program code and documentation written by others is, we are told, one of the + best ways to learn the art of writing readable, intelligible and maintainable code and + documentation. This talk introduces the concept of software resurrection as a tool for learning + from program code and documentation that are remote in time (e.g. 20 years old) and space (e.g. + unfamiliar algorithms and tools). The software resurrection exercise requires a motivated + learner to compile and test a historical software release version of a well maintained and + widely adopted open source software on a modern hardware and software platform. The learner + develops fixes for the issues encountered during compilation and testing of the software on a + modern platform that could not have been foreseen at the time of its release. The exercise + concludes by writing a critique which provides an opportunity to critically reflect on the + experience of maintaining the historical software. An illustrative example of this exercise + pursued on a version of the SQLite database engine released 20 years ago shows that software + engineering principles (or, programming pearls) emerge during the reflective learning cycle of + the software resurrection exercise.

+

The concept of software resurrection is similar to the "Learning by doing" methodology which is + based on the experiential learning theory. Engaging with program code and documentation that are + remote in time or space helps learners actively explore the experience of software maintenance. + These experiences reveal the factors that contribute to readability, intelligibility and + maintainability of program code and documentation.

+

Prerequisites + This talk is aimed at students, researchers and professionals who develop, support and maintain + computer software. The talk includes an illustrative example based on a software written in the + C programming language and therefore a basic understanding of the C programming language will be + useful. Since, the concept of software resurrection applies in general to the field of software + engineering, the attendees will still be able to understand the key ideas even if they do not + have a background in the C programming language.

+

Expected Outcomes + The attendees will learn about a novel method for teaching and learning software engineering + principles by engaging with existing software code and documentation. The concepts described in + this talk will allow the attendees to view the impact of existing software development and + documentation practices from the perspective of a software maintainer.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

At Sandia National Laboratories, computational modeling and simulation is ubiquitous + across the labs’ diverse missions. Computational models—that is, digital + representations of physical systems and/or phenomena and their behaviors— are + regularly developed and provide empirical justification to critical mission decisions; this + spans workflows, scripts, and notebooks that drive simulations as well as the complex + software stacks underneath them. As the number and variety of models continues to + grow, however, our limited ability to maintain and govern them becomes a bottleneck to + further productivity improvements. They are created in a highly manual process, may be + created in duplicate, lost because of personnel changes, or deteriorate over time due + ever-changing computing environments.

+

Researchers and engineering analysts often lack the time, resources, and/or skills to + build sustainable models and to make them discoverable. RSEs and allied professionals + can play an important role in encouraging the adoption of better practices, but to affect + enduring change, we must go even further: to realize a culture of sharing, collaboration, + and reusability around modeling, we need software and organizational infrastructures + that can support that culture.

+

For these reasons, we are building the Engineering Common Model Framework + (ECMF), a platform for computational model sustainment at Sandia. ECMF will enable + the automated evaluation of models over time and ensure that models created at + Sandia are discoverable and ready to be revisited, extended, and reused. We have + demonstrated the capability of virtually air-gapped automated execution in a + containerized environment and have a prototype, user-friendly frontend where users + can submit models, schedule model executions, and monitor model status. In our talk, + we will discuss our current and planned capabilities, review our lessons learned, and + discuss the role of RSEs in the present and future of software and data stewardship. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Fortran still occupies a significant fraction of the workloads at scientific computing centers, and + many + projects are still under active development by research software engineers (RSEs). In this talk I + will + describe how the National Energy Research Scientific Computing Center (NERSC) provides a holistic + support structure for our users, and especially RSEs, that take advantage of Fortran.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In the realm of scientific software development, adherence to best practices is often advocated. However, implementing these can be challenging due to differing opinions. Certain aspects, such as software licenses and naming conventions, are typically left to the discretion of the development team. Our team has established a set of preferred practices, informed by, but not limited to, widely accepted best practices. These preferred practices are derived from our understanding of the specific contexts and user needs we cater to. To facilitate the dissemination of these practices among our team and foster standardization with collaborating domain scientists, we have created a project template for Python projects. This template serves as a platform for discussing the implementation of various decisions. This paper will succinctly delineate the components that constitute an effective project template and elucidate the advantages of consolidating preferred practices in such a manner.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The sustainability of scientific software is crucial for advancing research. In the complex world of + scientific software development, it is essential to understand the diverse factors that influence + sustainability. From the health of the software community to the robustness of engineering + practices, each element plays a pivotal role in the long-term viability of a project. This talk, + presented by the Center for Open-Source Research Software Stewardship and Advancement + (CORSA), focuses on the diverse definitions of sustainability within the scientific software + community, its attributes, and the metrics used to measure and enhance it.

+

The Center for Open-Source Research Software Stewardship and Advancement (CORSA), a new community of + practice, aims to address the long-term sustainability of scientific and + research software by fostering collaboration among stakeholders, facilitating partnerships with + open-source foundations, and educating the community regarding approaches to the + stewardship and advancement of open-source software. CORSA is part of a larger initiative + funded by the U.S. Department of Energy (DOE) called the Next Generation Scientific Software + Technologies (NGSST) project, which includes stakeholders from a broad cross-section of the + scientific computing and research software community.

+

In this talk, we will provide a brief history of the NGSST project and the objectives of DOE to + create sustainability pathways for open-source scientific software. We will then discuss the key + issues that CORSA plans to address to facilitate scientific software's long-term stewardship. + These include the development of metrics and metric models that help projects assess and + understand their position in the landscape of sustainability efforts. The talk will draw on + information gathered from previous CORSA workshops and existing literature and research into + this topic, including the types of sustainability metrics identified as crucial by the community. In + particular, we will explore:

+

● Definitions of Sustainability: Understand the various ways the community defines + sustainability in the context of scientific software. + ● Attributes of Sustainability: Identify the key attributes that the community values, such + as community health, engineering practices, and funding stability. + ● Metrics for Measuring Sustainability: Discuss the different metrics and models that + help projects assess their sustainability, including how these metrics are developed and + applied. + ● Capturing and Using Metrics: Explore methods for capturing these metrics and + practical strategies for using them to improve sustainability.

+

Our goal is to create a community of practice where we can collaborate to curate, share, and + disseminate information and guidance that will strengthen and sustain the research and + scientific software community in the long term.

+
+
+
+ +
+ +--- + + +## Session 2B: Unique Stories in Research Software Experience + +
+ +
+
+
+ +
+
+ +
+
+

Leading a collaborative data science or research software engineering (RSE) team in an + academic environment can have many challenges including institutional infrastructure, funding, + and technical expertise. Even in the most challenging environment, however, leading such a + team with inclusive practices can be rewarding for the leader, the team members, and + collaborators. We describe nine leadership and management practices that are especially + relevant to the dynamics of such teams and an academic environment: ensuring people get + credit, making tacit knowledge explicit, establishing clear performance review processes, + championing career development, empowering team members to work autonomously, learning + from diverse experiences, supporting team members in navigating power dynamics, having + difficult conversations, and developing foundational management skills. Active engagement in + these areas will help those who lead data science or RSE groups – whether faculty or staff, + regardless of title – create and support inclusive teams. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The University Corporation for Atmospheric Research (UCAR) Software Engineering Assembly (SEA) + was formed in 2005 to provide an informal meeting space, instructional content including + tutorial series and seminars, and an evolving compilation of best practices for those staff and + collaborators at the organization interested in software engineering. Over time, the SEA + membership grew, events were regularly conducted, and in 2012 a yearly conference was + established with the focus being scientific software engineering.

+

Communities of practice like the SEA benefit from motivated members actively cultivating the + organization and adding some formal structure and legitimacy. Unfortunately, staff turnover and + budget (and thus time) constraints led to a gradual atrophying of SEA activity. While our yearly + conference - eventually titled the Improving Scientific Software Conference - remained a robust + fixture throughout, other offerings tapered to a nadir during the COVID pandemic. Soon after, + the longtime chair of the SEA left the organization, and it appeared that it may sunset + entirely.

+

As the SEA was shrinking, the US-RSE became a growing presence at National Labs. When a + new committee did eventually take over SEA governance, this presented an opportunity to align + our Assembly with the principles and best practices being developed by the research software + engineering community.

+

This talk will describe our Assembly in its current state, the changes that have been made to + modernize it thus far, and our goals for the future. Much of the focus has and will be on + building + a community of practice through events like open discussions on best practices, but some of the + more mundane challenges will also be described - such as revitalizing our web presence and + ensuring collaboration instead of competition with peer groups within and outside of our + organization. We will also give a brief overview of our Improving Scientific Software + Conference, our efforts to modernize it (i.e. using Jupyter Notebooks for proceedings), and how + we use the Conference to drive interest in the SEA and vice versa. Finally, we will discuss some + lessons learned about sustaining a long-running interest group, and mention some of the things + we wish we had known at the start of this revitalization effort. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In the spirit of this year's theme, we will present the past, present, and possible future of RSEs + at the National Center for Supercomputing Applications (NCSA), which was founded in 1986 as + one of the original five centers in NSF’s Supercomputing Centers program. While High + Performance Computing (HPC) was the center's initial emphasis, software was also a key part + of NCSA's work from the start, ebbing and flowing over time with a number of broad reaching + applications, early insights into areas such as applied AI, and the need to support UIX within + research software This led to the growth of RSEs at NCSA to a body of 50 or so RSEs today + supporting scores of projects across every scientific domain, identifying common needs, and + through that building larger more sustainable software frameworks.

+

During the early years of the Center, the Software Development Group was formed and it + quickly began to produce a number of globally impactful software packages for the community + such as NCSA Telnet, Iperf, HDF (Hierarchical Data Format), Habanero and other tools. This + work continued and in 1993, NCSA released NCSA Mosaic, the first wide-spread graphical web + browser that directly led to Netscape, Internet Explorer, and Spyglass, and NCSA httpd, which + led to Apache httpd that in turn drove 90% of web servers at its peak. Though all were built + around enabling the use of supercomputers during the growth of the internet, they all also had + an enormous broader impact with the general public. During this period, NSF funded NCSA + (and likewise our sibling centers as part of the Supercomputing Centers Program) through a + "block grant" model that supported the majority of activity at NCSA; funding was ~$35M per + year. The funding model was a key to success since it allowed NCSA staff to more freely + explore ideas and thus we saw the significant contributions NCSA made. In 1997, that changed + as NSF shifted from the block grant model to funding efforts through a set of independently + awarded grants for specific work. This resulted in software developers being scattered across + smaller groups that supported less traditional users who did not need HPC, leaving the Center, + or supporting others software on HPC resources after the Center took a much more HPC + support and hardware focus across a chain of large NSF efforts such as TeraGrid, XSEDE, and + Blue Waters.

+

The subsequent evolution of RSEs at NCSA had a very grassroots beginnings when a handful + of these small groups developing software decided to join forces: rather than competing with + each other in terms of collaborations, grants, and staff, they instead worked together, jointly + pursued funding, shared resources, and added greater security to all by having a larger portfolio + of collaborators and projects. The initial coalition was founded on a charter that prioritized trust + for improved efficiency. It emphasized respecting the PI’s role on projects and refraining from + interfering in another group’s project unless invited. The coalition also committed to supporting + each other if one group experienced a shortfall in projects. Over time other groups also joined + and through that, software had a larger voice enabling it to push for changes such as more + efficient hiring practices, support for green cards, standing up more flexible on-prem cloud + based resources to support interactive web services and data sharing, adoption of the RSE title + as an official campus title, and a recognized career path. Today software exists as a top + level directorate within the NCSA organization. This talk will walk through the key changes + during the evolution of NCSA's RSE role in a manner that can be leveraged by other RSEs + starting new groups. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

In 2022 the Princeton Research Software Engineering group, in collaboration with Human Resources, + established a multi-level career path job family for Research Software Engineers (RSEs) at Princeton + University. Expanding on the existing "Research Software Engineer” and "Senior Research Software + Engineer” roles, the new job family creates a structured career ladder that includes roles for six + individual contributors (Associate RSE, RSE I, RSE II, Senior RSE, Lead RSE, Principal RSE), two + working + managers (Lead RSE, Principal RSE), and three leadership positions (Associate Director, Director, + and + Senior Director). This formally establishes guidelines for defining and differentiating between RSE + roles, enabling equitable hiring of RSEs at substantially different experience levels, and + establishing + promotional pathways for RSEs employed at Princeton University.

+

In this talk we will describe how the vision for the career path originated, the process behind + defining + the roles and grades within the career ladder, and how we bridged gaps in technical understanding + with + administrative partners who were unversed in the role of Research Software Engineers. By minimizing + technical jargon to ‘standardize' job descriptions, roles were able to be defined with essential + requirements that allowed for proper compensation review and enabled the model to be effective for + broader use across campus departments. Finally, we will discuss important lessons learned from the + model + creation process through its implementation and use as we have successfully hired, reviewed, + promoted, + and retained RSEs at Princeton.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Research Software Engineering (RSE) covers a wide spectrum of people who fall somewhere in between + domain + research science and software engineering. While this makes the community highly inclusive, it can + be + difficult for some to know if they qualify as an RSE or not and hesitate to engage. In this talk I + will + share my personal journey from research in software engineering (SER) to RSE.

+

As someone who was never formally a software engineer in the classic sense but a researcher using + software engineering methods in domain science, I never felt like I had any particular identity. + Upon + first hearing the term “RSE”, I immediately identified. However, over the next two years of slowly + engaging with the community - including attending US-RSE’23 - I was still hesitant to see myself as + one + as my journey and position looked different than most of who I was seeing. It wasn’t until + attendance in + a recent Dagstuhl seminar that brought together SERs and RSEs that I was able to debate my + insecurities + first-hand and settle into my identity.

+

Throughout my experiences I have met a wide array of different types of RSEs. Each coming from their + own + backgrounds, skill sets, job titles, daily practices, team composition, career priorities, and + challenges. Many of these types which I have yet to see well-represented or understood in the + community. + In my talk I will not only share my personal experiences, but also highlight several examples of + diverse + types of people who identify as RSEs in order to provide a broader representation to the community + and + encourage anyone on the edges as I once needed that they do belong. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The Research Engineering Group at The Alan Turing Institute started our RSE journey 8 years ago as a + new + team at a new institute. Founded in 2016 and without the usual constraints and advantages of RSE + groups + based in universities, the team has had to find its own path in a rapidly evolving Institute and + field.

+

Over this time the Turing has grown and evolved considerably as a national institute, adding AI to + its + initial data science remit and refining its science and innovation agenda from a broad + programme-based + approach to a more focussed challenge-led one. As the Institute has grown and evolved over the + years, so + has the Research Engineering Group, growing from 4 to 45 and going through several iterations of how + we + structure ourselves and operate as a team in order to support the Institute's research.

+

As the team has evolved, we've expanded our range of research engineering roles to include those + more + focussed on data and computing, and we've built a sustainable career pathway for these roles within + the + Institute. Over the years we have refined our approach to recruitment, professional development and + career progression to attract a diverse range of talent and support them in their professional + journey, + with a clear pathway from our Junior level training role to our Principal level team leadership + role.

+

This journey has been guided by our principles: transparency in leadership and decisions; diversity + of + talent, people and experience; supporting individuals in their career journey; and focus on our role + as + expert collaborators. As we've progressed along this journey ourselves, we've also looked to support + others in doing so - both within the Turing as it has established other teams of related research + infrastructure professionals, and across the wider RSE community as other organisations have looked + to + establish or scale their own similar teams.

+

In this talk, we will share our journey and the lessons we've learned along the way. We hope that + our + story will be of interest both to those in leadership roles looking to establish or grow RSE teams + at + their own organisations and to team members within existing teams thinking about how they organise + themselves, their work and their culture as they grow and evolve as a team.

+
+
+
+ +
+ +--- + + +## Session 3: WetWare: Research Software in Chemical and Life Sciences + +
+ +
+
+
+ +
+
+ +
+
+

Lab notebooks are an integral part of science by documenting and tracking research progress in laboratories. However, existing electronic solutions have not properly leveraged the full extent of capabilities provided by a digital environment, resulting in most physics laboratory notebooks merely mimicking their physical counterparts on a computer. To address this situation, we report here preliminary work toward a novel electronic laboratory notebook, Lab Dragon, designed to empower researchers to create customized notebooks that optimize the benefits of digital technology. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The integration of computational physical chemistry into undergraduate laboratories presents a unique opportunity for collaboration with the research software engineering field. To promote more efficient computational workflows and foster engagement among budding programmers in computational modeling, we present this notebook investigating small molecules with unpaired electrons (radicals). The CF3 radical has been extensively explored in the chemical literature owing to its importance in ozone depletion from CFCs (chlorofluorocarbons) and its unusual geometric structure which deviates from the planar structure of the CH3 radical, despite the similar size of the fluorine atom and the hydrogen atom. Exploring trends along chemical groups is commonplace in the chemical literature, and as such we have created a notebook demonstrating the facile preparation and analysis of a simple experiment substituting the F atoms in the CF3 radical for other halogens in the same group (Cl, Br, I) in a combinatorial fashion. From a single excel sheet, input files for the quantum modeling software ORCA can be reproducibly generated. Upon completion of the requested calculations, the meaningful data is systematically extracted from the produced log files. This method contrasts with traditional practices in undergraduate labs in which students manually construct input files and scroll through log files to copy/paste data and demonstrates a more efficient and reproducible alternative. The notebook not only serves as an educational tool but also acquaints future research software engineers with the specialized software developed by computational chemists. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Domain research, particularly in the life sciences, has become increasingly complex due to the + diversity + of types and amounts of data concomitantly with the associated analytical methods and software. + Simultaneously, researchers must consider the trustworthiness of the software tools they use with + the + highest regard. As with any new physical laboratory technique, researchers should test and assess + any + software they use in the context of their planned research objectives.

+

As examples, bioinformatics software developers and contributors to community platforms that host a + variety of domain-specific tools, such as KBase (the DOE Systems Biology Knowledgebase) and Galaxy, + should design their tools with consideration for how users can assess and validate the correctness + of + their applications before opening their applications up to the community.

+

More attention should be placed on ensuring that computational tools offer robust platforms for + comparing experimental results and data across diverse studies. Many domain tools suffer from + inadequate + documentation, limited extensibility, and varying degrees of accuracy in data representation. This + lack + of standardization in biological research, in particular, diminishes the potential for + groundbreaking + insights and discoveries while also complicating domain scientists' ability to experiment, compare + findings, and confidently trust results across different studies.

+

Through several examples of tools in the biology domain, we demonstrate the issues that can arise in + these types of community-built domain-specific applications. Despite their open-source nature, we + note + issues related to transparency and accessibility resulting in unexpected behaviors that required + direct + engagement with developers to resolve. This experience underscores the importance of deeper openness + and + clarity in scientific software to ensure robustness and reliability in computational analyses.

+

Finally, we share several lessons learned that extend to research software in general and discuss + suggestions for the community.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Honeycomb is a template repository that standardizes best practices for building jsPsych-based + tasks. It offers continuous deployment for use in research settings, at home, and on the web. + The project's main aim is to improve the ability of psychiatry researchers to build, deploy, + maintain, reproduce, and share their own psychophysiological tasks (“behavioral experiments”).

+

Behavioral experiments are a useful tool for studying human behavior driven by mental processes + such as cognitive control, reward evaluation, and learning. Neural mechanisms during behavioral + tasks are often studied in the lab via simultaneous electrophysiological recordings. Uniquely + registered participants may be asked to concurrently complete the task at home where connecting + such specialized equipment is not feasible. Furthermore, online platforms such as Amazon + Mechanical Turk (MTurk) and Prolific enable deployment of tasks to large populations + simultaneously and at repeated intervals. Online distribution methods enable far more + participation than what labs can handle in a reasonable amount of time.

+

Honeycomb addresses the key challenge of using a single code base to deliver a task in each of + these environments. The benefits of Honeycomb were first seen in an ongoing study of deep brain + stimulation for obsessive compulsive disorder. Subsequent projects have included research on + decision making processes for people with obsessive compulsive disorder as well as gameplay + style differences between control, obsessive compulsive disorder, and + attention-deficit/hyperactivity disorder patients. The CCV additionally maintains a curated + public library, termed BeeHive, of ready-to-use tasks.

+

The project is open-source and directly supported by the Center for Computation and + Visualization at Brown University. It has been in active development since August of 2019 + (currently version 3.4) with version 4 and 5 releases roadmapped. An ultimate goal of the + project is to publish it as its own library to the node package manager (npm) registry. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Ecological Momentary Assessments (EMA) are often used in the field of Psychology to deliver + multiple data collection instruments to study participants at several time points in the day. As + these are often used to study participants current (at the time of receiving the notification) + mood, activity or immediate company, it is important that they do not anticipate the + notification arrival at fixed times throughout the day. However most traditional electronic data + capture (EDC) systems require participant notification schedules to be pre-determined with + little to no room for sending random events individualized to participants environment (wake up + time, etc.). Given this problem, our team of RSE’s has developed a cloud first random EMA + notification system, that can serve 1000’s of participants multiple random EMA push + notifications throughout the day. The system is capable of tracking user wake and sleep times, + adapt to weekend or weekday modes and configurable to work with different randomization logic + and anchors (points around which to randomize). During development the team prioritized the use + of proven architectural building blocks to maximize uptime, reduce cost and speed up + development. More importantly the system was built to evolve hand in hand with the changing + requirements from research stakeholders. In this talk we will go over how we build this system + using Amazon Web Services Event Schedulers, low cost serverless components, lessons learned from + testing across various time zones, and compliance monitoring. We will look at the initial design + choices, their limitations and how they had to be adapted. Finally, we will go over how the + solution integrates with existing commercial EDC products such as Care Evolution’s MyDataHelps + offering. The solution is currently open sourced and can be adapted by RSE teams for their own + stakeholder’s studies.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Life sciences research is increasingly requiring researchers to do more difficult tasks, as datasets + are + becoming larger and more complex and new statistical methods are being advanced. Researchers are + consequently needing to create and use more research software tools to manage data and analyses. The + majority of researchers in life sciences are lagging behind on the computational skills that they + need + to stay on the cutting edge of modern research. Most are self-taught, as computational skills are + mostly + still not being taught in the curriculum. As these are challenging skills to teach oneself, the + process + is difficult and leads to gaps or inaccuracies in knowledge. Additionally, most researchers do not + have + the time to become software engineering experts while doing all of their other necessary tasks.

+

Our group at the University of Arizona is addressing this problem by helping life sciences + researchers + increase their computational skills through training and collaboration. We are a small group of data + scientists and research software engineers embedded in a division that includes departments for + agriculture, plant and animal sciences, and environmental science. We devote a substantial amount of + our + time teaching researchers in the division through a variety of programming. We develop curriculum + and + hold workshops, workshop series, learning groups, and lab-level trainings on a variety of + intermediate + topics on good software practices, programming libraries, and version control. We also teach a lot + of + people one-on-one. We approach teaching in an inclusive and accessible way, and hold the philosophy + that + almost everyone is capable of learning these skills. We build community among our research + community, + connecting folks who are isolated in their labs or departments. We have also discussed with many + people + what paths there are to move into research software engineering as a career.

+

The second part of our group's approach is to have devoted practitioners collaborate with life + sciences + researchers. We have advanced skillsets that researchers cannot have themselves because we devote + our + focus on learning skills and new tools to develop software, advance data management, and improve + reproducibility. By teaching researchers when possible and doing the necessary when they cannot, we + enable research to be done that could not be otherwise. We are able to do this because we only help + with + the research of others and do not have our own research program. Everyone in our group also has + domain + expertise in life sciences fields, and so are more familiar with those fields' challenges, data + types, + and language. We also have strong communication skills that are needed for excellent collaborative + work. + Lastly, our collaboration success comes from being a small and flexible group embedded in the domain + unit.

+

There are some challenges to how our group is helping improve scientific software use and creation + in + life sciences. Our approach is slow and not very scalable because we are working with individuals or + small groups. It can also be difficult for us to track our impact, even with gathering a diverse set + of + data and information about how we are helping others. We are making a substantial difference in the + research that our institution's life sciences researchers are able to do.

+
+
+
+ +
+ +--- + + +## Session 4: Case Studies in Research Software + +
+ +
+
+
+ +
+
+ +
+
+

Science gateways have emerged as a popular and powerful interface to computational resources for + researchers. Most if not all of these science gateways now rely on container technology to + improve portability and scalability while simplifying maintenance. However, this can lead to + problems where the container image size can grow as more domain-specific packages and libraries + are needed for the tools deployed on these containers. This is particularly relevant in the case + of JupyterHub-based gateways, where the Python virtual environments or Conda environments + underlying the Jupyter kernels can often grow in size and number.

+

For example, a JupyterHub gateway that I work on as part of the NSF-funded I-GUIDE institute + required the installation of a large number of geospatial libraries, leading to the Jupyter + notebook container images approaching several gigabytes in size. To combat this, our team + decided to integrate the CernVM File System (also known as CVMFS) with Kubernetes, which acts as + a software distribution service and can provide software packages to the containers from a + separate server.

+

As a first step in this integration, we had to deploy our own CVMFS server on a separate virtual + machine and load it with the packages that were needed for distribution. CVMFS itself has two + main servers, which are the stratum 0 and the stratum 1. The stratum 0 is the main server for + configuration and packages, while the stratum 1 acts as a mirror of the stratum 0. Following the + deployment of the stratum 0 server, we were able to install the necessary Conda environments and + modules. The I-GUIDE JupyterHub platform is deployed on a Kubernetes cluster using the + Zero2JupyterHub recipes. In order to integrate the CVMFS server with this Kubernetes cluster, we + installed a CSI (container storage interface) driver provided by the developers of CVMFS to + connect the stratum server to the Kubernetes cluster. This then enabled us to create the + necessary storage class and persistent volumes in Kubernetes that could then be mounted into the + Jupyter notebook containers to serve the necessary Conda environments. Ultimately, this resulted + in containers having their sizes reduced significantly, from multiple gigabytes to only half of + a single gigabyte!

+

In conclusion, science gateways are incredibly impactful for researchers, but can take more + effort to maintain than most realize. Containers and virtual machines make this easier, yet can + contribute to their own issues by becoming bloated over time. These size issues can be resolved + with CVMFS, making the container sizes around three to four times smaller compared to before.

+

In this talk I will be presenting our deployment design as well as our experience through this + deployment process and lessons learned.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Citing software in research is crucially important for many reasons, from reproducibility, to + bolstering + the career of the research software engineers who worked on the code, to understanding the + provenance of + ideas. With the inclusion of DOIs on Zenodo, CRAN, and through integrations with GitHub, it is + easier + than ever to cite software as a first order research object. However, there are no standards on what + software should be cited in a paper, and authors often fail to cite software, or only cite + well-known, + charismatic, user-facing packages. There are few attempts at citing dependencies, in particular. +

+

Here, we took citing software as an ethical research goal to its logical but unfeasible conclusion, + citing all dependencies for software used in a research package, not only the top-level package + itself. + We present our open source tool we used for finding DOIs and citation.cff files within dependencies, + and + talk about the implications of large amounts of citations within paper that uses research software. + In + particular, we encourage the adoption of software bills of materials (SBOMs) for citing software, + especially research software.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Combining prose, code, and outputs in a single artifact, computational notebooks are + exceptionally valuable instruments in any context where 'research' and 'software' intersect. + However, the same features that make notebooks such effective tools also result in unique + issues that need to be addressed to ensure they can fulfill their full potential for the wider + community of software and research practitioners. One of the biggest challenges with + computational notebooks is ensuring that a notebook can be run by people other than its + author(s), on computation environments, and/or at different times in the future after its creation, + an ability often known as computational reproducibility. While this is a general problem affecting + any context where notebooks (or indeed, any computational artifact) are used, these concerns + also represented a concrete issue for the computational notebooks submission track at the + US-RSE conference, affecting both authors and reviewers alike.

+

If reviewers are not able to run notebooks for the submissions they're reviewing, they'll likely be + unable to evaluate the submission based on its full intended functionality; or, they might try to + fix + the issues preventing the notebook from being run (missing dependencies, incompatible + versions, etc), which results in extra work, frustration, and/or less consistency across multiple + reviewers. Even when authors try their best to provide resources for reproducing a valid + computational environment in which their submission can be run (such as documentation, + packaging/environment metadata, etc), the lack of an automated way to test and a documented + standard for the computational environment that will be used limits their ability to validate their + resources (and, therefore, estimating how likely it is that their notebooks will run as expected + during review) before finalizing their submission. As the program subcommittee responsible for + notebooks at US-RSE’24, a vital part of our role has been to streamline the submission and + review process to enable both authors and reviewers to concentrate on their respective duties. + Additionally, given the added technical complications unique to notebooks, any solution that + required unsustainable amounts of extra work on our side would also not be feasible to adopt. + This talk will provide an overview of the workflow we developed for US-RSE’24 to help dealing + with these issues, as well as lessons learned on what worked well and what didn’t. Built using + open-source and/or freely available tools such as repo2docker, GitHub Actions, and + Binder, the infrastructure provides a set of automated checks that authors can enable to + test the repository before submission, based on the same standardized tools, specifications, + and computational environment available to reviewers.

+

Beyond the specific context of this year’s conference, we structured this talk to be relevant and + appealing for a broad audience of RSEs, especially (but not limited to) those interested in + computational notebooks, Continuous Integration and Development (CI/CD), and the challenges + and tradeoffs associated with designing workflows to be usable at all levels of prior experience. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Small to medium-sized research projects require increasingly sophisticated software stacks as the + demand + continues to grow for more high performance computing (HPC) resources and Kubernetes clusters for + web-based applications. Frequently these smaller projects do not have funding for dedicated DevOps + engineers, and require their RSEs to perform the task of dedicated DevOps engineers. The effort + required + to manually provision each layer of this stack, from cluster node operating system configuration to + application deployment, especially given the scarcity of RSEs, will become infeasible without force + multiplying innovations. Often these tasks are done early in the project, and need to be re-learned + for + the next project. Additionally, the wealth of knowledge from the DevOps engineer, securing these + systems + and upgrading them during the project will fall on the RSE, reducing the often scarce time to + develop + the application even more.

+

We present the approach developed at NCSA to address this problem: a GitOps-based method of + bootstrapping virtual computing resources and Kubernetes clusters for composable deployment of + collaborative tools and services. Leveraging industry-standard software solutions we provide a free + and + open source foundation upon which open science can flourish, with an emphasis on decentralized + applications and protocols where possible. Leveraging this infrastructure, we can add new layers on + this + called DecentCI, allowing an RSE to quickly get a complex system up and running, allowing for shared + access to data, sharing ideas in forums, private messaging, websites, etc.

+

Building on the knowledge gained from many projects, we have created a set of recipes allowing for a + new + project to be up and running in under 30 minutes. For example in the case of kubernetes, nodes will + be + created and configured, and clusters will be initialized with ingress controllers, secret + management, + storage classes etc (all of this is configurable on a per cluster basis). The clusters deployed can + easily be upgraded by applying newer centrally managed modules in these clusters. New functionality + added centrally can be added over time to the clusters.

+

During this talk we will discuss what tools are used and are centrally managed, and what tools are + installed in each cluster. We will describe how an RSE can add their applications to the system and + use + well understood GIT workflows to deploy new applications, and work with other RSE on the project. + The + end goal is a system that will be decentralized and empower the RSE to get new applications to the + scientists faster and securely to help with their research.

+
+
+
+ +
+
+
+ +
+
+
+
+

Community resilience research is essential for anticipating, preventing, and mitigating the impacts of natural and anthropogenic disasters. To support this research, the Center for Risk-Based Community Resilience Planning, funded by the National Institute of Standards and Technology (NIST), developed the measurement science and metrics that can help communities in planning, adapting and recovering from disasters. This measurement science is implemented on an open-source platform called the Interdependent Networked Community Resilience Modeling Environment (IN-CORE). On IN-CORE, users can run scientific analyses that model the impact of natural hazards and community resilience against these impacts.

+ +

This Jupyter Notebook uses the Joplin, MO community and the historical 2011 EF-5 Tornado event as an example of how to use IN-CORE to analyze community resilience. The city of Joplin, Missouri, USA, was hit by an EF-5 tornado on May 22, 2011 (NIST Report). Note that IN-CORE supports various hazards including earthquake, tornado, tsunami, flood, and hurricane.

+ +

The notebook contains the following analyses: structural damage analysis on buildings, electric power network damage, building functionality, economic impact analysis on the community’s economy, population dislocation analysis, housing household recovery analysis, and retrofit analysis on buildings. In addition, the notebook demonstrates the visualization of outputs from these analyses.

+ +

Lastly, the core logic of this notebook is used to power the IN-CORE Community Resilience Playbook, an interactive guide for community resilience planning. It has been used in workshops with the city planners and government officials, making it a valuable resource for resilience planning.

+
+
+
+ +
+ +--- + + +## Session 5: RSE in Action! + +
+ +
+
+
+ +
+
+ +
+
+

Many academics feel comfortable wrangling and analyzing data in R, but have little to no + experience working on the command line and may find job scheduling systems like SLURM + intimidating. This can be a significant barrier for using high performance computing which + generally requires creating BASH scripts and submitting jobs via the command line. The {targets} + R package provides many benefits to researchers, one of which is running steps of an analysis + automatically as job requests on an HPC all from the comfort of R.

+

The {targets} package allows for workflow management of analysis pipelines in R where + dependencies among steps are automatically detected. When a {targets} pipeline is modified and + re-run, any steps (called “targets”) that do not need to be rerun are automatically skipped, + saving compute time. By default, {targets} pipelines are launched in a “clean” R session, which + enforces reproducibility (a blessing to some and a curse to others). It is relatively trivial to + parallelize a {targets} workflow so that independent targets are run on parallel workers either + locally as multiple R sessions, or using HPC or cloud computing resources. Users can define a + controller that runs their pipeline on the HPC using multiple workers running as separate SLURM + jobs (or PBS, SGE, etc.). It is also possible to define multiple controllers with different + resources for different targets so that tasks with heavier computational needs are run with more + CPUs, for example. All of this happens from the comfort of R without users needing to manually + create multiple R scripts and/or multiple SLURM submission scripts for each task.

+

RSEs and HPC professionals can help enable this powerful combination of technologies in a few + ways. At University of Arizona, our group has created a template GitHub repository for a + {targets} project that can run on the UA HPC either through the command line where targets are + run as SLURM requests or with Open OnDemand where targets are run on multiple R sessions. It + includes code for a controller function that works with the UA HPC and documentation about how + to modify the template and get it onto the HPC with git clone. We have previously run {targets} + workshops that help researchers re-factor their analysis scripts into {targets} pipelines. We + hope to work with HPC professionals to increase awareness of {targets} as an option for R users + to harness the power of cluster computing. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Creating population estimates for the entire globe using machine learning is a challenging task. + One challenge is gathering and combining vast amounts of global GIS data at high resolutions. + Another challenge is processing the amount of complex GIS data required to make population + estimation possible in a reasonable amount of time. Speed is important in research because of + the need to iterate and evaluate the data for validity and accuracy. In this work, we present + the challenges of taking research code from a Jupyter notebook and creating a cloud optimized + solution using infrastructure as code (IaC) to deploy a cluster in OpenStack. We show the code + modifications for speed performance improvements, comparisons of running machine learning on + multithreaded CPUs versus GPUs, and the architecture design for running a global dataset on a + Kubernetes cluster. +

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Have you ever asked, "did this output use the right version of the inputs and code?”, "what software + does + this program require to execute again?”, "how can I convert this pile-of-scripts to a containerized + workflow?”, "how does the ancestry of these two outputs differ?", "who is using my software + library?”, + or a similar question. All these are example questions that computational provenance can help + answer. + Computational provenance is the process by which a certain computational artifact was generated, + including its inputs (e.g., data, libraries, executables) and the computational provenance of those + inputs, for example, the figure on the right.

+

How to collect computational provenance? We could ask the application developers to emit this kind + of + data. That approach requires a herculean effort to get all applications to comply. We could require + the + user to use workflow systems that explicitly declare the inputs and outputs of every node. This + approach + shifts the compliance burden onto the user. If the user misspecifies the workflow, it may still + execute + but the provenance would be wrong. The "holy-grail” would be to collect provenance data at the + system-level without modifying any application code and not needing superuser privileges or harming + performance.

+

Almost all prior attempts at unprivileged system-level provenance collection used ptrace syscall, + which + asks the Linux kernel to switch to the tracing process every time the tracee process executes a + system + call (like how strace binary works). Ptrace-based tracers meet most of the technical requirements + but + are prohibitively slow. Our recently accepted work (Grayson et al. 2024) observed a geometric mean + of + traced runtime 1.5x for CARE (Janin et al. 2014), 2.5x for RR (O'Callahan et al. 2017), and 3x for + ReproZip (Chirigati et al. 2016) over the untraced runtime. Ptrace involves a context-switch from + the + tracee process to the tracer process and back every system call, of which there could be thousands + per + second. Each context switch causes scheduler overhead and clears caches (especially the translation + lookaside buffer).

+

We propose PROBE (Provenance for Replay OBservation Engine), a tool that collects system-level + provenance using library interpositioning (Curry 1994), also known as the LD_PRELOAD trick. Library + interpositioning happens all within the same virtual address space, which does not involve any + additional context switching. We have a working research prototype operating this way.

+

Another possible reason system-level provenance hasn't caught on is the lack of downstream tooling. + We + are developing several consumers of provenance including a graphical viewer, an automatic + containerization tool, an environment "diff”, and a software citation generator. Furthermore, we + export + our provenance to Process Run RO Crate format (Leo et al. 2023), so that it can be interoperable + with + other provenance consumers.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Most research projects today involve the development of some research software. Therefore, it has + become + more important than ever to make research software reusable to enhance transparency, prevent + duplicate + efforts, and ultimately increase the pace of discoveries. The Findable, Accessible, Interoperable, + and + Reusable (FAIR) Principles for Research Software (or FAIR4RS Principles) provide a general framework + for + achieving that.1 Just like the original FAIR Principles2, the FAIR4RS Principles are as designed to + be + aspirational and do not provide actionable instructions. To make their implementation easy, we + established the FAIR Biomedical Research Software (FAIR-BioRS) guidelines, which are minimal, + actionable, and step-by-step guidelines that biomedical researchers can follow to make their + research + software compliant with the FAIR4RS Principles.3,4 While they are designed to be easy to follow, we + learned that the FAIR-BioRS guidelines can still be time-consuming to implement, especially for + researchers without formal software development training. They are also prone to user errors as they + require several actions with each new version release of a software.

+

To address this challenge, we are developing codefair, a free and open-source GitHub app that acts + as a + personal assistant for making research software FAIR in line with the FAIR-BioRS guidelines.5,6 The + objective of codefair is to minimize developers’ time and effort in making their software FAIR so + they + can focus on the primary goals of their software. To use codefair, developers only need to install + it + from the GitHub marketplace. By leveraging tools such as Probot and GitHub API, codefair monitors + activities on the software repository and communicates with the developers via a GitHub issue + “dashboard” that lists issues related to FAIR-compliance (updated with each new commit). For each + issue, + there is a link that takes the developer to the codefair user interface (built with Nuxt, Naive UI + and + Tailwind) where they can better understand the issue, address it through an intuitive interface, and + submit a pull request automatically with necessary changes to address the related issue. Currently, + codefair is in the early stages of development and helps with including essential metadata elements + such + as a license file, a CITATION.cff metadata file, and a codemeta.json metadata file. Additional + features + are being added to provide support for complying with language-specific standards and best coding + practices, archiving on Zenodo and Software Heritage, registering on bio.tools, and much more to + cover + all the requirements for making software FAIR.

+

In this talk, we will highlight the current features of codefair, discuss upcoming features, explain + how + the community can benefit from it, and also contribute to it. We believe codefair is an essential + and + impactful tool for enabling software curation at scale and turning FAIR software into reality. The + application of codefair is not limited to just making biomedical research software FAIR as it can be + extended to other fields and also provide support for software management aspects outside of the + FAIR + Principles, such as software quality and security. We believe this work is fully aligned with the + US-RSE’24 topic of “Software engineering approaches supporting research”. The conference + participants + will benefit greatly from this talk as they will learn about a tool that can enhance their software + development practices. We will similarly benefit as we are looking for community awareness and + contribution in the development of codefair, which is not currently supported through any funding + but is + the result of the authors aim to reduce the burden of making software FAIR on fellow developers.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The Hawaiʻi Climate Data Portal (HCDP) (available https://www.hawaii.edu/climate-data-portal/data-portal/) provides access to 30+ years of climatological data collected from sensor stations around the state of Hawaiʻi and gridded data products derived from these values. The HCDP is a publicly available web application and is backed by an API that is accessible to researchers on request. This notebook demonstrates some of the abilities and usage of the HCDP API and the data provided by it. The notebook will show the user how to retrieve and map sensor station metadata and values, retrieve gridded data products, produce timeseries of station and gridded data, and generate data packages for large amounts of data that can be downloaded directly or sent to the user’s email. +

+
+
+
+ +
+ +--- + + +## Session 6: Reproducible Software Ecosystems + +
+ +
+
+
+ +
+
+ +
+
+

Science gateways provide an easy-to-use computational platform for research and educational purposes, + abstracting underlying infrastructure complexities while promoting an intuitive interface. In the + last + 15 years, quite a few mature science gateway frameworks and Application Programming Interfaces + (APIs) + have been developed fostering distinct communities and strengths that meet a diverse set of needs. + Examples such as HUBzero, Tapis, Galaxy, and OneSciencePlace are well-sustained science gateway + frameworks that create production quality gateways that facilitate collaborative workspaces. These + gateways enhance the research process by democratizing access to computational resources and + supporting + users in their exploration of research. Researchers benefit from streamlined access to various + resources, such as high-performance computing (HPC) systems, data repositories, and specialized + software + tools. The shared workspaces enable collaborative projects, facilitating communication and + cooperation + across different disciplines. Interdisciplinary collaboration is crucial to addressing many grand + scientific challenges such as climate modeling, genomics, or materials sciences. The standardized + environments these gateways provide promote data sharing and set the stage for the reproducibility + of + computational experiments, a cornerstone in science. For research software engineers, engagement + with + science gateways offers numerous advantages. These frameworks provide standardized interfaces and + mechanisms to interact with software libraries and tools, streamlining the development process and + ensuring compatibility. This reduces development time and complexity, allowing engineers to focus on + each community's unique requirements without dealing with low-level technical details. Automated + deployment features supported by many gateways further ease the process. Beyond the technical + benefits + above, engaging within a science gateway framework also means engaging with a larger community of + developers and users. This collaborative environment leads to shared knowledge, rapid issue + resolution, + and the opportunity to participate in joint development efforts. Continuous user feedback from + researchers using the tools allows continuous improvement, ensuring the software evolves to meet + evolving user needs. From a professional development perspective, active participation in science + gateway frameworks exposes engineers to cutting-edge computational methodologies, cloud computing + principles, and big data techniques. This both enhances their skills and keeps them up-to-date with + the + latest technological advancements. Furthermore, experience with science gateways and the relevant + tech + stacks being used, can open up career opportunities in academia and industry, given the growing + demand + for expertise in these areas. In summary, science gateway frameworks play a pivotal role in + accelerating + scientific research, providing enhanced accessibility and collaboration. For research software + engineers, these frameworks offer a rich environment for skill development, collaboration, and + career + advancement. As scientific research increasingly relies on collaborative, data-intensive approaches, + the + role of science gateways will continue to expand in the research ecosystem.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Computational notebooks provide a dynamic and interactive approach to scientific communication and + significantly enhance the reproducibility of scientific findings by seamlessly integrating code, + data, + images, and narrative texts. While notebooks are increasing in popularity among researchers, the + traditional academic publishing paradigm often requires authors to extract elements from their + notebooks + into another format, losing the interactive and integrative benefits of notebook format.

+

In response to this evolving landscape, the Software Engineering Assembly (SEA) Improving Scientific + Software Conference (ISS) has built a framework for paper submissions that accommodates Jupyter + Notebooks and Markdown files. This approach is designed to enhance the transparency and + accessibility of + research, enabling authors to submit and share their work in a more dynamic and interactive format. +

+

In this presentation, we will talk about this deployed framework and how it can be easily adopted + for + future conferences and journals. This framework is built on top of open-source tools such as Jupyter + Notebooks, JupyterBook, and Binder. In this framework, we utilized GitHub Workflows for the + automated + build and deployment of submissions into a cohesive JupyterBook format. The presentation will cover + the + challenges and solutions encountered in implementing this framework, aiming for its application in + future conferences. Additionally, we will share insights and experiences from developing and + deploying + this ecosystem, emphasizing how it can fundamentally change the way research is published, shared, + and + assessed within the open science and reproducibility paradigm.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Researchers support reproducibility and rigorous science by sharing and reviewing research + artifacts—the + documentation and code necessary to replicate a computational study (Hermann et al., 2020, + Papadopoulos + et al. 2021). Creating and sharing quality research artifacts and conducting reviews for conferences + and + journals are considered to be time consuming and poorly rewarded activities (Balenson et al., 2021; + Collberg & Proebsting, 2016; Levin & Leonelli, 2017). To simplify these scholarly tasks, we studied + the + work of artifact evaluation for a recent ACM conference. We analyzed artifact READMEs and reviewers’ + comments, reviews, and responses to surveys. Through this work, we recognized common issues + reviewers + faced and the features of high quality artifacts. To lessen the time and difficulty of artifact + creation + and evaluation, we suggest ways to improve artifact documentation and identify opportunities for + research infrastructure to better meet authors' and reviewers' needs. By applying the knowledge + gleaned + through our study, we hope to improve the usability of research infrastructure and, consequently, + the + reproducibility of research artifacts.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

The Department of Energy Systems Biology Knowledgebase (KBase) is a community-driven research + platform + designed for discovery, knowledge creation, and sharing. KBase is a free, open access platform that + integrates data and analysis tools into reproducible Narratives, provides access to scalable DOE + computing infrastructure, and enables collaborative research and publishing of findable, accessible, + interoperable, and reusable (FAIR) analyses.

+

The KBase Narrative - the primary user interface for the KBase platform - is built on top of the + Jupyter + Notebook application. This interface enables platform users to access an array of wrapped tools + (apps) + that are used throughout the computational biology community, many of which produce their own data + visualizations, which are also made available in Narratives. Within a Narrative, users can perform + analyses, display interactive results, and record interpretations. In contrast to analysis workflows + commonly used in bioinformatics, where researchers will run individual tools (potentially hosted in + different locations) sequentially, KBase allows users to build custom analytical pathways with + notebooks, where all the tools and data are contained in a single place to enable reproducibility of + analysis and provide data provenance.

+

The platform is built around creating a welcoming user experience for users with a broad range of + biology, bioinformatics, and computational expertise. To accomplish this, the KBase Narrative uses a + GUI + to generate code that runs analysis tools on DOE computing infrastructure. The output of these apps + can + be made more comprehensible for users in the form of interactive data visualizations, reports, or a + simple list of data objects created by a tool. At the same time, the Jupyter Notebook allows more + advanced users to supplement their app runs with custom Python code.

+

Reproducibility is one of the major concerns of the system: Narratives and apps are versioned, and + all + data in the system receives a persistent unique ID, so analyses can be rerun to ensure that the same + results are achieved. There is also an emphasis on tracking data through KBase and recording the + transformations it undergoes through the provenance system; every data object has an immutable + record of + how it was produced, and this provenance chain can be followed forwards or backwards to view the + original inputs or see the eventual output of a set of analyses.

+

KBase also strives to provide FAIR data access. Recent work has focused on assigning credit to users + who + do analyses and publish data generated through KBase. In an early step toward creating a publishable + Narrative, these documents can be exported to a “static” format providing a frozen snapshot + detailing + the analysis steps and data. Furthermore, DOIs (digital object identifiers) can be assigned to the + static Narrative. These features can be used for reproducibility in a publication and ensure being + credited for work. Markdown cells provide a mechanism for users to extend the documentation + automatically created by data provenance and add additional context to explain the background of the + data imported into KBase.

+

Together, these features make the KBase Narrative an application where analyses, results, and + interpretations can be viewed and shared in a single interactive document. There are still many + challenges ahead for KBase as it steps toward making publishable Narratives. These include updating + the + user experience as the platform expands and caters to a growing user-base, and challenges with + adapting + to recent advances in large language models and their utility in biological data science. We welcome + and + encourage community feedback and discussion.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

We present RainFlow as a case study in how collaborations between bench scientists and software + developers can deliver impactful solutions to the reproducibility crisis in scientific Research. +

+

RainFlow is a MacOS desktop application developed for Reproducible Analysis and Integration of + Flow-cytometry data. Flow cytometers are routinely used to collect rich biological data in clinical + settings and research laboratories alike. Modern flow cytometers are extremely sensitive instruments + that can measure the expression of 25-40 different proteins in millions of single cells in a matter + of + minutes. However, deriving actionable insights from this high-dimensional, high-volume data is + hindered + by the lack of reproducible analysis techniques.

+

Lack of reproducibility affects two aspects of the research data pipeline. First, technical noise in + the + sensitive instrumentation can confound accurate protein signal measurement during data collection. + Second, variations in the analytical choices made by individual researchers can confound + reproducibility + during data analysis.

+

We developed RainFlow in an effort to automate the process of data cleaning and analysis for flow + cytometry experiments. First, to decouple technical noise from true biological variation, we + developed + custom machine learning pipelines which reproducibly correct technical noise in the signal, as well + as + produce a quality score for each sample. The quality score can then be used to select for + high-quality + samples before integrating several batches of data together for downstream analysis. Second, to aid + researchers in making good analytical choices and recording every analytical choice, we packaged the + algorithms into a user-friendly MacOS desktop application called RainFlow.

+

RainFlow was specifically designed to be accessible to researchers with little to no coding ability + i.e. + researchers who have “bench skills” for experimental data collection, but not necessarily + computational + data analysis skills. RainFlow takes the researcher step-by-step to transform raw flow-cytometry + data + into cleaned, batch-normalized, quality-controlled data ready for integration. In addition, it + automatically records every data decision taken by the researcher during the analysis process and + exports the parameters for easy sharing. At every step, the researcher is able to visualize the + effect + of the machine learning algorithmic corrections on the data distribution. Helpful informational + guides + are provided to explain what each individual step or algorithm does, and how best to select the + required + analytical parameters. Additionally, we sought to automate parameter selection as much as possible, + so + that fewer total decisions relied on manual expertise.

+

This talk will focus on the lessons learnt during the development of RainFlow, which we hope will be + more broadly applicable for the research software engineering community. RainFlow was released in + the + Apple App Store in May 2024 and is available for free download.

+
+
+
+ +
+
+
+ +
+
+ +
+
+

Reproducibility of research that is dependent on software and data is a persistent and ongoing + problem. In this talk, I invite the emerging research software engineering community to leverage + the half century of specific knowledge offered by cybersecurity professionals. I demonstrate + that the practical needs of cybersecurity engineering and research software engineering overlap + significantly. In addition to enabling reliable reproducibility of research, I illustrate how + well-understood cybersecurity tools enable independent verification of research integrity and + increase the public trust in open science. +

+
+
+
+ +
+ +--- + + +## Working Group Fair + +This session is designed to provide attendees with an opportunity to meet and +interact with members of various Working Groups within the US-RSE community. + +**Featured Working Groups** + +- Diversity, Equity, and Inclusion (DEI): Explore initiatives aimed at promoting a diverse and inclusive environment within the research software engineering community. +- Education and Training: Discover programs and resources designed to enhance the skills and knowledge of RSE professionals. +- National Labs: Learn about efforts to empower RSEs in national labs and other similar organizations. +- And many more! + +Are you a current US-RSE member looking to deepen their involvement? A newcomer to the US-RSE community? Just curious? Join the session to learn, engage, and maybe even share your passion by becoming a Working Group member! + +
Oct 7 - Oct 11 - Online
1:00 PMWorkshops
2:30 PMBreak
3:00 PMWorkshops
5:00 PMPoster - Session / Happy Hour, La Sala
Oct + 7, 11:00 AMTutorial: + "Globus Compute: Managed Compute Across the Computing Continuum," + Kyle Chard, Yadu Babuji, Reid Mello - Register Now
Wednesday, Oct 16th
8:00 AMBreakfast
9:00 AMKeynote: TBA
10:00 AMBreak
10:30 AMTechnical Session
12:00 PMLunch +
Oct 7, 3:00 PMTutorial: + "GitHub Actions for Scientific Data Workflows," Valentina Staneva, + Quinn Brencher - Register Now
Oct + 8, 11:00 AMTutorial: + "Overview of GIS Open Source Software Ecosystem and Theory," Dennis + Milechin - Register Now
Oct + 9, 11:00 AMTutorial: + "Rapid Prototyping for a Usable React-based Web Application with + STRUDEL," Cody O'Donnell, Rajshree Deshmukh, Lavanya Ramakrishnan - + Register Now
Oct + 10, 11:00 AMTutorial: + "Leveraging Django Views and Permissions at Object-Level: The gist of an + envisioned solution for managing agricultural datasets," Diego Menéndez, + Danying Shao - Register Now
Oct + 11, 8:00 AMTutorial: "Research Data Automation with Globus Flows and Globus + Compute," Lee Liming - Register Now
Tuesday, Oct 15th
8:00 + AMBreakfast / Registration
8:45 AMWelcome (Lauren Milechin, Miranda Mundt, Ian Cosden, Sandra + Gesing), Ballroom A
9:00 AMKeynote: + Simon Hettrick, Ballroom A
10:00 + AMBreak
10:30 AMSession 1A: RSE Pedagogy, Hopi/TewaSession 1B: AI, ML, and Automation, TaosSession 1C: Insights on Research Software Practices and Principles, Cochiti
+
    +
  • A Brief Introduction to HPC Carpentry
  • +
  • Lessons Learned While Building an Effective and Equitable Internship + Program
  • +
  • Student Research Software Engineers: Insights from Macro and Micro + Perspectives
  • +
  • Creating a Curriculum for Curating Code
  • +
  • Experiences in Teaching Prospective Research Software Engineers at the + University of Chicago
  • +
  • Foundational competencies of an RSE: state of the project and what comes + next
  • +
+
+
    +
  • Leveraging LLMs for Effective Coding
  • +
  • Bridging the Valley of Death: From Research to Deployment in AI/ML
  • +
  • Enhancing the application of large language models with retrieval-augmented + generation for a research community
  • +
  • An Empirical Survey of GitHub Repositories at Research Universities
  • +
  • Decentralized collaboration for the AutoRA Python package with a functional + paradigm and namespace packages
  • +
+
+
    +
  • User-Centric Science: Unveiling the Power of Design at NCSA
  • +
  • Containers, structured programming and encapsulation - what would Djikstra + think
  • +
  • Building Bridges and Breaking Barriers: How Research Software Engineers and + Software Engineering Researchers Can Partner for Success
  • +
  • Analyzing the Security Culture of Research Software Engineers
  • +
  • A Mature RSE Capability Examined: Space Analysis and Applications at Johns + Hopkins APL
  • +
  • Future Beyond Borders: Transforming the Research Software through + Internationalization, Localization and Translation
  • +
1:00 PMBirds of a Feather / Workshop
12:00 + PMLunch
2:30 PMBreak
1:00 + PMWorkshop 1: "Research + Data Automation with Globus Flows and Globus Compute," Lee Liming and + Steve Turoscy, Hopi/TewaWorkshop 2: "Community + discussion: teachingRSE project," Jan Philipp Thiele, TaosWorkshop 3: + "Emerging as a Team Leader through Cultural Challenges," Elaine M. + Raybourn, Angela Herring, and Ryan Shaw, Cochiti
3:00 PMBirds of a Feather / Workshop
2:30 + PMBreak
6:00 PMConference Dinner / Award Ceremony
3:00 PMWorkshop 1: "Research + Data Automation with Globus Flows and Globus Compute," Lee Liming and + Steve Turoscy, Hopi/TewaWorkshop 2: "Community + discussion: teachingRSE project," Jan Philipp Thiele, TaosWorkshop 3: + "Emerging as a Team Leader through Cultural Challenges," Elaine M. + Raybourn, Angela Herring, and Ryan Shaw, Cochiti
Thursday, Oct 17th
8:00 AMBreakfast
9:00 AMSponsor Talks
10:00 AMBreak
10:30 AMRapid Access Microtalks / Technical Session
12:00 PMLunch +
5:00 + PMPoster Session / Happy hour, La Sala
6:30 + PMMentor/Mentee Program Ice Cream Social, TBD
Wednesday, Oct 16th
8:00 + AMBreakfast
9:00 + AMKeynote: + Sandra Gesing, Ballroom A
10:00 + AMBreak
10:30 + AMBoFs 1A: Better Scientific Software Fellowship Community, Hopi/TewaWorking + Group Fair, TaosBoFs 1B: RSEs in domain-specific ecosystems, Cochiti
12:00 + PMLunch
1:00 PMSession 2A: Software + Sustainability and Legacy Code, Hopi/TewaWorkshop 4: + "Establishing RSE Programs - From early stage formalization to mature + models," Ian Cosden, Sandra Gesing, and Adam Rubens, TaosSession 2B: Unique Stories in Research Software Experience, Cochiti
+
    +
  • Software Resurrection: Discovering Programming Pearls by Showing + Modernity to Historical Software
  • +
  • Towards Sustainable and Discoverable Computational Models: The Engineering + Common Model Framework (ECMF) Project at Sandia
  • +
  • Fortran at NERSC: A Cycle of Support
  • +
  • Preferred Practices Through a Project Template
  • +
  • Defining, Measuring, and Enhancing Sustainability in Scientific Software: + Insights from CORSA
  • +
+
    +
  • Do good: strategies for leading an inclusive collaborative data science and RSE teams
  • +
  • The UCAR SEA: Adapting a legacy employee resource group into an RSE + community of practice
  • +
  • The Long Tale of NCSA’s RSEs
  • +
  • The Creation of an RSE Career Path at Princeton University
  • +
  • Long-time listener, first-time caller: My RSE Identity Journey
  • +
  • The long and winding road: Building and growing a research engineering team + at the UK's national institute for data science and AI
  • +
+
1:00 PMBirds of a Feather / Technical Session
2:30 + PMBreak
2:30 PMBreak
3:00 PMSession 3: WetWare: + Research Software in Chemical and Life Sciences, Hopi/TewaWorkshop 4: + "Establishing RSE Programs - From early stage formalization to mature + models," Ian Cosden, Sandra Gesing, and Adam Rubens, TaosBoFs 2: + Teaching Research Software Engineering, Cochiti
3:00 PMBirds of a Feather / Technical Session
5:00 PMEnd of Technical Conference Material
+
    +
  • Lab Dragon: An electronic Laboratory
  • +
  • Using Radicals to Empower Budding Computational Chemists
  • +
  • Software testing in a community-driven web-based analysis platform for life + science research
  • +
  • Honeycomb: a template for reproducible psychophysiological tasks for + clinic, laboratory, and home use
  • +
  • Developing a Cloud First Personalized Ecological Momentary Assessments + Delivery Tool
  • +
  • A Rising Tide: One model for improving research software use in the life + sciences
  • +
+
6:00 + PMConference Dinner / Award Ceremony, Ballroom A
Thursday, Oct 17th
8:00 + AMBreakfast
9:00 + AMSponsor Talks, Ballroom A
10:00 + AMBreak
10:30 AMRapid + Access Microtalks (RAM), Ballroom ABoFs 3: Brainstorming + Strategies for Cultivating Successful and Collaborative RSE Teams, Hopi/TewaSession 4: Case Studies in Research Software, Cochiti
+
    +
  • Leveraging CVMFS for Scaling and Optimizing JupyterHub-based Gateways
  • +
  • From Core to Atuin: Citing Software All The Way Down The Stack
  • +
  • Reproducing notebooks without (too much) effort: a simple but effective + automated workflow for US-RSE’24 and beyond using Binder and GitHub Actions
  • +
  • Building a composable stack for research cyberinfrastructure
  • +
  • Community Resilience Research Using IN-CORE - Case Study with 2011 Tornado + Event at Joplin, MO
  • +
+
12:00 + PMLunch
1:00 PMBoFs 4A: Exploring the + Potential Impact of Advancements in Artificial Intelligence on the RSE + Profession, Hopi/TewaBoFs 4B: Mapping Open + Source Science, TaosSession 5: RSE in Action!, Cochiti
+
    +
  • Harnessing the power of HPC from the comfort of R
  • +
  • Population Modeling Workflow in OpenStack Cloud
  • +
  • PROBE4RSE: Provenance for Replay OBservation Engine for Research Software + Engineers
  • +
  • Codefair: Your Personal Assistant for Developing FAIR Software
  • +
  • Hawai'i Climate Data Portal API Demo
  • +
+
2:30 + PMBreak
3:00 PMBoFs 5A: Navigating the Remote Landscape: Working Effectively with + Stakeholders, Hopi/TewaBoFs 5B: Sharing lessons + learned on the challenges of fielding research software proof-of-concepts / + prototypes in Department of Defense (DoD) and other Government environments, TaosSession 6: Reproducible Software Ecosystems, Cochiti
+
    +
  • Importance of Science Gateway Frameworks for Research and Their Benefits for Research Software Engineers
  • +
  • Transforming Academic Publishing: A Jupyter Notebook-Based Submission + Ecosystem implemented for SEA ISS Conference for Enhanced Open Science and + Reproducibility
  • +
  • Documenting Research Artifacts for Reproducibility
  • +
  • The KBase Narrative: Reproducible, FAIR data access through Jupyter + notebooks
  • +
  • From bench to desktop: a case study in enabling reproducible data analysis + with RainFlow
  • +
  • Information Security Engineering and Research Software Engineering: Shared + Goals, Shared Approaches, and Shared Success
  • +
+
4:30 + PMAdjourn