diff --git a/art.qmd b/art.qmd index 22fdbb9..9105c87 100644 --- a/art.qmd +++ b/art.qmd @@ -2,14 +2,118 @@ title: "GxP Compliance" --- -## How do you establish reproducibility and traceability? +## How do you establish accuracy, reproducibility and traceability? -GxP compliance means establishing accuracy, reproducibility, and traceability. When working with open source solutions to process and analyze clinical trial data: +FDA definition of validation and GxP compliance means establishing accuracy, reproducibility, and traceability. When working with open source solutions to process and analyze clinical trial data: - How do we establish reproducibility of the outputs? - How do we establish traceability of the input through to the output? +## Are we validating R packages or the R environment? + +When discussing R validation, most of the discussion centres around validation of R functions and packages individually. Do they give the "right" or "expected" answer within a defined tolerability / accuracy? Can we trust the developers of the packages that they have adequately tested and documented all aspects of their package? Are there some packages that don't need validation? + +Using the FDA definition of "validation" - establishing accuracy, reproducibility and traceability - the accuracy part is only one piece of the puzzle. The ability to get this accurate answer every time, and often over a period of years (so maintaining a validated environment) is also a critical part. It is not unknown for regulatory agencies to request re-analysis or extension of analysis to additional data many years after a regulatory submission. In that event it may be expedient to be able to reuse the environment from the original analysis, rather than trying to replicate the old analysis in a new environment, with new versions of software and packages. + +With that in mind, having a validated ***environment***, tested and held under change control and able to reproduce results for many years to come would seem like an important aspect of the FDA validation definition. But this raises some interesting questions and problems: how do we maintain a "snapshot" of that environment such that it will be reproducible for years, when operating systems and underlying core software changes more frequently? How do we balance the need for stability and change control with the desire from scientists to have "latest and greatest" packages and versions of packages in order to be most efficient and do the best science? Can both of these states be possible? + +## R Validation Hub and \\{riskmetric\\} + +The R Consortium R Validation Hub Working Group have prepared a white paper on R validation, what it means from a package and environment perspective. They propose a risk-based approach where a package's risk is based on a collection of metrics such as + +- testing and documentation + +- active development and contribution + +- maintenance and release cycles + +- usage and licensing + +The {[riskmetric](https://pharmar.github.io/riskmetric/articles/riskmetric.html)} package (@riskmetric) has been developed to help collect these metrics for packages on CRAN or Github and an associated Shiny application {riskassessment} (@riskassessment) has been developed. + +The risk-based approach passes the assessment back to individual organisations. A package may be considered "high risk" but may also be business critical for delivery in a given part of the organisation. So instead of defining a cutoff above which packages would be "not recommended" the R Validation Hub and {riskmetric} are encouraging organisations (and individuals) to give it consideration and perhaps focus effort and resources on assessing and testing packages that are medium to higher risk, rather than those with lower risk. + +### Testing + +Modern package development processes [@RPackages] encourage the use of test cases to demonstrate that functions within the package are performing as expected. Tools such as {covr} [@covr] examines the test cases within a package and reports the test coverage - what proportion of functions within the package have associated tests - which can be used as a metric to what extent the package developer has assessed that functions within the package work as expected (and fail when appropriate). The associated {covtracer} [@covtracer] package uses {covr} to identify which functions do or do not have associated tests. These packages help build a picture of how thoroughly tested functions and packages are. + +While not mandatory for submission to CRAN, this good practice of providing test cases and verifying function behaviour and performance goes a long way to satisfying the validation requirement of accuracy, and, because these tests are run as part of building the R package for inclusion in CRAN, they will be tested across many different operating systems, versions of R, and compatibility with other CRAN packages they may rely on. With this in mind, the question arises: If a package has good test coverage and tests key or critical functions within the package adequately, and if the package is tested via CRAN build across a range of operating systems, is it really necessary to build new tests and verify internal to an organisation that said package does what it is purported to do? + +If a package has ***low*** test coverage, or if key functions are ***not*** tested adequately, then what do we do? Rather than each organisation write new tests, in the spirit of open-source development, surely the correct thing to do is to contribute new tests to the package for the benefit of the broader user community. + +### Documentation + +A key attribute (and expecation) of an R package is that functions accessible to users should be adequately documented. In the past, this was done by the user writing .Rd manual files. Today, tools such as {roxygen2} [@roxygen2] and {pkgdown} [@pkgdown] take a lot of burden from the user and elegant documentation and associated web pages can be built with very little additional effort on the developer. Tools within Integrated Development Environments such as RStudio IDE [@RStudioIDE] can assist the developer in creating the {roxygen2} header based on inputs to the function being written. With these frameworks it is easier than ever to provide quality documentation for functions and packages. Documentation is ***not*** on the FDA list of validation attributes, but it could be argued that without good documentation, the user could easily use functions inappropriately, leading to poor accuracy of results. + +### Active contribution, maintenance and release cycles + +The R Validation Hub {riskmetric} package also assesses whether packages are actively maintained - whether the package has a named maintainer, whether bug reports are being actioned and closed, whether there is a NEWS file and other attributes. This gives a picture of whether the package is actively maintained by the developer / maintainer. Inactive packages increase the associated risk for organisations. If a bug is reported, but not actioned and the community not made aware of the bug, then this is a bad sign for a package. There could be many reasons that a package is no longer maintained - if it works exactly as advertised and has a very limited scope then perhaps it does not need active maintenance, and it may not need new releases. But this should be flagged for the user and their organisation to assess. + +### Licensing and usage + +Packages made open-source should have an associated license. This details liability from the developer's and user's standpoint, scope of use and any limitations, stipulations or expectations around whether modifications of the source code must also be shared publically. It is the user's responsibility to check whether the license permits them to use and / or modify the package in their intended way. Obviously at an organisation level, it may be beneficical to flag any packages with non-permissive licenses. {riskmetric} helps with that process. + +The number of downloads of a package gives some indication of the user base of that package globally. While this doesn't guarantee quality by any means, it does give a measure of how many people are using and interacting with the package and its functions. If the number is large, then an organisation might feel comfortable that issues or errors will be discovered quickly, and that remediation might be also be addressed promptly. If the number is small, then an organisation might wish to examine the package more carefully. This, coupled with a single maintainer, low levels of maintenance, or low test coverage may be a red flag that any issues will be addressed and so risk is increased. + +### CRAN + +When a package is submitted to the Comprehensive R Archive Network (CRAN), there are a number of checks and assessments performed on a package before it is made available through the CRAN mirrors for download and installation. Many of the checks are automated, but the results of the checks are assessed manually and often actions are sent by the CRAN team back to the developer to remedy. The CRAN team also check many of the attributes given above - documentation, maintainer, successful tests etc. + +Packages on CRAN are also assessed for compatibility with other packages on CRAN. Any dependencies, reverse dependencies (packages that depend on the submitted package), are assessed so that we can be sure that packages on CRAN work together as expected. This reduces uncertainty greatly, since any given daily snapshot of CRAN is guaranteed to work with conflicts. + +This level of testing and rigour should not be taken for granted. The CRAN team ensures quality in the package set made available across CRAN mirrors globally for millions of R users. + +With this, having a package available on CRAN elevates its quality (or lowers its risk) but not to zero. + +### Tidyverse / Posit packages + +Posit / RStudio PBC issued a statement regarding validation of their {tidyverse}, {tidymodels} and {gt} packages and their r-lib Github organisations in 2020 [@tidyValidation]. In it they detail how the packages listed have verifiable software development life-cycle and meet the attributes defined by the R Validation Hub for low risk packages. + +## R package management - the problem + +We have discussed above how each CRAN daily snapshot ensures that packages from that snapshot date are guaranteed to together through the CRAN checks and CRAN team oversight. To address the validation requirement of reproducibility and traceability, the obvious answer would be to pick a snapshot date for CRAN packages and commit an R environment to that date, locking the package versions to that snapshot date, testing, documenting and holding it under change control from that point. We would then be able to report the R version, CRAN snapshot date resulting in R package versions for that nominated date. This would enable any third party to recreate our environment and reproduce any analysis done with that environment. All we need is to stick with one snapshot date per release of R. Is that viable? + +But what happens if a user downloads a new package from any other source outside of that snapshot date? If that package has dependencies that are now updated, then the installation process will likely download the updated dependencies. Do we know exactly what has been updated and when? Can we rely on reproducibility for this new set? If the user has kept careful note of exactly what has updated, what packages and versions are now being used, then maybe. + +And over a period of months and years, as users update packages, add new ones, as packages stop being updated and functionality is superceded by other packages, how can we guarantee that at any given time we can "roll back" to our validated set and reproduce results? + +The tension comes because there are two competing priorities: + +1. having a set of packages under a well-managed, change controlled, tested and documented environment. + +2. being flexible enough to update packages, versions at an arbitrary time in order to get the latest, greatest features and functionality to address today's work. + +## R package management - possible solutions + +At an organisation level, we can use containerisation technologies like Podman or Docker to capture a complete R environment including underlying operating system, base R, and R packages. This will snapshot the complete environment and ensure reproducibility for the long term, as we can return to the container at any point (provided it is held under change control), deploy it and reproduce a given analysis at an arbitrary point in time. Because of the effort required to build, test, document and deploy the environment however, we can't do this on a daily basis (or for every CRAN snapshot date). Instead we may wish to do this periodically - for example with each R minor release e.g. R-4.2, R-4.3. This approach well serves the first priority above - a well-managed, change controlled, tested and documented environment. + +But that does not address the second priority above - requiring a flexible environment where users can get the latest and greatest set of packages. + +To address this need, we would need to turn to R package environment management tools like {renv}. {renv} works in the context of a project and creates a self-contained cache of R packages within the project so that all packages required for the project are kept alongside the project work. This is ideal for most cases where the managed container doesn't contain all packages required for a project. However it relies on the project owner to maintain this package set and to qualify and document the package set used. + +For both approaches, we need a single source of R packages that allows users to "go back in time" to any given snapshot date in order to recreate the state of the R environment used in the project or analysis. Posit provides Package Manager . This provides a snapshot of CRAN across various dates and unique URLs for users to define the specific repository used to access R packages on that snapshot date. Previously Microsoft R Archive Network (MRAN) provided similar functionality, but unfortunately this service is no longer maintained. + +While the functionality above provides a means for individuals to instantiate and maintain a reproducible R environment complete with arbitrary packages needed for a specific analysis, it relies on individuals to appropriately document and maintain that environment. The barrier to ***doing it*** is lowered, but the discipline involved in capturing details of the environment and then managing it for the longer term becomes higher than the change controlled container environment, which may be done at the organisational level, rather than the individual level. It's also almost impossible to retrofit the documentation and R package management environment. Who knows whether the environment actually used in the analysis entirely matches the defined package set and local environment presented? + +The other issue is that only a minority of analysts are careful enough to manage their environment to this level. And for a "validated environment" for regulatory use, the level of information required to completely reproduce the environment is quite substantial. Are we confident that we have provided sufficient information for agency staff to do so? + +## Traceability - Where is the R log? + +Many tools are available then to help document what R environment was used for the analysis, and many of these are easily accessible if the analyst chooses to use them. + +{sessioninfo} provides information about what packages were loaded within the current R session, what underlying operating system was in use, what R version, what R packages and versions and where those R packages were installed from. This is the minimal information that should be provided for any analysis performed with R. Combined with package management tools like Posit Package Manager or CRAN, this information will allow a good attempt at recreating the environment. + +A {renv} lock file is a similar tool that would allow a third party to recreate the R environment used in analysis, and it contains the URL of repositories used to install packages. Provided the URLs are accessible to the third party, it should provide additional confidence in the ability to recreate the environment. + +The {logrx} package aims to go another step in providing additional metadata, user, session and run time information. By providing a wrapper for running an R script, it also prevents the possibility of users running commands interactively out of sequence, or picking objects from the Global Environment inside R instead of re-calculating. + +This list is not exhaustive by any means, but each tool gives additional information that would assist forensic recreation of the R environment at an arbitrary time point. They do not magically recreate the R environment, only document what ***was*** used. + +## The ideal solution + +The best solution is to combine a tool such as {logrx} with a well managed container environment deployment of R as described above. This proves that the container was unchanged from the original, tested and documented environment. And assuming that original container is held under Software Development Life Cycle change control, we can be confident of accuracy, reproducibility and traceability, and hence a "validated" environment. + ## How to Contribute Contribute to the discussion here in GitHub Discussions:\ diff --git a/references.bib b/references.bib index 0220dbd..3880e15 100644 --- a/references.bib +++ b/references.bib @@ -16,4 +16,197 @@ @article{knuth84 numpages = {15} } +@misc{Bell2006, + year = {2006}, + title = {Issues with Open Source Statistical Software in Industry: Validation, Legal + Issues, and Regulatory Requirements + }, + howpublished = {\url{https://ww2.amstat.org/meetings/jsm/2006/onlineprogram/index.cfm?fuseaction=people_index&letter=B} + }, + author = {Bell, B.} +} + +@misc{Soukup2007, + note = {Accessed: 2024-03-18}, + year = {2007}, + title = {Using R: Perspectives of a FDA Statistical RevieweR}, + publisher = {useR}, + howpublished = {\url{https://www.r-project.org/conferences/useR-2007/program/presentations/soukup.pdf} + }, + author = {Soukup, M.} +} + +@misc{Schuette2018, + note = {Accessed on: 2024-03-18}, + year = {2018}, + title = {Using R in a regulatory environment: some FDA perspectives}, + howpublished = {\url{https://rinpharma.com/publication/rinpharma_7/}}, + author = {Schuette, P.} +} + +@misc{FDA2015, + note = {Accessed on: 2024-03-18}, + year = {2015}, + title = {Statistical Software Clarifying Statement}, + howpublished = {\url{https://www.fda.gov/media/161196/download}}, + author = {U.S. Food & Drug Administration (FDA)} +} + +@misc{Novo2023, + note = {Accessed on: 2024-03-18}, + year = {2023}, + title = {Novo Nordisk's Journey to an R based FDA Submission}, + howpublished = {\url{https://www.youtube.com/watch?v=t33dS17QHuA}}, + author = {Knoph, A.S., Larsen, S.F., Bilgrau, A.E.} +} + +@misc{Roche2023, + note = {Accessed on: 2024-03-18}, + year = {2023}, + title = {Shifting to an Open-Source Backbone in Clinical Trials with Roche}, + howpublished = {\url{https://www.youtube.com/watch?v=nqJsLSLd39A}}, + author = {Neitman, T., Martin, K., Leng, N., Black, J.} +} + +@misc{SubmissionsWG2021, + note = {Accessed on: 2024-03-18}, + year = {2021}, + title = {R Consortium Submissions Working Group}, + howpublished = {\url{https://rconsortium.github.io/submissions-wg/}}, + author = {} +} + +@misc{OSNDAs, + note = {Accessed on: 2024-03-20}, + title = {Open-Source-in-New-Drug-Applications-NDAs-FDA}, + howpublished = {\url{https://github.com/philbowsher/Open-Source-in-New-Drug-Applications-NDAs-FDA}}, + author = {Bowsher, P.} +} + +@misc{RValidation, + note = {Accessed on: 2024-03-20}, + title = {R Validation Hub}, + howpublished = {\url{https://www.pharmar.org/}}, + author = {} +} + +@Manual{riskmetric, + title = {riskmetric: Risk Metrics to Evaluating R Packages}, + author = {{R Validation Hub} and Doug Kelkhoff and Marly Gotti and Eli Miller and Kevin K and Yilong Zhang and Eric Milliman and Juliane Manitz}, + year = {2024}, + note = {R package version 0.2.4.9000, https://github.com/pharmaR/riskmetric}, + url = {https://pharmar.github.io/riskmetric/}, + note = {Accessed on: 2024-03-20} +} + +@Manual{riskassessment, + title = {riskassessment: A web app designed to interface with the `riskmetric` package}, + author = {Aaron Clark and Robert Krajcik and Jeff Thompson and Lars Andersen and Andrew Borgman and Marly Gotti and Maya Gans and Aravind Reddy Kallem and {Fission Labs India Pvt Ltd}}, + year = {2023}, + note = {R package version 3.0.0}, + url = {https://github.com/pharmaR/riskassessment}, + note = {Accessed on: 2024-03-20} + } + +@misc{Merck_Validation, + note = {Accessed on: 2024-03-20}, + title = {R Validation Hub}, + howpublished = {\url{https://www.pharmar.org/}}, + author = {} +} + +@book{RPackages, + title = {R Packages}, + author = {Wickham, Hadley and Bryan, Jenny}, + year = {2023}, + publisher = {O'Reilly Media, Incorporated}, + isbn = {9781098134945}, + abstract = {Organize, Test, Document, and Share Your Code}, + language = {en} +} + +@Manual{covr, + title = {covr: Test Coverage for Packages}, + author = {Jim Hester}, + year = {2023}, + note = {R package version 3.6.4, https://github.com/r-lib/covr}, + url = {https://covr.r-lib.org}, +} + +@Manual{covtracer, + title = {covtracer: Tools for contextualizing tests}, + author = {Doug Kelkhoff and Andrew McNeil}, + year = {2024}, + note = {R package version 0.0.0.9016}, + url = {https://github.com/genentech/covtracer}, +} + +@Manual{roxygen2, + title = {roxygen2: In-Line Documentation for R}, + author = {Hadley Wickham and Peter Danenberg and Gábor Csárdi and Manuel Eugster}, + year = {2024}, + note = {R package version 7.3.1, https://github.com/r-lib/roxygen2}, + url = {https://roxygen2.r-lib.org/}, +} + +@Manual{pkgdown, + title = {pkgdown: Make Static HTML Documentation for a Package}, + author = {Hadley Wickham and Jay Hesselberth and Maëlle Salmon}, + year = {2023}, + note = {R package version 2.0.7, https://github.com/r-lib/pkgdown}, + url = {https://pkgdown.r-lib.org}, +} + +@Manual{RStudioIDE, + title = {RStudio: Integrated Development Environment for R}, + author = {{RStudio Team}}, + organization = {RStudio, PBC.}, + address = {Boston, MA}, + year = {2020}, + url = {http://www.rstudio.com/}, +} + +@misc{tidyValidation, + note = {Accessed: 2024-03-27}, + year = {2020}, + title = {tidyverse, tidymodels, r-lib, and gt R packages: Regulatory Compliance and Validation Issues}, + publisher = {Posit / RStudio PBC}, + url = {https://www.rstudio.com/assets/img/validation-tidy.pdf} +} + +@Manual{renv, + title = {renv: Project Environments}, + author = {Kevin Ushey and Hadley Wickham}, + year = {2024}, + note = {R package version 1.0.5, https://github.com/rstudio/renv}, + url = {https://rstudio.github.io/renv/}, +} + +@misc{Podman, + title = {Getting Started with Podman}, + author = {Podman Team}, + year = {2024}, + url = {https://podman.io/docs}, + note = {Accessed: 2024-03-27}, +} + +@misc{PPM, + title = {Posit Public Package Manager}, + url = {https://packagemanager.posit.co/}, + note = {Accessed: 2024-03-27}, +} + +@Manual{sessioninfo, + title = {sessioninfo: R Session Information}, + author = {Hadley Wickham and Winston Chang and Robert Flight and Kirill Müller and Jim Hester}, + year = {2022}, + note = {https://github.com/r-lib/sessioninfo#readme, https://r-lib.github.io/sessioninfo/}, +} +@Manual{logrx, + title = {logrx: A Logging Utility Focus on Clinical Trial Programming Workflows}, + author = {Nathan Kosiba and Thomas Bermudez and Ben Straub and Michael Rimler and Nicholas Masel and Sam Parmar}, + year = {2023}, + note = {R package version 0.3.0}, + url = {https://github.com/pharmaverse/logrx}, +} \ No newline at end of file diff --git a/reg_accept.qmd b/reg_accept.qmd index 0b74f74..a501821 100644 --- a/reg_accept.qmd +++ b/reg_accept.qmd @@ -10,18 +10,53 @@ title: "Regulatory Acceptance" - Are there technical considerations for the creation of submission data packages? -## How to Contribute +## (Perceived) Fear + +There has been a long debate about whether regulatory agencies would accept submissions using R and Open Source tools. This is in spite of communications from agency staff for ***over 15 years*** refuting this and stating that the agency would not and could not endorse any specific software tool for sponsor submissions [@Bell2006; @Soukup2007; @FDA2015; @Schuette2018]. Yet still there has been reticence from pharmaceutical industry sponsors around using R for clinical trials reporting and analysis of key endpoints. Meanwhile clinical pharmacology and pharmacometrics groups within many pharma companies have been using R for over 10 years in regulatory submissions and interactions, preparing graphics for presentation, model diagnostics and data summaries. So while some business areas within the industry are comfortable using R and other open source tools, other parts are more nervous about committing fully to an open source software solution. + +What is driving this nervousness then? If the regulatory agencies are telling us that they will accept submissions with R then what is stopping it from being more widely used within and across pharmaceutical companies? + +Perhaps it is down to fear of having a dossier rejected by a regulatory agency - "Refusal to file" - due to application deficiencies. In this context, could the use of open source software trigger this "Refusal to file"? Since "Refusal to file" could have a significant impact on a company's reputation, confidence and ultimately their stock price, it is understandable that companies are cagey. Also, who would want to be first to try out this untrodden path? It is easier to follow in the footsteps of somebody else than to forge a path yourself. + +## (Reducing) Uncertainty + +> "...there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns --- the ones we don't know we don't know." +> +> -- Donald Rumsfeld + +Fortunately we now have a growing number of examples of companies who are on this journey towards using more open source software in drug development and, critically, in regulatory interactions and submission [@Novo2023; @Roche2023; @OSNDAs]. There is also the R Consortium Submissions Working Group [@SubmissionsWG2021] - a cross industry pharma working group focusing on improving practices of R-based clinical trial regulatory submissions. This group has open dialogue with colleagues at the US FDA regulatory agency around technical issues of filing submissions using R deliverables and pilots these using open source examples in order to check and verify that submissions via the FDA electronic submissions process are received and can be validated within the agency. This eliminates some elements of fear and uncertainty from sponsors around the technical aspects, allows regulatory agency colleagues to provide feedback with the working group members in an environment which is not high-pressure or business critical to a sponsor. It is in the interest of both regulators and industry to modernise, to acknowledge a shift in tools that are being used to conduct clinical development of new therapeutics and be ready for whatever future state emerges. To be able to do this with increased confidence and mutual understanding is vital to this modernisation. + +The Novo Nordisk experience [@Novo2023] provides critical learnings. In their filing to US FDA, PMDA and other health authorities, they used a combination of SAS and R to prepare submission datasets and deliverables. They described their process as being an evolution, rather than revolution, using R to replicate tables and figures usually generated via SAS. Key also to the process was building an infrastructure and environment around R that would meet expectations pertaining to quality and reproducibility. This, more so than the replication of SAS output provides the backbone of "validated" work in R. They communicated in advance of submission with the regulatory authorities, which allowed both parties (sponsor and authority) to set expectations and "get ready" in good time. This did not prevent "Information Requests" from the FDA for clarification - causing anxious moments for the sponsor - but in the end this was about information sharing, reaching mutual understanding, for clarification and ensuring that the FDA internal environment would match Novo Nordisk's rather than a refusal to accept open source software approaches. The results of their work is giving Novo Nordisk and the pharmaceutical industry generally, more confidence to move forward with R and open source tools as the core of their submission work. + +Similarly, Roche [@Roche2023] is spending a great deal of effort in preparation - organisationally and technologically - for submission to regulatory agencies. The common thread across both Novo Nordisk and Roche / Genentech is about preparation and dialogue within the company (with regulatory, quality assurance and process groups) and with regulatory agencies in the lead up to submission. Proper preparation prevents poor performance, as the aphorism goes. -Contribute to the discussion here in GitHub Discussions: +## (Avoiding) Doubt -1. [Will the **FDA** accept data and analyses generated with solutions developed and available as open source?](https://github.com/phuse-org/OSTCDA/discussions/6){target="_blank"} +As has been mentioned, the Submissions Working Group of the R Consortium is actively partnering with US FDA to pilot submissions through the agency's electronic submissions portal. These pilot submissions are created and presented on Github as open source code, so it is possible for anyone to review what was submitted, how, and the agency's feedback and report on which elements were successful and any issues found with the submission. The pilots are vital for sponsors to understand the mechanisms of submission using open source software like R, and for regulatory agencies to understand how to reconstitute sponsor software environments, install packages and confirm results. This dialogue reduces the likelihood of any fundamental issues with submission using open source software, but does not completely eliminate sources of potential problems. In a sense, the pilots uncover known unknowns - the things that we expect to cause issues - so that they can be discussed and addressed on both sides. The sponsor needs to be sure to provide sufficient information that the agency has the best chance of recreating the environment for reproducing the results. Through the pilot process then, both parties can understand what needs to be communicated at the time of submission to reduce unexpected findings and business critical issues that need to resolved quickly. -2. [Will **other regulatory agencies** accept data and analyses generated with solutions developed and available as open source?](https://github.com/phuse-org/OSTCDA/discussions/7){target="_blank"} +The pilots are also allowing sponsors and agencies to look at modernising the content of the submission, moving from static tables, listings and figures to more dynamic, interactive presentations through web applications and dynamic HTML presentations. This goes beyond the "evolution" described by Novo Nordisk towards a true "revolution" in what sponsors submit to agencies and how they review the contents of those submissions. -## Guidance +Yet still some doubts persist. If a sponsor submits analytical results using Tool X, but the agency reviewer re-analyses the data with Tool Y and sees a different result, who owns reconciling the differences? The sponsor, via a time-bound Information Request from the agency? The agency? The developer of Tool X? When the majority of analysis uses only one tool for analysis, this situation is likely to be a rare occurence. But when the toolset available to both sponsors and agencies becomes wider, then resolving these questions is likely to come up more often. + +And what about agencies who typically do not re-analyse sponsor data? How can the industry provide reassurance and proof that analytical environments are validated: accurate, reproducible and traceable? When installation of software into an environment is a one-step process, we can be pretty sure of consistency across analysts. But if there are multiple steps involved, and the software can change almost daily, then how do we ensure this consistency and reproducibility? This is not just the job of IT organisations within sponsor companies, but the responsibility of the individual analyst, to ensure that they are working compliant with organisational processes and in validated environments. Audit trails and traceability helps ensure this, but it is a potential source of doubt for both sponsor and regulatory authority. + +## How to Contribute + +Contribute to the discussion here in GitHub Discussions:\ +[Will the **FDA** accept data and analyses generated with solutions developed and available as open source?](https://github.com/phuse-org/OSTCDA/discussions/6){target="_blank"} + +All contributions should: - Provide your thoughts and perspectives - Provide references to articles, webinars, presentations (citations, links) - Be respectful in this community + +## References + +Bell, B "Issues with Open Source Statistical Software in Industry: Validation, Legal Issues, and Regulatory Requirements" ASA JSM 2006. + +Sukop, M "Using R: Perspectives of a FDA Statistical RevieweR". UseR 2007 + +U.S. Food & Drug Administration. (2015, May 6).Statistical Software Clarifing Statement. Retrieved from FDA.gov: