diff --git a/.RData b/.RData index 34ceee8..b096994 100644 Binary files a/.RData and b/.RData differ diff --git a/README.md b/README.md index 99f2f14..29d45e7 100644 --- a/README.md +++ b/README.md @@ -2,55 +2,42 @@ Data-Driven Optimization of Mass Spectrometry Methods +![Python Package](https://github.com/SlavovLab/DO-MS-DIA/actions/workflows/python-package.yml/badge.svg) ![GitHub release](https://img.shields.io/github/release/SlavovLab/DO-MS.svg) ![GitHub](https://img.shields.io/github/license/SlavovLab/DO-MS.svg) * [Website](https://do-ms.slavovlab.net) * [Get started now](#getting-started) * [Download](https://github.com/SlavovLab/DO-MS/releases/latest) -* [Manuscript](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) +* [BioArxiv Preprint](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) - + ## Getting Started Please read our detailed getting started guides: -* [Getting started on the application](https://do-ms.slavovlab.net/docs/getting-started-application) -* [Getting started on the command-line](https://do-ms.slavovlab.net/docs/getting-started-command-line) +* [Getting started with DIA preprocessing](https://do-ms.slavovlab.net/docs/getting-started-preprocessing) +* [Getting started with DIA reports](https://do-ms.slavovlab.net/docs/getting-started-dia-app) +* [Getting started with DDA reports](https://do-ms.slavovlab.net/docs/getting-started-dda-app) ### Requirements - -This application has been tested on R >= 4.0.0 and R <= 4.0.2, OSX 10.14 / Windows 7/8/10. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and MaxQuant < 2.0 +This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10/11. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and DIA-NN > 1.8.1. The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. -### Installation - -Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest). - -### Running +### Running the Interactive Application The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file - +![](https://do-ms.slavovlab.net/assets/images/do-ms-proj.png){: width="70%" .center-image} and clicking the "Run App" button at the top of the application, after opening the `server.R` file. We recommend checking the "Run External" option to open the application in your default browser instead of the RStudio Viewer. - +![](https://do-ms.slavovlab.net/assets/images/do-ms-run.png){: width="70%" .center-image} You can also start the application by running the `start_server.R` script. -### Automated Report Generation - -You can automatically generate PDF/HTML reports without having to launch the server by running the `do-ms_cmd.R` script, like so: - -``` -$ Rscript do-ms_cmd.R config_file.yaml -``` - -This requires a configuration file, and you can [find an example one here](https://github.com/SlavovLab/DO-MS/blob/master/example/config_file.yaml). See [Automating Report Generation](https://do-ms.slavovlab.net/docs/automation) for more details and instructions. - ### Customization DO-MS is designed to be easily user-customizable for in-house proteomics workflows. Please see [Building Your Own Modules](https://do-ms.slavovlab.net/docs/build-your-own) for more details. @@ -59,9 +46,9 @@ DO-MS is designed to be easily user-customizable for in-house proteomics workflo Please see [Hosting as a Server](https://do-ms.slavovlab.net/docs/hosting-as-server) for more details. -### Search Engines Other Than MaxQuant +### Supporting other Search Engines -This application is currently maintained for MaxQuant >= 1.6.0.16. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ](https://do-ms.slavovlab.net/docs/other-search-engines) for more details. +This application is currently maintained for (MaxQuant)[https://www.nature.com/articles/nbt.1511] >= 1.6.0.16 and (DIA-NN)[https://www.nature.com/articles/s41592-019-0638-x] >= 1.8. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ](https://do-ms.slavovlab.net/docs/other-search-engines) for more details. ### Can I use this for Metabolomics, Lipidomics, etc... ? @@ -71,13 +58,8 @@ While the base library of modules are based around bottom-up proteomics by LC-MS ## About the project - - The manuscript for this tool is published at the Journal of Proteome Research: [https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) - -The manuscript is also freely available on bioRxiv: [https://www.biorxiv.org/content/10.1101/512152v1](https://www.biorxiv.org/content/10.1101/512152v1). +The manuscript for the extended version 2.0 can be found on bioArxiv: [https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) Contact the authors by email: [nslavov\{at\}northeastern.edu](mailto:nslavov@northeastern.edu). @@ -87,21 +69,11 @@ DO-MS is distributed by an [MIT license](https://github.com/SlavovLab/DO-MS/blob ### Contributing -Please feel free to contribute to this project by opening an issue or pull request. - - - - +Please feel free to contribute to this project by opening an issue or pull request in the [GitHub repository](https://github.com/SlavovLab/DO-MS). ------------- ## Help! -For any bugs, questions, or feature requests, +For any bugs, questions, or feature requests, please use the [GitHub issue system](https://github.com/SlavovLab/DO-MS/issues) to contact the developers. diff --git a/do-ms_cmd.R b/do-ms_cmd.R old mode 100755 new mode 100644 index 9a53d1c..033e884 --- a/do-ms_cmd.R +++ b/do-ms_cmd.R @@ -244,7 +244,7 @@ for(i in 1:length(config[['load_misc_input_files']])) { prnt(paste0('Loading misc file: ', name)) # read in as data frame (need to convert from tibble) - data[[name]] <- as.data.frame(read_tsv(file=path)) + data[[name]] <- as.data.frame(read_tsv(file=path, col_types = cols())) # rename columns (replace whitespace or special characters with '.') colnames(data[[name]]) <- gsub('\\s|\\(|\\)|\\/|\\[|\\]', '.', colnames(data[[name]])) @@ -363,6 +363,8 @@ for(f in config[['load_input_files']]) { # sort the raw files raw_files <- sort(raw_files) + + # load naming format file_levels <- rep(config[['exp_name_format']], length(raw_files)) @@ -377,8 +379,12 @@ file_levels <- str_replace(file_levels, '\\%i', as.character(seq(1, length(raw_f # folder name is stored as the names of the raw files vector file_levels <- str_replace(file_levels, '\\%f', names(raw_files)) + # replace %e with the raw file name file_levels <- str_replace(file_levels, '\\%e', raw_files) +print(file_levels) + +print(config[['exp_name_pattern']]) # apply custom string extraction expression to file levels if(!is.null(config[['exp_name_pattern']])) { @@ -387,11 +393,13 @@ if(!is.null(config[['exp_name_pattern']])) { file_levels[is.na(file_levels)] <- 'default' } +print(file_levels) + # apply custom names, as defined in the "exp_names" config field if(!is.null(config[['exp_names']]) & length(config[['exp_names']]) > 0) { file_levels[1:length(config[['exp_names']])] <- config[['exp_names']] } - +print(file_levels) # ensure there are no duplicate names # if so, then append a suffix to duplicate names to prevent refactoring errors @@ -477,4 +485,4 @@ generate_report(input, f_data, raw_files, config[['output']], progress_bar=FALSE # prnt(paste0('Report written to: ', config[['output']])) -prnt('Done!') +prnt('Done!') \ No newline at end of file diff --git a/docs/Gemfile b/docs/Gemfile index 3be9c3c..457656e 100644 --- a/docs/Gemfile +++ b/docs/Gemfile @@ -1,2 +1,6 @@ source "https://rubygems.org" gemspec + +gem "rexml", "~> 3.2" + +gem "webrick", "~> 1.8" diff --git a/docs/_config.yml b/docs/_config.yml index b3ce700..3eccc36 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -26,17 +26,15 @@ search_enabled: true # Aux links for the upper right navigation aux_links: - "DO-MS Preprint": - - "https://www.biorxiv.org/content/10.1101/512152v1" - "JPR Article": - - "https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039" + "BioArxiv Preprint": + - "https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1" "GitHub Repository": - "https://github.com/SlavovLab/DO-MS" "Slavov Lab": - "http://slavovlab.net" github_link: https://github.com/SlavovLab/DO-MS -preprint_link: https://www.biorxiv.org/content/10.1101/512152v1 +preprint_link: https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1 jpr_link: https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039 # Color scheme currently only supports "dark" or nil (default) diff --git a/docs/assets/images/dia_input.png b/docs/assets/images/dia_input.png new file mode 100644 index 0000000..b9b341c Binary files /dev/null and b/docs/assets/images/dia_input.png differ diff --git a/docs/assets/images/do-ms-dia-example.png b/docs/assets/images/do-ms-dia-example.png new file mode 100644 index 0000000..8d4ae3f Binary files /dev/null and b/docs/assets/images/do-ms-dia-example.png differ diff --git a/docs/assets/images/do-ms-dia-foldername.png b/docs/assets/images/do-ms-dia-foldername.png new file mode 100644 index 0000000..cf4b699 Binary files /dev/null and b/docs/assets/images/do-ms-dia-foldername.png differ diff --git a/docs/assets/images/do-ms-dia-generate-report.png b/docs/assets/images/do-ms-dia-generate-report.png new file mode 100644 index 0000000..b9632e3 Binary files /dev/null and b/docs/assets/images/do-ms-dia-generate-report.png differ diff --git a/docs/assets/images/do-ms-dia-import.png b/docs/assets/images/do-ms-dia-import.png new file mode 100644 index 0000000..e4d42a6 Binary files /dev/null and b/docs/assets/images/do-ms-dia-import.png differ diff --git a/docs/assets/images/do-ms-dia-info.png b/docs/assets/images/do-ms-dia-info.png new file mode 100644 index 0000000..a77dbbb Binary files /dev/null and b/docs/assets/images/do-ms-dia-info.png differ diff --git a/docs/assets/images/do-ms-dia-load.png b/docs/assets/images/do-ms-dia-load.png new file mode 100644 index 0000000..5c2decb Binary files /dev/null and b/docs/assets/images/do-ms-dia-load.png differ diff --git a/docs/assets/images/do-ms-dia-overview.png b/docs/assets/images/do-ms-dia-overview.png new file mode 100644 index 0000000..6079f1c Binary files /dev/null and b/docs/assets/images/do-ms-dia-overview.png differ diff --git a/docs/assets/images/do-ms-dia-pathname.png b/docs/assets/images/do-ms-dia-pathname.png new file mode 100644 index 0000000..b4b2b57 Binary files /dev/null and b/docs/assets/images/do-ms-dia-pathname.png differ diff --git a/docs/assets/images/do-ms-dia-rename.png b/docs/assets/images/do-ms-dia-rename.png new file mode 100644 index 0000000..debb1eb Binary files /dev/null and b/docs/assets/images/do-ms-dia-rename.png differ diff --git a/docs/assets/images/do-ms-dia-results.png b/docs/assets/images/do-ms-dia-results.png new file mode 100644 index 0000000..a130b41 Binary files /dev/null and b/docs/assets/images/do-ms-dia-results.png differ diff --git a/docs/assets/images/do-ms-dia_config.png b/docs/assets/images/do-ms-dia_config.png new file mode 100644 index 0000000..1d79228 Binary files /dev/null and b/docs/assets/images/do-ms-dia_config.png differ diff --git a/docs/assets/images/do-ms-dia_mode.png b/docs/assets/images/do-ms-dia_mode.png new file mode 100644 index 0000000..0ab4770 Binary files /dev/null and b/docs/assets/images/do-ms-dia_mode.png differ diff --git a/docs/assets/images/do-ms-dia_title.png b/docs/assets/images/do-ms-dia_title.png new file mode 100755 index 0000000..276d1d2 Binary files /dev/null and b/docs/assets/images/do-ms-dia_title.png differ diff --git a/docs/assets/images/do-ms-dia_title_v2.png b/docs/assets/images/do-ms-dia_title_v2.png new file mode 100644 index 0000000..9226b55 Binary files /dev/null and b/docs/assets/images/do-ms-dia_title_v2.png differ diff --git a/docs/assets/images/preprocessing_input.png b/docs/assets/images/preprocessing_input.png new file mode 100644 index 0000000..49ac6e3 Binary files /dev/null and b/docs/assets/images/preprocessing_input.png differ diff --git a/docs/assets/images/preprocessing_output.png b/docs/assets/images/preprocessing_output.png new file mode 100644 index 0000000..a70bd6b Binary files /dev/null and b/docs/assets/images/preprocessing_output.png differ diff --git a/docs/docs/DO-MS_examples.md b/docs/docs/DO-MS_examples.md index 2e85552..1de564b 100644 --- a/docs/docs/DO-MS_examples.md +++ b/docs/docs/DO-MS_examples.md @@ -1,7 +1,7 @@ --- layout: default title: DO-MS Examples -nav_order: 3 +nav_order: 5 permalink: docs/DO-MS_examples parent: Getting Started --- @@ -10,6 +10,13 @@ parent: Getting Started Bellow are links to example DO-MS Reports +## DIA-NN DIA Reports +1. [MS2 number optimization](https://drive.google.com/uc?id=1mNrJsk6uaI3ljtQEadr0XwR5mlyO4xZg&export=download) +2. [Different MS2 strategies](https://drive.google.com/uc?id=1eOJXC2Zb0lmVwbsaj7Zlf6wm3SPGrbWl&export=download) +3. [Gradient length optimization](https://drive.google.com/uc?id=1dQdo3kR-WxWT8zEeOxztxXmJE1rSMsEM&export=download) +4. [MS1 sampling optimization](https://drive.google.com/uc?id=1YcT3al9ICu36_ornSsLZgHj1eTBBPOxb&export=download) +5. [Single cell quality control](https://drive.google.com/uc?id=1v62qJ8JVZ_TSkUa3K0i6nWoJDbW9EQHO&export=download) +## MaxQuant DDA Reports 1. [Isobaric carrier optimization](https://scope2.slavovlab.net/mass-spec/Isobaric-carrier-optimization#do-ms-reports) 2. [Technical improvements of SCoPE2](https://scope2.slavovlab.net/mass-spec/Increased-accuracy-and-throughput) diff --git a/docs/docs/getting_started.md b/docs/docs/getting_started.md index ea238bb..290e93d 100644 --- a/docs/docs/getting_started.md +++ b/docs/docs/getting_started.md @@ -8,4 +8,19 @@ has_children: true # Getting Started -DO-MS can be run either from the command-line or as interactive application. Follow the links below to get started using the implementation of your choice. For more details on the data display, read the [DO-MS article](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039). +DO-MS can be run either from the command-line or as interactive application. Follow the links below to get started using the implementation of your choice. For more details on the data display, read the [DO-MS 2.0 article](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1). + +Before starting DO-MS the first time, the input data type has to be selected. DO-MS can work with both DDA results coming from MaxQuant as well as with DIA results coming from DIA-NN. +The mode can be set in the config.yaml file. Open the file in R-Studio or your editor of choice. +![]({{site.baseurl}}/assets/images/do-ms-dia_mode.png){: width="70%" .center-image} + +If You wish to analyze MaxQuant DDA results, change the parameter to `max_quant`, otherwise leave it as `dia-nn`. This setting needs to be changed before SO-MS is started or the R environment is initialized. It is also possible to keep both versionas in two separate folders simultanously. +![]({{site.baseurl}}/assets/images/do-ms-dia_config.png){: width="70%" .center-image} + +# Generating DO-MS Reports +Please read our detailed getting started guides: + +* [DIA Preprocessing]({{site.baseurl}}/docs/getting-started-preprocessing) +* [DIA Reports using the app]({{site.baseurl}}/docs/getting-started-dia-app) +* [DDA Reports using the app]({{site.baseurl}}/docs/getting-started-application) +* [DDA Reports using the command line]({{site.baseurl}}/docs/getting-started-application) \ No newline at end of file diff --git a/docs/docs/getting_started_app.md b/docs/docs/getting_started_app.md index e207619..3d6c702 100644 --- a/docs/docs/getting_started_app.md +++ b/docs/docs/getting_started_app.md @@ -1,12 +1,12 @@ --- layout: default -title: Interactive Application -nav_order: 1 -permalink: docs/getting-started-application +title: DDA Reports in the App +nav_order: 3 +permalink: docs/getting-started-dda-app parent: Getting Started --- -# Getting Started -- Interactive DO-MS Application +# Getting Started -- DDA Reports in the App DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. @@ -31,13 +31,8 @@ The only constraint for data in DO-MS is that it must be from MaxQuant version > ## Installation -Download the application via. a zip or tar archive from the [GitHub release page](https://github.com/SlavovLab/DO-MS/releases). Unzip the archive, and then open the `DO-MS.Rproj` to load the project into RStudio - -![]({{site.baseurl}}/assets/images/do-ms-proj.png){: width="60%" .center-image} - -To start the app open `server.R` in RStudio and on the top right corner of the editor click on the "Run App" button. To run the application in your browser (preferred option) rather than in RStudio, click the dropdown menu and select "Run External". - -![]({{site.baseurl}}/assets/images/do-ms-run.png){: width="85%" .center-image} +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. +Note that DO-MS has to be configured for MaxQuant data. [Find out More]({{site.baseurl}}/docs/getting-started) ## Data Import diff --git a/docs/docs/getting_started_cmd.md b/docs/docs/getting_started_cmd.md index 655ca80..81ae9dc 100644 --- a/docs/docs/getting_started_cmd.md +++ b/docs/docs/getting_started_cmd.md @@ -1,7 +1,7 @@ --- layout: default -title: Command Line -nav_order: 2 +title: DO-MS Command Line +nav_order: 4 permalink: docs/getting-started-command-line parent: Getting Started --- diff --git a/docs/docs/getting_started_dia_app.md b/docs/docs/getting_started_dia_app.md new file mode 100644 index 0000000..91d8b8e --- /dev/null +++ b/docs/docs/getting_started_dia_app.md @@ -0,0 +1,100 @@ +--- +layout: default +title: DIA Reports in the App +nav_order: 2 +permalink: docs/getting-started-dia-app +parent: Getting Started +--- + +# Getting Started -- DIA Reports in the App + +DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. + +**Table of Contents** + +1. [Example Data](#example-data) +2. [Installation](#installation) +3. [Data Import](#data-import) + 1. [Adding Folders](#adding-folders) + 2. [Renaming Experiments](#renaming-experiments) +4. [Interacting with Modules](#interacting-with-modules) +5. [Generate Report](#generate-report) + + + +## Example Data + +We have provided an example data set online, which contains parts of the MS2 number optimization in the paper. This data can also be obtained by follwoing the guide on the preprocessing or can be downloaded as a zip file here: [https://drive.google.com/file/d/1BzWVKghIThtgYItgGy9vt6M214mHB74q/view?usp=share_link](https://drive.google.com/file/d/1BzWVKghIThtgYItgGy9vt6M214mHB74q/view?usp=share_link). + + +![]({{site.baseurl}}/assets/images/do-ms-dia-example.png){: width="70%" .center-image} + +## Installation + +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. + + +## Data Import +Make sure DO-MS has been configured for DIA-NN data. [Find out More]({{site.baseurl}}/docs/getting-started) + +![]({{site.baseurl}}/assets/images/do-ms-dia-overview.png){: width="90%" .center-image} + +### Adding Folders + +DO-MS is designed to load _folders_ of analyses rather than individual files. To allow quick access to your analyses, as well as to allow analyzing multiple searches simultaneously, DO-MS provides a searchable "folder table" for all of your analyses. We will start loading the folder with the example output. + +Start by clicking the "Add Folder" button at the top of the table + +![]({{site.baseurl}}/assets/images/do-ms-dia-import.png){: width="70%" .center-image} + +Then add the path of your folder into the textbox, as shown: + +![]({{site.baseurl}}/assets/images/do-ms-dia-foldername.png){: width="70%" .center-image} + +A folder path is the folder's absolute location on your machine. On Windows, you can get the folder path by navigating to it in Explorer, clicking on the top file path bar, and copying the resulting text with Ctrl+C. On Mac/OSX, you can get the folder path by right clicking on the folder while presing the 'option' key. + +![]({{site.baseurl}}/assets/images/do-ms-dia-pathname.png){: width="70%" .center-image} + +Note that in the example above we checked "Add Single Folder" to add just the folder path we pasted in. If for example, you have a folder that contains many MaxQuant searches, you can select "Add Child Folders" to add all subfolders of the path specified, or "Add Recursively" to add _all_ folders that are below the path specified. + +Click "Confirm" and now you should see your folder added to the folder table. When you have multiple folders loaded into the table you can select more than one. + +![]({{site.baseurl}}/assets/images/do-ms-dia-load.png){: width="70%" .center-image} + +Finally, load the files from the selected folders by scrolling down and clicking on the big "Load Data" button + +### Renaming Experiments + +An important aspect of data visualization is easy-to-read labels for your experiments. DO-MS provides an accessible interface for renaming your raw file names so that generated figures will be easier to interpret. + +Once your data is loaded, scroll down to the "Renaming Experiments" section. Here you will find a table of your loaded raw files and their associated "labels" that will be used when plotting. Double click a label and type enter the new labe: + +![]({{site.baseurl}}/assets/images/do-ms-dia-rename.png){: width="100%" .center-image} + +Please rename all experiments as following: + +``` +wGW027 => 4 MS2 +wGW028 => 8 MS2 +wGW029 => 10 MS2 +``` + +## Interacting with Modules + +Each module has a short description that can be accessed by hovering over the question mark icon next to the module title. + +![]({{site.baseurl}}/assets/images/do-ms-dia-info.png){: width="70%" .center-image} + +You can also download the module plot as a PNG or PDF, by clicking on the download buttons below each module plot. You can also download the underlying data used for the plot by clicking on the "Download Data" button. This tab-delimited file can be imported into many other visualization packages. + +## Generate Report + +Click on "Generate Report" in the sidebar to access the report generation page. + +![]({{site.baseurl}}/assets/images/do-ms-dia-generate-report.png){: width="70%" .center-image} + +Here you will find some options to customize your report. While we support PDF reports and PDF images, we strongly recommend that you generate your reports in HTML format with PNG image plots. Other configurations may result in graphical glitches or unwanted behavior. In addition we recommend that you check your `pandoc` installation ([more details here]({{site.baseurl}}/docs/known-issues#pandoc-not-found)) as any issues will prevent the report generation. + +Click the "Download Report" button to begin generating the report. This takes a while as all plots have to be remade. A progress bar at the bottom of the page informs you of the progress. + +All images in the report are embedded in the markup, so feel free to share this single file to your colleagues/collaborators and don't worry about having to include anything else. \ No newline at end of file diff --git a/docs/docs/getting_started_preprocessing.md b/docs/docs/getting_started_preprocessing.md new file mode 100644 index 0000000..1e9b6bf --- /dev/null +++ b/docs/docs/getting_started_preprocessing.md @@ -0,0 +1,130 @@ +--- +layout: default +title: DIA Preprocesing +nav_order: 1 +permalink: docs/getting-started-preprocessing +parent: Getting Started +--- + +# Getting Started -- DIA Preprocesing + +DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. + +**Table of Contents** + +1. [Example Data](#example-data) +2. [Installation](#installation) +3. [Processing Raw Data](#processing-raw-data) +4. [Command Line Interface](#command-line-interface) + +## Example Data +We have provided an example data set online, which contains parts of the MS2 number optimization in the paper. You can download a .zip bundle of it here: [https://drive.google.com/file/d/1bjFzKqTFLk7ECUJOTy8LNxsCOD0Xf96Q/view?usp=share_link](https://drive.google.com/file/d/1bjFzKqTFLk7ECUJOTy8LNxsCOD0Xf96Q/view?usp=share_link). The contents of the archive are the main dia-nn report and the corresponding raw files. + +Your folder should look like this: +![]({{site.baseurl}}/assets/images/preprocessing_input.png){: width="70%" .center-image} + +## Installation +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. For using the preprocessing pipeline it is necessary to install the ThermoRawFileParser and the Dinosaur feature detection. + +## Processing Raw Data +Open a terminal and enter the base folder of your DO-MS installation. +Make sure that your DO-MS environment is set up and activate it. +``` +conda activate doms +``` + +For processing, the piplline module located at `pipeline/processing.py` will be called with the following parameters. + +```bash +python pipeline/processing.py /location/to/example/report_filtered.tsv +``` + +the following additional options will be included: +```bash +# Activate Mono if using Mac or Linux. Mono is required to run the Thermo Raw File Parser on Linux and OSX. +-m +# location of the ThermoRawFileParser executeable +--raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe +# location of the Dinosaur .jar file +--dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar +# location of the example raw data +-r /location/to/example +``` + +The full command needs to be a single line and will look like: +```bash +python pipeline/processing.py /location/to/example/report_filtered.tsv -m --raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe --dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar -r /location/to/example +``` + +After processing, the additional files should be part of your folder: +![]({{site.baseurl}}/assets/images/preprocessing_output.png){: width="70%" .center-image} + +Temporary .mzML files can be deleted. + +## Command Line Interface +The documentation for the various command line options can be found by typing `python pipeline/processing.py -h` + +```bash +usage: processing.py [-h] --raw-parser-location RAW_PARSER_LOCATION + [--dinosaur-location DINOSAUR_LOCATION] [-m] [-d] [-v] + [-t TEMPORARY_FOLDER] [-r RAW_FILE_LOCATION] + [--no-feature-detection] [--no-fill-times] [--no-tic] + [--no-sn] [--no-mzml-generation] + [--mz-bin-size MZ_BIN_SIZE] [--rt-bin-size RT_BIN_SIZE] + [--resolution RESOLUTION] [-p PROCESSES] [--isotopes-sn] + report + +Command line tool for feature detection in shotgun MS experiments. Can be used +together with DIA-NN to provide additional information on the peptide like +features identified in the MS1 spectra. + +positional arguments: + report Location of the report.tsv output from DIA-NN which + should be used for analysis. + +options: + -h, --help show this help message and exit + --raw-parser-location RAW_PARSER_LOCATION + Path pointing to the ThermoRawFileParser executeable. + --dinosaur-location DINOSAUR_LOCATION + Path pointing to the dinosaur jar executeable. + -m, --mono Use mono for ThermoRawFileParser under Linux and OSX. + -d, --delete Delete generated mzML and copied raw files after + successfull feature generation. + -v, --verbose Show verbose output. + -t TEMPORARY_FOLDER, --temporary-folder TEMPORARY_FOLDER + Input Raw files will be temporarilly copied to this + folder. Required for use with Google drive. + -r RAW_FILE_LOCATION, --raw-file-location RAW_FILE_LOCATION + By default, raw files are loaded based on the + File.Name column in the report.tsv. With this option, + a different folder can be specified. + --no-feature-detection + All steps are performed as usual but Dinosaur feature + detection is skipped. No features.tsv file will be + generated. + --no-fill-times All steps are performed as usual but fill times are + not extracted. No fill_times.tsv file will be + generated. + --no-tic All steps are performed as usual but binned TIC is not + extracted. No tic.tsv file will be generated. + --no-sn Signal to Noise ratio is not estimated for precursors + --no-mzml-generation Raw files are not converted to .mzML. Nevertheless, + mzML files are expected in their theoretical output + location and loaded. Should be only be carefully used + for repeated calulcations or debugging + --mz-bin-size MZ_BIN_SIZE + Bin size over the mz dimension for TIC binning. + --rt-bin-size RT_BIN_SIZE + Bin size over the RT dimension for TIC binning in + minutes. If a bin size of 0 is provided, binning will + not be applied and TIC is given per scan. + --resolution RESOLUTION + Set the resolution used for estimating counts from S/N + data + -p PROCESSES, --processes PROCESSES + Number of Processes + --isotopes-sn Use all isototopes from the same scan as the highest + intensity datapoint for estimating the SN and copy + number. +``` \ No newline at end of file diff --git a/docs/docs/installation.md b/docs/docs/installation.md new file mode 100644 index 0000000..6e5c877 --- /dev/null +++ b/docs/docs/installation.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Installation +nav_order: 2 +permalink: docs/installation +--- + +**Table of Contents** + +1. [Interactive DO-MS App](#interactive-do-ms-app) + 1. [Installation](#installation) + 2. [Running](#running) +2. [DIA Python Pipeline](#dia-python-pipeline) + 1. [Installation](#installation) + 2. [Running](#running) + 3. [Setup Custom Script ](#setup-custom-script) + 1. [Windows](#windows) + 2. [MacOS](#macos) + +# Interactive DO-MS App + +## Installation + +This application has been tested on R >= 3.5.0, OSX >= 10.14 / Windows 7/8/10/11. Make sure you have the mos recent version of R and R Studio installed. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16. + +The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. + +## Running + +The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file + +![]({{site.baseurl}}/assets/images/do-ms-proj.png){: width="70%" .center-image} + +and clicking the "Run App" button at the top of the application, after opening the `server.R` file. We recommend checking the "Run External" option to open the application in your default browser instead of the RStudio Viewer. + +![]({{site.baseurl}}/assets/images/do-ms-run.png){: width="70%" .center-image} + +You can also start the application by running the `start_server.R` script. + +More information on using the DO-MS application is provided [here]({{site.baseurl}}/docs/getting-started-application). + +# DIA Python Pipeline + +## Installation + +1. Please make sure that [Conda](https://docs.conda.io/en/latest/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) are installed. +Use the provided conda configuration to create the environment with all required dependencies. +``` +conda env create -f pipeline/env.yml +``` + +2. Activate the environment and check that the command can be run. +``` +conda activate doms +python pipeline/processing.py -h +``` + +3. For automatic conversion of Thermo Raw files to the open mzML format ThermoRawFileParser (Hulstaert et al. 2020) is required. Download the latest release of the [ThermoRawFileParser](https://github.com/compomics/ThermoRawFileParser) (version v1.4.0 or newer) and write down the location of the ```ThermoRawFileParser.exe``` file. Under OSX and Linux, [Mono](https://www.mono-project.com/download/stable/). Please make sure to use the option ```-m``` with the feature detection which will tell the script to use Mono. + +4. For feature detection Dinosaur (Teleman et al. 2016) is used. Download the latest release of the Dinosaur from [GitHub](https://github.com/fickludd/dinosaur) and install Java as recommended on your platform. Please write down the location of the ```Dinosaur-xx.jar``` file. + +5. Optional, create a custom script for your system. + +## Running + +```python pipeline/processing.py -h``` + +More information on using the python pipeline is provided [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +## Custom Script for Preprocessing + +Using the feature detection requires the correct conda environment, ThermoRawFileParser and Dinosaur location. If the tool is used frequently its more convenient to combine the configuration in a script which is added to the system ```PATH```. This will register a local command which can be used everywhere on the system and allows to set default options. + +### Windows + +1. Create a local folder for example under ```C:\Users\xxx\Documents\bin```. + +2. Create a file named ```processing.bat``` with the following content. Make sure that all three file paths are changed to the corresponding locations on your system. +Further default options can be added to this file if needed. An overview of all command line options can be found [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +``` +@echo off +conda activate doms & ^ +python C:\Users\xxx\pipeline\processing.py %* ^ +--dinosaur-location "C:\Users\xxx\dinosaur-1.2\Dinosaur-1.2.0.free.jar" ^ +--raw-parser-location "C:\Users\xxx\thermo_raw_file_parser_1.3.4\ThermoRawFileParser.exe" +``` + +3. Search ```environment variables``` in the windows search and click ```Edit the system environment variables```. + +4. Click ```Environment Variables``` in the bottom right. + +5. Select the variable ```Path``` in the upper panel saying ```User variables ...``` and click ```Edit```. + +6. Click ```New``` and enter the location of the directory containing the ```processing.bat``` script. + +7. Now, the feature processing including the external tools can be called from anywhere on the machine with the ```processing``` command. + + +### MacOS + +1. Create a local folder for example ```/Users/xxx/Documents/bin```. + +2. Create a file named ```processing``` with the following content. Make sure that all three file paths are changed to the corresponding locations on your system. +``` +#!/bin/bash +eval "$(conda shell.bash hook)" +conda activate doms +python /Users/xxx/pipeline/processing.py "$@" \ +-m \ +--dinosaur-location "/Users/xxx/Dinosaur/Dinosaur-1.2.0.free.jar" \ +--raw-parser-location "/Users/xxx/ThermoRawFileParser/ThermoRawFileParser.exe" +``` +The first line makes conda available to the script ([known issue](https://github.com/conda/conda/issues/7980)). Please note how the mono option ```-m``` is used by default. Further default options can be added to this file if needed. An overview of all command line options can be found [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +3. make the file executable with the following command ```chmod +x processing```. + +4. Navigate to the location of your bash profile file in your home directory. This will be ```/Users/{username}/.zshrc``` for zsh and ```/Users/{username}/.bash_profile``` for bash on macOS and ```/home/xxx/.bashrc``` for bash on linux. Open the file in a text editor of choice, for example vim ```vim .bash_profile```. Go into edit mode by pressing ```i```. + +5. Add the line ```export PATH="/Users/xxx/Documents/bin:$PATH"``` to the end of the file and save the file by pressing ```ESSC```, ```:wq```, ```Enter```. + +6. Restart your terminal. diff --git a/docs/docs/session_info.md b/docs/docs/session_info.md index 8bd9737..e01b150 100644 --- a/docs/docs/session_info.md +++ b/docs/docs/session_info.md @@ -7,16 +7,14 @@ permalink: docs/session-info # Session Info -DO-MS is tested on Windows (7/8/10), OSX (10.14.1), and Linux (Ubuntu 14.04). Development sessions and package versions are listed below: +DO-MS is tested on Windows (7/8/10/11), OSX (12.5.1), and Linux (Ubuntu 14.04). Development sessions and package versions are listed below: ``` -R version 3.5.2 (2018-12-20) -Platform: x86_64-apple-darwin15.6.0 (64-bit) -Running under: macOS Mojave 10.14.1 +R version 4.2.2 (2022-10-31) +Running under: macOS Monterey 12.5.1 Matrix products: default -BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib -LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib +LAPACK: /opt/homebrew/Cellar/r/4.2.2/lib/R/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 @@ -25,17 +23,34 @@ attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: - [1] argparse_2.0.0 knitr_1.21 bindrcpp_0.2.2 yaml_2.2.0 stringr_1.3.1 - [6] DT_0.5 rmarkdown_1.11 readr_1.3.1 RColorBrewer_1.1-2 reshape2_1.4.3 -[11] lattice_0.20-38 ggplot2_3.1.0 dplyr_0.7.8 shinyWidgets_0.4.4 shinydashboard_0.7.1 -[16] pacman_0.5.0 shiny_1.2.0 + [1] htmltools_0.5.3 viridis_0.6.2 MASS_7.3-58.1 + [4] ggpubr_0.5.0 viridisLite_0.4.1 yaml_2.3.5 + [7] stringr_1.4.1 DT_0.24 rmarkdown_2.16 +[10] readr_2.1.2 reshape2_1.4.4 tibble_3.1.8 +[13] knitr_1.40 lattice_0.20-45 ggplot2_3.4.0 +[16] tidyr_1.2.0 dplyr_1.0.10 shinydashboard_0.7.2 +[19] shinyWidgets_0.7.3 pacman_0.5.1 shiny_1.7.2 loaded via a namespace (and not attached): - [1] tidyselect_0.2.5 xfun_0.4 purrr_0.2.5 colorspace_1.4-0 getopt_1.20.2 htmltools_0.3.6 utf8_1.1.4 - [8] rlang_0.3.1 later_0.7.5 pillar_1.3.1 glue_1.3.0 withr_2.1.2 bindr_0.1.1 plyr_1.8.4 -[15] findpython_1.0.4 munsell_0.5.0 gtable_0.2.0 htmlwidgets_1.3 evaluate_0.12 labeling_0.3 httpuv_1.4.5.1 -[22] crosstalk_1.0.0 fansi_0.4.0 highr_0.7 Rcpp_1.0.0 xtable_1.8-3 promises_1.0.1 scales_1.0.0 -[29] jsonlite_1.6 mime_0.6 hms_0.4.2 digest_0.6.18 stringi_1.2.4 grid_3.5.2 cli_1.0.1 -[36] tools_3.5.2 magrittr_1.5 lazyeval_0.2.1 tibble_2.0.1 crayon_1.3.4 pkgconfig_2.0.2 assertthat_0.2.0 -[43] rstudioapi_0.9.0 R6_2.3.0 compiler_3.5.2 + [1] sass_0.4.2 bit64_4.0.5 vroom_1.5.7 + [4] jsonlite_1.8.0 carData_3.0-5 bslib_0.4.0 + [7] pillar_1.8.1 backports_1.4.1 glue_1.6.2 +[10] digest_0.6.29 promises_1.2.0.1 ggsignif_0.6.3 +[13] colorspace_2.0-3 httpuv_1.6.5 plyr_1.8.7 +[16] pkgconfig_2.0.3 broom_1.0.1 purrr_0.3.4 +[19] xtable_1.8-4 scales_1.2.1 fontawesome_0.3.0 +[22] later_1.3.0 tzdb_0.3.0 farver_2.1.1 +[25] generics_0.1.3 car_3.1-1 ellipsis_0.3.2 +[28] cachem_1.0.6 withr_2.5.0 cli_3.5.0 +[31] magrittr_2.0.3 crayon_1.5.1 mime_0.12 +[34] memoise_2.0.1 evaluate_0.16 fansi_1.0.3 +[37] rstatix_0.7.2 tools_4.2.2 hms_1.1.2 +[40] lifecycle_1.0.3 munsell_0.5.0 compiler_4.2.2 +[43] jquerylib_0.1.4 rlang_1.0.6 grid_4.2.2 +[46] htmlwidgets_1.5.4 crosstalk_1.2.0 labeling_0.4.2 +[49] gtable_0.3.1 abind_1.4-5 R6_2.5.1 +[52] gridExtra_2.3 fastmap_1.1.0 bit_4.0.4 +[55] utf8_1.2.2 stringi_1.7.8 parallel_4.2.2 +[58] Rcpp_1.0.9 vctrs_0.5.1 tidyselect_1.2.0 +[61] xfun_0.32 ``` diff --git a/docs/index.md b/docs/index.md index 9e3b493..d2f82e2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,33 +12,37 @@ permalink: / Data-Driven Optimization of Mass Spectrometry Methods {: .fs-6 .fw-300} +![Python Package](https://github.com/SlavovLab/DO-MS-DIA/actions/workflows/python-package.yml/badge.svg) ![GitHub release](https://img.shields.io/github/release/SlavovLab/DO-MS.svg) ![GitHub](https://img.shields.io/github/license/SlavovLab/DO-MS.svg) -[Get started now](#getting-started){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [Download](https://github.com/SlavovLab/DO-MS/releases/latest){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [JPR Article](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [GitHub Repository](https://github.com/SlavovLab/DO-MS){: .btn .fs-5 .mb-4 .mb-md-0 } +[Get started now](#getting-started){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [Download](https://github.com/SlavovLab/DO-MS/releases/latest){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [BioArxiv Preprint](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [GitHub Repository](https://github.com/SlavovLab/DO-MS){: .btn .fs-5 .mb-4 .mb-md-0 } -![]({{site.baseurl}}/assets/images/TOC_Graphic_noOutline_abstract_v18.png){: width="70%" .center-image} +![]({{site.baseurl}}/assets/images/do-ms-dia_title_v2.png){: width="90%" .center-image} ## Aim of DO-MS -The performance of ultrasensitive liquid chromatography and tandem mass spectrometry (LC-MS/MS) methods, such as [single-cell proteomics by mass spectrometry](https://scope2.slavovlab.net/mass-spec/sensitive-mass-spectrometry-analysis), depends on multiple interdependent parameters. This interdependence makes it challenging to specifically pinpoint the sources of problems in the LC-MS/MS methods. For example, a low signal at the MS2 level can be due to poor LC separation, ionization, apex targeting, ion transfer, or ion detection. DO-MS aims to specifically diagnose such problems by interactively visualizing data from all levels of bottom-up LC-MS/MS analysis. +The performance of ultrasensitive liquid chromatography and tandem mass spectrometry (LC-MS/MS) methods, such as [single-cell proteomics by mass spectrometry](https://scope2.slavovlab.net/mass-spec/sensitive-mass-spectrometry-analysis) and [multiplexed data-independent acquisition experiments](https://plexdia.slavovlab.net/), depends on multiple interdependent parameters. This interdependence makes it challenging to specifically pinpoint the sources of problems in the LC-MS/MS methods. -## Getting Started +This applies to data-dependent acquisition (DDA) as well as to data-indepent acquisition (DIA) experiments. For example, a low signal at the MS2 level in a DDA experiment can be due to poor LC separation, ionization, apex targeting, ion transfer, or ion detection. DO-MS aims to specifically diagnose such problems by interactively visualizing data from all levels of bottom-up LC-MS/MS analysis. -Please read our detailed getting started guides: -* [Getting started on the application]({{site.baseurl}}/docs/getting-started-application) -* [Getting started on the command-line]({{site.baseurl}}/docs/getting-started-command-line) +### Installation -### Requirements +Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest) and by following the [installation instructions]({{site.baseurl}}/docs/installation). -This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16. -The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. +## Getting Started +Please read our detailed getting started guides: +* [Getting started with DIA preprocessing]({{site.baseurl}}/docs/getting-started-preprocessing) +* [Getting started with DIA reports]({{site.baseurl}}/docs/getting-started-dia-app) +* [Getting started with DDA reports]({{site.baseurl}}/docs/getting-started-dda-app) -### Installation -Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest). +### Requirements +This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10/11. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and DIA-NN > 1.8.1. + +The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. -### Running +### Running the Interactive Application The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file @@ -50,16 +54,6 @@ and clicking the "Run App" button at the top of the application, after opening t You can also start the application by running the `start_server.R` script. -### Automated Report Generation - -You can automatically generate PDF/HTML reports without having to launch the server by running the `do-ms_cmd.R` script, like so: - -``` -$ Rscript do-ms_cmd.R config_file.yaml -``` - -This requires a configuration file, and you can [find an example one here](https://github.com/SlavovLab/DO-MS/blob/master/example/config_file.yaml). See [Automating Report Generation]({{site.baseurl}}/docs/automation) for more details and instructions. - ### Customization DO-MS is designed to be easily user-customizable for in-house proteomics workflows. Please see [Building Your Own Modules]({{site.baseurl}}/docs/build-your-own) for more details. @@ -68,9 +62,9 @@ DO-MS is designed to be easily user-customizable for in-house proteomics workflo Please see [Hosting as a Server]({{site.baseurl}}/docs/hosting-as-server) for more details. -### Search Engines Other Than MaxQuant +### Supporting other Search Engines -This application is currently maintained for MaxQuant >= 1.6.0.16. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ]({{site.baseurl}}/docs/other-search-engines) for more details. +This application is currently maintained for (MaxQuant)[https://www.nature.com/articles/nbt.1511] >= 1.6.0.16 and (DIA-NN)[https://www.nature.com/articles/s41592-019-0638-x] >= 1.8. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ]({{site.baseurl}}/docs/other-search-engines) for more details. ### Can I use this for Metabolomics, Lipidomics, etc... ? @@ -80,13 +74,8 @@ While the base library of modules are based around bottom-up proteomics by LC-MS ## About the project - - The manuscript for this tool is published at the Journal of Proteome Research: [https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) - -The manuscript is also freely available on bioRxiv: [https://www.biorxiv.org/content/10.1101/512152v1](https://www.biorxiv.org/content/10.1101/512152v1). +The manuscript for the extended version 2.0 can be found on bioArxiv: [https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) Contact the authors by email: [nslavov\{at\}northeastern.edu](mailto:nslavov@northeastern.edu). @@ -98,18 +87,6 @@ DO-MS is distributed by an [MIT license]({{site.github_link}}/blob/master/LICENS Please feel free to contribute to this project by opening an issue or pull request in the [GitHub repository]({{site.github_link}}). -### Data - - - -### Figures/Analysis - - - ------------- ## Help! diff --git a/example/config_file_diann.yaml b/example/config_file_diann.yaml new file mode 100644 index 0000000..163656f --- /dev/null +++ b/example/config_file_diann.yaml @@ -0,0 +1,84 @@ +## DO-MS command line configuration +## for use with do-ms_cmd.R + +## Input/Output + +# paths to folders with MaxQuant txt output +input_folders: + - /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase + +# files to load from each folder +load_input_files: + - features + - report + +# match up with misc_input_files list in global.R +#misc_input_files: +# inclusion_list: /path/to/inclusion_list.txt + +output: /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.html + +## Filters + +# regular expressions to match raw file names against +#include_files: '180614_S' +#exclude_files: '180614_S_A' + +# experiment name format string +# %i -- index of raw file +# %f -- folder name +# %e -- raw file name +exp_name_format: 'Exp %f %i' + +# optional regular expression pattern to extract +# from the experiment names after applying the format string +exp_name_pattern: '[0-9]{6}' + +# custom names for files +#exp_names: +# - Control +# - '2X' +# - '4X' +# - '10X' + +# custom order for files +#exp_order: [4,2,3,1] + +#pep_thresh: 0.01 +#pif_thresh: 0.7 +#remove_decoy: REV_ +#remove_contam: CON_ + +## Figure rendering options + +ppi: 150 + +# label font size +figure_title_font_size: 16 +# axis tick label font size +figure_axis_font_size: 12 +# facet label font size +figure_facet_font_size: 12 +# line width +figure_line_width: 1 +# show background grid +figure_show_grid: true + +## Report options + +# choices: pdf, html +report_format: html + +# choices: default, cerulean, flatly, darkly, readable, +# spacelab, united, cosmo, lumen, paper, sandstone, simplex, yeti +# previews: https://bootswatch.com/3/ +report_theme: readable + +# figure size (in inches) +report_figure_width: 5 +report_figure_height: 5 + +# figure format. choices: pdf, png +report_figure_format: png + + diff --git a/global.R b/global.R index 5a1a764..a5f0851 100644 --- a/global.R +++ b/global.R @@ -1,4 +1,4 @@ -version <- '1.0.11' +version <- '2.0.5' # check R version. required R >= 3.5.0 & R <= 4.0.2 if(as.numeric(R.Version()$major) < 4) { @@ -6,9 +6,9 @@ if(as.numeric(R.Version()$major) < 4) { } # check R version. required R <= 4.0.2 -if(as.numeric(R.Version()$minor) > 0.2) { - stop('R Version <= 4.0.2 required. Download R 4.0.2 from the CRAN page: https://cran.r-project.org/') -} +#if(as.numeric(R.Version()$minor) > 0.2) { +# stop('R Version <= 4.0.2 required. Download R 4.0.2 from the CRAN page: https://cran.r-project.org/') +#} # first, get pacman if(!'pacman' %in% installed.packages()[,'Package']) { @@ -18,7 +18,37 @@ library(pacman) # install/load dependencies p_load(shiny, shinyWidgets, shinydashboard, dplyr, tidyr, ggplot2, lattice, knitr, tibble, - reshape2, readr, rmarkdown, stats, DT, stringr, yaml, viridisLite, ggpubr) + reshape2, readr, rmarkdown, stats, DT, stringr, yaml, viridisLite, ggpubr, MASS, viridis) + +# look for pandoc - moved from start_server.R to gloabl to make sure pandoc is always available +# stolen from https://github.com/r-lib/rappdirs/blob/master/R/utils.r +get_os <- function() { + if (.Platform$OS.type == "windows") { + "win" + } else if (Sys.info()["sysname"] == "Darwin") { + "mac" + } else if (.Platform$OS.type == "unix") { + "unix" + } else { + stop("Unknown OS") + } +} +os <- get_os() + +pandoc_osx <- "/Applications/RStudio.app/Contents/MacOS/quarto/bin/tools" +pandoc_windows <- "C:\\Program Files\\RStudio\\bin\\pandoc" +pandoc_linux <- "/usr/lib/rstudio/bin/pandoc" + +# try and predict pandoc directories +if(os == 'mac' & file.exists(pandoc_osx)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_osx) +} else if (os == 'win' & file.exists(pandoc_windows)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_windows) +} else if (os == 'unix' & file.exists(pandoc_linux)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_linux) +} else { + print('pandoc could not be found in default directories. If it is not available on the system PATH then PDF report generation will fail.') +} print('Checking online for latest version of DO-MS...') # check application version @@ -57,10 +87,29 @@ tryCatch({ # load application settings config <- read_yaml('settings.yaml') +do_ms_mode <- config[['do_ms_mode']] +# check if settings.yaml contains config for do_ms_mode +if (do_ms_mode %in% names(config)){ + print(paste('DO-MS mode:',do_ms_mode, 'found in settings.yaml')) + +} else { + + stop(paste('No config section for DO-MS mode',do_ms_mode, 'found in settings.yaml')) +} + +# Add all config attributes found in the do_ms_mode specific section to the base config level. +for (i in 1:length(names(config[[do_ms_mode]]))){ + current_name <- names(config[[do_ms_mode]])[i] + + config[[current_name]] <- config[[do_ms_mode]][[current_name]] +} + +#DO-MS mode specific module path +module_path <- file.path('modules', do_ms_mode) # load tabs first -tabs <- list.dirs('modules', recursive=F, full.names=F) +tabs <- list.dirs(module_path, recursive=F, full.names=F) # remove commented-out tabs (folders that start with "__") tabs <- tabs[substr(tabs, 1, 2) != '__'] @@ -74,14 +123,14 @@ tabs <- gsub('([0-9])+(\\s|_)', '', tabs) tabs <- gsub('_', ' ', tabs) modules <- list() -# loop thru tabs and populate modules +# loop thru tabs and populate modulesapply_aliases for(i in 1:length(tabs)) { tab_path <- tab_paths[i] # put modules for this tab in its own list modules[[i]] <- list() - module_files <- list.files(file.path('modules', tab_path)) + module_files <- list.files(file.path(module_path, tab_path)) for(j in 1:length(module_files)) { module_file <- module_files[j] @@ -89,7 +138,7 @@ for(i in 1:length(tabs)) { if(substr(module_file, 1, 2) == '__') { next } # source module to load the init named list - source(file.path('modules', tab_path, module_file)) + source(file.path(module_path, tab_path, module_file)) # load the module into the module list module_name <- gsub('.R', '', module_file) modules[[i]][[j]] <- init() @@ -108,6 +157,39 @@ for(i in 1:length(tabs)) { # to get custom panel heading colors for each tab, config[['tab_colors']] <- rep(config[['tab_colors']], 10) +# load modifications +if ("modifications" %in% names(config)){ + + if (length(config[['modifications']]) > 0){ + + real_mod_vec <- c(F, F) + name_vec <- c("All", "Unmodified") + unimod_vec <- c("all", "unmodified") + + for(i in 1:length(config[['modifications']])){ + name_vec <- c(name_vec, config[['modifications']][[i]]$name) + unimod_vec <- c(unimod_vec, config[['modifications']][[i]]$unimod) + real_mod_vec <- c(real_mod_vec, T) + } + + } else { + name_vec <- c("All") + unimod_vec <- c("all") + real_mod_vec <- c(F) + } + +} else { + name_vec <- c("All") + unimod_vec <- c("all") + real_mod_vec <- c(F) +} + +config[['modification_list']] <- data.frame(name = name_vec, + unimod = unimod_vec, + real_mod = real_mod_vec, + stringsAsFactors = FALSE) + + # load app.css into string app_css <- paste(readLines(file.path('resources', 'app.css')), collapse='') # load app.js into string @@ -155,6 +237,70 @@ theme_base <- function(input=list(), show_legend=F) { return(.theme) } +theme_diann <- function(input=list(), show_legend=F) { + + # default values + axis_font_size <- ifelse(is.null(input[['figure_axis_font_size']]), + 10, input[['figure_axis_font_size']]) + title_font_size <- ifelse(is.null(input[['figure_title_font_size']]), + 12, input[['figure_title_font_size']]) + facet_font_size <- ifelse(is.null(input[['figure_facet_font_size']]), + 10, input[['figure_facet_font_size']]) + + show_grid <- ifelse(is.null(input[['figure_show_grid']]), + TRUE, input[['figure_show_grid']]) + + .theme <- theme(text = element_text(face="bold", size=12, colour = "grey40"), + panel.grid.major = element_line(colour = "grey80", size = 0.4), + axis.ticks = element_line(colour = "grey80", size = 0.4), + panel.grid.minor.x = element_blank(), + panel.grid.minor = element_blank(), + #panel.background = element_rect(fill = NA), + axis.text = element_text(colour = "grey40", face = "bold", size = axis_font_size), + axis.text.x = element_text(angle=45, hjust=1, margin=margin(r=45)), + axis.line = element_blank(), + axis.title=element_text(size=title_font_size, colour = "grey20"), + strip.background = element_rect(colour = NA, fill = "grey90"), + strip.text = element_text(colour = "grey20", face = "bold", size = facet_font_size), + legend.text = element_text(colour = "grey40", face = "bold", size = 12), + legend.title = element_text(colour = "grey40", face = "bold", size = 12), + panel.background = element_rect(fill="white", colour = "white") + ) + if(!show_legend) { + .theme <- .theme + theme(legend.position="none") + } + + if(show_grid) { + .theme <- .theme + theme( + panel.grid.major = element_line(size=0.25, linetype="solid", color="lightgrey"), + panel.grid.minor = element_line(size=0.25, linetype="solid", color="lightgrey") + ) + } else { + .theme <- .theme + theme( + panel.grid.major = element_blank(), + panel.grid.minor = element_blank() + ) + } + + return(.theme) + + .theme <- .theme + theme(text = element_text(face="bold", size=12, colour = "grey40"), + panel.grid.major = element_line(colour = "grey80", size = 0.4), + axis.ticks = element_line(colour = "grey80", size = 0.4), + panel.grid.minor.x = element_blank(), + panel.grid.minor = element_blank(), + #panel.background = element_rect(fill = NA), + axis.text = element_text(colour = "grey40", face = "bold", size = axis_font_size), + axis.line = element_blank(), + axis.title=element_text(size=title_font_size, colour = "grey20"), + strip.background = element_rect(colour = NA, fill = "grey90"), + strip.text.x = element_text(colour = "grey20", face = "bold", size = facet_font_size), + legend.text = element_text(colour = "grey40", face = "bold", size = 12), + legend.title = element_text(colour = "grey40", face = "bold", size = 12)) + + return(.theme) +} + downloadButtonFixed <- function(outputId, label = "Download", class = NULL, ...) { aTag <- tags$a( @@ -247,4 +393,200 @@ sanitize_text_output <- function(text) { text } +print("global.R") +# create a new column with the chemical label +map_label <- function(sequence, labelsdata){ + + label = '' + + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + if (grepl( current_label, sequence, fixed = TRUE)){ + label <- current_label + } + + } + return(label) +} + +count_pattern <- function(string, pattern){ + occurence <- str_count(string, pattern = pattern) + if (occurence > 0){ + return(pattern) + } else { + return("Unmodified") + } +} + +#returns the seperator for a path +get_seperator <- function(instring){ + forward_count <- str_count(instring, "/") + backward_count <- str_count(instring, "\\\\") + + seperator <- if (forward_count > backward_count) "/" else "\\\\" + return(seperator) +} + +# accepts an MS1.extracted style matrix dataframe and returns an report.tsv style dataframe +# conversion of the matrix based format to a row based format allows to use the same satatistics as with the report.tsv +ms1_extracted_to_report <- function(.input_df){ + + # for debugging + #.input_df <- as.data.frame(read_tsv(file='G:/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16_d/Report.pr_matrix_channels_ms1_extracted.tsv',guess_max=1e5)) + + #.input_df <- as.data.frame(read_tsv(file='/Volumes/GoogleDrive/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16_raw/report.pr_matrix_channels_ms1_extracted.tsv.txt',guess_max=1e5)) + + # get slash direction. Last element is always path + seperator <- get_seperator(tail(colnames(.input_df), n=1)) + + # get a vector of all column names which do not contain a slash + slash_occurences <- str_count(colnames(.input_df), seperator) + last_non_path_index <- max(which(slash_occurences == 0)) + dont_pivot <- colnames(.input_df)[0:last_non_path_index] + + print(dont_pivot) + .input_df <- .input_df %>% pivot_longer(cols = !all_of(dont_pivot), names_to='File.Name.Conv', values_to = "val") + print("done") + basename_filename <- matrix(unlist(strsplit(.input_df$File.Name.Conv, paste0(seperator,"\\s*(?=[^",seperator,"]+$)"), perl=TRUE)), ncol=2,byrow=TRUE) + + .input_df <- .input_df %>% dplyr::mutate(MS1.Name = basename_filename[,2]) + + + + # Old DIA-NN versions contain a Q.Value column in the MS1 extracted. + # New versions have a .Quality column + # checking the occurrences of both strings is used to determine the verson. + qval_count <- length(grep(".QValue", .input_df$MS1.Name, fixed=TRUE)) + quality_count <- length(grep(".Quality", .input_df$MS1.Name, fixed=TRUE)) + + if (qval_count > quality_count){ + ms1_extracted_mode = '.QValue' + } else { + ms1_extracted_mode = '.Quality' + } + print(paste('Ms1_extracted mode:', ms1_extracted_mode)) + + # Match all rows which contain .QValue at the end of the file name + .input_df <- .input_df %>% dplyr::mutate(Q = grepl(ms1_extracted_mode, MS1.Name, fixed=TRUE)) + + + identifier_comp <- matrix(unlist(strsplit(.input_df$MS1.Name, "-\\s*(?=[^-]+$)", perl=TRUE)), ncol=2,byrow=TRUE) + + # File.Name eLK002.raw + .input_df$File.Name <- identifier_comp[,1] + + # Raw.file eLK002.raw + .input_df$Raw.File <- matrix(unlist(strsplit(.input_df$File.Name, "\\.\\s*(?=[^\\.]+$)", perl=TRUE)), ncol=2,byrow=TRUE)[,1] + + .input_df$Channel <- strtoi(str_extract(identifier_comp[,2], '[0-9]+')) + + # create unique identifier for merging Q values to intensities + .input_df <- .input_df %>% + dplyr::mutate(Identifier = paste(Raw.File, Channel, Precursor.Id, sep='_')) + + # create seperate dataframes for Q-values and intensities + .input_df.val <- .input_df %>% + dplyr::filter(!Q) %>% + dplyr::mutate(val = replace_na(val, 0)) %>% + dplyr::rename(Ms1.Area = val) + + q_default <- if (ms1_extracted_mode == '.Qvalue') 1 else 0 + + .input_df.Q <- .input_df %>% + dplyr::filter(Q) %>% + dplyr::select(Identifier, val) %>% + dplyr::mutate(val = replace_na(val, q_default)) %>% + dplyr::rename(Quality = val) + + # Append q-value by joining the datasets + .input_df <- .input_df.val %>% + inner_join(.input_df.Q, by='Identifier') + + # create Precursor.Id with channel information + # create Modified.Sequence with channel information + # remove temporary columns + .input_df <- .input_df %>% + dplyr::mutate(Precursor.Id = str_replace_all(Precursor.Id, "(?<=\\()mTRAQ(?=\\))", paste0('mTRAQ',Channel))) %>% + dplyr::mutate(Modified.Sequence= str_replace_all(Modified.Sequence, "(?<=\\()mTRAQ(?=\\))", paste0('mTRAQ',Channel))) %>% + dplyr::select(-c('Q','File.Name.Conv','MS1.Name','Channel')) + + return(.input_df) +} + +# Translates new (post 1.8.1 b12) channels to the old format +# Old Channels were denoted like (mTRAQ0) new ones are denoted like (mTRAQ-K-0) +translate_diann_channel_format <- function(.input_df, columns = c("Precursor.Id","Modified.Sequence")){ + if (length(columns) < 1){ + print('translate_diann_channel_format, no columns specified') + return(.input_df) + } + + if (nrow(.input_df) < 1){ + print('translate_diann_channel_format, dataframe is empty') + return(.input_df) + } + + # check if channel is in old format + test_precursor <- .input_df[[columns[1]]][[1]] + label_occurences <- str_count(test_precursor, 'mTRAQ-[a-zA-Z]-') + if(label_occurences == 0){ + return(.input_df) + } + + + for (column in columns) { + .input_df[[column]] = sapply(.input_df[[column]], .update_channel) + } + + + return(.input_df) + +} + +.update_channel <- function(sequence){ + groups <- str_match_all(sequence, "mTRAQ-([a-zA-Z])-([0-9]+)") + + if (length(groups) > 0 ){ + groups <- groups[[1]] + + for(i in 1:nrow(groups)){ + sequence <- str_replace_all(sequence, groups[i,1], paste0('mTRAQ',groups[i,3])) + } + } + + return(sequence) +} + +custom_colors = c("#e8411c", "#f7c12a", "#329ebf",'#51c473','#c355d4','#6e6e6e',"#e8411c", "#f7c12a", "#329ebf",'#51c473','#c355d4','#6e6e6e') +custom_theme = + +separate_channel_info <- function(df){ + channels <- config[['channels']] + + df$Label <- sapply(df$Precursor.Id, .get_channel, channels ) + + for (channel in channels) { + mod <- channel[['modification']] + df$Precursor.Id <- gsub(paste0('\\Q',mod,'\\E'),'',df$Precursor.Id) + } + + return(df) +} + +.get_channel <- function(sequence, channeldata){ + + label = '' + + for (channel in channeldata) { + current_label = channel[['name']] + mod = channel[['modification']] + if (grepl( mod, sequence, fixed = TRUE)){ + label <- current_label + } + } + + return(label) + +} diff --git a/modules/dia-nn/005_Summary/summary_001_experiments.R b/modules/dia-nn/005_Summary/summary_001_experiments.R new file mode 100644 index 0000000..0b73966 --- /dev/null +++ b/modules/dia-nn/005_Summary/summary_001_experiments.R @@ -0,0 +1,46 @@ +init <- function() { + + type <- 'text' + box_title <- 'DIA-NN Experiments' + help_text <- 'DIA-NN Experiments' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload parameters.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']] + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, File.Name) %>% + dplyr::summarise(n=n()) + + outstr <- '' + for (i in 1:length(plotdata$Raw.file)){ + str <- paste(plotdata$Raw.file[[i]], plotdata$File.Name[[i]]) + outstr <- paste0(outstr, str, sep='\n \n') + + } + + outstr + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} + diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R b/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R new file mode 100644 index 0000000..9699b6c --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R @@ -0,0 +1,59 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 Fill Time Distribution' + help_text <- 'Ms1 fill times along gradient' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + plotdata <- plotdata[plotdata$RT.Start > config[['RT.Start']], ] + + plotdata <- plotdata %>% + filter(Ms.Level == 1) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + hist_bins <- 15 + + maxFT <- max(plotdata$Fill.Time) + minFT <- min(plotdata$Fill.Time) + rangeFT <- maxFT - minFT + minFT <- minFT - rangeFT/(15) + maxFT <- maxFT + rangeFT/(15) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + xlim(minFT, maxFT) + + geom_histogram(aes(x=Fill.Time), bins=15, fill=custom_colors[[6]])+ + labs(x='Fill times in ms', y=' Number of Scans') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R b/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R new file mode 100644 index 0000000..5bcc63d --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R @@ -0,0 +1,74 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 Fill Times along Gradient' + help_text <- 'The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + binned_data <- plotdata %>% + filter(Ms.Level == 1) %>% + mutate(bin = ntile(RT.Start, 15)) + + sum_data <- binned_data %>% + group_by(bin, Raw.file) %>% + summarise(mean = mean(Fill.Time), + sd = sd(Fill.Time), + RT.Mean = mean(RT.Start), + max = max(Fill.Time), .groups = "drop") %>% + mutate(lower = mean - sd/2, upper = pmin(mean + sd/2, max), type='sum') + + joined_data <- binned_data %>% + full_join(sum_data, by=c("bin", 'Raw.file')) %>% + mutate(type = 'joined') + + + plotdata <- list("sum" = sum_data, "joined" = joined_data) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata$sum) > 1), paste0('No Rows selected'))) + + sum_data <- plotdata$sum + joined_data <- plotdata$joined + + ggplot(sum_data) + + geom_point(data=joined_data, aes(x = RT.Start, y = Fill.Time), fill = 'grey80', color = "grey80") + + geom_ribbon(aes(ymin = lower, ymax = upper, x = RT.Mean), fill = custom_colors[[3]], alpha=0.6) + + geom_line(aes(y = mean, x = RT.Mean), color = custom_colors[[1]], size = 1) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(y='Fill Times in ms', x='Retention Time in minutes') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R b/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R new file mode 100644 index 0000000..fc94c2f --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R @@ -0,0 +1,62 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 total Ion Current along Gradient' + help_text <- 'The toal Ion Current (TIC) is shown for bins along the retention time gradient.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['tic']],paste0('Upload tic.tsv'))) + validate(need(config[['RT.Start']],paste0('Please specify RT.Start parameter in the settings'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['tic']] + + # Apply retention time filter as specified in settings.yaml + tic.matrix <- plotdata %>% + filter(Retention.time > config[['RT.Start']]) %>% + filter(Retention.time < config[['RT.End']]) + + tic.mean <- tic.matrix %>% + group_by(Raw.file, Retention.time) %>% + summarise(mean = weighted.mean(MZ, TIC), .groups = "drop") + + plotdata <- list("matrix" = tic.matrix, "mean" = tic.mean) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + tic.matrix <- plotdata$matrix + tic.mean <- plotdata$mean + + ggplot(tic.matrix) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_tile(aes(Retention.time, MZ, fill= log10(TIC))) + + scale_fill_viridis(discrete=FALSE) + + geom_line(data = tic.mean, aes(y = mean, x =Retention.time, color = "TIC weighted m/z"), size = 1)+ + labs(y='m/z', x='Retention Time in minutes') + + scale_color_manual(name = "", values = c( "TIC weighted m/z" = custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R b/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R new file mode 100644 index 0000000..490114b --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R @@ -0,0 +1,57 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Time Distribution' + help_text <- 'Ms2 fill times along gradient' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + plotdata <- plotdata[plotdata$RT.Start > config[['RT.Start']], ] + + plotdata <- plotdata %>% + filter(Ms.Level == 2) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxFT <- max(plotdata$Fill.Time) + minFT <- min(plotdata$Fill.Time) + rangeFT <- maxFT - minFT + minFT <- minFT - rangeFT/(15) + maxFT <- maxFT + rangeFT/(15) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Fill times in ms', y='Scans') + + geom_histogram(aes(x=Fill.Time), bins=15, fill=custom_colors[[6]])+ + labs(x='Fill times in ms', y=' Number of Scans') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R b/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R new file mode 100644 index 0000000..2ea6b37 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R @@ -0,0 +1,79 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Times along Gradient' + help_text <- 'The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + binned_data <- plotdata %>% + filter(Ms.Level == 2) %>% + mutate(bin = ntile(RT.Start, 15)) + + sum_data <- binned_data %>% + group_by(bin, Raw.file) %>% + summarise(mean = mean(Fill.Time), + sd = sd(Fill.Time), + RT.Mean = mean(RT.Start), + max = max(Fill.Time), + .groups = "drop") %>% + mutate(lower = mean - sd/2, upper = pmin(mean + sd/2, max), type='sum') + + joined_data <- binned_data %>% + full_join(sum_data, by=c("bin", 'Raw.file')) %>% + mutate(type = 'joined') + + + plotdata <- list("sum" = sum_data, "joined" = joined_data) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + + validate(need((nrow(plotdata$sum) > 1), paste0('No Rows selected'))) + + + sum_data <- plotdata$sum + joined_data <- plotdata$joined + + maxRT <- max(sum_data$RT.Mean) + + ggplot(sum_data) + + geom_point(data=joined_data, aes(x = RT.Start, y = Fill.Time), fill = 'grey80', color = "grey80") + + geom_ribbon(aes(ymin = lower, ymax = upper, x = RT.Mean), fill = custom_colors[[3]], alpha=0.6) + + geom_line(aes(y = mean, x = RT.Mean), color = custom_colors[[1]], size = 1) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(y='Fill Times in ms', x='Retention Time in minutes') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R b/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R new file mode 100644 index 0000000..38bd750 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R @@ -0,0 +1,89 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Time Matrix' + help_text <- 'The average Ms2 fill times are shown across the gradient for every distinct Ms2 window.' + source_file <- 'fill_times' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload tic.tsv'))) + validate(need(data()[['fill_times']][['Window.Lower']],paste0('Your fill_times.tsv file does not contain the Window.Lower column for the isolation window. Please make sure that you use the latest version of DO-MS DIA.'))) + validate(need(data()[['fill_times']][['Window.Upper']],paste0('Your fill_times.tsv file does not contain the Window.Upper column for the isolation window. Please make sure that you use the latest version of DO-MS DIA.'))) + validate(need(config[['RT.Start']],paste0('Please specify RT.Start parameter in the settings'))) + validate(need(config[['RT.End']],paste0('Please specify RT.End parameter in the settings'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['fill_times']] + + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Documents/testdaten/2022_06_24_MS2_number_wGW011-wGW017/fill_times.tsv',guess_max=1e5)) + + plotdata <- plotdata %>% + filter(Ms.Level > 1) + + num_rt_bins = 10 + + rt_start = min(plotdata$RT.Start) + rt_end = max(plotdata$RT.Start) + + # Create RT column which has labels for N RT bins + plotdata <- plotdata %>% + mutate(RT = round((RT.Start - rt_start)/(rt_end-rt_start)*num_rt_bins)) + + # calculate mean fill times + plotdata <- plotdata %>% + group_by(Raw.file, Ms.Level, Window.Lower, Window.Upper, RT) %>% + summarise(Fill.Time = mean(Fill.Time), .groups = "drop") + + # Transform RT bins back to real RTs + plotdata <- plotdata %>% + mutate(RT = RT/num_rt_bins*(rt_end-rt_start)+rt_start) + + # create label columns eg. 100 - 200 (mz) + plotdata <- plotdata %>% + mutate(Label = paste0(Window.Lower, '-', Window.Upper)) %>% + mutate(Window = (Window.Lower + Window.Upper)/2) + + label_index = sort(plotdata$Window, index.return=TRUE) + + label_sorted = plotdata$Label[label_index$ix] + label_unique = unique(label_sorted) + + plotdata$Label = factor(plotdata$Label, levels = label_unique) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free") + + geom_tile(aes(RT, Label, fill= Fill.Time)) + + scale_fill_viridis(discrete=FALSE) + + labs(y='m/z', x='Retention Time in minutes', fill='Fill time (ms)') + + theme(legend.position = "bottom",legend.key.width = unit(2.5, "cm"), axis.text.y = element_text(angle = 45, vjust = 0.5, hjust=1, size=10))+ + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R b/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R new file mode 100644 index 0000000..a6a5ccd --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R @@ -0,0 +1,66 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Channel wise MS1 Intensity for Precursors' + help_text <- 'Plotting the MS1 intensity for all precursors which were associated with one of the defined channels.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label')] + + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Intensity, color = Label)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + scale_fill_manual(name = "plexDIA Label:", values = custom_colors)+ + scale_color_manual(name = "plexDIA Label:", values = custom_colors)+ + theme_diann(input=input, show_legend=T)+ + theme(legend.position = "bottom") + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R b/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R new file mode 100644 index 0000000..4e1e82c --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R @@ -0,0 +1,54 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors Identified across Gradient' + help_text <- 'Precursor are plotted across the chromatographic gradient.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Please provide a report.tsv file'))) + } + + .plotdata <- function(data, input) { + + + + plotdata <- data()[['report']][,c('Raw.file', 'Retention.time')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(Retention.time > config[['RT.Start']]) %>% + filter(Retention.time < config[['RT.End']]) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Retention.time)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='Retention Time (min)', y='Number of Precursors') + + scale_color_manual(values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme_diann(input=input, show_legend=T) + + } + + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R b/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R new file mode 100644 index 0000000..4b348f3 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R @@ -0,0 +1,68 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Intensity summed over all Channels.' + help_text <- 'Plotting the MS1 intensity for all precursors summed over all channels.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' )] + + + + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + medianData = plotdata %>% group_by(Raw.file) %>% + summarise(median = median(Intensity), .groups = "drop") + + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R b/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R new file mode 100644 index 0000000..990fff7 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R @@ -0,0 +1,107 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Intensity for Intersected Precursors, summed over all Channels.' + help_text <- 'Plotting the MS1 intensity for all precursors summed over all channels. Only intersected precursors present in all loaded experiments are shown.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + } + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Remove runs with less than 200 IDs + r_t<-table(plotdata$Raw.file) + r_rmv<-names(r_t)[r_t<2] + plotdata<-plotdata[!plotdata$Raw.file%in%r_rmv, ] + + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + + #Assemble list of peptides in each Raw file + plotdata$Raw.file <- factor(plotdata$Raw.file) + expsInDF <- unique(plotdata$Raw.file) + peplist <- list() + for (i in 1:length(expsInDF)){ + peptidesDF <- dplyr::filter(plotdata, Raw.file == expsInDF[i]) + peptides <- dplyr::select(peptidesDF, Precursor.Id) + peplist[[i]] <- peptides + } + + #Get intersection of all peptides + intersectList <- as.vector(Reduce(intersect, peplist)) + + #Get reduced dataframe for elements that match intersection + plotdata_Intersected <- dplyr::filter(plotdata, Precursor.Id %in% intersectList$Precursor.Id) + + plotdata <- plotdata_Intersected + } + return(plotdata) + + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need(nrow(plotdata) > 19, paste0('Less than 20 peptides in common')), + need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + medianData = plotdata %>% group_by(Raw.file) %>% + summarise(median = median(Intensity), .groups = "drop") + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + geom_text(data=medianData, + aes(label=paste0("median: ", round(median,2))), x = -Inf, y = -Inf, colour=custom_colors[[1]], + hjust = 0, vjust=0)+ + geom_vline(data=medianData, aes(xintercept = median),col=custom_colors[[1]],size=1) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R b/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R new file mode 100644 index 0000000..7abacf9 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R @@ -0,0 +1,112 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Normalized MS1 Intensity for Intersected Precursors' + help_text <- 'Plotting the MS1 Intensity for intersected precursors summed over all channels. Experiments are normalized to the first experiment. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + } + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity = plotdata$Ms1.Area + + # Remove runs with less than 200 IDs + r_t<-table(plotdata$Raw.file) + r_rmv<-names(r_t)[r_t<2] + plotdata<-plotdata[!plotdata$Raw.file%in%r_rmv, ] + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + + #Assemble list of peptides in each Raw file + plotdata$Raw.file <- factor(plotdata$Raw.file) + expsInDF <- levels(plotdata$Raw.file) + peplist <- list() + for (i in 1:length(expsInDF)){ + peptidesDF <- dplyr::filter(plotdata, Raw.file == expsInDF[i]) + peptides <- dplyr::select(peptidesDF, Precursor.Id) + peplist[[i]] <- peptides + } + + #Get intersection of all peptides + intersectList <- as.vector(Reduce(intersect, peplist)) + + #Get reduced dataframe for elements that match intersection + plotdata_Intersected <- dplyr::filter(plotdata, Precursor.Id %in% intersectList$Precursor.Id) + + + + plotdata <- plotdata_Intersected %>% group_by(Raw.file, Precursor.Id) %>% + summarize(IntAvg=mean(Intensity, na.rm = TRUE), .groups = "drop") + + plotdata.w <- reshape2::dcast(plotdata, Precursor.Id ~ Raw.file, value = "IntAvg") + baselineInd <- ncol(plotdata.w) + 1 + plotdata.w$baseline <- plotdata.w[,2] + + for (j in 2:(baselineInd-1)){ + plotdata.w[,j] <- log10(plotdata.w[,j])-log10(plotdata.w[,baselineInd]) + + } + + plotdata.w.m <- melt(plotdata.w) + plotdata.w.m.clean <- plotdata.w.m %>% filter(!(variable == "baseline")) + colnames(plotdata.w.m.clean) <- c("Precursor.Id","Raw.file","Intensity") + plotdata <- plotdata.w.m.clean + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need(nrow(plotdata) > 19, paste0('Less than 20 peptides in common'))) + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + xlim(c(-3,3))+ + coord_flip() + + labs(x=expression(bold('Log'[10]*' Intensity Relative to first experiment')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R b/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R new file mode 100644 index 0000000..8ecb113 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R @@ -0,0 +1,78 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Precursors by Charge State' + help_text <- 'Number of precursors observed during MS1 scans by charge state' + source_file <- 'report' + + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Precursor.Charge','Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label = labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + + } + + plotdata <- plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), Charge = first(Precursor.Charge), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + + plotdata$Charge[plotdata$Charge > 3] <- 4 + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Charge) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata) + + geom_bar(aes(x=factor(Charge), y=n, fill=factor(Charge), colour=factor(Charge)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + scale_fill_hue(labels=c('1', '2', '3', '>3')) + + labs(x='Experiment', y='Count', fill='Charge State') + + theme_diann(input=input, show_legend=T)+ + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank())+ + theme(legend.position = "bottom")+ + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R b/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R new file mode 100644 index 0000000..636893b --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R @@ -0,0 +1,67 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Channel wise MS1 Copy Number for Precursors' + help_text <- 'Plotting the MS1 copy numbers for all precursors which were associated with one of the defined channels. + The copy numbers are calculated using the signal to noise ratio as described in Derks et al. 2022. By default a resolution of 70,000 is used during preprocessing. It can be changed with the --resolution parameter' + source_file <- 'sn' + + .validate <- function(data, input) { + validate(need(data()[['sn']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['sn']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['sn']][,c('Raw.file','Precursor.Id', 'Copy.Number')] + plotdata <- translate_diann_channel_format(plotdata, columns = c("Precursor.Id")) + plotdata <- separate_channel_info(plotdata) + + plotdata$Intensity <- log10(plotdata$Copy.Number) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Intensity, color = Label)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Copy Number')), y='Number of Precursors') + + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R b/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R new file mode 100644 index 0000000..dc2b730 --- /dev/null +++ b/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R @@ -0,0 +1,102 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Confident Precursor Identifications' + help_text <- 'Plotting the number of precursors identified at each given confidence level.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload evidence.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'PEP')] + + plotdata <- plotdata[complete.cases(plotdata[ , 'PEP']),] + + # build log10 PEP vector + peps <- seq(log10(max(c(min(plotdata$PEP)), 1e-8)), log10(max(plotdata$PEP)), length.out=500) + peps <- c(log10(.Machine$double.xmin), peps) + + + plotdata <- plotdata %>% + dplyr::mutate(bin=findInterval(PEP, 10**peps)) + + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, bin) + + + plotdata <- plotdata %>% + dplyr::summarise(n=dplyr::n()) + + + + plotdata <- plotdata %>% + dplyr::mutate(cy=cumsum(n)) + + + + plotdata$pep = 10**peps[plotdata$bin+1] + + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + # Rank the Experiments by most number of peptides observed + + maxnum <- c() + rawnames <- c() + + for(X in unique(plotdata$Raw.file)){ + maxnum <- c(maxnum, max(plotdata$cy[plotdata$Raw.file %in% X]) ) + rawnames <- c(rawnames, X) + } + + names(maxnum) <- rawnames + rank_exp <- maxnum[order(maxnum, decreasing = T)] + rank_exp_ord <- seq(1, length(rank_exp),1) + names(rank_exp_ord) <- names(rank_exp) + plotdata$rank_ord <- NA + + for(X in levels(plotdata$Raw.file)) { + plotdata$rank_ord[plotdata$Raw.file %in% X] <- rank_exp_ord[X] + } + + cc <- scales::seq_gradient_pal('red', 'blue', 'Lab')(seq(0, 1, length.out=length(rank_exp_ord))) + + ggplot(plotdata, aes(x=pep, color=Raw.file, y=cy, group=Raw.file)) + + geom_line(size = input$figure_line_width) + + scale_colour_manual(name='Experiment', values=cc, labels=levels(plotdata$Raw.file)) + + coord_flip() + + scale_x_log10(limits=c(.000009,max(plotdata$pep)), breaks=c(.00001,.0001,.001,.01,.1), + labels=scales::trans_format('log10', scales::math_format(10^.x))) + + xlab('PEP') + ylab('Number of Precursors') + + theme_diann(input=input, show_legend=T) + + theme(panel.grid.major.x = element_line(colour = "grey80", size = 0.4), + legend.position='right', + legend.key=element_rect(fill='white')) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + #plot_height=500, # pixels + report_plot_width=7, # inches + report_plot_height=5 # inches + )) +} diff --git a/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R b/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R new file mode 100644 index 0000000..5d75910 --- /dev/null +++ b/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R @@ -0,0 +1,65 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors by Quantification Strategy' + help_text <- 'Number of precursors found based on quantification startegy. Ms2 precursors are counted based on Precursor.Quantity > 0 and Ms1 precursors are counted based on Ms1.Area > 0.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file','Precursor.Quantity', 'Ms1.Area')] + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Documents/testdaten/2022_06_24_MS2_number_wGW011-wGW017/report.tsv',guess_max=1e5)) + + full_counts = plotdata %>% + filter(Precursor.Quantity > 0) %>% + group_by(Raw.file) %>% + tally() %>% + mutate(Label='MS2 Quantified') + + ms1_counts <- plotdata %>% + filter(Ms1.Area > 0)%>% + group_by(Raw.file) %>% + tally() %>% + mutate(Label='MS1 Quantified') + + outdf = rbind(full_counts, ms1_counts) + outdf$Label = factor(outdf$Label, levels=c('MS2 Quantified','MS1 Quantified')) + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata, aes(x=Label, y=n, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors', fill='Quantification', colour='Quantification') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom", + axis.text.x = element_blank(), + axis.ticks = element_blank()) + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/030_Identifications/02_modified_IDs.R b/modules/dia-nn/030_Identifications/02_modified_IDs.R new file mode 100644 index 0000000..13a777c --- /dev/null +++ b/modules/dia-nn/030_Identifications/02_modified_IDs.R @@ -0,0 +1,116 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors by Modification' + help_text <- 'Number of precursors found based on modification types specified' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + validate(need(config[['modifications']], paste0('Please provide a list of modifications under the key: Modifications in the settings.yaml file'))) + } + + + + .get_occurence <- function(string, pattern){ + occurence <- str_count(string, pattern = pattern) + if (occurence > 0){ + return(pattern) + } else { + return("Unmodified") + } + } + + .plotdata <- function(data, input) { + + #retrive modifications before report, so columns can be selected + modsdata <- config[['modification_list']] + modsdata <- modsdata[modsdata$real_mod, ] + + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', modsdata[['unimod']], 'mod_sum' )] + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + labelsdata <- config[['ChemicalLabels']] + + + + outdf <- data.frame(Raw.file=character(), + Modification=character(), + Identifications=integer(), + stringsAsFactors=FALSE) + + # Iterate over all modifications + for (i in 1:nrow(modsdata)){ + + current_mod <- modsdata$unimod[[i]] + + plotdata_mod <- plotdata[plotdata[[current_mod]] > 0, ] + + plotdata_mod <- plotdata_mod %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + + plotdata_mod$Modification <- modsdata$name[[i]] + + outdf <- rbind(outdf, plotdata_mod) + } + + + plotdata_total <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + plotdata_total$Modification <- 'Total' + + outdf <- rbind(outdf, plotdata_total) + + plotdata_unmod <- plotdata[plotdata$mod_sum < 1, ] + plotdata_unmod <- plotdata_unmod %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + plotdata_unmod$Modification <- 'Unmodified' + outdf <- rbind(outdf, plotdata_unmod) + + + levels_plot = c(modsdata[['name']],'Unmodified','Total') + outdf <- within(outdf, + Modification <- factor(Modification, levels=levels_plot)) + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Modification, y=Identifications, fill=Modification, colour=Modification)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + #theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + labs(x='', y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom", + axis.text.x = element_blank(), + axis.ticks = element_blank())+ + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/030_Identifications/05_miscleavage_table.R b/modules/dia-nn/030_Identifications/05_miscleavage_table.R new file mode 100644 index 0000000..e419c86 --- /dev/null +++ b/modules/dia-nn/030_Identifications/05_miscleavage_table.R @@ -0,0 +1,83 @@ +init <- function() { + + type <- 'table' + box_title <- 'Miscleavage Rate (percentage), PEP < 0.01' + help_text <- 'Miscleavage rate (percentage) for precursors identified with confidence PEP < 0.01' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages.R = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="R") + plotdata$Missed.cleavages.K = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="K") + plotdata$Missed.cleavages = plotdata$Missed.cleavages.R + plotdata$Missed.cleavages.K + + # group by raw file and number of missed cleavages, wrangle data + plotdata <- plotdata %>% + dplyr:: filter(!is.na(Missed.cleavages)) %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + max_missed = max(plotdata$Missed.cleavages) + + plotdata <- plotdata %>% + tidyr::spread(Missed.cleavages, n) + + plotdata[is.na(plotdata)] = 0 + + if (max_missed==1){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1`) / (`0` + `1`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else if (max_missed==2){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1` + 2*`2`) / (`0` + `1` + 2*`2`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else if (max_missed==3){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1` + 2*`2`+ 3*`3`) / (`0` + `1` + 2*`2`+ 3*`3`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else { + return("Not supported for more than 2 missed cleavage sites") + } + + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + plotdata + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} diff --git a/modules/dia-nn/030_Identifications/10_miscleavage_K.R b/modules/dia-nn/030_Identifications/10_miscleavage_K.R new file mode 100644 index 0000000..49de757 --- /dev/null +++ b/modules/dia-nn/030_Identifications/10_miscleavage_K.R @@ -0,0 +1,64 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Miscleavage Rate (K), PEP < 0.01' + help_text <- 'Plotting frequency of lysine miscleavages in confidently identified precursors.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="K") + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% dplyr::filter(!is.na(Missed.cleavages)) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=factor(Missed.cleavages), y=n, fill=factor(Missed.cleavages), colour=factor(Missed.cleavages)), + stat='identity', position='dodge2') + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_bar(stat="identity", alpha=0.7) + + labs(x='Missed K Cleavages', y='Count') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom")+ + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank()) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/10_miscleavage_R.R b/modules/dia-nn/030_Identifications/10_miscleavage_R.R new file mode 100644 index 0000000..3015bc5 --- /dev/null +++ b/modules/dia-nn/030_Identifications/10_miscleavage_R.R @@ -0,0 +1,66 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Miscleavage Rate (R), PEP < 0.01' + help_text <- 'Plotting frequency of arginine miscleavages in confidently identified precursors.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="R") + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% dplyr::filter(!is.na(Missed.cleavages)) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=factor(Missed.cleavages), y=n, fill=factor(Missed.cleavages), colour=factor(Missed.cleavages)), + stat='identity', position='dodge2') + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_bar(stat="identity", alpha=0.7) + + labs(x='Missed R Cleavages', y='Count') + + theme_base(input=input, show_legend=T)+ + theme_diann(input=input, show_legend=T) + + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/20_protein_groups.R b/modules/dia-nn/030_Identifications/20_protein_groups.R new file mode 100644 index 0000000..23c92fa --- /dev/null +++ b/modules/dia-nn/030_Identifications/20_protein_groups.R @@ -0,0 +1,92 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Protein Identifications' + help_text <- 'Number of proteotypic protein IDs found per run. Protein IDs are shown across all channels in an experiment.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) +} + + + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Proteotypic', 'Label', 'Protein.Ids','Protein.Group','Protein.Q.Value','PG.Q.Value')] + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + Proteins <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Ids), .groups = "drop") + Proteins$Label <- 'Proteins' + + Proteins.Q <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Protein.Q.Value < 0.01) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Ids), .groups = "drop") + Proteins.Q$Label <- 'Proteins, q-val < 1%' + + PG <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Group), .groups = "drop") + PG$Label <- 'Protein Groups' + + PG.Q <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(PG.Q.Value < 0.01) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Group), .groups = "drop") + PG.Q$Label <- 'Protein Groups, q-val < 1%' + + plotdata <- rbind(Proteins, Proteins.Q, PG, PG.Q) + # create custom factor levels to influence the order of labels + levels_plot = c('Proteins', 'Proteins, q-val < 1%', 'Protein Groups', 'Protein Groups, q-val < 1%') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, colour=Label, fill=Label )) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Proteins', colour ='Filter', fill = 'Filter') + + theme_diann(input=input, show_legend=T) + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R b/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R new file mode 100644 index 0000000..f423225 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R @@ -0,0 +1,123 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Precursors per Channel' + help_text <- 'The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors found in the report.tsv file which is by default controlled by a run-specific FDR.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label','Translated.Q.Value')] + + #plotdata <- read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5, col_types = cols()) + #plotdata <- translate_diann_channel_format(plotdata) + + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata %>% + # mutate(Raw.file=Run) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + + plotdata_union <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id), + Identifications.O = n_distinct(Precursor.Id[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_channel <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id), + Identifications.O = n_distinct(Precursor.Id[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_I <- plotdata %>% + dplyr::group_by(Raw.file,Precursor.Id) %>% + dplyr::summarise(Channels = n_distinct(Label), + Channels.O = n_distinct(Label[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_intersected <- plotdata_I %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id[Channels == 3]), + Identifications.O = n_distinct(Precursor.Id[Channels.O == 3]), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + plotdata = rbind(plotdata_channel, plotdata_intersected, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Identifications.T = Identifications - Identifications.O) %>% + gather("Type", "Identifications", c('Identifications.T','Identifications.O')) + + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Type=recode(Type, + Identifications.T = "Translated", + Identifications.O = "Main Search")) + + levels_plot = c("Translated","Main Search") + plotdata <- within(plotdata, + Type <- factor(Type, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label, alpha=Type)) + + geom_bar(stat="identity") + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors') + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + #theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom")+ + scale_alpha_manual(name="",values=c(0.4,0.8)) + + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none")+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none") + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R b/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R new file mode 100644 index 0000000..8871ae9 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R @@ -0,0 +1,101 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Precursors per Channel, Channel Q-Value' + help_text <- 'The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors with a Channel.Q.Value <= 0.01.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need(data()[['report']][['Channel.Q.Value']], paste0('Channel.Q.Value column was not found. Your DIA-NN version does not yet support this feature.'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Channel.Q.Value', 'Label')] + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + plotdata <- plotdata %>% + dplyr::filter(Channel.Q.Value < 0.01) + + + # calculate channel wise IDs + plotdata_n <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate IDs across channels + plotdata_k <- plotdata %>% + dplyr::group_by(Raw.file, Precursor.Id) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate union of all IDs + plotdata_union <- plotdata_k %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate intersection of all IDs + plotdata_k = plotdata_k[plotdata_k$Identifications==3,] + plotdata_intersected <- plotdata_k %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + + plotdata = rbind(plotdata_n, plotdata_intersected) + plotdata = rbind(plotdata, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors') + + theme_diann(input=input, show_legend=F) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]))+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]])) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R b/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R new file mode 100644 index 0000000..98413cc --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R @@ -0,0 +1,118 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Proteins per Channel' + help_text <- 'The number of Proteins identified is shown for every channel together with the number of total and intersected proteins. The number of proteins is based on all proteotypic precursors independent of the Protein.Q.Value.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Proteotypic', 'Label', 'Protein.Ids','Protein.Group','Protein.Q.Value','PG.Q.Value', 'Translated.Q.Value')] + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + plotdata_union <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids), + Identifications.O = n_distinct(Protein.Ids[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_channel <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file,Label) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids), + Identifications.O = n_distinct(Protein.Ids[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_I <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file,Protein.Ids) %>% + dplyr::summarise(Channels = n_distinct(Label), + Channels.O = n_distinct(Label[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_intersected <- plotdata_I %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids[Channels == 3]), + Identifications.O = n_distinct(Protein.Ids[Channels.O == 3]), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + + plotdata = rbind(plotdata_channel, plotdata_intersected, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Identifications.T = Identifications - Identifications.O) %>% + gather("Type", "Identifications", c('Identifications.T','Identifications.O')) + + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Type=recode(Type, + Identifications.T = "Translated", + Identifications.O = "Main Search")) + + levels_plot = c("Translated","Main Search") + plotdata <- within(plotdata, + Type <- factor(Type, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label, alpha=Type)) + + geom_bar(stat="identity") + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Proteins') + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + theme(legend.position = "bottom")+ + scale_alpha_manual(name="",values=c(0.4,0.8)) + + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none")+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none") + + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R b/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R new file mode 100644 index 0000000..e54a45c --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R @@ -0,0 +1,115 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Relative Single-Cell Intensity' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label')] + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + # create empty dataframe for output + + outdf <- data.frame(Raw.file=factor(), + logratio=double(), + Label=factor(), + stringsAsFactors=FALSE) + + # iterate all experiments + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + experiment = experiments[[i]] + + subdf <- plotdata[plotdata$Raw.file==experiment,] + + # get carrier label, eg. label with highest intensity + meandf <- subdf %>% + dplyr::group_by(Label) %>% + dplyr::summarise(mean=mean(Ms1.Area), + .groups = "drop") + meandf <- meandf[order(meandf$mean),] + + carrier_label <- 'd8' #meandf[1:1,][['Label']] + + # Calculate ratio to carrier for all but carrier label + + + + for (channel in config[['channels']]) { + current_label = channel[['name']] + if (current_label != carrier_label){ + + # join dataframes by intersected precursors + carrier_df <- subdf[subdf$Label==carrier_label,] + carrier_df <- carrier_df[,c("Precursor.Id","Ms1.Area")] + current_df <- subdf[subdf$Label==current_label,] + + joineddf <- merge(carrier_df, current_df, by = "Precursor.Id") + joineddf$logratio = log10(joineddf$Ms1.Area.y/joineddf$Ms1.Area.x) + + joineddf <- joineddf[,c("Raw.file","logratio","Label")] + + + outdf <- rbind(outdf, joineddf) + + } + } + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata, aes(x=Label, y=logratio)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + theme_diann(input=input, show_legend=T) + + geom_boxplot()+ + ylim(-2.5,1)+ + labs(x='', y='log10 Intensity Ratio') + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R b/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R new file mode 100644 index 0000000..64375f9 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Quantification Variability' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5)) + #plotdata['Raw.file'] <- plotdata['Run'] + #plotdata <- translate_diann_channel_format(plotdata) + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata[,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + # Add Label column + # + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + plotdata$seqcharge = paste0(plotdata$Stripped.Sequence, plotdata$Precursor.Charge) + + #normalize each sequence_charge to the mean of the set + plotdata <- plotdata %>% dplyr::group_by(Raw.file, seqcharge) %>% dplyr::mutate("Ms1.Area"=Ms1.Area/mean(Ms1.Area, na.rm=T)) %>% ungroup() + + # Normalize Ms1.Area based with the median of each cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::mutate("Ms1.Area.Norm" = Ms1.Area/median(Ms1.Area, na.rm=T)) %>% dplyr::ungroup() + + # Filter for at least 3 Precursors / Protein.Group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::mutate("n" = n()) %>% dplyr::ungroup() + plotdata <- plotdata[plotdata$n>3,] + + + # Calculate CV for every protein group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::summarise("cv" = sd(Ms1.Area.Norm)/mean(Ms1.Area.Norm)) + + # Calculate median cv for every cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise('median_cv' = median(cv)) %>% + dplyr::ungroup() + + # Calculate x coordinate as id in set. + # Only needed as heloper for plotting + plotdata <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::mutate('x' = row_number()) %>% + dplyr::ungroup() + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata)+ + geom_jitter( aes(x=x, y=median_cv, color=Label), size=6)+ + theme_diann() + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + ylab("Quantification Variability")+ + xlab("")+ + ylim(0,1)+ + xlim(0,4)+ + theme_diann() + + theme(legend.position = "bottom")+ + theme(axis.text.x = element_blank())+ + scale_color_manual(name='plexDIA Label:', values = c(custom_colors[[1]],custom_colors[[2]],custom_colors[[3]])) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R b/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R new file mode 100644 index 0000000..3d0d2b6 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS2 Quantification Variability' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Precursor.Quantity', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5)) + #plotdata['Raw.file'] <- plotdata['Run'] + #plotdata <- translate_diann_channel_format(plotdata) + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata[,c('Raw.file', 'Precursor.Quantity', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + # Add Label column + # + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + + plotdata <- plotdata[plotdata$Precursor.Quantity>0,] + + + plotdata$seqcharge = paste0(plotdata$Stripped.Sequence, plotdata$Precursor.Charge) + + #normalize each sequence_charge to the mean of the set + plotdata <- plotdata %>% dplyr::group_by(Raw.file, seqcharge) %>% dplyr::mutate("Precursor.Quantity"=Precursor.Quantity/mean(Precursor.Quantity, na.rm=T)) %>% ungroup() + + # Normalize Precursor.Quantity based with the median of each cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::mutate("Precursor.Quantity.Norm" = Precursor.Quantity/median(Precursor.Quantity, na.rm=T)) %>% dplyr::ungroup() + + # Filter for at least 3 Precursors / Protein.Group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::mutate("n" = n()) %>% dplyr::ungroup() + plotdata <- plotdata[plotdata$n>3,] + + + # Calculate CV for every protein group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::summarise("cv" = sd(Precursor.Quantity.Norm)/mean(Precursor.Quantity.Norm)) + + # Calculate median cv for every cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise('median_cv' = median(cv)) %>% + dplyr::ungroup() + + # Calculate x coordinate as id in set. + # Only needed as heloper for plotting + plotdata <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::mutate('x' = row_number()) %>% + dplyr::ungroup() + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata)+ + geom_jitter( aes(x=x, y=median_cv, color=Label), size=6)+ + theme_diann() + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + ylab("Quantification Variability")+ + xlab("")+ + ylim(0,1)+ + xlim(0,4)+ + theme_diann() + + theme(legend.position = "bottom")+ + theme(axis.text.x = element_blank())+ + scale_color_manual(name='plexDIA Label:', values = c(custom_colors[[1]],custom_colors[[2]],custom_colors[[3]])) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R b/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R new file mode 100644 index 0000000..f517d6e --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R @@ -0,0 +1,101 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Total MS1 Precursors' + help_text <- 'Total number of precursors identified based on different confidence metrics.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['ms1_extracted']], paste0('Upload ms1_extracted.tsv'))) + validate(need((nrow(data()[['ms1_extracted']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + + .plotdata <- function(data, input) { + + ms1_extracted <- data()[['ms1_extracted']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id')] + + + report <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id')] + + + ms1_extracted <- ms1_extracted[ms1_extracted$Ms1.Area>0,] + report <- report[report$Ms1.Area>0,] + + + # calculate channel wise IDs + ms1_extracted_s <- ms1_extracted %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate channel wise IDs + report_s <- report %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # Set labels dor newly create dataframe + ms1_extracted_s$Label <- "Ms1_extracted.tsv" + report_s$Label <- "run q-value" + + plotdata = rbind(ms1_extracted_s, report_s) + + # create custom factor levels to influence the order of labels + levels_plot = c("run q-value","Ms1_extracted.tsv") + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + if ("Channel.Q.Value" %in% names(report)){ + report_channel_q<- report %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Channel.Q.Value < 0.01) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + report_channel_q$Label <- "channel q-value" + + plotdata = rbind(ms1_extracted_s, report_s, report_channel_q) + + # create custom factor levels to influence the order of labels + levels_plot = c("run q-value","Ms1_extracted.tsv","channel q-value") + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + } + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(y='Number of Precursors',x='' ) + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + scale_fill_manual(values=c("#DC267F","#FFB000","#648FFF")) + + scale_fill_manual(values = c(custom_colors[[4]],custom_colors[[5]],custom_colors[[6]]))+ + scale_color_manual(values = c(custom_colors[[4]],custom_colors[[5]],custom_colors[[6]])) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R b/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R new file mode 100644 index 0000000..778221a --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Missing Data, Precursor Level' + help_text <- 'Plotting the Jaccard Index for identified precursors for all channel combinations. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .jaccard <- function(a, b) { + intersection = length(intersect(a, b)) + union = length(a) + length(b) - intersection + return (intersection/union) + } + + .jaccard_by_label <- function(dataframe, experiment){ + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + labels = sort(unique(dataframe[["Label"]])) + + rows <- length(labels) + idx <- which(lower.tri(matrix(, rows, rows), diag = FALSE) == TRUE, arr.ind=T) + + for(i in 1:nrow(idx)){ + first_index <- idx[i,'row'] + second_index <- idx[i,'col'] + + first_label <- labels[[first_index]] + second_label <- labels[[second_index]] + + precursors_first = dataframe[dataframe$Label == first_label, ] + precursors_first = precursors_first[["Precursor.Id"]] + + precursors_second = dataframe[dataframe$Label == second_label, ] + precursors_second = precursors_second[["Precursor.Id"]] + + dist <- .jaccard(precursors_first, precursors_second) + + x <- i/nrow(idx) + if ((second_label == 'd8') || (first_label == 'd8')) { + order <- 'reference' + } else { + order <- 'other' + } + outdf <- rbind(outdf, data.frame(Raw.file=experiment, Order=order, Jacc=dist, Label=first_label, X=x)) + + } + + return(outdf) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' )] + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + experiment = experiments[[i]] + subdf <- plotdata[plotdata$Raw.file==experiment,] + jaccdf <- .jaccard_by_label(subdf, experiment) + outdf <- rbind(outdf, jaccdf) + + + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata)+ + geom_point(data=plotdata, aes(x=X, y=Jacc, color=Order), size=6)+ + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + scale_color_manual(name = "Comparison", values = c(custom_colors[[3]],custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_blank())+ + xlim(-1,2)+ + ylab("Jaccard Index")+ + xlab("")+ + ylim(0,1) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R b/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R new file mode 100644 index 0000000..9bb7b31 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R @@ -0,0 +1,130 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Missing Data, Protein Level' + help_text <- 'Plotting the Jaccard Index for identified precursors for all channel combinations. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .jaccard <- function(a, b) { + intersection = length(intersect(a, b)) + union = length(a) + length(b) - intersection + return (intersection/union) + } + + .jaccard_by_label <- function(dataframe, experiment){ + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + labels = sort(unique(dataframe[["Label"]])) + + rows <- length(labels) + idx <- which(lower.tri(matrix(, rows, rows), diag = FALSE) == TRUE, arr.ind=T) + + for(i in 1:nrow(idx)){ + first_index <- idx[i,'row'] + second_index <- idx[i,'col'] + + first_label <- labels[[first_index]] + second_label <- labels[[second_index]] + + precursors_first = dataframe[dataframe$Label == first_label, ] + precursors_first = unique(precursors_first[["Protein.Ids"]]) + + precursors_second = dataframe[dataframe$Label == second_label, ] + precursors_second = unique(precursors_second[["Protein.Ids"]]) + + dist <- .jaccard(precursors_first, precursors_second) + + x <- i/nrow(idx) + if ((second_label == 'd8') || (first_label == 'd8')) { + order <- 'reference' + } else { + order <- 'other' + } + outdf <- rbind(outdf, data.frame(Raw.file=experiment, Order=order, Jacc=dist, Label=first_label, X=x)) + + } + + return(outdf) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' ,'Protein.Ids','Proteotypic')] + + #plotdata <- read_tsv(file='G:/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16/report.tsv',guess_max=1e5, col_types = cols()) + #plotdata <- translate_diann_channel_format(plotdata) + + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata %>% + #mutate(Raw.file=Run) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + plotdata <- plotdata[plotdata$Proteotypic==1,] + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + print(i) + print(experiments[[i]]) + experiment = experiments[[i]] + + subdf <- plotdata[plotdata$Raw.file==experiment,] + jaccdf <- .jaccard_by_label(subdf, experiment) + + outdf <- rbind(outdf, jaccdf) + + + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata)+ + geom_point(data=plotdata, aes(x=X, y=Jacc, color=Order), size=6)+ + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + scale_color_manual(name = "Comparison", values = c(custom_colors[[3]],custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_blank())+ + xlim(-1,2)+ + ylab("Jaccard Index")+ + xlab("")+ + ylim(0,1) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R b/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R new file mode 100644 index 0000000..77c99cf --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R @@ -0,0 +1,85 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified by Charge' + help_text <- 'Identified features are reported based on the charge. Precursors quantified in seperate channels are treated as separate precursors..' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + validate(need(data()[['report']], paste0('Upload report.tsv'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + .get_label <- function(sequence, labelsdata){ + + label = '' + + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + if (grepl( current_label, sequence, fixed = TRUE)){ + label <- current_label + } + + } + + return(label) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Protein.Group')] + featuredata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtEnd')] + labelsdata <- config[['ChemicalLabels']] + + featuredata$charge[featuredata$charge > 3] <- 4 + + featuredata_total <- featuredata %>% + dplyr::group_by(Raw.file, charge) %>% + dplyr::tally() + + + + return(featuredata_total) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata) + + geom_bar(aes(x=charge, y=n, fill=factor(charge), colour=factor(charge)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Charge State', y='Count', fill='Charge State') + + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R b/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R new file mode 100644 index 0000000..54ed86e --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R @@ -0,0 +1,61 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Isotopic Peaks Identified per Feature' + help_text <- 'The number of isotopic peaks identified is shown for features detected in the Dinosaur search.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','nIsotopes')] + + plotdata$nIsotopes[plotdata$nIsotopes > 5] <- 5 + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, nIsotopes) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxIsotopes <- max(plotdata$nIsotopes) + + ggplot(plotdata) + + geom_bar(aes(x=nIsotopes, y=n, fill=factor(nIsotopes), colour=factor(nIsotopes)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Number of Isotopic Peaks', y='Features identified', fill='Isotopes') + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + theme_diann(input=input, show_legend=T) + + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/03_nscans.R b/modules/dia-nn/060_Feature_Detection/03_nscans.R new file mode 100644 index 0000000..024c9b1 --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/03_nscans.R @@ -0,0 +1,57 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Scans per feature' + help_text <- 'The number of MS1 scans is shown for all features identified in the Dinosaur search.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','nScans')] + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$nScans, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$nScans, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(nScans)) + + plotdata[plotdata$nScans >= ceiling, 'nScans'] <- ceiling + plotdata[plotdata$nScans <= floor, 'nScans'] <- floor + + maxScans <- max(plotdata$nScans) + + ggplot(plotdata, aes(nScans)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=30, fill=custom_colors[[6]]) + + coord_flip() + + xlim(0, maxScans) + + labs(x='Number of Scans', y='Features identified') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R b/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R new file mode 100644 index 0000000..b68aba7 --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R @@ -0,0 +1,56 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified across Gradient' + help_text <- 'The frequency of precursor identifications based on the Dinosaur search is plotted across the chromatographic gradient.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(rtStart > config[['RT.Start']]) %>% + filter(rtStart < config[['RT.End']]) + + plotdata$Category = 'z = 1' + plotdata$Category[plotdata$charge > 1] <- 'z > 1' + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxRT <- max(plotdata$rtApex) + + ggplot(plotdata, aes(x=rtApex, color = Category)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='Retention Time (min)', y='Number of Precursors') + + scale_color_manual(name='Charge:', values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme_diann(input=input, show_legend=T)+ + theme(legend.position = "bottom") + + } + + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R b/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R new file mode 100644 index 0000000..7b1adab --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R @@ -0,0 +1,51 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified across m/z' + help_text <- 'The frequency of precursor identifications based on the Dinosaur search is plotted across the mass to charge ratio.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + plotdata$Category = 'z = 1' + plotdata$Category[plotdata$charge > 1] <- 'z > 1' + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxMZ <- max(plotdata$mz) + minMZ <- max(plotdata$mz) + + ggplot(plotdata, aes(x=mz, color = Category)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='m/z', y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + scale_color_manual(name='Charge:', values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme(legend.position = "bottom") + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R b/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R new file mode 100644 index 0000000..6c2017e --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R @@ -0,0 +1,60 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Retention Length of Features at Base' + help_text <- 'Plotting the retention length of identified features at the base.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(rtStart > config[['RT.Start']]) %>% + filter(rtStart < config[['RT.End']]) + + plotdata <- dplyr::mutate(plotdata, RT.Length = (rtEnd-rtStart)*60) + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$RT.Length, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$RT.Length, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(RT.Length)) + + plotdata[plotdata$RT.Length >= ceiling, "RT.Length"] <- ceiling + plotdata[plotdata$RT.Length <= floor, "RT.Length"] <- floor + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(RT.Length)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x='Retention Lengths at base (sec)', y='Features identified') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R b/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R new file mode 100644 index 0000000..880e71b --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R @@ -0,0 +1,53 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Feature Intensity Distribution' + help_text <- 'The distribution of integrated intensities is shown for identified features. ' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','intensitySum')] + plotdata$Intensity <- log10(plotdata$intensitySum) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata, aes(x=Intensity )) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x=expression(bold('Log'[10]*'Feature Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/005_Summary/summary_005_maxquant.R b/modules/max_quant/005_Summary/summary_005_maxquant.R similarity index 100% rename from modules/005_Summary/summary_005_maxquant.R rename to modules/max_quant/005_Summary/summary_005_maxquant.R diff --git a/modules/005_Summary/summary_010_expmap.R b/modules/max_quant/005_Summary/summary_010_expmap.R similarity index 100% rename from modules/005_Summary/summary_010_expmap.R rename to modules/max_quant/005_Summary/summary_010_expmap.R diff --git a/modules/005_Summary/summary_020_parameters.R b/modules/max_quant/005_Summary/summary_020_parameters.R similarity index 100% rename from modules/005_Summary/summary_020_parameters.R rename to modules/max_quant/005_Summary/summary_020_parameters.R diff --git a/modules/005_Summary/summary_030_expsummary.R b/modules/max_quant/005_Summary/summary_030_expsummary.R similarity index 100% rename from modules/005_Summary/summary_030_expsummary.R rename to modules/max_quant/005_Summary/summary_030_expsummary.R diff --git a/modules/010_Chromatography/chrom_00_FWHM.R b/modules/max_quant/010_Chromatography/chrom_00_FWHM.R similarity index 100% rename from modules/010_Chromatography/chrom_00_FWHM.R rename to modules/max_quant/010_Chromatography/chrom_00_FWHM.R diff --git a/modules/010_Chromatography/chrom_00_FWHM_most_intense.R b/modules/max_quant/010_Chromatography/chrom_00_FWHM_most_intense.R similarity index 100% rename from modules/010_Chromatography/chrom_00_FWHM_most_intense.R rename to modules/max_quant/010_Chromatography/chrom_00_FWHM_most_intense.R diff --git a/modules/010_Chromatography/chrom_01_IDfreq_by_RT.R b/modules/max_quant/010_Chromatography/chrom_01_IDfreq_by_RT.R similarity index 100% rename from modules/010_Chromatography/chrom_01_IDfreq_by_RT.R rename to modules/max_quant/010_Chromatography/chrom_01_IDfreq_by_RT.R diff --git a/modules/010_Chromatography/chrom_02_RT_length_base.R b/modules/max_quant/010_Chromatography/chrom_02_RT_length_base.R similarity index 100% rename from modules/010_Chromatography/chrom_02_RT_length_base.R rename to modules/max_quant/010_Chromatography/chrom_02_RT_length_base.R diff --git a/modules/020_Ion_Sampling/instrument_00_apex_offset.R b/modules/max_quant/020_Ion_Sampling/instrument_00_apex_offset.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_00_apex_offset.R rename to modules/max_quant/020_Ion_Sampling/instrument_00_apex_offset.R diff --git a/modules/020_Ion_Sampling/instrument_01_MS1_int_ID.R b/modules/max_quant/020_Ion_Sampling/instrument_01_MS1_int_ID.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_01_MS1_int_ID.R rename to modules/max_quant/020_Ion_Sampling/instrument_01_MS1_int_ID.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R diff --git a/modules/020_Ion_Sampling/instrument_03_MS1_int_z_all.R b/modules/max_quant/020_Ion_Sampling/instrument_03_MS1_int_z_all.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_03_MS1_int_z_all.R rename to modules/max_quant/020_Ion_Sampling/instrument_03_MS1_int_z_all.R diff --git a/modules/020_Ion_Sampling/instrument_04_injection_time_PSM.R b/modules/max_quant/020_Ion_Sampling/instrument_04_injection_time_PSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_04_injection_time_PSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_04_injection_time_PSM.R diff --git a/modules/020_Ion_Sampling/instrument_04_ms2_per_PSM.R b/modules/max_quant/020_Ion_Sampling/instrument_04_ms2_per_PSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_04_ms2_per_PSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_04_ms2_per_PSM.R diff --git a/modules/020_Ion_Sampling/instrument_05_injection_time_noPSM.R b/modules/max_quant/020_Ion_Sampling/instrument_05_injection_time_noPSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_05_injection_time_noPSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_05_injection_time_noPSM.R diff --git a/modules/020_Ion_Sampling/instrument_06_MS1_fill_time.R b/modules/max_quant/020_Ion_Sampling/instrument_06_MS1_fill_time.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_06_MS1_fill_time.R rename to modules/max_quant/020_Ion_Sampling/instrument_06_MS1_fill_time.R diff --git a/modules/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R b/modules/max_quant/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R rename to modules/max_quant/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R diff --git a/modules/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R b/modules/max_quant/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R rename to modules/max_quant/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R diff --git a/modules/020_Ion_Sampling/instrument_10_RI_precursor.R b/modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R similarity index 93% rename from modules/020_Ion_Sampling/instrument_10_RI_precursor.R rename to modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R index 1d00828..7028a08 100644 --- a/modules/020_Ion_Sampling/instrument_10_RI_precursor.R +++ b/modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R @@ -1,86 +1,86 @@ -init <- function() { - - type <- 'plot' - box_title <- 'MS2 Signal / MS1 Signal per peptide, log10' - help_text <- 'This plot displays the sum of the reporter ion intensities divided by its precursor intensity per run on a log10 scale. This can serve as an indicator of transmission efficiency.' - source_file <- 'evidence' - - .validate <- function(data, input) { - validate(need(data()[['evidence']], paste0('Upload evidence.txt'))) - - # require reporter ion quantification data - validate(need(any(grepl('Reporter.intensity.corrected', colnames(data()[['evidence']]))), - paste0('Loaded data does not contain reporter ion quantification'))) - } - - .plotdata <- function(data, input) { - plotdata <- data()[['evidence']] - plotdata$SeqCharge <- paste(plotdata$Sequence, plotdata$Charge, sep="") - duplicated_seqcharge <- plotdata[duplicated(plotdata$SeqCharge), ] - - - potential_unique <- duplicated_seqcharge$SeqCharge - subframes <- split(plotdata, plotdata$Raw.file) - - - final_unique = c() - for (i in 1:length(potential_unique)){ - bool_vec = c() - for (k in 1:length(subframes)){ - if (potential_unique[i] %in% subframes[[k]][['SeqCharge']]){ - bool_vec = c(bool_vec, TRUE) - } - } - if (sum(bool_vec, na.rm = TRUE) == length(subframes)){ - final_unique = c(final_unique, potential_unique[i]) - } - } - plotdata <- plotdata[plotdata$SeqCharge %in% final_unique,] - - plotdata <- plotdata %>% dplyr::filter(Type != "MULTI-MATCH") - - plotdata <- plotdata %>% - rowwise() %>% - mutate(RI.Sum = sum(across(starts_with("Reporter.intensity.corrected")), na.rm = T)) - plotdata <- plotdata %>% - rowwise() %>% - mutate(RI.Precursor.Ratio = log10(RI.Sum / Intensity)) - - - # Thresholding data at 1 and 99th percentiles - ceiling <- quantile(plotdata$RI.Precursor.Ratio, probs=.99, na.rm = TRUE) - floor <- quantile(plotdata$RI.Precursor.Ratio, probs=.01, na.rm = TRUE) - - plotdata <- dplyr::filter(plotdata, is.finite(RI.Precursor.Ratio)) - - - plotdata[plotdata$RI.Precursor.Ratio >= ceiling, ]['RI.Precursor.Ratio'] <- ceiling - plotdata[plotdata$RI.Precursor.Ratio <= floor, ]['RI.Precursor.Ratio'] <- floor - - return(plotdata) - } - - .plot <- function(data, input) { - .validate(data, input) - plotdata <- .plotdata(data, input) - - validate(need((nrow(plotdata) >= 1), paste0('No Rows selected'))) - - ggplot(plotdata, aes(RI.Precursor.Ratio)) + - facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + - geom_histogram(bins=100) + - coord_flip() + - labs(x=expression(bold('Log'[10]*' Ratio')), y='Number of Peptides') - - } - - return(list( - type=type, - box_title=box_title, - help_text=help_text, - source_file=source_file, - validate_func=.validate, - plotdata_func=.plotdata, - plot_func=.plot - )) -} +init <- function() { + + type <- 'plot' + box_title <- 'MS2 Signal / MS1 Signal per peptide, log10' + help_text <- 'This plot displays the sum of the reporter ion intensities divided by its precursor intensity per run on a log10 scale. This can serve as an indicator of transmission efficiency.' + source_file <- 'evidence' + + .validate <- function(data, input) { + validate(need(data()[['evidence']], paste0('Upload evidence.txt'))) + + # require reporter ion quantification data + validate(need(any(grepl('Reporter.intensity.corrected', colnames(data()[['evidence']]))), + paste0('Loaded data does not contain reporter ion quantification'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['evidence']] + plotdata$SeqCharge <- paste(plotdata$Sequence, plotdata$Charge, sep="") + duplicated_seqcharge <- plotdata[duplicated(plotdata$SeqCharge), ] + + + potential_unique <- duplicated_seqcharge$SeqCharge + subframes <- split(plotdata, plotdata$Raw.file) + + + final_unique = c() + for (i in 1:length(potential_unique)){ + bool_vec = c() + for (k in 1:length(subframes)){ + if (potential_unique[i] %in% subframes[[k]][['SeqCharge']]){ + bool_vec = c(bool_vec, TRUE) + } + } + if (sum(bool_vec, na.rm = TRUE) == length(subframes)){ + final_unique = c(final_unique, potential_unique[i]) + } + } + plotdata <- plotdata[plotdata$SeqCharge %in% final_unique,] + + plotdata <- plotdata %>% dplyr::filter(Type != "MULTI-MATCH") + + plotdata <- plotdata %>% + rowwise() %>% + mutate(RI.Sum = sum(across(starts_with("Reporter.intensity.corrected")), na.rm = T)) + plotdata <- plotdata %>% + rowwise() %>% + mutate(RI.Precursor.Ratio = log10(RI.Sum / Intensity)) + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$RI.Precursor.Ratio, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$RI.Precursor.Ratio, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(RI.Precursor.Ratio)) + + + plotdata[plotdata$RI.Precursor.Ratio >= ceiling, ]['RI.Precursor.Ratio'] <- ceiling + plotdata[plotdata$RI.Precursor.Ratio <= floor, ]['RI.Precursor.Ratio'] <- floor + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) >= 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(RI.Precursor.Ratio)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=100) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Ratio')), y='Number of Peptides') + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} \ No newline at end of file diff --git a/modules/030_Peptide_Identifications/00_miscleavage.R b/modules/max_quant/030_Peptide_Identifications/00_miscleavage.R similarity index 100% rename from modules/030_Peptide_Identifications/00_miscleavage.R rename to modules/max_quant/030_Peptide_Identifications/00_miscleavage.R diff --git a/modules/030_Peptide_Identifications/05_miscleavage_table.R b/modules/max_quant/030_Peptide_Identifications/05_miscleavage_table.R similarity index 100% rename from modules/030_Peptide_Identifications/05_miscleavage_table.R rename to modules/max_quant/030_Peptide_Identifications/05_miscleavage_table.R diff --git a/modules/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R b/modules/max_quant/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R similarity index 100% rename from modules/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R rename to modules/max_quant/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R diff --git a/modules/030_Peptide_Identifications/25_PIF.R b/modules/max_quant/030_Peptide_Identifications/25_PIF.R similarity index 100% rename from modules/030_Peptide_Identifications/25_PIF.R rename to modules/max_quant/030_Peptide_Identifications/25_PIF.R diff --git a/modules/030_Peptide_Identifications/30_fractional_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/30_fractional_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate.R diff --git a/modules/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R b/modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R similarity index 100% rename from modules/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R rename to modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R diff --git a/modules/030_Peptide_Identifications/40_total_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/40_total_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/40_total_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/40_total_ID_rate.R diff --git a/modules/030_Peptide_Identifications/dart_00_a_link2.R b/modules/max_quant/030_Peptide_Identifications/dart_00_a_link2.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_00_a_link2.R rename to modules/max_quant/030_Peptide_Identifications/dart_00_a_link2.R diff --git a/modules/030_Peptide_Identifications/dart_00_pep_scatter.R b/modules/max_quant/030_Peptide_Identifications/dart_00_pep_scatter.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_00_pep_scatter.R rename to modules/max_quant/030_Peptide_Identifications/dart_00_pep_scatter.R diff --git a/modules/030_Peptide_Identifications/dart_01_fdr_increase.R b/modules/max_quant/030_Peptide_Identifications/dart_01_fdr_increase.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_01_fdr_increase.R rename to modules/max_quant/030_Peptide_Identifications/dart_01_fdr_increase.R diff --git a/modules/030_Peptide_Identifications/dart_02_alignment.R b/modules/max_quant/030_Peptide_Identifications/dart_02_alignment.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_02_alignment.R rename to modules/max_quant/030_Peptide_Identifications/dart_02_alignment.R diff --git a/modules/030_Peptide_Identifications/dart_03_ids_per_exp.R b/modules/max_quant/030_Peptide_Identifications/dart_03_ids_per_exp.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_03_ids_per_exp.R rename to modules/max_quant/030_Peptide_Identifications/dart_03_ids_per_exp.R diff --git a/modules/030_Peptide_Identifications/dart_04_fractional_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/dart_04_fractional_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_04_fractional_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/dart_04_fractional_ID_rate.R diff --git a/modules/030_Peptide_Identifications/dart_05_total_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/dart_05_total_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_05_total_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/dart_05_total_ID_rate.R diff --git a/modules/040_Contamination/contam_00_charge_count.R b/modules/max_quant/040_Contamination/contam_00_charge_count.R similarity index 100% rename from modules/040_Contamination/contam_00_charge_count.R rename to modules/max_quant/040_Contamination/contam_00_charge_count.R diff --git a/modules/040_Contamination/contam_01_charge_tic.R b/modules/max_quant/040_Contamination/contam_01_charge_tic.R similarity index 100% rename from modules/040_Contamination/contam_01_charge_tic.R rename to modules/max_quant/040_Contamination/contam_01_charge_tic.R diff --git a/modules/040_Contamination/contam_02_MS1_int_z_1.R b/modules/max_quant/040_Contamination/contam_02_MS1_int_z_1.R similarity index 100% rename from modules/040_Contamination/contam_02_MS1_int_z_1.R rename to modules/max_quant/040_Contamination/contam_02_MS1_int_z_1.R diff --git a/modules/040_Contamination/contam_03_mz_z_1.R b/modules/max_quant/040_Contamination/contam_03_mz_z_1.R similarity index 100% rename from modules/040_Contamination/contam_03_mz_z_1.R rename to modules/max_quant/040_Contamination/contam_03_mz_z_1.R diff --git a/modules/040_Contamination/contam_04_z1_int_by_RT.R b/modules/max_quant/040_Contamination/contam_04_z1_int_by_RT.R similarity index 100% rename from modules/040_Contamination/contam_04_z1_int_by_RT.R rename to modules/max_quant/040_Contamination/contam_04_z1_int_by_RT.R diff --git a/modules/040_Contamination/contam_04_z2_int_by_RT.R b/modules/max_quant/040_Contamination/contam_04_z2_int_by_RT.R similarity index 100% rename from modules/040_Contamination/contam_04_z2_int_by_RT.R rename to modules/max_quant/040_Contamination/contam_04_z2_int_by_RT.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_00_NAs.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_00_NAs.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_00_NAs.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_00_NAs.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R diff --git a/modules/080_Carrier/carrier_00.R b/modules/max_quant/080_Carrier/carrier_00.R similarity index 100% rename from modules/080_Carrier/carrier_00.R rename to modules/max_quant/080_Carrier/carrier_00.R diff --git a/modules/080_Carrier/carrier_01_label_efficiency.R b/modules/max_quant/080_Carrier/carrier_01_label_efficiency.R similarity index 86% rename from modules/080_Carrier/carrier_01_label_efficiency.R rename to modules/max_quant/080_Carrier/carrier_01_label_efficiency.R index 805aca6..24c6669 100644 --- a/modules/080_Carrier/carrier_01_label_efficiency.R +++ b/modules/max_quant/080_Carrier/carrier_01_label_efficiency.R @@ -10,10 +10,10 @@ init <- function() { validate(need(data()[['Labeling_Efficiency']], paste0('Upload evidence.txt in labeling efficiency input'))) # require TMT as a variable mod - validate(need(any(grepl('TMT', data()[['Labeling_Efficiency']]$Modifications)), - paste0('Loaded data was not searched with TMT as a variable modification'))) - validate(need(any(grepl('TMTPro_K_LE', colnames(data()[['Labeling_Efficiency']]))), - paste0('Loaded data was not searched with TMT as a variable modification'))) + #validate(need(any(grepl('TMT', data()[['Labeling_Efficiency']]$Modifications)), + #paste0('Loaded data was not searched with TMT as a variable modification'))) + #validate(need(any(grepl('TMTPro_K_LE', colnames(data()[['Labeling_Efficiency']]))), + # paste0('Loaded data was not searched with TMT as a variable modification'))) } .plotdata <- function(data, input) { @@ -26,8 +26,8 @@ init <- function() { plotdata$K_count <- stringr::str_count(plotdata$Sequence, "K") plotdata$Acetyl_count = stringr::str_count(plotdata$Modifications, "Acetyl (Protein N-term)") - n_nam<-c("TMT11plex_N_LE", "TMTPro_Nter_LE","Variable_TMTpro16plex.N-term") - k_nam<-c("TMT11plex_K_LE", "TMTPro_K_LE","Variable_TMTpro16plex.Lys") + n_nam<-c("TMT11plex_N_LE", "TMTPro_Nter_LE") + k_nam<-c("TMT11plex_K_LE", "TMTPro_K_LE") n_ind<-which(colnames(plotdata)%in%n_nam) k_ind<-which(colnames(plotdata)%in%k_nam) diff --git a/modules/080_Carrier/carrier_02_RelativeAbundance.R b/modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R similarity index 95% rename from modules/080_Carrier/carrier_02_RelativeAbundance.R rename to modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R index ecb2a71..621b6b6 100644 --- a/modules/080_Carrier/carrier_02_RelativeAbundance.R +++ b/modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R @@ -46,12 +46,9 @@ init <- function() { validate(need(nrow(plotdata) > 2, paste0('Less than 2 peptides with labeled and unlabeled precursors'))) ggplot(plotdata, aes(Log2IntRatio)) + - #geom_bar(stat='identity') + geom_histogram() + coord_flip() + facet_wrap(~Raw.file.x, nrow=1) + - #scale_x_discrete(labels=c('Unlabeled', 'Labeled')) + - #scale_fill_discrete(guide=F) + labs(x="Log2(Precursor Intensity Ratio)\n(Labeled / Unlabeled)", y='# Peptides') + theme_base(input=input) + theme(axis.text.x=element_text(angle=45, hjust=1, vjust=1)) diff --git a/modules/080_Carrier/carrier_03__miscleavage_table.R b/modules/max_quant/080_Carrier/carrier_03__miscleavage_table.R similarity index 100% rename from modules/080_Carrier/carrier_03__miscleavage_table.R rename to modules/max_quant/080_Carrier/carrier_03__miscleavage_table.R diff --git a/modules/080_Carrier/carrier_04_R_vs_K_missedcleavages.R b/modules/max_quant/080_Carrier/carrier_04_R_vs_K_missedcleavages.R similarity index 100% rename from modules/080_Carrier/carrier_04_R_vs_K_missedcleavages.R rename to modules/max_quant/080_Carrier/carrier_04_R_vs_K_missedcleavages.R diff --git a/out.html b/out.html new file mode 100644 index 0000000..f9da014 --- /dev/null +++ b/out.html @@ -0,0 +1,1899 @@ + + + + + + + + + + + + + +DO-MS Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +
+

+
+

Summary

+
+

DIA-NN Experiments

+

DIA-NN Experiments

+

wGW032 F:/GW/raw_data/wGW032.raw

+

wGW033 F:/GW/raw_data/wGW033.raw

+

wGW034 F:/GW/raw_data/wGW034.raw

+

wGW035 F:/GW/raw_data/wGW035.raw

+

wGW036 F:/GW/raw_data/wGW036.raw

+

wGW037 F:/GW/raw_data/wGW037.raw

+

wGW038 F:/GW/raw_data/wGW038.raw

+
+
+
+

Ion Sampling

+
+

Channel wise MS1 Intensity for Precursors

+

Plotting the MS1 intensity for all precursors which were associated with one of the defined channels.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Precursors Identified across Gradient

+

Precursor are plotted across the chromatographic gradient.

+

+
+
+

MS1 Intensity summed over all Channels.

+

Plotting the MS1 intensity for all precursors summed over all channels.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

MS1 Intensity for Intersected Precursors, summed over all Channels.

+

Plotting the MS1 intensity for all precursors summed over all channels. Only intersected precursors present in all loaded experiments are shown.

+

+
+
+

Normalized MS1 Intensity for Intersected Precursors

+

Plotting the MS1 Intensity for intersected precursors summed over all channels. Experiments are normalized to the first experiment.

+

+
+
+

Number of Precursors by Charge State

+

Number of precursors observed during MS1 scans by charge state

+

+
+
+

Ms1 Fill Time Distribution

+

Ms1 fill times along gradient

+
## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+
+
+

Ms1 Fill Times along Gradient

+

The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.

+
## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+
+
+

Ms1 total Ion Current along Gradient

+

The toal Ion Current (TIC) is shown for bins along the retention time gradient.

+
## [1] "Plot failed to render. Reason: Error: Upload tic.tsv\n"
+
+
+

Ms2 Fill Time Distribution

+

Ms2 fill times along gradient

+
## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+
+
+

Ms2 Fill Times along Gradient

+

The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.

+
## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+
+
+

Ms2 Fill Time Matrix

+

The average Ms2 fill times are shown across the gradient for every distinct Ms2 window.

+
## [1] "Plot failed to render. Reason: Error: Upload tic.tsv\n"
+
+
+

Channel wise MS1 Counts for Precursors

+

Plotting the MS1 counts based on the signal to noise ratio for all precursors which were associated with one of the defined channels.

+
## [1] "Plot failed to render. Reason: Error: Upload report.txt\n"
+
+
+
+

Identifications

+
+

Number of Confident Precursor Identifications

+

Plotting the number of precursors identified at each given confidence level.

+

+
+
+

Precursors by Quantification Strategy

+

Number of precursors found based on quantification startegy. Ms2 precursors are counted based on Precursor.Quantity > 0 and Ms1 precursors are counted based on Ms1.Area > 0.

+

+
+
+

Precursors by Modification

+

Number of precursors found based on modification types specified

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Miscleavage Rate (percentage), PEP < 0.01

+

Miscleavage rate (percentage) for precursors identified with confidence PEP < 0.01

+
+ +
+
+
+

Miscleavage Rate (K), PEP < 0.01

+

Plotting frequency of lysine miscleavages in confidently identified precursors.

+

+
+
+

Miscleavage Rate (R), PEP < 0.01

+

Plotting frequency of arginine miscleavages in confidently identified precursors.

+

+
+
+

Number of Protein Identifications

+

Number of proteotypic protein IDs found per run. Protein IDs are shown across all channels in an experiment.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+
+

plex-DIA Diagnostics

+
+

Identified Precursors per Channel

+

The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors found in the report.tsv file which is by default controlled by a run-specific FDR.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Identified Precursors per Channel, Channel Q-Value

+

The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors with a Channel.Q.Value <= 0.01.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Identified Proteins per Channel

+

The number of Proteins identified is shown for every channel together with the number of total and intersected proteins. The number of proteins is based on all proteotypic precursors independent of the Protein.Q.Value.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Relative Single-Cell Intensity

+

Single-Cell intensity relative to the carrier channel for intersected precursors

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Total MS1 Precursors

+

Total number of precursors identified based on different confidence metrics.

+
## [1] "Plot failed to render. Reason: Error: Upload ms1_extracted.tsv\n"
+
+
+

Missing Data, Precursor Level

+

Plotting the Jaccard Index for identified precursors for all channel combinations.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+

Missing Data, Protein Level

+

Plotting the Jaccard Index for identified precursors for all channel combinations.

+
## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+
+
+
+

Feature Detection

+
+

Features Identified by Charge

+

Identified features are reported based on the charge. Precursors quantified in seperate channels are treated as separate precursors..

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Isotopes Identified per Feature

+

The number of isotopes identified is shown for features detected in the Dinosaur search.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Number of Scans per feature

+

The number of MS1 scans is shown for all features identified in the Dinosaur search.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Retention Length of Features at Base

+

Plotting the retention length of identified features at the base.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Features Identified across Gradient

+

The frequency of precursor identifications based on the Dinosaur search is plotted across the chromatographic gradient.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Features Identified across m/z

+

The frequency of precursor identifications based on the Dinosaur search is plotted across the mass to charge ratio.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+

Feature Intensity Distribution

+

The distribution of integrated intensities is shown for identified features.

+
## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+
+
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/pipeline/__init__ b/pipeline/__init__ new file mode 100644 index 0000000..e69de29 diff --git a/pipeline/env.yml b/pipeline/env.yml new file mode 100644 index 0000000..8bc63ee --- /dev/null +++ b/pipeline/env.yml @@ -0,0 +1,57 @@ +name: doms +channels: + - conda-forge + - defaults +dependencies: + - appdirs=1.4.4=pyh9f0ad1d_0 + - brotlipy=0.7.0=py311he2be06e_1005 + - bzip2=1.0.8=h3422bc3_4 + - ca-certificates=2022.12.7=h4653dfc_0 + - certifi=2022.12.7=pyhd8ed1ab_0 + - cffi=1.15.1=py311hae827db_3 + - charset-normalizer=2.1.1=pyhd8ed1ab_0 + - codecov=2.1.12=pyhd8ed1ab_0 + - colorama=0.4.6=pyhd8ed1ab_0 + - coverage=7.2.1=py311he2be06e_0 + - cryptography=39.0.1=py311h507f6e9_0 + - idna=3.4=pyhd8ed1ab_0 + - libblas=3.9.0=16_osxarm64_openblas + - libcblas=3.9.0=16_osxarm64_openblas + - libcxx=15.0.7=h75e25f2_0 + - libffi=3.4.2=h3422bc3_5 + - libgfortran=5.0.0=11_3_0_hd922786_28 + - libgfortran5=11.3.0=hdaf2cc0_28 + - liblapack=3.9.0=16_osxarm64_openblas + - libopenblas=0.3.21=openmp_hc731615_3 + - libsqlite=3.40.0=h76d750c_0 + - libzlib=1.2.13=h03a7124_4 + - llvm-openmp=15.0.7=h7cfbb63_0 + - ncurses=6.3=h07bb92c_1 + - numpy=1.24.2=py311h60f8152_0 + - openssl=3.0.8=h03a7124_0 + - packaging=23.0=pyhd8ed1ab_0 + - pandas=1.5.3=py311h4eec4a9_0 + - pip=23.0.1=pyhd8ed1ab_0 + - pooch=1.6.0=pyhd8ed1ab_0 + - pycparser=2.21=pyhd8ed1ab_0 + - pyopenssl=23.0.0=pyhd8ed1ab_0 + - pysocks=1.7.1=pyha2e5f31_6 + - python=3.11.0=h3ba56d0_1_cpython + - python-dateutil=2.8.2=pyhd8ed1ab_0 + - python_abi=3.11=3_cp311 + - pytz=2022.7.1=pyhd8ed1ab_0 + - readline=8.1.2=h46ed386_0 + - requests=2.28.2=pyhd8ed1ab_0 + - scipy=1.10.1=py311h0bcca16_0 + - setuptools=67.4.0=pyhd8ed1ab_0 + - six=1.16.0=pyh6c4a22f_0 + - tk=8.6.12=he1e0b03_0 + - tomli=2.0.1=pyhd8ed1ab_0 + - tqdm=4.64.1=pyhd8ed1ab_0 + - tzdata=2022g=h191b570_0 + - urllib3=1.26.14=pyhd8ed1ab_0 + - wheel=0.38.4=pyhd8ed1ab_0 + - xz=5.2.6=h57fd34a_0 + - pip: + - pymzml==2.5.2 + - regex==2022.10.31 \ No newline at end of file diff --git a/pipeline/env_loose.yml b/pipeline/env_loose.yml new file mode 100644 index 0000000..11ab3d2 --- /dev/null +++ b/pipeline/env_loose.yml @@ -0,0 +1,14 @@ +name: doms +channels: + - conda-forge + - defaults +dependencies: + - python>=3.8 + - pip + - tqdm + - codecov + - pandas + - numpy + - scipy + - pip: + - pymzml \ No newline at end of file diff --git a/pipeline/processing.py b/pipeline/processing.py new file mode 100644 index 0000000..fef1f2a --- /dev/null +++ b/pipeline/processing.py @@ -0,0 +1,705 @@ +#!/usr/bin/env python3 + +import argparse +import os +import pathlib +import shutil +import subprocess +import pandas as pd +from datetime import datetime +import pymzml +import numpy as np +from typing import List, Any + +from pathlib import Path +from tqdm import tqdm + +import multiprocessing + +def generate_parser(): + + # instantiate the parser + parser = argparse.ArgumentParser(description=('Command line tool for feature detection in shotgun MS experiments. Can be used together ' + 'with DIA-NN to provide additional information on the peptide like features identified in the MS1 spectra.')) + + # Required command line arguments + parser.add_argument('report', type=pathlib.Path, default=os.getcwd(), help="Location of the report.tsv output from DIA-NN which should be used for analysis.") + + parser.add_argument("--raw-parser-location", type=pathlib.Path, required=True, help="Path pointing to the ThermoRawFileParser executeable.") + + # optional command line arguments + parser.add_argument("--dinosaur-location", help="Path pointing to the dinosaur jar executeable.") + + parser.add_argument("-m","--mono", default=False, action='store_true', help="Use mono for ThermoRawFileParser under Linux and OSX.") + + parser.add_argument("-d","--delete", default=False, action='store_true', help="Delete generated mzML and copied raw files after successfull feature generation.") + + parser.add_argument("-v","--verbose", default=False, action='store_true', help="Show verbose output.") + + parser.add_argument("-t","--temporary-folder", type=pathlib.Path, help="Input Raw files will be temporarilly copied to this folder. Required for use with Google drive.") + + parser.add_argument("-r","--raw-file-location", type=pathlib.Path, help="By default, raw files are loaded based on the File.Name column in the report.tsv. With this option, a different folder can be specified.") + + parser.add_argument("--no-feature-detection", default=False, action='store_true', help="All steps are performed as usual but Dinosaur feature detection is skipped. No features.tsv file will be generated.") + + parser.add_argument("--no-fill-times", default=False, action='store_true', help="All steps are performed as usual but fill times are not extracted. No fill_times.tsv file will be generated.") + + parser.add_argument("--no-tic", default=False, action='store_true', help="All steps are performed as usual but binned TIC is not extracted. No tic.tsv file will be generated.") + + parser.add_argument("--no-sn", default=False, action='store_true', help=('Signal to Noise ratio is not estimated for precursors')) + + parser.add_argument("--no-mzml-generation", default=False, action='store_true', help=('Raw files are not converted to .mzML. ' + 'Nevertheless, mzML files are expected in their theoretical output location and loaded. Should be only be carefully used for repeated calulcations or debugging')) + + parser.add_argument("--mz-bin-size", default=10.0, type=float, help="Bin size over the mz dimension for TIC binning.") + + parser.add_argument("--rt-bin-size", default=1, type=float, help="Bin size over the RT dimension for TIC binning in minutes. If a bin size of 0 is provided, binning will not be applied and TIC is given per scan.") + + parser.add_argument("--resolution", default=70000, help="Set the resolution used for estimating counts from S/N data") + + parser.add_argument("-p", "--processes", default=10, help="Number of Processes") + + parser.add_argument("--isotopes-sn", default=False, action='store_true', help="Use all isototopes from the same scan as the highest intensity datapoint for estimating the SN and copy number.") + + return parser + +class MatchReport: + # labels as found in the Precursor.Id + def __init__(self, + mzml_list: List, + report: str, + scan_window = 5, + mass_error = 15, + isotope_start = 0, + isotope_stop = 3, + add_isotopes = False, + keep_all = False, + resolution = 70000): + """ + The class can be used to match a DIA-NN report to an centroided or profile mode mzml file. + After matching peaks to a certain region, features and data can be extracted. + + Processing order and steps for customization: + + 1. The report dataframe is loaded and filtered based on the filter_report(report) function. + + 2. The report dataframe is split into sub dataframes which are generated based on the Run column. + + 3. Precursors are then matched to the mzml file based on the retation time. + For every precursors a list of potential hits is created based on the defined scan window, mass error and isotopes. + For every hit the function build_scan_features(self, scan, scan_id, mz_id, precursor_id, j, mz) is called. + + 4. The top hit is selected based on the intensity field in the feature dict. + If add_isotopes is set, all hits for different isotopes from the same scan as the top hit are passed to join_top_scans(self, feature_list) to combine the features. + + Args: + mzml_list (list(str)): List of mzml input files. The file name has to match the Run column in the DIA-NN report. + + report (str): Location of the DIA-NN report. + + mass_error (float): Mass error in parts per million + + scan_window (int): Number of windows adjecent to the closest retention time. + + keep_all (bool): Keep all hits in the selected window. + """ + + self.mzml_list = mzml_list + self.report = report + + # processing parameters + self.scan_window = scan_window + self.mass_error = mass_error + self.isotope_start = isotope_start + self.isotope_stop = isotope_stop + self.add_isotopes = add_isotopes + self.resolution = resolution + self.keep_all = keep_all + + def __call__(self, *args: Any, **kwds: Any) -> Any: + + # get report.tsv + report = pd.read_csv(self.report, sep='\t') + report = self.filter_report(report) + + dfs = [] + for input_mzml in self.mzml_list: + name = Path(input_mzml).stem + + # create subframe for the specific mzml + subframe = report[report['Run'] == name] + + # load mzml file + dfs.append(self.match_mzml(input_mzml, subframe)) + return pd.concat(dfs) + + def filter_report(self, report_df): + """ + Apply filtering on the whole report, for example Ms1.Area + """ + + return report_df[report_df['Ms1.Area'] > 0] + + def build_scan_features(self, scan, scan_id, scan_window_center, mz_id, j, report_row): + report_row = report_row.copy() + + noise = scan['noise'][mz_id] + intensity = scan['spectrum_intensity'][mz_id] + sn = intensity/noise + copy_number = sn * 3.5 * np.sqrt(240000/self.resolution) / report_row['Precursor.Charge'] + + report_row.update({ + 'Scan.Id':scan_id, + 'Isotope': j, + 'Intensity': intensity, + 'Noise': noise, + 'Baseline': scan['baseline'][mz_id], + 'mz': scan['spectrum_mz'][mz_id], + 'Mass.Error': scan['delta_mz'][mz_id], + 'sn': sn, + 'Copy.Number': copy_number + }) + + return report_row + + def join_top_scans(self, feature_list): + + return {'Datapoints': feature_list[0]['Datapoints'], + 'Precursor.Id': feature_list[0]['Precursor.Id'], + 'Scan.Id':feature_list[0]['Scan.Id'], + 'Num.Isotopes': len(feature_list), + 'Intensity': np.mean([feature['Intensity'] for feature in feature_list]), + 'Noise': np.mean([feature['Noise'] for feature in feature_list]), + 'Baseline': np.mean([feature['Baseline'] for feature in feature_list]), + 'Mass.Error': np.mean([feature['Mass.Error'] for feature in feature_list]), + 'sn': np.mean([feature['sn'] for feature in feature_list]), + 'Copy.Number': np.sum([feature['Copy.Number'] for feature in feature_list]) + } + + def match_mzml(self, mzml, subframe): + name = Path(mzml).stem + run = pymzml.run.Reader(mzml) + + # generate RT index for MS1's + indexed_ms1 = [] + + print('Indexing Ms1 spectra') + + indexed_ms1 = [] + + # initiate statistics + statistics = {'precursors' : 0, 'identified': 0, 'failed': 0} + + # iterate all ms1 scans and extract retention times + for i, spectrum in tqdm(enumerate(run)): + ms_level = spectrum.ms_level + + if ms_level == 1: + + # Collect data from mzml + row_dict = {'index': i, + 'retention_time': spectrum['MS:1000016'], + 'fill_time': spectrum['MS:1000927'], + 'tic': spectrum['MS:1000285']} + + # collect different types of spectra + try: + baseline_parameters = spectrum._get_encoding_parameters('sampled noise baseline array') + baseline = spectrum._decode(*baseline_parameters) + row_dict['baseline'] = np.array(baseline) + + noise_parameters = spectrum._get_encoding_parameters('sampled noise intensity array') + noise = spectrum._decode(*noise_parameters) + row_dict['noise'] = np.array(noise) + + mz_parameters = spectrum._get_encoding_parameters('sampled noise m/z array') + mz = spectrum._decode(*mz_parameters) + row_dict['mz'] = np.array(mz) + except Exception as e: + print (e) + + raise ValueError(f'Noise and baseline data could not be extracted from {name}. Please make sure the mzML files contains this data. Aborting') + + intensity_parameters = spectrum._get_encoding_parameters('intensity array') + intensity = spectrum._decode(*intensity_parameters) + row_dict['spectrum_intensity'] = np.array(intensity) + + rmz_parameters = spectrum._get_encoding_parameters('m/z array') + rmz = spectrum._decode(*rmz_parameters) + row_dict['spectrum_mz'] = np.array(rmz) + + indexed_ms1.append(row_dict) + + # create rt indexing for matching mzml + rt_index = np.array([scan['retention_time'] for scan in indexed_ms1]) + + out_df = [] + + for i, row_dict in tqdm(zip(range(len(subframe)), subframe.to_dict(orient="records"))): + delta_rt = np.abs(rt_index - row_dict['RT']) + ms_idx = np.argmin(delta_rt) + + # define lower and upper indices + ms_idx_lower = ms_idx - self.scan_window if (ms_idx - self.scan_window) >= 0 else 0 + ms_idx_upper = ms_idx + self.scan_window + 1 if (ms_idx + self.scan_window +1) < len(rt_index) else len(rt_index)-1 + + sn_arr = [] + + # Match scan and mz + for scan_idx in range(ms_idx_lower, ms_idx_upper): + scan = indexed_ms1[scan_idx] + + for j in range(self.isotope_start, self.isotope_stop): + isotope_mz = row_dict['Precursor.Mz'] + j/row_dict['Precursor.Charge'] + + scan['delta_mz'] = np.abs(scan['spectrum_mz'] - isotope_mz)/isotope_mz*1e6 + mz_idx = np.flatnonzero(scan['delta_mz'] < self.mass_error) + + for id in mz_idx: + sn_arr.append(self.build_scan_features(scan, scan_idx, ms_idx, id, j, row_dict)) + + + + # Pick top hit + statistics['precursors'] += 1 + if len(sn_arr) > 0: + + #print([el['intensity'] for el in sn_arr]) + intensity_arr = np.array([el['Intensity'] for el in sn_arr]) + hit_idx = np.argmax(intensity_arr) + + top_scan = sn_arr[hit_idx] + + for el in sn_arr: + el['Run'] = name + el['Datapoints'] = len(sn_arr) + + if self.keep_all: + out_df += sn_arr + else: + if self.add_isotopes: + top_scan_id = top_scan['scan_id'] + top_scans = [features for features in sn_arr if features['Scan.Id'] == top_scan_id] + top_scan = self.join_top_scans(top_scans) + + + out_df.append(top_scan) + + statistics['identified'] += 1 + else: + statistics['failed'] += 1 + + # Calculate statistics on success of matching + successrate = statistics['identified'] / statistics['precursors']*100 + print(f"{name} identified: {successrate:.2f}%") + + return pd.DataFrame(out_df) +def _validate_path(file_path, description): + if not os.path.isfile(file_path): + raise ValueError(f'{description} \n {file_path} does not exist.') + +def validate_path(list_or_str, description): + if isinstance(list_or_str,str): + _validate_path(list_or_str, description) + + if isinstance(list_or_str,List): + for elem in list_or_str: + _validate_path(elem, description) + + else: + raise TypeError('Provide a string or list of strings') + +def _validate_filetype(file_path, description): + filename, file_extension = os.path.splitext(file_path) + if file_extension.lower() not in ['.raw','.mzml']: + raise ValueError(f'{description} \n {file_extension} not supported.') + +def validate_filetype(list_or_str, description): + if isinstance(list_or_str,str): + _validate_filetype(list_or_str, description) + + if isinstance(list_or_str,List): + for elem in list_or_str: + _validate_filetype(elem, description) + + else: + raise TypeError('Provide a string or list of strings') + +class FeatureDetection(): + + def __init__(self): + pass + + def get_timestamp(self): + # datetime object containing current date and time + now = datetime.now() + + dt_string = now.strftime("%d/%m/%Y %H:%M:%S") + return "[" + dt_string + "] " + + def log(self, msg): + print(self.get_timestamp() + msg) + + def __call__(self): + + parser = generate_parser() + self.args = parser.parse_args() + + self.output_folder = pathlib.Path(self.args.report).parent.resolve() + self.report_tsv = pd.read_csv(self.args.report, sep='\t') + self.experiment_files = list(set(self.report_tsv['File.Name'])) + + # Contains a list of the raw files in the report + # will be converted to unix style pathlib objects + self.experiment_files = [file.replace('\\','/') for file in self.experiment_files] + self.experiment_files = [pathlib.Path(file) for file in self.experiment_files] + + # Source raw files from alternative folder + if self.args.raw_file_location is not None: + self.log('Raw files will be sourced from the specified folder') + + if not os.path.isdir(self.args.raw_file_location): + raise ValueError(f'{self.args.raw_file_location} does not exist.') + + for i, file_path in enumerate(self.experiment_files): + file = os.path.basename(file_path) + self.experiment_files[i] = os.path.join(self.args.raw_file_location, file) + + # Raw file strings are then valaidated + self.log('Checking raw file locations') + + if not self.args.no_mzml_generation: + validate_path(self.experiment_files, 'DO-MS DIA feature extraction relies on the raw file locations found in the File.Name column in the report.tsv.') + validate_filetype(self.experiment_files, 'DO-MS DIA feature extraction relies on the raw file locations found in the File.Name column in the report.tsv.') + + self.experiment_names = [Path(path).stem for path in self.experiment_files] + + self.log('The following experiments were found:') + for exp in self.experiment_names: + self.log(exp) + + self.mzml_files = [os.path.join(self.output_folder,'.'.join([name,'mzML'])) for name in self.experiment_names] + self.mzml_profile_files = [os.path.join(self.output_folder,'.'.join([name,'profile','mzML'])) for name in self.experiment_names] + + if self.args.temporary_folder is not None: + #temporary folder is defined, copy files to temporary folder and change input path + + for i, experiment in enumerate(self.experiment_files): + + input_raw = experiment + output_raw = os.path.join(self.args.temporary_folder, experiment.name) + shutil.copy(input_raw, output_raw) + self.log(f"{experiment.name} copied to {self.args.temporary_folder}") + + self.experiment_files[i] = output_raw + + # Check if --no-mzml-generation has been set + if not self.args.no_mzml_generation: + try: + self.mzml_generation() + except Exception as e: + self.log('mzml generation failed:') + print(e) + + # Check if --no-fill-times has been set + if not self.args.no_fill_times: + try: + self.fill_times() + except Exception as e: + self.log('Fill time extraction failed:') + print(e) + + # Check if --no-tic has been set + if not self.args.no_tic: + try: + self.tic() + except Exception as e: + self.log('Fill time extraction failed:') + print(e) + + # Check if --no-sn has been set + if not self.args.no_sn: + try: + self.sn() + except Exception as e: + self.log('SN extraction failed:') + print(e) + + # Check if --no-feature-detection has been set + if not self.args.no_feature_detection: + try: + self.feature_detection() + except Exception as e: + self.log('Feature detection failed:') + print(e) + + # delete temporary files if specified + if self.args.delete: + for input_mzml in self.mzml_files: + self.log('delete' + str(input_mzml)) + if os.path.isfile(input_mzml): + os.remove(input_mzml) + + for input_mzml in self.mzml_profile_files: + self.log('delete' + str(input_mzml)) + if os.path.isfile(input_mzml): + os.remove(input_mzml) + + if self.args.temporary_folder is not None: + for experiment in self.experiment_files: + self.log(str(experiment)) + if os.path.isfile(experiment): + os.remove(experiment) + def sn(self): + matcher = MatchReport(self.mzml_files, self.args.report, add_isotopes=self.args.isotopes_sn) + df = matcher() + df.to_csv(os.path.join(self.output_folder,'sn.tsv') ,sep='\t', index=False) + + + def mzml_generation(self): + """ + The function converts the provided Thermo raw files to the open mzML format. + Output files with centroided and profile mode spectra are created ad specified in self.mzml_files and self.mzml_profile_files + """ + + labels = ["-N"]*len(self.experiment_files)+["-p"]*len(self.experiment_files) + mzmls = self.mzml_files + self.mzml_profile_files + raw = self.experiment_files + self.experiment_files + + queue = list(zip(raw, mzmls, labels)) + + with multiprocessing.Pool(processes = self.args.processes) as pool: + pool.map(self.mzml_job, queue) + + def mzml_job(self, args): + + input_raw, mzml, label = args + + if self.args.mono: + process = subprocess.Popen(['mono',str(self.args.raw_parser_location),label,'-i',str(input_raw),'-b',str(mzml)],shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + else: + process = subprocess.Popen(['ThermoRawFileParser',label,'-i',str(input_raw),'-b',str(mzml)], executable=str(self.args.raw_parser_location), stdout=subprocess.PIPE, stderr=subprocess.PIPE) + + process.wait() + out, err = process.communicate() + + self.log(out.decode()) + self.log(err.decode()) + + def feature_detection(self): + + # Dinosaur output columns are defined to make reordering of the Run column easier + # Needs to be updated if any new columns are added + columns = ['mz','mostAbundantMz','charge','rtStart','rtApex','rtEnd','fwhm','nIsotopes','nScans','averagineCorr','mass','massCalib','intensityApex','intensitySum'] + + # check if --dinosaur-location has been provided and is valid file path. + if (self.args.dinosaur_location == None) or (not os.path.isfile(self.args.dinosaur_location)): + self.log(f"Dinosaur location not valid: {self.args.dinosaur_location}") + return + process_list = [] + + #perform feature detection based on mzml file + for input_mzml in self.mzml_profile_files: + process_list.append(subprocess.Popen(['java', + '-jar', + str(self.args.dinosaur_location), + '--concurrency=10', + + str(input_mzml)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)) + + for process in process_list: + process.wait() + out, err = process.communicate() + + self.log(out.decode()) + self.log(err.decode()) + + + # convert outputs to dataframes + dfs = [] + for i, experiment_name in enumerate(self.experiment_names): + feature_file = '.'.join([experiment_name,'profile','features','tsv']) + input_feature = os.path.join(self.output_folder, feature_file) + + df = pd.read_csv(input_feature, sep='\t', header=0) + + # create new Run column for DO-MS compatability + df['Run'] = experiment_name + dfs.append(df) + + # concatenate outputs to a single dataset + df = pd.concat(dfs, ignore_index=True) + # reorder columns + df = df.reindex(columns=['Run']+columns) + + output_features = os.path.join(self.output_folder, "features.tsv") + df.to_csv(output_features,index=False, header=True, sep='\t') + + + # delete dinosaur qc folder + qc_path = os.path.join(os.getcwd(),'QC') + if os.path.isdir(qc_path): + shutil.rmtree(qc_path) + + def fill_times(self): + # List which will be used to collect datapoints + df_to_be = [] + + + for input_mzml, name in zip(self.mzml_files, self.experiment_names): + self.log(f'Collecting fill times for {name}') + + run = pymzml.run.Reader(input_mzml) + for spectrum in run: + + ms_level = spectrum.ms_level + + cv_params = spectrum.get_element_by_path(['scanList', 'scan', 'cvParam']) + for el in cv_params: + key_val = el.attrib + + # parse + if key_val['name'] == 'scan start time': + scan_start_time = key_val['value'] + + if key_val['name'] == 'ion injection time': + injection_time = key_val['value'] + + window_center = 0 + window_lower_delta = 0 + window_upper_delta = 0 + + if ms_level == 2: + + try: + window_center = spectrum['MS:1000827'] + window_lower_delta = spectrum['MS:1000828'] + window_upper_delta = spectrum['MS:1000828'] + except: + pass + + window_lower = window_center - window_lower_delta + window_upper = window_center + window_upper_delta + + df_to_be.append([name, ms_level, window_lower, window_upper, scan_start_time, injection_time]) + + fill_times_df = pd.DataFrame(df_to_be, columns =['Run', 'Ms.Level', 'Window.Lower','Window.Upper','RT.Start', 'Fill.Time']) + output_fill_times = os.path.join(self.output_folder, "fill_times.tsv") + fill_times_df.to_csv(output_fill_times,index=False, header=True, sep='\t') + + def tic(self): + collect_dfs = [] + + for input_mzml, name in zip(self.mzml_files, self.experiment_names): + self.log(f'Collecting TIC for {name}') + # get tic for single run + result = self.tic_single(input_mzml, name) + + # only sparse mode returns df + if isinstance(result, pd.DataFrame): + collect_dfs.append(result) + + # concat TIC dfs from multiple runs and export them as tsv + if len(collect_dfs) > 0: + out_df = pd.concat(collect_dfs) + output_tic = os.path.join(self.output_folder, "tic.tsv") + out_df.to_csv(output_tic,index=False, header=True, sep='\t') + + def tic_single(self, input_mzml, name): + run = pymzml.run.Reader(input_mzml) + + ms1_id = -1 + + # tic values are collected spectrum wise in a sparse format + # contains lists of [[ms1_id, current_bin, current_tic], ... ] + spectra_batch = [] + + # contains retention times + rt_label = [] + + for spectrum in run: + ms_level = spectrum.ms_level + if ms_level == 1: + ms1_id += 1 + + scan_start_time = spectrum['MS:1000016'] + total_ion_current = spectrum['MS:1000285'] + + rt_label.append(scan_start_time) + + data = np.array(spectrum.peaks("raw")) + intensity = data[:,1] + mz = data[:,0] + + bins = self.calc_sparse_binned_tic(mz, intensity, total_ion_current, ms1_id) + spectra_batch.append(bins) + + # list is converted to array and sublists are concatenated + sparse_arr = np.concatenate(spectra_batch) + + # sparse tic matrix is converted to dense tic matrix + rt_col = sparse_arr[:,0].astype(int) + mz_col = sparse_arr[:,1] + tic = sparse_arr[:,2] + + rt_col = [rt_label[i] for i in rt_col] + # check if sparse output has been selected + # Easier to handle in ggplot2 + + raw_file = [name] * len(tic) + data = {'Raw.File': raw_file, + 'RT': rt_col, + 'MZ': mz_col, + 'TIC': tic} + + df = pd.DataFrame.from_dict(data) + + if self.args.rt_bin_size > 0: + df['RT'] = np.round(df['RT']/self.args.rt_bin_size)*self.args.rt_bin_size + df = df.groupby(['Raw.File','RT','MZ'])['TIC'].sum().reset_index() + + return df + + + def calc_sparse_binned_tic(self, mz, intensity, total_ion_current, ms1_id): + # Resolution, integer, number of mz / bin + # the integral of the provided spectra is approximated by using the rectangle rule + # This allows efficient binning by rounding without iterative calculation of integrals + mz_diff = np.diff(mz) + mz_diff = np.pad(mz_diff, (0, 1), 'constant') + + # The true integral is calculated as the dot product + integral = np.dot(mz_diff,intensity) + + # rescaled to resample TIC provided in meta information + scaling_factor = total_ion_current / integral + scaled_intensity = intensity * scaling_factor + binned_intensity = scaled_intensity * mz_diff + + binned_mz = np.round(mz/self.args.mz_bin_size) + binned_mz = binned_mz * self.args.mz_bin_size + + sparse_binned_tic = [] + + current_bin = binned_mz[0] + current_tic = 0 + + for mz_bin, int_bin in zip(binned_mz, binned_intensity): + if mz_bin == current_bin: + current_tic += int_bin + + else: + # close last bin, check if bin has tic + if current_tic > 0: + sparse_binned_tic.append((ms1_id, current_bin, current_tic)) + + # open new bin + current_bin = mz_bin + current_tic = int_bin + + return np.array(sparse_binned_tic) + + +if __name__ == "__main__": + feature_detection = FeatureDetection() + feature_detection() diff --git a/pipeline/requirements.txt b/pipeline/requirements.txt new file mode 100644 index 0000000..23302f4 --- /dev/null +++ b/pipeline/requirements.txt @@ -0,0 +1,4 @@ +pandas +scipy +pymzml +tqdm \ No newline at end of file diff --git a/pipeline/test_processing.py b/pipeline/test_processing.py new file mode 100644 index 0000000..e2caeb0 --- /dev/null +++ b/pipeline/test_processing.py @@ -0,0 +1,2 @@ +def test_processing(): + print('hey') \ No newline at end of file diff --git a/server.R b/server.R index eb7bb2b..37bff25 100644 --- a/server.R +++ b/server.R @@ -11,6 +11,7 @@ source(file.path('server', 'generate_report.R')) shinyServer(function(input, output, session) { + if(exists('latest_version')) { # in addition to printing the version message, show it as a notification here if(version == latest_version) { @@ -25,16 +26,18 @@ shinyServer(function(input, output, session) { + folders <- reactiveVal(data.frame( Folder.Name=as.character(c()), Has.Files=as.logical(c()), Path=as.character(c()) )) + if(file.exists('folder_list.txt')) { - .folders <- as.data.frame(read_tsv('folder_list.txt')) - + .folders <- as.data.frame(read_tsv('folder_list.txt', col_types = cols())) + # patch older versions of the folder_list where Has.Files doesn't exist if(ncol(.folders) < 3) { print('Detected legacy version of folder_list.txt. Patching now...') @@ -42,7 +45,7 @@ shinyServer(function(input, output, session) { # reorder columns .folders <- .folders[,c('Folder.Name', 'Has.Files', 'Path')] } - + folders <- reactiveVal(.folders) } @@ -72,7 +75,7 @@ shinyServer(function(input, output, session) { # listen to add folder modal completion observeEvent(input$add_folders_confirm, { - + # get a copy of the current list of folders .folders <- isolate(folders()) # list of selected files @@ -168,7 +171,7 @@ shinyServer(function(input, output, session) { # react when folders_d (debounced version) is updated observe({ # write folder list to file (overwrite previous) - write_tsv(folders_d(), path='folder_list.txt') + write_tsv(folders_d(), file='folder_list.txt') }) output$data_status <- renderUI({ @@ -293,7 +296,26 @@ shinyServer(function(input, output, session) { # since a lot of MS data is very sparse and only using the first 1000 # rows to guess may guess a column type wrong .dat <- as.data.frame(read_tsv(file=file.path(folder$Path, file[['file']]), - guess_max=1e5)) + guess_max=1e5, col_types = cols())) + + # Custom behavior for ms1_extracted + if(file$name == 'ms1_extracted') { + # transform matrix style output to report.tsv style + .dat <- ms1_extracted_to_report(.dat) + } + + # Custom behavior for report + if((file$name == 'report') && (! is.null(.dat))) { + + # DIA-NN versions > 1.8.1 beta 12 use a different channel identifier + # for the modified sequence and precursor Id. + # This will transform the new sequence format to the old one. + .dat <- translate_diann_channel_format(.dat) + + # Add column for modified precursor without channel + .dat <- separate_channel_info(.dat) + + } # rename columns (replace whitespace or special characters with '.') .dat <- .dat %>% dplyr::rename_all(make.names) @@ -301,6 +323,8 @@ shinyServer(function(input, output, session) { # apply column aliases .dat <- apply_aliases(.dat) + + if('Raw.file' %in% colnames(.dat)) { # Remove any rows where "Total" is a raw file (e.g., summary.txt) .dat <- .dat %>% dplyr::filter(!Raw.file == 'Total') @@ -309,6 +333,8 @@ shinyServer(function(input, output, session) { .dat$Raw.file <- factor(.dat$Raw.file) } + + # Custom behavior for parameters.txt if(file$name == 'parameters') { # store folder name/path as a value in parameters.txt @@ -380,6 +406,7 @@ shinyServer(function(input, output, session) { # set the data data(.data) + showNotification(paste0('Loading Complete!'), type='message') }) @@ -441,7 +468,7 @@ shinyServer(function(input, output, session) { if(!is.null(.data[[file$name]])) { next } # read in as data frame (need to convert from tibble) - .data[[file$name]] <- as.data.frame(read_tsv(file=.file$datapath)) + .data[[file$name]] <- as.data.frame(read_tsv(file=.file$datapath, col_types = cols())) # rename columns (replace whitespace or special characters with '.') colnames(.data[[file$name]]) <- gsub('\\s|\\(|\\)|\\/|\\[|\\]', '.', colnames(.data[[file$name]])) @@ -469,6 +496,7 @@ shinyServer(function(input, output, session) { for(file in config[['input_files']]) { # don't do this with MaxQuant's summary.txt file since it has weird behavior if(file$name == 'summary') { next; } + if(file$name == 'features') { next; } # for each file, check if it has a raw file column if('Raw.file' %in% colnames(f_data[[file$name]])) { @@ -509,10 +537,10 @@ shinyServer(function(input, output, session) { } # apply re-ordering - .file_order <- file_order() - if(length(.file_order) > 1) { - .file_levels <- .file_levels[.file_order] - .raw_files <- .raw_files[.file_order] + nfile_order <- file_order() + if(length(nfile_order) > 1) { + .file_levels <- .file_levels[nfile_order] + .raw_files <- .raw_files[nfile_order] } data.frame( @@ -642,6 +670,7 @@ shinyServer(function(input, output, session) { # replace %e with the raw file name .file_levels <- str_replace(.file_levels, '\\%e', .raw_files) + print(.pattern) # apply custom string extraction expression to file levels if(!is.null(.pattern) & length(.pattern) > 0 & nchar(.pattern) > 0) { # account for users inputting bad regexes @@ -706,44 +735,48 @@ shinyServer(function(input, output, session) { # debounce (throttle) by 1000ms delay, because this expression is so costly filtered_data <- debounce(reactive({ f_data <- data() - + # skip if no data has been loaded yet if(is.null(f_data)) return() + + for(file in config[['input_files']]) { - # for each file, check if it has a raw file column - if('Raw.file' %in% colnames(f_data[[file$name]])) { + if('Raw.file' %in% colnames(f_data[[file$name]])) { # make a copy of the raw file column f_data[[file$name]]$Raw.file.orig <- f_data[[file$name]]$Raw.file - # rename the levels of this file - .levels <- levels(f_data[[file$name]]$Raw.file) - - .labels <- file_levels() + + olevels <- levels(f_data[[file$name]]$Raw.file) + nlabels <- file_levels() + nlevels <- as.vector(unlist(isolate(raw_files()), use.names=FALSE)) + # if labels are still not loaded or defined yet, then default them to the levels - if(is.null(.labels) | length(.labels) < 1 | length(.labels) != length(.levels)) { - .labels <- .levels + if(is.null(nlabels) | length(nlabels) < 1 | length(nlabels) != length(nlevels)) { + + nlabels <- nlevels } # if this file has a subset of raw files # then take the same subset of the labels vector - if(length(.labels) > length(.levels)) { - .labels <- .labels[1:length(.levels)] + if(length(nlabels) > length(nlevels)) { + nlabels <- nlabels[1:length(nlevels)] } # apply re-ordering - .file_order <- file_order() - if(length(.file_order) > 1) { - .levels <- .levels[.file_order] - .labels <- .labels[.file_order] + nfile_order <- file_order() + if(length(nfile_order) > 1) { + nlevels <- nlevels[nfile_order] + nlabels <- nlabels[nfile_order] } + # recalculate file levels f_data[[file$name]]$Raw.file <- factor(f_data[[file$name]]$Raw.file, - levels=.levels, labels=.labels) + levels=nlevels, labels=nlabels) # Filter for experiments as specified by user if(!is.null(input$Exp_Sets)) { @@ -757,30 +790,120 @@ shinyServer(function(input, output, session) { ## Filter observations - # Filter out decoys and contaminants, if the leading razor protein column exists - if('Leading.razor.protein' %in% colnames(f_data[[file$name]])) { - if(!is.null(config[['remove_contam']])) { - f_data[[file$name]] <- f_data[[file$name]] %>% - dplyr::filter(!grepl(config[['remove_contam']], Leading.razor.protein)) + if (config[['do_ms_mode']] == 'max_quant'){ + # Filter out decoys and contaminants, if the leading razor protein column exists + if('Leading.razor.protein' %in% colnames(f_data[[file$name]])) { + if(!is.null(config[['remove_contam']])) { + f_data[[file$name]] <- f_data[[file$name]] %>% + dplyr::filter(!grepl(config[['remove_contam']], Leading.razor.protein)) + } + if(!is.null(config[['remove_decoy']])) { + f_data[[file$name]] <- f_data[[file$name]] %>% + dplyr::filter(!grepl(config[['remove_decoy']], Leading.razor.protein)) + } + } + + # Filter by PEP + if('PEP' %in% colnames(f_data[[file$name]])) { + f_data[[file$name]] <- f_data[[file$name]] %>% + dplyr::filter(PEP < input$pep_thresh | is.na(PEP)) + } + + # Filter by PIF + if('PIF' %in% colnames(f_data[[file$name]])) { + #filter on PIF only if the value is not NA + f_data[[file$name]] <- f_data[[file$name]] %>% + dplyr::filter(PIF > input$pif_thresh | is.na(PIF)) } - if(!is.null(config[['remove_decoy']])) { - f_data[[file$name]] <- f_data[[file$name]] %>% - dplyr::filter(!grepl(config[['remove_decoy']], Leading.razor.protein)) + + } + else if (config[['do_ms_mode']] == 'dia-nn'){ + # calculate modification columns + + # Filter by PEP + if('PEP' %in% colnames(f_data[[file$name]])) { + f_data[[file$name]] <- f_data[[file$name]] %>% + dplyr::filter(PEP < input$pep_thresh | is.na(PEP)) + } + + + # apply modification filter + if('Precursor.Id' %in% colnames(f_data[[file$name]])) { + + modvec <- c() + # create modification columns + for(i in 1:length(config[['modifications']])){ + unimod <- config[['modifications']][[i]]$unimod + modvec <- c(modvec, config[['modifications']][[i]]$unimod) + + f_data[[file$name]][unimod] <- sapply(f_data[[file$name]]['Precursor.Id'], str_count, paste0('\\Q',unimod,'\\E')) + } + + # summarize over all modification columns + f_data[[file$name]]['mod_sum'] <- rowSums(f_data[[file$name]][,modvec]) + + # filter for modifications + modifications <- config[['modification_list']] + + if (input$modification == "All"){ + + } else if (input$modification == "Unmodified") { + + + f_data[[file$name]] <- f_data[[file$name]][f_data[[file$name]]['mod_sum'] < 1,] + + } else { + + # Try to resolve unimod key from modifications dataframe + + + modifications_filtered <- modifications[modifications$name == input$modification, ] + + if (nrow(modifications_filtered) > 0){ + # contains unimod label for the current modification + unimod <- modifications_filtered$unimod[[1]] + + #filter for all rows, which contain modification + f_data[[file$name]] <- f_data[[file$name]][f_data[[file$name]][unimod] > 0,] + + if (nrow(f_data[[file$name]]) == 0){ + showNotification(paste("No Precursors found with modification:",input$modification), type='warning') + } + + } + } } + + # apply MBR filter + + #if(!input$mbr){ + # if (file$name == 'report'){ + + # if ('report-first-pass' %in% names(f_data)){ + + # f_data[['report']] <- f_data[['report-first-pass']] + + # } else { + # showNotification("Cant show results without MBR, report-first-pass.tsv was not found.", type='warning') + # } + # } + + #} + } - # Filter by PEP - if('PEP' %in% colnames(f_data[[file$name]])) { - f_data[[file$name]] <- f_data[[file$name]] %>% - dplyr::filter(PEP < input$pep_thresh | is.na(PEP)) - } + + + # Filter by PIF - if('PIF' %in% colnames(f_data[[file$name]])) { + #if('PIF' %in% colnames(f_data[[file$name]])) { # filter on PIF only if the value is not NA - f_data[[file$name]] <- f_data[[file$name]] %>% - dplyr::filter(PIF > input$pif_thresh | is.na(PIF)) - } + # f_data[[file$name]] <- f_data[[file$name]] %>% + # dplyr::filter(PIF > input$pif_thresh | is.na(PIF)) + #} + + # filter by modification ## More filters, like Intensity? } @@ -797,4 +920,6 @@ shinyServer(function(input, output, session) { # PDF Report Generation download_report(input, output, filtered_data, exp_sets) + + traceback() }) diff --git a/server/build_modules.R b/server/build_modules.R index 6b5d38b..6db1212 100644 --- a/server/build_modules.R +++ b/server/build_modules.R @@ -51,7 +51,7 @@ attach_module_outputs <- function(input, output, filtered_data, exp_sets) { module$validate_func(filtered_data, input) plotdata <- module$plotdata_func(filtered_data, input) # TODO: options to configure output format (CSV, delimeters, quotes, etc) - write_tsv(plotdata, path=file) + write_tsv(plotdata, file=file) # finish progress bar progress$inc(1, detail='') diff --git a/server/generate_report.R b/server/generate_report.R index 1b47072..0f6a9cc 100644 --- a/server/generate_report.R +++ b/server/generate_report.R @@ -95,6 +95,8 @@ generate_report <- function(input, filtered_data, exp_sets, file, progress_bar=F for(t in 1:length(tabs)) { local({ + + # need to create copies of these indices in the local environment # otherwise, the indices will be frozen for each iteration, only generating the last one .t <- t @@ -112,15 +114,17 @@ generate_report <- function(input, filtered_data, exp_sets, file, progress_bar=F for(m in 1:length(modules_in_tab)) { local({ .m <- m module <- modules_in_tab[[.m]] + # debugging: - # print(paste0('tab ', .t)) - # print(paste0('module ', .m)) + print(paste0('tab ', .t)) + print(paste0('module ', .m)) if(progress_bar) { progress$inc(0.45/num_modules, detail=paste0('Adding module ', .m, ' from tab ', .t)) } + print('1') # create chunk name from module box title chunk_name <- module$id chunk_name <- gsub('\\s', '_', chunk_name) @@ -129,7 +133,7 @@ generate_report <- function(input, filtered_data, exp_sets, file, progress_bar=F # if dynamic plot width is defined, then inject that into this # R-markdown chunk instead # because dynamic width is defined in pixels -- need to convert to inches - + print('2') plot_width = NA if(!is.null(module$dynamic_width)) { num_files <- length(exp_sets) @@ -148,28 +152,28 @@ generate_report <- function(input, filtered_data, exp_sets, file, progress_bar=F # round up to an integer plot_width <- ceiling(plot_width) } - + print('3') # override existing/default plot width if it is explicitly defined if(!is.null(module$report_plot_width)) { plot_width <- module$report_plot_width } - + print('4') # if plot width was defined, then insert it into the block def if(!is.na(plot_width)) { plot_width <- paste0(', fig.width=', plot_width) } else { plot_width <- '' } - + print('5') plot_height <- '' # override existing/default plot height if it is explicitly defined if(!is.null(module$report_plot_height)) { plot_height <- paste0(', fig.height=', module$report_plot_height) } - + print('6') # prevent further processing with 'results=asis' flag? results_flag <- '' if(module$type == 'text') { results_flag <- ", results='asis'" } - + print('7') report <<- paste(report, paste0('### ', module$box_title, ' {.plot-title}'), '', module$help, '', @@ -181,30 +185,33 @@ generate_report <- function(input, filtered_data, exp_sets, file, progress_bar=F '}'), 'options( warn = -1 )', sep='\n') - + print('8') # call helper function to decide what to do with this module + report format report <<- paste(report, render_module(.t, .m, module$type, input$report_format), sep='\n') - + print('9') # store the output from the module plot function plots[[.m]] <<- tryCatch(module$plot_func(filtered_data, input), error = function(e) { paste0('Plot failed to render. Reason: ', e) }, finally={} ) - + print('10') # if this plot is a text, sanitize the text if(module$type == 'text') { plots[[.m]] <<- sanitize_text_output(plots[[.m]]) } + print('11') + print(module$type %in% c('table', 'datatable')) + print(all(class(plots[[.m]]) != 'character')) # if this plot is a table, sanitize the text in every cell in the table # note: sanitize_text_output ignores non-character values so don't worry about # inadvertently typecasting doubles or logicals to characters - if(module$type %in% c('table', 'datatable') & class(plots[[.m]]) != 'character') { + if(module$type %in% c('table', 'datatable') & all(class(plots[[.m]]) != 'character')) { plots[[.m]] <<- plots[[.m]] %>% dplyr::mutate_all(sanitize_text_output) %>% dplyr::rename_all(sanitize_text_output) } - + print('12') # grab module metadata, but exclude function definitions to save space meta[[.m]] <<- module[!grepl('Func', names(module))] diff --git a/settings.yaml b/settings.yaml index 8620f9c..504f4b3 100644 --- a/settings.yaml +++ b/settings.yaml @@ -1,57 +1,147 @@ --- - # Global Settings for the DO-MS app and report -# Input files to load from each search output folder -input_files: - evidence: - name: 'evidence' - file: 'evidence.txt' - help: 'MaxQuant evidence.txt file' - default_enabled: true - msms: - name: 'msms' - file: 'msms.txt' - help: 'MaxQuant msms.txt file' - default_enabled: true - msmsScans: - name: 'msmsScans' - file: 'msmsScans.txt' - help: 'MaxQuant msmsScans.txt file' - default_enabled: true - allPeptides: - name: 'allPeptides' - file: 'allPeptides.txt' - help: 'MaxQuant allPeptides.txt file' - default_enabled: true - summary: - name: 'summary' - file: 'summary.txt' - help: 'MaxQuant summary.txt file' - default_enabled: true - parameters: - name: 'parameters' - file: 'parameters.txt' - help: 'MaxQuant parameters.txt file' - default_enabled: true - msScans: - name: 'msScans' - file: 'msScans.txt' - help: 'MaxQuant msScans.txt file' - default_enabled: true - -# Misc. input files to provide upload forms for -misc_input_files: - inclusion_list: - name: 'inclusion_list' - help: 'Inclusion List .txt file' - Labeling_Efficiency_list: - name: 'Labeling_Efficiency' - help: 'Labeling Efficiency .txt file' - dartid_list: - name: 'DART-ID' - help: 'DART-ID evidence_updated.txt file' +#Switch between dia-nn and max_quant mode +# do_ms_mode: "dia-nn" +# do_ms_mode: "max_quant" + +do_ms_mode: "dia-nn" + +dia-nn: + mode_name: 'DIA-NN' + mode_skin: 'purple' + + ChemicalLabels: + - "(mTRAQ0)" + - "(mTRAQ4)" + - "(mTRAQ8)" + + channels: + mtraq0: + modification: "(mTRAQ0)" + name: "d0" + + mtraq4: + modification: "(mTRAQ4)" + name: "d4" + + mtraq8: + modification: "(mTRAQ8)" + name: "d8" + + + modifications: + phospho: + unimod: "(UniMod:21)" + name: "Phosphorylation" + + methyl: + unimod: "(UniMod:34)" + name: "Methylation" + + acetyl: + unimod: "(UniMod:1)" + name: "Acetylation" + + misc_input_files: + # None + + # Input files to load from each search output folder + input_files: + report: + name: 'report' + file: 'report.tsv' + help: 'DIA-NN report.tsv' + default_enabled: true + + ms1_extracted: + name: 'ms1_extracted' + file: 'report.pr_matrix_channels_ms1_extracted.tsv' + help: 'DIA-NN ms1_extracted.tsv' + default_enabled: true + + features: + name: 'features' + file: 'features.tsv' + help: 'feature-detection features.tsv' + default_enabled: true + + fill_times: + name: 'fill_times' + file: 'fill_times.tsv' + help: 'feature-detection fill_times.tsv' + default_enabled: true + + tic: + name: 'tic' + file: 'tic.tsv' + help: 'extracted total ion current tic.tsv' + default_enabled: true + + sn: + name: 'sn' + file: 'sn.tsv' + help: 'extracted signal to noise sn.tsv' + default_enabled: true + + # Retention time filter applied to all statistics displaying retention time information + RT.Start: 0 + RT.End: 90 +max_quant: + mode_name: 'Max Quant' + mode_skin: 'blue' + + # Input files to load from each search output folder + input_files: + evidence: + name: 'evidence' + file: 'evidence.txt' + help: 'MaxQuant evidence.txt file' + default_enabled: true + msms: + name: 'msms' + file: 'msms.txt' + help: 'MaxQuant msms.txt file' + default_enabled: true + msmsScans: + name: 'msmsScans' + file: 'msmsScans.txt' + help: 'MaxQuant msmsScans.txt file' + default_enabled: true + allPeptides: + name: 'allPeptides' + file: 'allPeptides.txt' + help: 'MaxQuant allPeptides.txt file' + default_enabled: true + summary: + name: 'summary' + file: 'summary.txt' + help: 'MaxQuant summary.txt file' + default_enabled: true + parameters: + name: 'parameters' + file: 'parameters.txt' + help: 'MaxQuant parameters.txt file' + default_enabled: true + msScans: + name: 'msScans' + file: 'msScans.txt' + help: 'MaxQuant msScans.txt file' + default_enabled: true + + # Misc. input files to provide upload forms for + misc_input_files: + inclusion_list: + name: 'inclusion_list' + help: 'Inclusion List .txt file' + Labeling_Efficiency_list: + name: 'Labeling_Efficiency' + help: 'Labeling Efficiency .txt file' + dartid_list: + name: 'DART-ID' + help: 'DART-ID evidence_updated.txt file' + # Allow addition of folders without MaxQuant txt output files? allow_all_folders: true @@ -63,7 +153,7 @@ pif_thresh: 0 remove_decoy: REV_ remove_contam: CON_ -exp_name_format: 'Exp %f %i' +exp_name_format: '%e' exp_name_pattern: '' ## Figure rendering options @@ -72,15 +162,15 @@ ppi: 150 ## download figure settings download_figure_units: 'in' -download_figure_width: 5 +download_figure_width: 6 download_figure_height: 5 # label font size -figure_title_font_size: 16 +figure_title_font_size: 12 # axis tick label font size -figure_axis_font_size: 12 +figure_axis_font_size: 10 # facet label font size -figure_facet_font_size: 12 +figure_facet_font_size: 10 # line width figure_line_width: 1 # show background grid @@ -115,8 +205,10 @@ tab_colors: ['#E41A1C', '#377EB8', '#4DAF4A', '#984EA3', '#FF7F00', '#1B9E77', ' aliases: 'Raw.file': + - Run - Raw.File - RawFile 'Retention.time': + - RT - Retention.Time - RetentionTime diff --git a/start_server.R b/start_server.R old mode 100644 new mode 100755 index 99ef236..fdd3c3d --- a/start_server.R +++ b/start_server.R @@ -5,40 +5,9 @@ repos['CRAN'] <- 'https://cloud.r-project.org' options(repos=repos) -# look for pandoc -# stolen from https://github.com/r-lib/rappdirs/blob/master/R/utils.r -get_os <- function() { - if (.Platform$OS.type == "windows") { - "win" - } else if (Sys.info()["sysname"] == "Darwin") { - "mac" - } else if (.Platform$OS.type == "unix") { - "unix" - } else { - stop("Unknown OS") - } -} -os <- get_os() - -pandoc_osx <- "/Applications/RStudio.app/Contents/MacOS/pandoc" -pandoc_windows <- "C:\\Program Files\\RStudio\\bin\\pandoc" -pandoc_linux <- "/usr/lib/rstudio/bin/pandoc" - -# try and predict pandoc directories -if(os == 'mac' & file.exists(pandoc_osx)) { - Sys.setenv(RSTUDIO_PANDOC=pandoc_osx) -} else if (os == 'win' & file.exists(pandoc_windows)) { - Sys.setenv(RSTUDIO_PANDOC=pandoc_windows) -} else if (os == 'unix' & file.exists(pandoc_linux)) { - Sys.setenv(RSTUDIO_PANDOC=pandoc_linux) -} else { - print('pandoc could not be found in default directories. If it is not available on the system PATH then PDF report generation will fail.') -} - library(shiny) port <- 8080 host <- '127.0.0.1' -shiny::runApp(getwd(), host=host, port=port, launch.browser = T) - +shiny::runApp(getwd(), host=host, port=port, launch.browser = T) \ No newline at end of file diff --git a/ui.R b/ui.R index e18025e..5d5e2ab 100644 --- a/ui.R +++ b/ui.R @@ -66,14 +66,76 @@ bsDep[[1]]$name <- "bootstrap2" pkDep <- htmltools::findDependencies(shinyWidgets:::attachShinyWidgetsDep(tags$div(), widget = "picker")) pkDep[[2]]$name <- "picker2" + +global_filters <- list() + +if (config[['do_ms_mode']] == 'max_quant'){ + + global_filters <- list( + shinyWidgets::sliderTextInput('pep_thresh', + label=tags$div(class='slider-label-header', + tags$span(class='slider-title', 'PEP Threshold:'), + # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips + tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), + `data-toggle`='tooltip', `data-placement`='right', + title='Filter identified peptides at an identification confidence threshold (Posterior error probability -- PEP). Peptides that have a PEP higher than this value will not be included in the module analyses or visualizations.' + ) + ), + grid = T, choices=c(1e-4, 1e-3, 1e-2, 1e-1, 1), selected=config[['pep_thresh']]), + # PIF filter slider + shinyWidgets::sliderTextInput('pif_thresh', + label=tags$div(class='slider-label-header', + tags$span(class='slider-title', 'PIF Threshold:'), + # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips + tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), + `data-toggle`='tooltip', `data-placement`='right', + title='Filter identified peptides at an isolation purity score threshold (Precursor Ion Fraction -- PIF). Peptides that have a PIF lower than this value will not be included in the module analyses or visualizations' + ) + ), + grid = T, choices=seq(0, 1, by=0.1), selected=config[['pif_thresh']]) + + ) +} else if (config[['do_ms_mode']] == 'dia-nn'){ + global_filters <- list( + shinyWidgets::sliderTextInput('pep_thresh', + label=tags$div(class='slider-label-header', + tags$span(class='slider-title', 'PEP Threshold:'), + # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips + tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), + `data-toggle`='tooltip', `data-placement`='right', + title='Filter identified peptides at an identification confidence threshold (Posterior error probability -- PEP). Peptides that have a PEP higher than this value will not be included in the module analyses or visualizations.' + ) + ), + grid = T, choices=c(1e-4, 1e-3, 1e-2, 1e-1, 1), selected=config[['pep_thresh']]), + shinyWidgets::pickerInput( + inputId = "modification", + choices = config[['modification_list']]$name, + label=tags$div(class='slider-label-header', + tags$span(class='slider-title', 'Modification:'), + # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips + tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), + `data-toggle`='tooltip', `data-placement`='right', + title='Filter dataset for certain modifications.' + ) + ), + ) + + ) + +} + + + shinyUI( - dashboardPage(skin='blue', + dashboardPage(skin=config[['mode_skin']], dashboardHeader(title = "DO-MS Dashboard", + tags$li(class='dropdown', style='display:flex;flex-direction:row;align-items:center;height:50px', tags$a(class='github-btn', style='padding:5px;border-radius:10px', href='https://github.com/SlavovLab/DO-MS/', target='_blank', tags$img(src='GitHub_Logo_White.png', height='30px')), - tags$span(class='version-string', paste0('Version: ', version))) + tags$span(class='version-string', paste0('Version: ', version)), + tags$span(class='version-string', paste0('Mode: ', config[['mode_name']]))) # tags$li(class='dropdown', # tags$button(type='button', class='btn btn-default', `data-container`='body', `data-toggle`='popover', # `data-placement`='bottom', @@ -86,14 +148,14 @@ shinyUI( dashboardSidebar( sidebarMenu( # Sidebar Menu Options - menuItem("Import Data", tabName = "import", icon = icon("upload", lib="glyphicon")), + menuItem("Import Data", tabName = "import", icon = icon("upload", lib="glyphicon", verify_fa = FALSE)), menuItem("Dashboard", tabName = "dashboard", - icon = icon("signal", lib = "glyphicon"), startExpanded = TRUE, + icon = icon("signal", lib = "glyphicon", verify_fa = FALSE), startExpanded = TRUE, menu_items ), - menuItem("Generate Report", tabName = "report", icon = icon("file", lib="glyphicon")), - menuItem("Documentation", tabName = "documentation", icon = icon("book", lib="glyphicon")), - menuItem("Plot Settings", tabName = "settings", icon = icon("cog", lib="glyphicon")) + menuItem("Generate Report", tabName = "report", icon = icon("file", lib="glyphicon", verify_fa = FALSE)), + menuItem("Documentation", tabName = "documentation", icon = icon("book", lib="glyphicon", verify_fa = FALSE)), + menuItem("Plot Settings", tabName = "settings", icon = icon("cog", lib="glyphicon", verify_fa = FALSE)) ), tags$hr(), @@ -104,30 +166,13 @@ shinyUI( ), tags$hr(), + # global filters for MQ mode + global_filters # PEP filter slider - shinyWidgets::sliderTextInput('pep_thresh', - label=tags$div(class='slider-label-header', - tags$span(class='slider-title', 'PEP Threshold:'), - # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips - tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), - `data-toggle`='tooltip', `data-placement`='right', - title='Filter identified peptides at an identification confidence threshold (Posterior error probability -- PEP). Peptides that have a PEP higher than this value will not be included in the module analyses or visualizations.' - ) - ), - grid = T, choices=c(1e-4, 1e-3, 1e-2, 1e-1, 1), selected=config[['pep_thresh']]), - # PIF filter slider - shinyWidgets::sliderTextInput('pif_thresh', - label=tags$div(class='slider-label-header', - tags$span(class='slider-title', 'PIF Threshold:'), - # tooltip: https://getbootstrap.com/docs/3.3/javascript/#tooltips - tags$button(class='btn btn-secondary tooltip-btn', icon('question-sign', lib='glyphicon'), - `data-toggle`='tooltip', `data-placement`='right', - title='Filter identified peptides at an isolation purity score threshold (Precursor Ion Fraction -- PIF). Peptides that have a PIF lower than this value will not be included in the module analyses or visualizations' - ) - ), - grid = T, choices=seq(0, 1, by=0.1), selected=config[['pif_thresh']]) + + ), dashboardBody( diff --git a/ui/import_tab.R b/ui/import_tab.R index a97a3c4..9b06a13 100644 --- a/ui/import_tab.R +++ b/ui/import_tab.R @@ -32,7 +32,7 @@ import_tab <- tabItem(tabName='import', fluidPage( a(name='folder-select'), div(class='import-header', span(class='num', '1'), - h2('Select MaxQuant txt Output Folders')), + h2(paste('Select', config[['mode_name']], 'Output Folders'))), p(class='import-help', 'Click on a row in the table to select that folder. Click multiple rows to select multiple folders, and use Shift to select a series of folders.'), fluidRow( column(9, wellPanel( @@ -69,7 +69,7 @@ import_tab <- tabItem(tabName='import', fluidPage( div(class='upload-button-container', tags$button(id='confirm_folders', class='btn btn-primary action-button shiny-bound-input', - icon('file-upload'), 'Load Data') + icon('file-upload', verify_fa = FALSE), 'Load Data') ) ), column(6,