diff --git a/.RData b/.RData index 34ceee8..b096994 100644 Binary files a/.RData and b/.RData differ diff --git a/README.md b/README.md index 99f2f14..29d45e7 100644 --- a/README.md +++ b/README.md @@ -2,55 +2,42 @@ Data-Driven Optimization of Mass Spectrometry Methods +![Python Package](https://github.com/SlavovLab/DO-MS-DIA/actions/workflows/python-package.yml/badge.svg) ![GitHub release](https://img.shields.io/github/release/SlavovLab/DO-MS.svg) ![GitHub](https://img.shields.io/github/license/SlavovLab/DO-MS.svg) * [Website](https://do-ms.slavovlab.net) * [Get started now](#getting-started) * [Download](https://github.com/SlavovLab/DO-MS/releases/latest) -* [Manuscript](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) +* [BioArxiv Preprint](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) - + ## Getting Started Please read our detailed getting started guides: -* [Getting started on the application](https://do-ms.slavovlab.net/docs/getting-started-application) -* [Getting started on the command-line](https://do-ms.slavovlab.net/docs/getting-started-command-line) +* [Getting started with DIA preprocessing](https://do-ms.slavovlab.net/docs/getting-started-preprocessing) +* [Getting started with DIA reports](https://do-ms.slavovlab.net/docs/getting-started-dia-app) +* [Getting started with DDA reports](https://do-ms.slavovlab.net/docs/getting-started-dda-app) ### Requirements - -This application has been tested on R >= 4.0.0 and R <= 4.0.2, OSX 10.14 / Windows 7/8/10. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and MaxQuant < 2.0 +This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10/11. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and DIA-NN > 1.8.1. The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. -### Installation - -Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest). - -### Running +### Running the Interactive Application The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file - +![](https://do-ms.slavovlab.net/assets/images/do-ms-proj.png){: width="70%" .center-image} and clicking the "Run App" button at the top of the application, after opening the `server.R` file. We recommend checking the "Run External" option to open the application in your default browser instead of the RStudio Viewer. - +![](https://do-ms.slavovlab.net/assets/images/do-ms-run.png){: width="70%" .center-image} You can also start the application by running the `start_server.R` script. -### Automated Report Generation - -You can automatically generate PDF/HTML reports without having to launch the server by running the `do-ms_cmd.R` script, like so: - -``` -$ Rscript do-ms_cmd.R config_file.yaml -``` - -This requires a configuration file, and you can [find an example one here](https://github.com/SlavovLab/DO-MS/blob/master/example/config_file.yaml). See [Automating Report Generation](https://do-ms.slavovlab.net/docs/automation) for more details and instructions. - ### Customization DO-MS is designed to be easily user-customizable for in-house proteomics workflows. Please see [Building Your Own Modules](https://do-ms.slavovlab.net/docs/build-your-own) for more details. @@ -59,9 +46,9 @@ DO-MS is designed to be easily user-customizable for in-house proteomics workflo Please see [Hosting as a Server](https://do-ms.slavovlab.net/docs/hosting-as-server) for more details. -### Search Engines Other Than MaxQuant +### Supporting other Search Engines -This application is currently maintained for MaxQuant >= 1.6.0.16. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ](https://do-ms.slavovlab.net/docs/other-search-engines) for more details. +This application is currently maintained for (MaxQuant)[https://www.nature.com/articles/nbt.1511] >= 1.6.0.16 and (DIA-NN)[https://www.nature.com/articles/s41592-019-0638-x] >= 1.8. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ](https://do-ms.slavovlab.net/docs/other-search-engines) for more details. ### Can I use this for Metabolomics, Lipidomics, etc... ? @@ -71,13 +58,8 @@ While the base library of modules are based around bottom-up proteomics by LC-MS ## About the project - - The manuscript for this tool is published at the Journal of Proteome Research: [https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) - -The manuscript is also freely available on bioRxiv: [https://www.biorxiv.org/content/10.1101/512152v1](https://www.biorxiv.org/content/10.1101/512152v1). +The manuscript for the extended version 2.0 can be found on bioArxiv: [https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) Contact the authors by email: [nslavov\{at\}northeastern.edu](mailto:nslavov@northeastern.edu). @@ -87,21 +69,11 @@ DO-MS is distributed by an [MIT license](https://github.com/SlavovLab/DO-MS/blob ### Contributing -Please feel free to contribute to this project by opening an issue or pull request. - - - - +Please feel free to contribute to this project by opening an issue or pull request in the [GitHub repository](https://github.com/SlavovLab/DO-MS). ------------- ## Help! -For any bugs, questions, or feature requests, +For any bugs, questions, or feature requests, please use the [GitHub issue system](https://github.com/SlavovLab/DO-MS/issues) to contact the developers. diff --git a/do-ms_cmd.R b/do-ms_cmd.R old mode 100755 new mode 100644 index 9a53d1c..033e884 --- a/do-ms_cmd.R +++ b/do-ms_cmd.R @@ -244,7 +244,7 @@ for(i in 1:length(config[['load_misc_input_files']])) { prnt(paste0('Loading misc file: ', name)) # read in as data frame (need to convert from tibble) - data[[name]] <- as.data.frame(read_tsv(file=path)) + data[[name]] <- as.data.frame(read_tsv(file=path, col_types = cols())) # rename columns (replace whitespace or special characters with '.') colnames(data[[name]]) <- gsub('\\s|\\(|\\)|\\/|\\[|\\]', '.', colnames(data[[name]])) @@ -363,6 +363,8 @@ for(f in config[['load_input_files']]) { # sort the raw files raw_files <- sort(raw_files) + + # load naming format file_levels <- rep(config[['exp_name_format']], length(raw_files)) @@ -377,8 +379,12 @@ file_levels <- str_replace(file_levels, '\\%i', as.character(seq(1, length(raw_f # folder name is stored as the names of the raw files vector file_levels <- str_replace(file_levels, '\\%f', names(raw_files)) + # replace %e with the raw file name file_levels <- str_replace(file_levels, '\\%e', raw_files) +print(file_levels) + +print(config[['exp_name_pattern']]) # apply custom string extraction expression to file levels if(!is.null(config[['exp_name_pattern']])) { @@ -387,11 +393,13 @@ if(!is.null(config[['exp_name_pattern']])) { file_levels[is.na(file_levels)] <- 'default' } +print(file_levels) + # apply custom names, as defined in the "exp_names" config field if(!is.null(config[['exp_names']]) & length(config[['exp_names']]) > 0) { file_levels[1:length(config[['exp_names']])] <- config[['exp_names']] } - +print(file_levels) # ensure there are no duplicate names # if so, then append a suffix to duplicate names to prevent refactoring errors @@ -477,4 +485,4 @@ generate_report(input, f_data, raw_files, config[['output']], progress_bar=FALSE # prnt(paste0('Report written to: ', config[['output']])) -prnt('Done!') +prnt('Done!') \ No newline at end of file diff --git a/docs/Gemfile b/docs/Gemfile index 3be9c3c..457656e 100644 --- a/docs/Gemfile +++ b/docs/Gemfile @@ -1,2 +1,6 @@ source "https://rubygems.org" gemspec + +gem "rexml", "~> 3.2" + +gem "webrick", "~> 1.8" diff --git a/docs/_config.yml b/docs/_config.yml index b3ce700..3eccc36 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -26,17 +26,15 @@ search_enabled: true # Aux links for the upper right navigation aux_links: - "DO-MS Preprint": - - "https://www.biorxiv.org/content/10.1101/512152v1" - "JPR Article": - - "https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039" + "BioArxiv Preprint": + - "https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1" "GitHub Repository": - "https://github.com/SlavovLab/DO-MS" "Slavov Lab": - "http://slavovlab.net" github_link: https://github.com/SlavovLab/DO-MS -preprint_link: https://www.biorxiv.org/content/10.1101/512152v1 +preprint_link: https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1 jpr_link: https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039 # Color scheme currently only supports "dark" or nil (default) diff --git a/docs/assets/images/dia_input.png b/docs/assets/images/dia_input.png new file mode 100644 index 0000000..b9b341c Binary files /dev/null and b/docs/assets/images/dia_input.png differ diff --git a/docs/assets/images/do-ms-dia-example.png b/docs/assets/images/do-ms-dia-example.png new file mode 100644 index 0000000..8d4ae3f Binary files /dev/null and b/docs/assets/images/do-ms-dia-example.png differ diff --git a/docs/assets/images/do-ms-dia-foldername.png b/docs/assets/images/do-ms-dia-foldername.png new file mode 100644 index 0000000..cf4b699 Binary files /dev/null and b/docs/assets/images/do-ms-dia-foldername.png differ diff --git a/docs/assets/images/do-ms-dia-generate-report.png b/docs/assets/images/do-ms-dia-generate-report.png new file mode 100644 index 0000000..b9632e3 Binary files /dev/null and b/docs/assets/images/do-ms-dia-generate-report.png differ diff --git a/docs/assets/images/do-ms-dia-import.png b/docs/assets/images/do-ms-dia-import.png new file mode 100644 index 0000000..e4d42a6 Binary files /dev/null and b/docs/assets/images/do-ms-dia-import.png differ diff --git a/docs/assets/images/do-ms-dia-info.png b/docs/assets/images/do-ms-dia-info.png new file mode 100644 index 0000000..a77dbbb Binary files /dev/null and b/docs/assets/images/do-ms-dia-info.png differ diff --git a/docs/assets/images/do-ms-dia-load.png b/docs/assets/images/do-ms-dia-load.png new file mode 100644 index 0000000..5c2decb Binary files /dev/null and b/docs/assets/images/do-ms-dia-load.png differ diff --git a/docs/assets/images/do-ms-dia-overview.png b/docs/assets/images/do-ms-dia-overview.png new file mode 100644 index 0000000..6079f1c Binary files /dev/null and b/docs/assets/images/do-ms-dia-overview.png differ diff --git a/docs/assets/images/do-ms-dia-pathname.png b/docs/assets/images/do-ms-dia-pathname.png new file mode 100644 index 0000000..b4b2b57 Binary files /dev/null and b/docs/assets/images/do-ms-dia-pathname.png differ diff --git a/docs/assets/images/do-ms-dia-rename.png b/docs/assets/images/do-ms-dia-rename.png new file mode 100644 index 0000000..debb1eb Binary files /dev/null and b/docs/assets/images/do-ms-dia-rename.png differ diff --git a/docs/assets/images/do-ms-dia-results.png b/docs/assets/images/do-ms-dia-results.png new file mode 100644 index 0000000..a130b41 Binary files /dev/null and b/docs/assets/images/do-ms-dia-results.png differ diff --git a/docs/assets/images/do-ms-dia_config.png b/docs/assets/images/do-ms-dia_config.png new file mode 100644 index 0000000..1d79228 Binary files /dev/null and b/docs/assets/images/do-ms-dia_config.png differ diff --git a/docs/assets/images/do-ms-dia_mode.png b/docs/assets/images/do-ms-dia_mode.png new file mode 100644 index 0000000..0ab4770 Binary files /dev/null and b/docs/assets/images/do-ms-dia_mode.png differ diff --git a/docs/assets/images/do-ms-dia_title.png b/docs/assets/images/do-ms-dia_title.png new file mode 100755 index 0000000..276d1d2 Binary files /dev/null and b/docs/assets/images/do-ms-dia_title.png differ diff --git a/docs/assets/images/do-ms-dia_title_v2.png b/docs/assets/images/do-ms-dia_title_v2.png new file mode 100644 index 0000000..9226b55 Binary files /dev/null and b/docs/assets/images/do-ms-dia_title_v2.png differ diff --git a/docs/assets/images/preprocessing_input.png b/docs/assets/images/preprocessing_input.png new file mode 100644 index 0000000..49ac6e3 Binary files /dev/null and b/docs/assets/images/preprocessing_input.png differ diff --git a/docs/assets/images/preprocessing_output.png b/docs/assets/images/preprocessing_output.png new file mode 100644 index 0000000..a70bd6b Binary files /dev/null and b/docs/assets/images/preprocessing_output.png differ diff --git a/docs/docs/DO-MS_examples.md b/docs/docs/DO-MS_examples.md index 2e85552..1de564b 100644 --- a/docs/docs/DO-MS_examples.md +++ b/docs/docs/DO-MS_examples.md @@ -1,7 +1,7 @@ --- layout: default title: DO-MS Examples -nav_order: 3 +nav_order: 5 permalink: docs/DO-MS_examples parent: Getting Started --- @@ -10,6 +10,13 @@ parent: Getting Started Bellow are links to example DO-MS Reports +## DIA-NN DIA Reports +1. [MS2 number optimization](https://drive.google.com/uc?id=1mNrJsk6uaI3ljtQEadr0XwR5mlyO4xZg&export=download) +2. [Different MS2 strategies](https://drive.google.com/uc?id=1eOJXC2Zb0lmVwbsaj7Zlf6wm3SPGrbWl&export=download) +3. [Gradient length optimization](https://drive.google.com/uc?id=1dQdo3kR-WxWT8zEeOxztxXmJE1rSMsEM&export=download) +4. [MS1 sampling optimization](https://drive.google.com/uc?id=1YcT3al9ICu36_ornSsLZgHj1eTBBPOxb&export=download) +5. [Single cell quality control](https://drive.google.com/uc?id=1v62qJ8JVZ_TSkUa3K0i6nWoJDbW9EQHO&export=download) +## MaxQuant DDA Reports 1. [Isobaric carrier optimization](https://scope2.slavovlab.net/mass-spec/Isobaric-carrier-optimization#do-ms-reports) 2. [Technical improvements of SCoPE2](https://scope2.slavovlab.net/mass-spec/Increased-accuracy-and-throughput) diff --git a/docs/docs/getting_started.md b/docs/docs/getting_started.md index ea238bb..290e93d 100644 --- a/docs/docs/getting_started.md +++ b/docs/docs/getting_started.md @@ -8,4 +8,19 @@ has_children: true # Getting Started -DO-MS can be run either from the command-line or as interactive application. Follow the links below to get started using the implementation of your choice. For more details on the data display, read the [DO-MS article](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039). +DO-MS can be run either from the command-line or as interactive application. Follow the links below to get started using the implementation of your choice. For more details on the data display, read the [DO-MS 2.0 article](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1). + +Before starting DO-MS the first time, the input data type has to be selected. DO-MS can work with both DDA results coming from MaxQuant as well as with DIA results coming from DIA-NN. +The mode can be set in the config.yaml file. Open the file in R-Studio or your editor of choice. +![]({{site.baseurl}}/assets/images/do-ms-dia_mode.png){: width="70%" .center-image} + +If You wish to analyze MaxQuant DDA results, change the parameter to `max_quant`, otherwise leave it as `dia-nn`. This setting needs to be changed before SO-MS is started or the R environment is initialized. It is also possible to keep both versionas in two separate folders simultanously. +![]({{site.baseurl}}/assets/images/do-ms-dia_config.png){: width="70%" .center-image} + +# Generating DO-MS Reports +Please read our detailed getting started guides: + +* [DIA Preprocessing]({{site.baseurl}}/docs/getting-started-preprocessing) +* [DIA Reports using the app]({{site.baseurl}}/docs/getting-started-dia-app) +* [DDA Reports using the app]({{site.baseurl}}/docs/getting-started-application) +* [DDA Reports using the command line]({{site.baseurl}}/docs/getting-started-application) \ No newline at end of file diff --git a/docs/docs/getting_started_app.md b/docs/docs/getting_started_app.md index e207619..3d6c702 100644 --- a/docs/docs/getting_started_app.md +++ b/docs/docs/getting_started_app.md @@ -1,12 +1,12 @@ --- layout: default -title: Interactive Application -nav_order: 1 -permalink: docs/getting-started-application +title: DDA Reports in the App +nav_order: 3 +permalink: docs/getting-started-dda-app parent: Getting Started --- -# Getting Started -- Interactive DO-MS Application +# Getting Started -- DDA Reports in the App DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. @@ -31,13 +31,8 @@ The only constraint for data in DO-MS is that it must be from MaxQuant version > ## Installation -Download the application via. a zip or tar archive from the [GitHub release page](https://github.com/SlavovLab/DO-MS/releases). Unzip the archive, and then open the `DO-MS.Rproj` to load the project into RStudio - -![]({{site.baseurl}}/assets/images/do-ms-proj.png){: width="60%" .center-image} - -To start the app open `server.R` in RStudio and on the top right corner of the editor click on the "Run App" button. To run the application in your browser (preferred option) rather than in RStudio, click the dropdown menu and select "Run External". - -![]({{site.baseurl}}/assets/images/do-ms-run.png){: width="85%" .center-image} +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. +Note that DO-MS has to be configured for MaxQuant data. [Find out More]({{site.baseurl}}/docs/getting-started) ## Data Import diff --git a/docs/docs/getting_started_cmd.md b/docs/docs/getting_started_cmd.md index 655ca80..81ae9dc 100644 --- a/docs/docs/getting_started_cmd.md +++ b/docs/docs/getting_started_cmd.md @@ -1,7 +1,7 @@ --- layout: default -title: Command Line -nav_order: 2 +title: DO-MS Command Line +nav_order: 4 permalink: docs/getting-started-command-line parent: Getting Started --- diff --git a/docs/docs/getting_started_dia_app.md b/docs/docs/getting_started_dia_app.md new file mode 100644 index 0000000..91d8b8e --- /dev/null +++ b/docs/docs/getting_started_dia_app.md @@ -0,0 +1,100 @@ +--- +layout: default +title: DIA Reports in the App +nav_order: 2 +permalink: docs/getting-started-dia-app +parent: Getting Started +--- + +# Getting Started -- DIA Reports in the App + +DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. + +**Table of Contents** + +1. [Example Data](#example-data) +2. [Installation](#installation) +3. [Data Import](#data-import) + 1. [Adding Folders](#adding-folders) + 2. [Renaming Experiments](#renaming-experiments) +4. [Interacting with Modules](#interacting-with-modules) +5. [Generate Report](#generate-report) + + + +## Example Data + +We have provided an example data set online, which contains parts of the MS2 number optimization in the paper. This data can also be obtained by follwoing the guide on the preprocessing or can be downloaded as a zip file here: [https://drive.google.com/file/d/1BzWVKghIThtgYItgGy9vt6M214mHB74q/view?usp=share_link](https://drive.google.com/file/d/1BzWVKghIThtgYItgGy9vt6M214mHB74q/view?usp=share_link). + + +![]({{site.baseurl}}/assets/images/do-ms-dia-example.png){: width="70%" .center-image} + +## Installation + +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. + + +## Data Import +Make sure DO-MS has been configured for DIA-NN data. [Find out More]({{site.baseurl}}/docs/getting-started) + +![]({{site.baseurl}}/assets/images/do-ms-dia-overview.png){: width="90%" .center-image} + +### Adding Folders + +DO-MS is designed to load _folders_ of analyses rather than individual files. To allow quick access to your analyses, as well as to allow analyzing multiple searches simultaneously, DO-MS provides a searchable "folder table" for all of your analyses. We will start loading the folder with the example output. + +Start by clicking the "Add Folder" button at the top of the table + +![]({{site.baseurl}}/assets/images/do-ms-dia-import.png){: width="70%" .center-image} + +Then add the path of your folder into the textbox, as shown: + +![]({{site.baseurl}}/assets/images/do-ms-dia-foldername.png){: width="70%" .center-image} + +A folder path is the folder's absolute location on your machine. On Windows, you can get the folder path by navigating to it in Explorer, clicking on the top file path bar, and copying the resulting text with Ctrl+C. On Mac/OSX, you can get the folder path by right clicking on the folder while presing the 'option' key. + +![]({{site.baseurl}}/assets/images/do-ms-dia-pathname.png){: width="70%" .center-image} + +Note that in the example above we checked "Add Single Folder" to add just the folder path we pasted in. If for example, you have a folder that contains many MaxQuant searches, you can select "Add Child Folders" to add all subfolders of the path specified, or "Add Recursively" to add _all_ folders that are below the path specified. + +Click "Confirm" and now you should see your folder added to the folder table. When you have multiple folders loaded into the table you can select more than one. + +![]({{site.baseurl}}/assets/images/do-ms-dia-load.png){: width="70%" .center-image} + +Finally, load the files from the selected folders by scrolling down and clicking on the big "Load Data" button + +### Renaming Experiments + +An important aspect of data visualization is easy-to-read labels for your experiments. DO-MS provides an accessible interface for renaming your raw file names so that generated figures will be easier to interpret. + +Once your data is loaded, scroll down to the "Renaming Experiments" section. Here you will find a table of your loaded raw files and their associated "labels" that will be used when plotting. Double click a label and type enter the new labe: + +![]({{site.baseurl}}/assets/images/do-ms-dia-rename.png){: width="100%" .center-image} + +Please rename all experiments as following: + +``` +wGW027 => 4 MS2 +wGW028 => 8 MS2 +wGW029 => 10 MS2 +``` + +## Interacting with Modules + +Each module has a short description that can be accessed by hovering over the question mark icon next to the module title. + +![]({{site.baseurl}}/assets/images/do-ms-dia-info.png){: width="70%" .center-image} + +You can also download the module plot as a PNG or PDF, by clicking on the download buttons below each module plot. You can also download the underlying data used for the plot by clicking on the "Download Data" button. This tab-delimited file can be imported into many other visualization packages. + +## Generate Report + +Click on "Generate Report" in the sidebar to access the report generation page. + +![]({{site.baseurl}}/assets/images/do-ms-dia-generate-report.png){: width="70%" .center-image} + +Here you will find some options to customize your report. While we support PDF reports and PDF images, we strongly recommend that you generate your reports in HTML format with PNG image plots. Other configurations may result in graphical glitches or unwanted behavior. In addition we recommend that you check your `pandoc` installation ([more details here]({{site.baseurl}}/docs/known-issues#pandoc-not-found)) as any issues will prevent the report generation. + +Click the "Download Report" button to begin generating the report. This takes a while as all plots have to be remade. A progress bar at the bottom of the page informs you of the progress. + +All images in the report are embedded in the markup, so feel free to share this single file to your colleagues/collaborators and don't worry about having to include anything else. \ No newline at end of file diff --git a/docs/docs/getting_started_preprocessing.md b/docs/docs/getting_started_preprocessing.md new file mode 100644 index 0000000..1e9b6bf --- /dev/null +++ b/docs/docs/getting_started_preprocessing.md @@ -0,0 +1,130 @@ +--- +layout: default +title: DIA Preprocesing +nav_order: 1 +permalink: docs/getting-started-preprocessing +parent: Getting Started +--- + +# Getting Started -- DIA Preprocesing + +DO-MS is an application to visualize mass-spec data both in an interactive application and static reports generated via. the command-line. In this document we'll walk you through analyzing an example dataset in the interactive application. + +**Table of Contents** + +1. [Example Data](#example-data) +2. [Installation](#installation) +3. [Processing Raw Data](#processing-raw-data) +4. [Command Line Interface](#command-line-interface) + +## Example Data +We have provided an example data set online, which contains parts of the MS2 number optimization in the paper. You can download a .zip bundle of it here: [https://drive.google.com/file/d/1bjFzKqTFLk7ECUJOTy8LNxsCOD0Xf96Q/view?usp=share_link](https://drive.google.com/file/d/1bjFzKqTFLk7ECUJOTy8LNxsCOD0Xf96Q/view?usp=share_link). The contents of the archive are the main dia-nn report and the corresponding raw files. + +Your folder should look like this: +![]({{site.baseurl}}/assets/images/preprocessing_input.png){: width="70%" .center-image} + +## Installation +Please make sure that you installed DO-MS as descibed in the [installation]({{site.baseurl}}/docs/installation) section. For using the preprocessing pipeline it is necessary to install the ThermoRawFileParser and the Dinosaur feature detection. + +## Processing Raw Data +Open a terminal and enter the base folder of your DO-MS installation. +Make sure that your DO-MS environment is set up and activate it. +``` +conda activate doms +``` + +For processing, the piplline module located at `pipeline/processing.py` will be called with the following parameters. + +```bash +python pipeline/processing.py /location/to/example/report_filtered.tsv +``` + +the following additional options will be included: +```bash +# Activate Mono if using Mac or Linux. Mono is required to run the Thermo Raw File Parser on Linux and OSX. +-m +# location of the ThermoRawFileParser executeable +--raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe +# location of the Dinosaur .jar file +--dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar +# location of the example raw data +-r /location/to/example +``` + +The full command needs to be a single line and will look like: +```bash +python pipeline/processing.py /location/to/example/report_filtered.tsv -m --raw-parser-location /location/to/ThermoRawFileParser1.4.2/ThermoRawFileParser.exe --dinosaur-location /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/Dinosaur-1.2.0.free.jar -r /location/to/example +``` + +After processing, the additional files should be part of your folder: +![]({{site.baseurl}}/assets/images/preprocessing_output.png){: width="70%" .center-image} + +Temporary .mzML files can be deleted. + +## Command Line Interface +The documentation for the various command line options can be found by typing `python pipeline/processing.py -h` + +```bash +usage: processing.py [-h] --raw-parser-location RAW_PARSER_LOCATION + [--dinosaur-location DINOSAUR_LOCATION] [-m] [-d] [-v] + [-t TEMPORARY_FOLDER] [-r RAW_FILE_LOCATION] + [--no-feature-detection] [--no-fill-times] [--no-tic] + [--no-sn] [--no-mzml-generation] + [--mz-bin-size MZ_BIN_SIZE] [--rt-bin-size RT_BIN_SIZE] + [--resolution RESOLUTION] [-p PROCESSES] [--isotopes-sn] + report + +Command line tool for feature detection in shotgun MS experiments. Can be used +together with DIA-NN to provide additional information on the peptide like +features identified in the MS1 spectra. + +positional arguments: + report Location of the report.tsv output from DIA-NN which + should be used for analysis. + +options: + -h, --help show this help message and exit + --raw-parser-location RAW_PARSER_LOCATION + Path pointing to the ThermoRawFileParser executeable. + --dinosaur-location DINOSAUR_LOCATION + Path pointing to the dinosaur jar executeable. + -m, --mono Use mono for ThermoRawFileParser under Linux and OSX. + -d, --delete Delete generated mzML and copied raw files after + successfull feature generation. + -v, --verbose Show verbose output. + -t TEMPORARY_FOLDER, --temporary-folder TEMPORARY_FOLDER + Input Raw files will be temporarilly copied to this + folder. Required for use with Google drive. + -r RAW_FILE_LOCATION, --raw-file-location RAW_FILE_LOCATION + By default, raw files are loaded based on the + File.Name column in the report.tsv. With this option, + a different folder can be specified. + --no-feature-detection + All steps are performed as usual but Dinosaur feature + detection is skipped. No features.tsv file will be + generated. + --no-fill-times All steps are performed as usual but fill times are + not extracted. No fill_times.tsv file will be + generated. + --no-tic All steps are performed as usual but binned TIC is not + extracted. No tic.tsv file will be generated. + --no-sn Signal to Noise ratio is not estimated for precursors + --no-mzml-generation Raw files are not converted to .mzML. Nevertheless, + mzML files are expected in their theoretical output + location and loaded. Should be only be carefully used + for repeated calulcations or debugging + --mz-bin-size MZ_BIN_SIZE + Bin size over the mz dimension for TIC binning. + --rt-bin-size RT_BIN_SIZE + Bin size over the RT dimension for TIC binning in + minutes. If a bin size of 0 is provided, binning will + not be applied and TIC is given per scan. + --resolution RESOLUTION + Set the resolution used for estimating counts from S/N + data + -p PROCESSES, --processes PROCESSES + Number of Processes + --isotopes-sn Use all isototopes from the same scan as the highest + intensity datapoint for estimating the SN and copy + number. +``` \ No newline at end of file diff --git a/docs/docs/installation.md b/docs/docs/installation.md new file mode 100644 index 0000000..6e5c877 --- /dev/null +++ b/docs/docs/installation.md @@ -0,0 +1,122 @@ +--- +layout: default +title: Installation +nav_order: 2 +permalink: docs/installation +--- + +**Table of Contents** + +1. [Interactive DO-MS App](#interactive-do-ms-app) + 1. [Installation](#installation) + 2. [Running](#running) +2. [DIA Python Pipeline](#dia-python-pipeline) + 1. [Installation](#installation) + 2. [Running](#running) + 3. [Setup Custom Script ](#setup-custom-script) + 1. [Windows](#windows) + 2. [MacOS](#macos) + +# Interactive DO-MS App + +## Installation + +This application has been tested on R >= 3.5.0, OSX >= 10.14 / Windows 7/8/10/11. Make sure you have the mos recent version of R and R Studio installed. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16. + +The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. + +## Running + +The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file + +![]({{site.baseurl}}/assets/images/do-ms-proj.png){: width="70%" .center-image} + +and clicking the "Run App" button at the top of the application, after opening the `server.R` file. We recommend checking the "Run External" option to open the application in your default browser instead of the RStudio Viewer. + +![]({{site.baseurl}}/assets/images/do-ms-run.png){: width="70%" .center-image} + +You can also start the application by running the `start_server.R` script. + +More information on using the DO-MS application is provided [here]({{site.baseurl}}/docs/getting-started-application). + +# DIA Python Pipeline + +## Installation + +1. Please make sure that [Conda](https://docs.conda.io/en/latest/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) are installed. +Use the provided conda configuration to create the environment with all required dependencies. +``` +conda env create -f pipeline/env.yml +``` + +2. Activate the environment and check that the command can be run. +``` +conda activate doms +python pipeline/processing.py -h +``` + +3. For automatic conversion of Thermo Raw files to the open mzML format ThermoRawFileParser (Hulstaert et al. 2020) is required. Download the latest release of the [ThermoRawFileParser](https://github.com/compomics/ThermoRawFileParser) (version v1.4.0 or newer) and write down the location of the ```ThermoRawFileParser.exe``` file. Under OSX and Linux, [Mono](https://www.mono-project.com/download/stable/). Please make sure to use the option ```-m``` with the feature detection which will tell the script to use Mono. + +4. For feature detection Dinosaur (Teleman et al. 2016) is used. Download the latest release of the Dinosaur from [GitHub](https://github.com/fickludd/dinosaur) and install Java as recommended on your platform. Please write down the location of the ```Dinosaur-xx.jar``` file. + +5. Optional, create a custom script for your system. + +## Running + +```python pipeline/processing.py -h``` + +More information on using the python pipeline is provided [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +## Custom Script for Preprocessing + +Using the feature detection requires the correct conda environment, ThermoRawFileParser and Dinosaur location. If the tool is used frequently its more convenient to combine the configuration in a script which is added to the system ```PATH```. This will register a local command which can be used everywhere on the system and allows to set default options. + +### Windows + +1. Create a local folder for example under ```C:\Users\xxx\Documents\bin```. + +2. Create a file named ```processing.bat``` with the following content. Make sure that all three file paths are changed to the corresponding locations on your system. +Further default options can be added to this file if needed. An overview of all command line options can be found [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +``` +@echo off +conda activate doms & ^ +python C:\Users\xxx\pipeline\processing.py %* ^ +--dinosaur-location "C:\Users\xxx\dinosaur-1.2\Dinosaur-1.2.0.free.jar" ^ +--raw-parser-location "C:\Users\xxx\thermo_raw_file_parser_1.3.4\ThermoRawFileParser.exe" +``` + +3. Search ```environment variables``` in the windows search and click ```Edit the system environment variables```. + +4. Click ```Environment Variables``` in the bottom right. + +5. Select the variable ```Path``` in the upper panel saying ```User variables ...``` and click ```Edit```. + +6. Click ```New``` and enter the location of the directory containing the ```processing.bat``` script. + +7. Now, the feature processing including the external tools can be called from anywhere on the machine with the ```processing``` command. + + +### MacOS + +1. Create a local folder for example ```/Users/xxx/Documents/bin```. + +2. Create a file named ```processing``` with the following content. Make sure that all three file paths are changed to the corresponding locations on your system. +``` +#!/bin/bash +eval "$(conda shell.bash hook)" +conda activate doms +python /Users/xxx/pipeline/processing.py "$@" \ +-m \ +--dinosaur-location "/Users/xxx/Dinosaur/Dinosaur-1.2.0.free.jar" \ +--raw-parser-location "/Users/xxx/ThermoRawFileParser/ThermoRawFileParser.exe" +``` +The first line makes conda available to the script ([known issue](https://github.com/conda/conda/issues/7980)). Please note how the mono option ```-m``` is used by default. Further default options can be added to this file if needed. An overview of all command line options can be found [here]({{site.baseurl}}/docs/getting-started-preprocessing). + +3. make the file executable with the following command ```chmod +x processing```. + +4. Navigate to the location of your bash profile file in your home directory. This will be ```/Users/{username}/.zshrc``` for zsh and ```/Users/{username}/.bash_profile``` for bash on macOS and ```/home/xxx/.bashrc``` for bash on linux. Open the file in a text editor of choice, for example vim ```vim .bash_profile```. Go into edit mode by pressing ```i```. + +5. Add the line ```export PATH="/Users/xxx/Documents/bin:$PATH"``` to the end of the file and save the file by pressing ```ESSC```, ```:wq```, ```Enter```. + +6. Restart your terminal. diff --git a/docs/docs/session_info.md b/docs/docs/session_info.md index 8bd9737..e01b150 100644 --- a/docs/docs/session_info.md +++ b/docs/docs/session_info.md @@ -7,16 +7,14 @@ permalink: docs/session-info # Session Info -DO-MS is tested on Windows (7/8/10), OSX (10.14.1), and Linux (Ubuntu 14.04). Development sessions and package versions are listed below: +DO-MS is tested on Windows (7/8/10/11), OSX (12.5.1), and Linux (Ubuntu 14.04). Development sessions and package versions are listed below: ``` -R version 3.5.2 (2018-12-20) -Platform: x86_64-apple-darwin15.6.0 (64-bit) -Running under: macOS Mojave 10.14.1 +R version 4.2.2 (2022-10-31) +Running under: macOS Monterey 12.5.1 Matrix products: default -BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib -LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib +LAPACK: /opt/homebrew/Cellar/r/4.2.2/lib/R/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 @@ -25,17 +23,34 @@ attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: - [1] argparse_2.0.0 knitr_1.21 bindrcpp_0.2.2 yaml_2.2.0 stringr_1.3.1 - [6] DT_0.5 rmarkdown_1.11 readr_1.3.1 RColorBrewer_1.1-2 reshape2_1.4.3 -[11] lattice_0.20-38 ggplot2_3.1.0 dplyr_0.7.8 shinyWidgets_0.4.4 shinydashboard_0.7.1 -[16] pacman_0.5.0 shiny_1.2.0 + [1] htmltools_0.5.3 viridis_0.6.2 MASS_7.3-58.1 + [4] ggpubr_0.5.0 viridisLite_0.4.1 yaml_2.3.5 + [7] stringr_1.4.1 DT_0.24 rmarkdown_2.16 +[10] readr_2.1.2 reshape2_1.4.4 tibble_3.1.8 +[13] knitr_1.40 lattice_0.20-45 ggplot2_3.4.0 +[16] tidyr_1.2.0 dplyr_1.0.10 shinydashboard_0.7.2 +[19] shinyWidgets_0.7.3 pacman_0.5.1 shiny_1.7.2 loaded via a namespace (and not attached): - [1] tidyselect_0.2.5 xfun_0.4 purrr_0.2.5 colorspace_1.4-0 getopt_1.20.2 htmltools_0.3.6 utf8_1.1.4 - [8] rlang_0.3.1 later_0.7.5 pillar_1.3.1 glue_1.3.0 withr_2.1.2 bindr_0.1.1 plyr_1.8.4 -[15] findpython_1.0.4 munsell_0.5.0 gtable_0.2.0 htmlwidgets_1.3 evaluate_0.12 labeling_0.3 httpuv_1.4.5.1 -[22] crosstalk_1.0.0 fansi_0.4.0 highr_0.7 Rcpp_1.0.0 xtable_1.8-3 promises_1.0.1 scales_1.0.0 -[29] jsonlite_1.6 mime_0.6 hms_0.4.2 digest_0.6.18 stringi_1.2.4 grid_3.5.2 cli_1.0.1 -[36] tools_3.5.2 magrittr_1.5 lazyeval_0.2.1 tibble_2.0.1 crayon_1.3.4 pkgconfig_2.0.2 assertthat_0.2.0 -[43] rstudioapi_0.9.0 R6_2.3.0 compiler_3.5.2 + [1] sass_0.4.2 bit64_4.0.5 vroom_1.5.7 + [4] jsonlite_1.8.0 carData_3.0-5 bslib_0.4.0 + [7] pillar_1.8.1 backports_1.4.1 glue_1.6.2 +[10] digest_0.6.29 promises_1.2.0.1 ggsignif_0.6.3 +[13] colorspace_2.0-3 httpuv_1.6.5 plyr_1.8.7 +[16] pkgconfig_2.0.3 broom_1.0.1 purrr_0.3.4 +[19] xtable_1.8-4 scales_1.2.1 fontawesome_0.3.0 +[22] later_1.3.0 tzdb_0.3.0 farver_2.1.1 +[25] generics_0.1.3 car_3.1-1 ellipsis_0.3.2 +[28] cachem_1.0.6 withr_2.5.0 cli_3.5.0 +[31] magrittr_2.0.3 crayon_1.5.1 mime_0.12 +[34] memoise_2.0.1 evaluate_0.16 fansi_1.0.3 +[37] rstatix_0.7.2 tools_4.2.2 hms_1.1.2 +[40] lifecycle_1.0.3 munsell_0.5.0 compiler_4.2.2 +[43] jquerylib_0.1.4 rlang_1.0.6 grid_4.2.2 +[46] htmlwidgets_1.5.4 crosstalk_1.2.0 labeling_0.4.2 +[49] gtable_0.3.1 abind_1.4-5 R6_2.5.1 +[52] gridExtra_2.3 fastmap_1.1.0 bit_4.0.4 +[55] utf8_1.2.2 stringi_1.7.8 parallel_4.2.2 +[58] Rcpp_1.0.9 vctrs_0.5.1 tidyselect_1.2.0 +[61] xfun_0.32 ``` diff --git a/docs/index.md b/docs/index.md index 9e3b493..d2f82e2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,33 +12,37 @@ permalink: / Data-Driven Optimization of Mass Spectrometry Methods {: .fs-6 .fw-300} +![Python Package](https://github.com/SlavovLab/DO-MS-DIA/actions/workflows/python-package.yml/badge.svg) ![GitHub release](https://img.shields.io/github/release/SlavovLab/DO-MS.svg) ![GitHub](https://img.shields.io/github/license/SlavovLab/DO-MS.svg) -[Get started now](#getting-started){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [Download](https://github.com/SlavovLab/DO-MS/releases/latest){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [JPR Article](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [GitHub Repository](https://github.com/SlavovLab/DO-MS){: .btn .fs-5 .mb-4 .mb-md-0 } +[Get started now](#getting-started){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [Download](https://github.com/SlavovLab/DO-MS/releases/latest){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [BioArxiv Preprint](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1){: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [GitHub Repository](https://github.com/SlavovLab/DO-MS){: .btn .fs-5 .mb-4 .mb-md-0 } -![]({{site.baseurl}}/assets/images/TOC_Graphic_noOutline_abstract_v18.png){: width="70%" .center-image} +![]({{site.baseurl}}/assets/images/do-ms-dia_title_v2.png){: width="90%" .center-image} ## Aim of DO-MS -The performance of ultrasensitive liquid chromatography and tandem mass spectrometry (LC-MS/MS) methods, such as [single-cell proteomics by mass spectrometry](https://scope2.slavovlab.net/mass-spec/sensitive-mass-spectrometry-analysis), depends on multiple interdependent parameters. This interdependence makes it challenging to specifically pinpoint the sources of problems in the LC-MS/MS methods. For example, a low signal at the MS2 level can be due to poor LC separation, ionization, apex targeting, ion transfer, or ion detection. DO-MS aims to specifically diagnose such problems by interactively visualizing data from all levels of bottom-up LC-MS/MS analysis. +The performance of ultrasensitive liquid chromatography and tandem mass spectrometry (LC-MS/MS) methods, such as [single-cell proteomics by mass spectrometry](https://scope2.slavovlab.net/mass-spec/sensitive-mass-spectrometry-analysis) and [multiplexed data-independent acquisition experiments](https://plexdia.slavovlab.net/), depends on multiple interdependent parameters. This interdependence makes it challenging to specifically pinpoint the sources of problems in the LC-MS/MS methods. -## Getting Started +This applies to data-dependent acquisition (DDA) as well as to data-indepent acquisition (DIA) experiments. For example, a low signal at the MS2 level in a DDA experiment can be due to poor LC separation, ionization, apex targeting, ion transfer, or ion detection. DO-MS aims to specifically diagnose such problems by interactively visualizing data from all levels of bottom-up LC-MS/MS analysis. -Please read our detailed getting started guides: -* [Getting started on the application]({{site.baseurl}}/docs/getting-started-application) -* [Getting started on the command-line]({{site.baseurl}}/docs/getting-started-command-line) +### Installation -### Requirements +Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest) and by following the [installation instructions]({{site.baseurl}}/docs/installation). -This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16. -The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. +## Getting Started +Please read our detailed getting started guides: +* [Getting started with DIA preprocessing]({{site.baseurl}}/docs/getting-started-preprocessing) +* [Getting started with DIA reports]({{site.baseurl}}/docs/getting-started-dia-app) +* [Getting started with DDA reports]({{site.baseurl}}/docs/getting-started-dda-app) -### Installation -Install this application by downloading it from the [release page](https://github.com/SlavovLab/DO-MS/releases/latest). +### Requirements +This application has been tested on R >= 3.5.0, OSX 10.14 / Windows 7/8/10/11. R can be downloaded from the main [R Project page](https://www.r-project.org/) or downloaded with the [RStudio Application](https://www.rstudio.com/products/rstudio/download/). All modules are maintained for MaxQuant >= 1.6.0.16 and DIA-NN > 1.8.1. + +The application suffers from visual glitches when displayed on unsupported older browsers (such as IE9 commonly packaged with RStudio on Windows). Please use IE >= 11, Firefox, or Chrome for the best user experience. -### Running +### Running the Interactive Application The easiest way to run the app is directly through RStudio, by opening the `DO-MS.Rproj` Rproject file @@ -50,16 +54,6 @@ and clicking the "Run App" button at the top of the application, after opening t You can also start the application by running the `start_server.R` script. -### Automated Report Generation - -You can automatically generate PDF/HTML reports without having to launch the server by running the `do-ms_cmd.R` script, like so: - -``` -$ Rscript do-ms_cmd.R config_file.yaml -``` - -This requires a configuration file, and you can [find an example one here](https://github.com/SlavovLab/DO-MS/blob/master/example/config_file.yaml). See [Automating Report Generation]({{site.baseurl}}/docs/automation) for more details and instructions. - ### Customization DO-MS is designed to be easily user-customizable for in-house proteomics workflows. Please see [Building Your Own Modules]({{site.baseurl}}/docs/build-your-own) for more details. @@ -68,9 +62,9 @@ DO-MS is designed to be easily user-customizable for in-house proteomics workflo Please see [Hosting as a Server]({{site.baseurl}}/docs/hosting-as-server) for more details. -### Search Engines Other Than MaxQuant +### Supporting other Search Engines -This application is currently maintained for MaxQuant >= 1.6.0.16. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ]({{site.baseurl}}/docs/other-search-engines) for more details. +This application is currently maintained for (MaxQuant)[https://www.nature.com/articles/nbt.1511] >= 1.6.0.16 and (DIA-NN)[https://www.nature.com/articles/s41592-019-0638-x] >= 1.8. Adapting to other search engines is possible but not provided out-of-the-box. Please see [Integrating Other Search Engines ]({{site.baseurl}}/docs/other-search-engines) for more details. ### Can I use this for Metabolomics, Lipidomics, etc... ? @@ -80,13 +74,8 @@ While the base library of modules are based around bottom-up proteomics by LC-MS ## About the project - - The manuscript for this tool is published at the Journal of Proteome Research: [https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039](https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00039) - -The manuscript is also freely available on bioRxiv: [https://www.biorxiv.org/content/10.1101/512152v1](https://www.biorxiv.org/content/10.1101/512152v1). +The manuscript for the extended version 2.0 can be found on bioArxiv: [https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1](https://www.biorxiv.org/content/10.1101/2023.02.02.526809v1) Contact the authors by email: [nslavov\{at\}northeastern.edu](mailto:nslavov@northeastern.edu). @@ -98,18 +87,6 @@ DO-MS is distributed by an [MIT license]({{site.github_link}}/blob/master/LICENS Please feel free to contribute to this project by opening an issue or pull request in the [GitHub repository]({{site.github_link}}). -### Data - - - -### Figures/Analysis - - - ------------- ## Help! diff --git a/example/config_file_diann.yaml b/example/config_file_diann.yaml new file mode 100644 index 0000000..163656f --- /dev/null +++ b/example/config_file_diann.yaml @@ -0,0 +1,84 @@ +## DO-MS command line configuration +## for use with do-ms_cmd.R + +## Input/Output + +# paths to folders with MaxQuant txt output +input_folders: + - /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase + +# files to load from each folder +load_input_files: + - features + - report + +# match up with misc_input_files list in global.R +#misc_input_files: +# inclusion_list: /path/to/inclusion_list.txt + +output: /Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.html + +## Filters + +# regular expressions to match raw file names against +#include_files: '180614_S' +#exclude_files: '180614_S_A' + +# experiment name format string +# %i -- index of raw file +# %f -- folder name +# %e -- raw file name +exp_name_format: 'Exp %f %i' + +# optional regular expression pattern to extract +# from the experiment names after applying the format string +exp_name_pattern: '[0-9]{6}' + +# custom names for files +#exp_names: +# - Control +# - '2X' +# - '4X' +# - '10X' + +# custom order for files +#exp_order: [4,2,3,1] + +#pep_thresh: 0.01 +#pif_thresh: 0.7 +#remove_decoy: REV_ +#remove_contam: CON_ + +## Figure rendering options + +ppi: 150 + +# label font size +figure_title_font_size: 16 +# axis tick label font size +figure_axis_font_size: 12 +# facet label font size +figure_facet_font_size: 12 +# line width +figure_line_width: 1 +# show background grid +figure_show_grid: true + +## Report options + +# choices: pdf, html +report_format: html + +# choices: default, cerulean, flatly, darkly, readable, +# spacelab, united, cosmo, lumen, paper, sandstone, simplex, yeti +# previews: https://bootswatch.com/3/ +report_theme: readable + +# figure size (in inches) +report_figure_width: 5 +report_figure_height: 5 + +# figure format. choices: pdf, png +report_figure_format: png + + diff --git a/global.R b/global.R index 5a1a764..a5f0851 100644 --- a/global.R +++ b/global.R @@ -1,4 +1,4 @@ -version <- '1.0.11' +version <- '2.0.5' # check R version. required R >= 3.5.0 & R <= 4.0.2 if(as.numeric(R.Version()$major) < 4) { @@ -6,9 +6,9 @@ if(as.numeric(R.Version()$major) < 4) { } # check R version. required R <= 4.0.2 -if(as.numeric(R.Version()$minor) > 0.2) { - stop('R Version <= 4.0.2 required. Download R 4.0.2 from the CRAN page: https://cran.r-project.org/') -} +#if(as.numeric(R.Version()$minor) > 0.2) { +# stop('R Version <= 4.0.2 required. Download R 4.0.2 from the CRAN page: https://cran.r-project.org/') +#} # first, get pacman if(!'pacman' %in% installed.packages()[,'Package']) { @@ -18,7 +18,37 @@ library(pacman) # install/load dependencies p_load(shiny, shinyWidgets, shinydashboard, dplyr, tidyr, ggplot2, lattice, knitr, tibble, - reshape2, readr, rmarkdown, stats, DT, stringr, yaml, viridisLite, ggpubr) + reshape2, readr, rmarkdown, stats, DT, stringr, yaml, viridisLite, ggpubr, MASS, viridis) + +# look for pandoc - moved from start_server.R to gloabl to make sure pandoc is always available +# stolen from https://github.com/r-lib/rappdirs/blob/master/R/utils.r +get_os <- function() { + if (.Platform$OS.type == "windows") { + "win" + } else if (Sys.info()["sysname"] == "Darwin") { + "mac" + } else if (.Platform$OS.type == "unix") { + "unix" + } else { + stop("Unknown OS") + } +} +os <- get_os() + +pandoc_osx <- "/Applications/RStudio.app/Contents/MacOS/quarto/bin/tools" +pandoc_windows <- "C:\\Program Files\\RStudio\\bin\\pandoc" +pandoc_linux <- "/usr/lib/rstudio/bin/pandoc" + +# try and predict pandoc directories +if(os == 'mac' & file.exists(pandoc_osx)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_osx) +} else if (os == 'win' & file.exists(pandoc_windows)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_windows) +} else if (os == 'unix' & file.exists(pandoc_linux)) { + Sys.setenv(RSTUDIO_PANDOC=pandoc_linux) +} else { + print('pandoc could not be found in default directories. If it is not available on the system PATH then PDF report generation will fail.') +} print('Checking online for latest version of DO-MS...') # check application version @@ -57,10 +87,29 @@ tryCatch({ # load application settings config <- read_yaml('settings.yaml') +do_ms_mode <- config[['do_ms_mode']] +# check if settings.yaml contains config for do_ms_mode +if (do_ms_mode %in% names(config)){ + print(paste('DO-MS mode:',do_ms_mode, 'found in settings.yaml')) + +} else { + + stop(paste('No config section for DO-MS mode',do_ms_mode, 'found in settings.yaml')) +} + +# Add all config attributes found in the do_ms_mode specific section to the base config level. +for (i in 1:length(names(config[[do_ms_mode]]))){ + current_name <- names(config[[do_ms_mode]])[i] + + config[[current_name]] <- config[[do_ms_mode]][[current_name]] +} + +#DO-MS mode specific module path +module_path <- file.path('modules', do_ms_mode) # load tabs first -tabs <- list.dirs('modules', recursive=F, full.names=F) +tabs <- list.dirs(module_path, recursive=F, full.names=F) # remove commented-out tabs (folders that start with "__") tabs <- tabs[substr(tabs, 1, 2) != '__'] @@ -74,14 +123,14 @@ tabs <- gsub('([0-9])+(\\s|_)', '', tabs) tabs <- gsub('_', ' ', tabs) modules <- list() -# loop thru tabs and populate modules +# loop thru tabs and populate modulesapply_aliases for(i in 1:length(tabs)) { tab_path <- tab_paths[i] # put modules for this tab in its own list modules[[i]] <- list() - module_files <- list.files(file.path('modules', tab_path)) + module_files <- list.files(file.path(module_path, tab_path)) for(j in 1:length(module_files)) { module_file <- module_files[j] @@ -89,7 +138,7 @@ for(i in 1:length(tabs)) { if(substr(module_file, 1, 2) == '__') { next } # source module to load the init named list - source(file.path('modules', tab_path, module_file)) + source(file.path(module_path, tab_path, module_file)) # load the module into the module list module_name <- gsub('.R', '', module_file) modules[[i]][[j]] <- init() @@ -108,6 +157,39 @@ for(i in 1:length(tabs)) { # to get custom panel heading colors for each tab, config[['tab_colors']] <- rep(config[['tab_colors']], 10) +# load modifications +if ("modifications" %in% names(config)){ + + if (length(config[['modifications']]) > 0){ + + real_mod_vec <- c(F, F) + name_vec <- c("All", "Unmodified") + unimod_vec <- c("all", "unmodified") + + for(i in 1:length(config[['modifications']])){ + name_vec <- c(name_vec, config[['modifications']][[i]]$name) + unimod_vec <- c(unimod_vec, config[['modifications']][[i]]$unimod) + real_mod_vec <- c(real_mod_vec, T) + } + + } else { + name_vec <- c("All") + unimod_vec <- c("all") + real_mod_vec <- c(F) + } + +} else { + name_vec <- c("All") + unimod_vec <- c("all") + real_mod_vec <- c(F) +} + +config[['modification_list']] <- data.frame(name = name_vec, + unimod = unimod_vec, + real_mod = real_mod_vec, + stringsAsFactors = FALSE) + + # load app.css into string app_css <- paste(readLines(file.path('resources', 'app.css')), collapse='') # load app.js into string @@ -155,6 +237,70 @@ theme_base <- function(input=list(), show_legend=F) { return(.theme) } +theme_diann <- function(input=list(), show_legend=F) { + + # default values + axis_font_size <- ifelse(is.null(input[['figure_axis_font_size']]), + 10, input[['figure_axis_font_size']]) + title_font_size <- ifelse(is.null(input[['figure_title_font_size']]), + 12, input[['figure_title_font_size']]) + facet_font_size <- ifelse(is.null(input[['figure_facet_font_size']]), + 10, input[['figure_facet_font_size']]) + + show_grid <- ifelse(is.null(input[['figure_show_grid']]), + TRUE, input[['figure_show_grid']]) + + .theme <- theme(text = element_text(face="bold", size=12, colour = "grey40"), + panel.grid.major = element_line(colour = "grey80", size = 0.4), + axis.ticks = element_line(colour = "grey80", size = 0.4), + panel.grid.minor.x = element_blank(), + panel.grid.minor = element_blank(), + #panel.background = element_rect(fill = NA), + axis.text = element_text(colour = "grey40", face = "bold", size = axis_font_size), + axis.text.x = element_text(angle=45, hjust=1, margin=margin(r=45)), + axis.line = element_blank(), + axis.title=element_text(size=title_font_size, colour = "grey20"), + strip.background = element_rect(colour = NA, fill = "grey90"), + strip.text = element_text(colour = "grey20", face = "bold", size = facet_font_size), + legend.text = element_text(colour = "grey40", face = "bold", size = 12), + legend.title = element_text(colour = "grey40", face = "bold", size = 12), + panel.background = element_rect(fill="white", colour = "white") + ) + if(!show_legend) { + .theme <- .theme + theme(legend.position="none") + } + + if(show_grid) { + .theme <- .theme + theme( + panel.grid.major = element_line(size=0.25, linetype="solid", color="lightgrey"), + panel.grid.minor = element_line(size=0.25, linetype="solid", color="lightgrey") + ) + } else { + .theme <- .theme + theme( + panel.grid.major = element_blank(), + panel.grid.minor = element_blank() + ) + } + + return(.theme) + + .theme <- .theme + theme(text = element_text(face="bold", size=12, colour = "grey40"), + panel.grid.major = element_line(colour = "grey80", size = 0.4), + axis.ticks = element_line(colour = "grey80", size = 0.4), + panel.grid.minor.x = element_blank(), + panel.grid.minor = element_blank(), + #panel.background = element_rect(fill = NA), + axis.text = element_text(colour = "grey40", face = "bold", size = axis_font_size), + axis.line = element_blank(), + axis.title=element_text(size=title_font_size, colour = "grey20"), + strip.background = element_rect(colour = NA, fill = "grey90"), + strip.text.x = element_text(colour = "grey20", face = "bold", size = facet_font_size), + legend.text = element_text(colour = "grey40", face = "bold", size = 12), + legend.title = element_text(colour = "grey40", face = "bold", size = 12)) + + return(.theme) +} + downloadButtonFixed <- function(outputId, label = "Download", class = NULL, ...) { aTag <- tags$a( @@ -247,4 +393,200 @@ sanitize_text_output <- function(text) { text } +print("global.R") +# create a new column with the chemical label +map_label <- function(sequence, labelsdata){ + + label = '' + + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + if (grepl( current_label, sequence, fixed = TRUE)){ + label <- current_label + } + + } + return(label) +} + +count_pattern <- function(string, pattern){ + occurence <- str_count(string, pattern = pattern) + if (occurence > 0){ + return(pattern) + } else { + return("Unmodified") + } +} + +#returns the seperator for a path +get_seperator <- function(instring){ + forward_count <- str_count(instring, "/") + backward_count <- str_count(instring, "\\\\") + + seperator <- if (forward_count > backward_count) "/" else "\\\\" + return(seperator) +} + +# accepts an MS1.extracted style matrix dataframe and returns an report.tsv style dataframe +# conversion of the matrix based format to a row based format allows to use the same satatistics as with the report.tsv +ms1_extracted_to_report <- function(.input_df){ + + # for debugging + #.input_df <- as.data.frame(read_tsv(file='G:/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16_d/Report.pr_matrix_channels_ms1_extracted.tsv',guess_max=1e5)) + + #.input_df <- as.data.frame(read_tsv(file='/Volumes/GoogleDrive/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16_raw/report.pr_matrix_channels_ms1_extracted.tsv.txt',guess_max=1e5)) + + # get slash direction. Last element is always path + seperator <- get_seperator(tail(colnames(.input_df), n=1)) + + # get a vector of all column names which do not contain a slash + slash_occurences <- str_count(colnames(.input_df), seperator) + last_non_path_index <- max(which(slash_occurences == 0)) + dont_pivot <- colnames(.input_df)[0:last_non_path_index] + + print(dont_pivot) + .input_df <- .input_df %>% pivot_longer(cols = !all_of(dont_pivot), names_to='File.Name.Conv', values_to = "val") + print("done") + basename_filename <- matrix(unlist(strsplit(.input_df$File.Name.Conv, paste0(seperator,"\\s*(?=[^",seperator,"]+$)"), perl=TRUE)), ncol=2,byrow=TRUE) + + .input_df <- .input_df %>% dplyr::mutate(MS1.Name = basename_filename[,2]) + + + + # Old DIA-NN versions contain a Q.Value column in the MS1 extracted. + # New versions have a .Quality column + # checking the occurrences of both strings is used to determine the verson. + qval_count <- length(grep(".QValue", .input_df$MS1.Name, fixed=TRUE)) + quality_count <- length(grep(".Quality", .input_df$MS1.Name, fixed=TRUE)) + + if (qval_count > quality_count){ + ms1_extracted_mode = '.QValue' + } else { + ms1_extracted_mode = '.Quality' + } + print(paste('Ms1_extracted mode:', ms1_extracted_mode)) + + # Match all rows which contain .QValue at the end of the file name + .input_df <- .input_df %>% dplyr::mutate(Q = grepl(ms1_extracted_mode, MS1.Name, fixed=TRUE)) + + + identifier_comp <- matrix(unlist(strsplit(.input_df$MS1.Name, "-\\s*(?=[^-]+$)", perl=TRUE)), ncol=2,byrow=TRUE) + + # File.Name eLK002.raw + .input_df$File.Name <- identifier_comp[,1] + + # Raw.file eLK002.raw + .input_df$Raw.File <- matrix(unlist(strsplit(.input_df$File.Name, "\\.\\s*(?=[^\\.]+$)", perl=TRUE)), ncol=2,byrow=TRUE)[,1] + + .input_df$Channel <- strtoi(str_extract(identifier_comp[,2], '[0-9]+')) + + # create unique identifier for merging Q values to intensities + .input_df <- .input_df %>% + dplyr::mutate(Identifier = paste(Raw.File, Channel, Precursor.Id, sep='_')) + + # create seperate dataframes for Q-values and intensities + .input_df.val <- .input_df %>% + dplyr::filter(!Q) %>% + dplyr::mutate(val = replace_na(val, 0)) %>% + dplyr::rename(Ms1.Area = val) + + q_default <- if (ms1_extracted_mode == '.Qvalue') 1 else 0 + + .input_df.Q <- .input_df %>% + dplyr::filter(Q) %>% + dplyr::select(Identifier, val) %>% + dplyr::mutate(val = replace_na(val, q_default)) %>% + dplyr::rename(Quality = val) + + # Append q-value by joining the datasets + .input_df <- .input_df.val %>% + inner_join(.input_df.Q, by='Identifier') + + # create Precursor.Id with channel information + # create Modified.Sequence with channel information + # remove temporary columns + .input_df <- .input_df %>% + dplyr::mutate(Precursor.Id = str_replace_all(Precursor.Id, "(?<=\\()mTRAQ(?=\\))", paste0('mTRAQ',Channel))) %>% + dplyr::mutate(Modified.Sequence= str_replace_all(Modified.Sequence, "(?<=\\()mTRAQ(?=\\))", paste0('mTRAQ',Channel))) %>% + dplyr::select(-c('Q','File.Name.Conv','MS1.Name','Channel')) + + return(.input_df) +} + +# Translates new (post 1.8.1 b12) channels to the old format +# Old Channels were denoted like (mTRAQ0) new ones are denoted like (mTRAQ-K-0) +translate_diann_channel_format <- function(.input_df, columns = c("Precursor.Id","Modified.Sequence")){ + if (length(columns) < 1){ + print('translate_diann_channel_format, no columns specified') + return(.input_df) + } + + if (nrow(.input_df) < 1){ + print('translate_diann_channel_format, dataframe is empty') + return(.input_df) + } + + # check if channel is in old format + test_precursor <- .input_df[[columns[1]]][[1]] + label_occurences <- str_count(test_precursor, 'mTRAQ-[a-zA-Z]-') + if(label_occurences == 0){ + return(.input_df) + } + + + for (column in columns) { + .input_df[[column]] = sapply(.input_df[[column]], .update_channel) + } + + + return(.input_df) + +} + +.update_channel <- function(sequence){ + groups <- str_match_all(sequence, "mTRAQ-([a-zA-Z])-([0-9]+)") + + if (length(groups) > 0 ){ + groups <- groups[[1]] + + for(i in 1:nrow(groups)){ + sequence <- str_replace_all(sequence, groups[i,1], paste0('mTRAQ',groups[i,3])) + } + } + + return(sequence) +} + +custom_colors = c("#e8411c", "#f7c12a", "#329ebf",'#51c473','#c355d4','#6e6e6e',"#e8411c", "#f7c12a", "#329ebf",'#51c473','#c355d4','#6e6e6e') +custom_theme = + +separate_channel_info <- function(df){ + channels <- config[['channels']] + + df$Label <- sapply(df$Precursor.Id, .get_channel, channels ) + + for (channel in channels) { + mod <- channel[['modification']] + df$Precursor.Id <- gsub(paste0('\\Q',mod,'\\E'),'',df$Precursor.Id) + } + + return(df) +} + +.get_channel <- function(sequence, channeldata){ + + label = '' + + for (channel in channeldata) { + current_label = channel[['name']] + mod = channel[['modification']] + if (grepl( mod, sequence, fixed = TRUE)){ + label <- current_label + } + } + + return(label) + +} diff --git a/modules/dia-nn/005_Summary/summary_001_experiments.R b/modules/dia-nn/005_Summary/summary_001_experiments.R new file mode 100644 index 0000000..0b73966 --- /dev/null +++ b/modules/dia-nn/005_Summary/summary_001_experiments.R @@ -0,0 +1,46 @@ +init <- function() { + + type <- 'text' + box_title <- 'DIA-NN Experiments' + help_text <- 'DIA-NN Experiments' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload parameters.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']] + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, File.Name) %>% + dplyr::summarise(n=n()) + + outstr <- '' + for (i in 1:length(plotdata$Raw.file)){ + str <- paste(plotdata$Raw.file[[i]], plotdata$File.Name[[i]]) + outstr <- paste0(outstr, str, sep='\n \n') + + } + + outstr + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} + diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R b/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R new file mode 100644 index 0000000..9699b6c --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_08_ms1_fill_time_dist.R @@ -0,0 +1,59 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 Fill Time Distribution' + help_text <- 'Ms1 fill times along gradient' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + plotdata <- plotdata[plotdata$RT.Start > config[['RT.Start']], ] + + plotdata <- plotdata %>% + filter(Ms.Level == 1) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + hist_bins <- 15 + + maxFT <- max(plotdata$Fill.Time) + minFT <- min(plotdata$Fill.Time) + rangeFT <- maxFT - minFT + minFT <- minFT - rangeFT/(15) + maxFT <- maxFT + rangeFT/(15) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + xlim(minFT, maxFT) + + geom_histogram(aes(x=Fill.Time), bins=15, fill=custom_colors[[6]])+ + labs(x='Fill times in ms', y=' Number of Scans') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R b/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R new file mode 100644 index 0000000..5bcc63d --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_09_ms1_fill_times.R @@ -0,0 +1,74 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 Fill Times along Gradient' + help_text <- 'The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + binned_data <- plotdata %>% + filter(Ms.Level == 1) %>% + mutate(bin = ntile(RT.Start, 15)) + + sum_data <- binned_data %>% + group_by(bin, Raw.file) %>% + summarise(mean = mean(Fill.Time), + sd = sd(Fill.Time), + RT.Mean = mean(RT.Start), + max = max(Fill.Time), .groups = "drop") %>% + mutate(lower = mean - sd/2, upper = pmin(mean + sd/2, max), type='sum') + + joined_data <- binned_data %>% + full_join(sum_data, by=c("bin", 'Raw.file')) %>% + mutate(type = 'joined') + + + plotdata <- list("sum" = sum_data, "joined" = joined_data) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata$sum) > 1), paste0('No Rows selected'))) + + sum_data <- plotdata$sum + joined_data <- plotdata$joined + + ggplot(sum_data) + + geom_point(data=joined_data, aes(x = RT.Start, y = Fill.Time), fill = 'grey80', color = "grey80") + + geom_ribbon(aes(ymin = lower, ymax = upper, x = RT.Mean), fill = custom_colors[[3]], alpha=0.6) + + geom_line(aes(y = mean, x = RT.Mean), color = custom_colors[[1]], size = 1) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(y='Fill Times in ms', x='Retention Time in minutes') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R b/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R new file mode 100644 index 0000000..fc94c2f --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_10_ms1_tic.R @@ -0,0 +1,62 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms1 total Ion Current along Gradient' + help_text <- 'The toal Ion Current (TIC) is shown for bins along the retention time gradient.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['tic']],paste0('Upload tic.tsv'))) + validate(need(config[['RT.Start']],paste0('Please specify RT.Start parameter in the settings'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['tic']] + + # Apply retention time filter as specified in settings.yaml + tic.matrix <- plotdata %>% + filter(Retention.time > config[['RT.Start']]) %>% + filter(Retention.time < config[['RT.End']]) + + tic.mean <- tic.matrix %>% + group_by(Raw.file, Retention.time) %>% + summarise(mean = weighted.mean(MZ, TIC), .groups = "drop") + + plotdata <- list("matrix" = tic.matrix, "mean" = tic.mean) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + tic.matrix <- plotdata$matrix + tic.mean <- plotdata$mean + + ggplot(tic.matrix) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_tile(aes(Retention.time, MZ, fill= log10(TIC))) + + scale_fill_viridis(discrete=FALSE) + + geom_line(data = tic.mean, aes(y = mean, x =Retention.time, color = "TIC weighted m/z"), size = 1)+ + labs(y='m/z', x='Retention Time in minutes') + + scale_color_manual(name = "", values = c( "TIC weighted m/z" = custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R b/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R new file mode 100644 index 0000000..490114b --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_11_ms2_fill_time_dist.R @@ -0,0 +1,57 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Time Distribution' + help_text <- 'Ms2 fill times along gradient' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + plotdata <- plotdata[plotdata$RT.Start > config[['RT.Start']], ] + + plotdata <- plotdata %>% + filter(Ms.Level == 2) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxFT <- max(plotdata$Fill.Time) + minFT <- min(plotdata$Fill.Time) + rangeFT <- maxFT - minFT + minFT <- minFT - rangeFT/(15) + maxFT <- maxFT + rangeFT/(15) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Fill times in ms', y='Scans') + + geom_histogram(aes(x=Fill.Time), bins=15, fill=custom_colors[[6]])+ + labs(x='Fill times in ms', y=' Number of Scans') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R b/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R new file mode 100644 index 0000000..2ea6b37 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_12_ms2_fill_times.R @@ -0,0 +1,79 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Times along Gradient' + help_text <- 'The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload fill_times.txt'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['fill_times']] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + binned_data <- plotdata %>% + filter(Ms.Level == 2) %>% + mutate(bin = ntile(RT.Start, 15)) + + sum_data <- binned_data %>% + group_by(bin, Raw.file) %>% + summarise(mean = mean(Fill.Time), + sd = sd(Fill.Time), + RT.Mean = mean(RT.Start), + max = max(Fill.Time), + .groups = "drop") %>% + mutate(lower = mean - sd/2, upper = pmin(mean + sd/2, max), type='sum') + + joined_data <- binned_data %>% + full_join(sum_data, by=c("bin", 'Raw.file')) %>% + mutate(type = 'joined') + + + plotdata <- list("sum" = sum_data, "joined" = joined_data) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + + validate(need((nrow(plotdata$sum) > 1), paste0('No Rows selected'))) + + + sum_data <- plotdata$sum + joined_data <- plotdata$joined + + maxRT <- max(sum_data$RT.Mean) + + ggplot(sum_data) + + geom_point(data=joined_data, aes(x = RT.Start, y = Fill.Time), fill = 'grey80', color = "grey80") + + geom_ribbon(aes(ymin = lower, ymax = upper, x = RT.Mean), fill = custom_colors[[3]], alpha=0.6) + + geom_line(aes(y = mean, x = RT.Mean), color = custom_colors[[1]], size = 1) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(y='Fill Times in ms', x='Retention Time in minutes') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R b/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R new file mode 100644 index 0000000..38bd750 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/Instrument_13_ms2_fill_time_matrix.R @@ -0,0 +1,89 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Ms2 Fill Time Matrix' + help_text <- 'The average Ms2 fill times are shown across the gradient for every distinct Ms2 window.' + source_file <- 'fill_times' + + .validate <- function(data, input) { + validate(need(data()[['fill_times']],paste0('Upload tic.tsv'))) + validate(need(data()[['fill_times']][['Window.Lower']],paste0('Your fill_times.tsv file does not contain the Window.Lower column for the isolation window. Please make sure that you use the latest version of DO-MS DIA.'))) + validate(need(data()[['fill_times']][['Window.Upper']],paste0('Your fill_times.tsv file does not contain the Window.Upper column for the isolation window. Please make sure that you use the latest version of DO-MS DIA.'))) + validate(need(config[['RT.Start']],paste0('Please specify RT.Start parameter in the settings'))) + validate(need(config[['RT.End']],paste0('Please specify RT.End parameter in the settings'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['fill_times']] + + plotdata <- plotdata %>% + filter(RT.Start > config[['RT.Start']]) %>% + filter(RT.Start < config[['RT.End']]) + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Documents/testdaten/2022_06_24_MS2_number_wGW011-wGW017/fill_times.tsv',guess_max=1e5)) + + plotdata <- plotdata %>% + filter(Ms.Level > 1) + + num_rt_bins = 10 + + rt_start = min(plotdata$RT.Start) + rt_end = max(plotdata$RT.Start) + + # Create RT column which has labels for N RT bins + plotdata <- plotdata %>% + mutate(RT = round((RT.Start - rt_start)/(rt_end-rt_start)*num_rt_bins)) + + # calculate mean fill times + plotdata <- plotdata %>% + group_by(Raw.file, Ms.Level, Window.Lower, Window.Upper, RT) %>% + summarise(Fill.Time = mean(Fill.Time), .groups = "drop") + + # Transform RT bins back to real RTs + plotdata <- plotdata %>% + mutate(RT = RT/num_rt_bins*(rt_end-rt_start)+rt_start) + + # create label columns eg. 100 - 200 (mz) + plotdata <- plotdata %>% + mutate(Label = paste0(Window.Lower, '-', Window.Upper)) %>% + mutate(Window = (Window.Lower + Window.Upper)/2) + + label_index = sort(plotdata$Window, index.return=TRUE) + + label_sorted = plotdata$Label[label_index$ix] + label_unique = unique(label_sorted) + + plotdata$Label = factor(plotdata$Label, levels = label_unique) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata) + + facet_wrap(~Raw.file, nrow = 1, scales = "free") + + geom_tile(aes(RT, Label, fill= Fill.Time)) + + scale_fill_viridis(discrete=FALSE) + + labs(y='m/z', x='Retention Time in minutes', fill='Fill time (ms)') + + theme(legend.position = "bottom",legend.key.width = unit(2.5, "cm"), axis.text.y = element_text(angle = 45, vjust = 0.5, hjust=1, size=10))+ + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + box_width=12, + plot_height=300, # pixels + dynamic_width=300, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R b/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R new file mode 100644 index 0000000..a6a5ccd --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_01_DIA_MS1_int_ID_Label.R @@ -0,0 +1,66 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Channel wise MS1 Intensity for Precursors' + help_text <- 'Plotting the MS1 intensity for all precursors which were associated with one of the defined channels.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label')] + + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Intensity, color = Label)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + scale_fill_manual(name = "plexDIA Label:", values = custom_colors)+ + scale_color_manual(name = "plexDIA Label:", values = custom_colors)+ + theme_diann(input=input, show_legend=T)+ + theme(legend.position = "bottom") + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R b/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R new file mode 100644 index 0000000..4e1e82c --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_02_IDs_by_RT.R @@ -0,0 +1,54 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors Identified across Gradient' + help_text <- 'Precursor are plotted across the chromatographic gradient.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Please provide a report.tsv file'))) + } + + .plotdata <- function(data, input) { + + + + plotdata <- data()[['report']][,c('Raw.file', 'Retention.time')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(Retention.time > config[['RT.Start']]) %>% + filter(Retention.time < config[['RT.End']]) + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Retention.time)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='Retention Time (min)', y='Number of Precursors') + + scale_color_manual(values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme_diann(input=input, show_legend=T) + + } + + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R b/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R new file mode 100644 index 0000000..4b348f3 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_03_DIA_MS1_int_ID.R @@ -0,0 +1,68 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Intensity summed over all Channels.' + help_text <- 'Plotting the MS1 intensity for all precursors summed over all channels.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' )] + + + + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + medianData = plotdata %>% group_by(Raw.file) %>% + summarise(median = median(Intensity), .groups = "drop") + + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R b/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R new file mode 100644 index 0000000..990fff7 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_04_DIA_MS1_Intersected_int.R @@ -0,0 +1,107 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Intensity for Intersected Precursors, summed over all Channels.' + help_text <- 'Plotting the MS1 intensity for all precursors summed over all channels. Only intersected precursors present in all loaded experiments are shown.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + } + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity <- log10(plotdata$Ms1.Area) + + # Remove runs with less than 200 IDs + r_t<-table(plotdata$Raw.file) + r_rmv<-names(r_t)[r_t<2] + plotdata<-plotdata[!plotdata$Raw.file%in%r_rmv, ] + + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + + #Assemble list of peptides in each Raw file + plotdata$Raw.file <- factor(plotdata$Raw.file) + expsInDF <- unique(plotdata$Raw.file) + peplist <- list() + for (i in 1:length(expsInDF)){ + peptidesDF <- dplyr::filter(plotdata, Raw.file == expsInDF[i]) + peptides <- dplyr::select(peptidesDF, Precursor.Id) + peplist[[i]] <- peptides + } + + #Get intersection of all peptides + intersectList <- as.vector(Reduce(intersect, peplist)) + + #Get reduced dataframe for elements that match intersection + plotdata_Intersected <- dplyr::filter(plotdata, Precursor.Id %in% intersectList$Precursor.Id) + + plotdata <- plotdata_Intersected + } + return(plotdata) + + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need(nrow(plotdata) > 19, paste0('Less than 20 peptides in common')), + need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + medianData = plotdata %>% group_by(Raw.file) %>% + summarise(median = median(Intensity), .groups = "drop") + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + geom_text(data=medianData, + aes(label=paste0("median: ", round(median,2))), x = -Inf, y = -Inf, colour=custom_colors[[1]], + hjust = 0, vjust=0)+ + geom_vline(data=medianData, aes(xintercept = median),col=custom_colors[[1]],size=1) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Precursor Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R b/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R new file mode 100644 index 0000000..7abacf9 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_05_DIA_MS1_Intersected_int_Norm.R @@ -0,0 +1,112 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Normalized MS1 Intensity for Intersected Precursors' + help_text <- 'Plotting the MS1 Intensity for intersected precursors summed over all channels. Experiments are normalized to the first experiment. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + } + + plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + plotdata$Intensity = plotdata$Ms1.Area + + # Remove runs with less than 200 IDs + r_t<-table(plotdata$Raw.file) + r_rmv<-names(r_t)[r_t<2] + plotdata<-plotdata[!plotdata$Raw.file%in%r_rmv, ] + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + + #Assemble list of peptides in each Raw file + plotdata$Raw.file <- factor(plotdata$Raw.file) + expsInDF <- levels(plotdata$Raw.file) + peplist <- list() + for (i in 1:length(expsInDF)){ + peptidesDF <- dplyr::filter(plotdata, Raw.file == expsInDF[i]) + peptides <- dplyr::select(peptidesDF, Precursor.Id) + peplist[[i]] <- peptides + } + + #Get intersection of all peptides + intersectList <- as.vector(Reduce(intersect, peplist)) + + #Get reduced dataframe for elements that match intersection + plotdata_Intersected <- dplyr::filter(plotdata, Precursor.Id %in% intersectList$Precursor.Id) + + + + plotdata <- plotdata_Intersected %>% group_by(Raw.file, Precursor.Id) %>% + summarize(IntAvg=mean(Intensity, na.rm = TRUE), .groups = "drop") + + plotdata.w <- reshape2::dcast(plotdata, Precursor.Id ~ Raw.file, value = "IntAvg") + baselineInd <- ncol(plotdata.w) + 1 + plotdata.w$baseline <- plotdata.w[,2] + + for (j in 2:(baselineInd-1)){ + plotdata.w[,j] <- log10(plotdata.w[,j])-log10(plotdata.w[,baselineInd]) + + } + + plotdata.w.m <- melt(plotdata.w) + plotdata.w.m.clean <- plotdata.w.m %>% filter(!(variable == "baseline")) + colnames(plotdata.w.m.clean) <- c("Precursor.Id","Raw.file","Intensity") + plotdata <- plotdata.w.m.clean + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need(nrow(plotdata) > 19, paste0('Less than 20 peptides in common'))) + + ggplot(plotdata, aes(Intensity)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + xlim(c(-3,3))+ + coord_flip() + + labs(x=expression(bold('Log'[10]*' Intensity Relative to first experiment')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R b/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R new file mode 100644 index 0000000..8ecb113 --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_06_charge_count.R @@ -0,0 +1,78 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Precursors by Charge State' + help_text <- 'Number of precursors observed during MS1 scans by charge state' + source_file <- 'report' + + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Precursor.Charge','Ms1.Area','Precursor.Id')] + labelsdata <- config[['ChemicalLabels']] + + # iterate over labels and calculate label-less ID + for (i in 1:length(labelsdata)){ + current_label = labelsdata[[i]] + + # subtract all label modifications, but not other modifications + plotdata$Precursor.Id = gsub(paste0('\\Q',current_label,'\\E'),'',plotdata$Precursor.Id) + + } + + plotdata <- plotdata %>% + group_by(Raw.file, Precursor.Id) %>% + summarise(Ms1.Area = sum(Ms1.Area), Charge = first(Precursor.Charge), .groups = "drop") + + plotdata <- dplyr::filter(plotdata, Ms1.Area>0) + + plotdata$Charge[plotdata$Charge > 3] <- 4 + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Charge) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata) + + geom_bar(aes(x=factor(Charge), y=n, fill=factor(Charge), colour=factor(Charge)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + scale_fill_hue(labels=c('1', '2', '3', '>3')) + + labs(x='Experiment', y='Count', fill='Charge State') + + theme_diann(input=input, show_legend=T)+ + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank())+ + theme(legend.position = "bottom")+ + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R b/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R new file mode 100644 index 0000000..636893b --- /dev/null +++ b/modules/dia-nn/020_Ion_Sampling/instrument_14_DIA_MS1_counts.R @@ -0,0 +1,67 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Channel wise MS1 Copy Number for Precursors' + help_text <- 'Plotting the MS1 copy numbers for all precursors which were associated with one of the defined channels. + The copy numbers are calculated using the signal to noise ratio as described in Derks et al. 2022. By default a resolution of 70,000 is used during preprocessing. It can be changed with the --resolution parameter' + source_file <- 'sn' + + .validate <- function(data, input) { + validate(need(data()[['sn']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['sn']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + + } + + .plotdata <- function(data, input) { + plotdata <- data()[['sn']][,c('Raw.file','Precursor.Id', 'Copy.Number')] + plotdata <- translate_diann_channel_format(plotdata, columns = c("Precursor.Id")) + plotdata <- separate_channel_info(plotdata) + + plotdata$Intensity <- log10(plotdata$Copy.Number) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Intensity, color = Label)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Copy Number')), y='Number of Precursors') + + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R b/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R new file mode 100644 index 0000000..dc2b730 --- /dev/null +++ b/modules/dia-nn/030_Identifications/00_PEP_colored_cumulative_update.R @@ -0,0 +1,102 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Confident Precursor Identifications' + help_text <- 'Plotting the number of precursors identified at each given confidence level.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload evidence.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'PEP')] + + plotdata <- plotdata[complete.cases(plotdata[ , 'PEP']),] + + # build log10 PEP vector + peps <- seq(log10(max(c(min(plotdata$PEP)), 1e-8)), log10(max(plotdata$PEP)), length.out=500) + peps <- c(log10(.Machine$double.xmin), peps) + + + plotdata <- plotdata %>% + dplyr::mutate(bin=findInterval(PEP, 10**peps)) + + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, bin) + + + plotdata <- plotdata %>% + dplyr::summarise(n=dplyr::n()) + + + + plotdata <- plotdata %>% + dplyr::mutate(cy=cumsum(n)) + + + + plotdata$pep = 10**peps[plotdata$bin+1] + + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + # Rank the Experiments by most number of peptides observed + + maxnum <- c() + rawnames <- c() + + for(X in unique(plotdata$Raw.file)){ + maxnum <- c(maxnum, max(plotdata$cy[plotdata$Raw.file %in% X]) ) + rawnames <- c(rawnames, X) + } + + names(maxnum) <- rawnames + rank_exp <- maxnum[order(maxnum, decreasing = T)] + rank_exp_ord <- seq(1, length(rank_exp),1) + names(rank_exp_ord) <- names(rank_exp) + plotdata$rank_ord <- NA + + for(X in levels(plotdata$Raw.file)) { + plotdata$rank_ord[plotdata$Raw.file %in% X] <- rank_exp_ord[X] + } + + cc <- scales::seq_gradient_pal('red', 'blue', 'Lab')(seq(0, 1, length.out=length(rank_exp_ord))) + + ggplot(plotdata, aes(x=pep, color=Raw.file, y=cy, group=Raw.file)) + + geom_line(size = input$figure_line_width) + + scale_colour_manual(name='Experiment', values=cc, labels=levels(plotdata$Raw.file)) + + coord_flip() + + scale_x_log10(limits=c(.000009,max(plotdata$pep)), breaks=c(.00001,.0001,.001,.01,.1), + labels=scales::trans_format('log10', scales::math_format(10^.x))) + + xlab('PEP') + ylab('Number of Precursors') + + theme_diann(input=input, show_legend=T) + + theme(panel.grid.major.x = element_line(colour = "grey80", size = 0.4), + legend.position='right', + legend.key=element_rect(fill='white')) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + #plot_height=500, # pixels + report_plot_width=7, # inches + report_plot_height=5 # inches + )) +} diff --git a/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R b/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R new file mode 100644 index 0000000..5d75910 --- /dev/null +++ b/modules/dia-nn/030_Identifications/01_ms1vsms2_IDs.R @@ -0,0 +1,65 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors by Quantification Strategy' + help_text <- 'Number of precursors found based on quantification startegy. Ms2 precursors are counted based on Precursor.Quantity > 0 and Ms1 precursors are counted based on Ms1.Area > 0.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file','Precursor.Quantity', 'Ms1.Area')] + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Documents/testdaten/2022_06_24_MS2_number_wGW011-wGW017/report.tsv',guess_max=1e5)) + + full_counts = plotdata %>% + filter(Precursor.Quantity > 0) %>% + group_by(Raw.file) %>% + tally() %>% + mutate(Label='MS2 Quantified') + + ms1_counts <- plotdata %>% + filter(Ms1.Area > 0)%>% + group_by(Raw.file) %>% + tally() %>% + mutate(Label='MS1 Quantified') + + outdf = rbind(full_counts, ms1_counts) + outdf$Label = factor(outdf$Label, levels=c('MS2 Quantified','MS1 Quantified')) + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata, aes(x=Label, y=n, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors', fill='Quantification', colour='Quantification') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom", + axis.text.x = element_blank(), + axis.ticks = element_blank()) + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/030_Identifications/02_modified_IDs.R b/modules/dia-nn/030_Identifications/02_modified_IDs.R new file mode 100644 index 0000000..13a777c --- /dev/null +++ b/modules/dia-nn/030_Identifications/02_modified_IDs.R @@ -0,0 +1,116 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Precursors by Modification' + help_text <- 'Number of precursors found based on modification types specified' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + validate(need(config[['modifications']], paste0('Please provide a list of modifications under the key: Modifications in the settings.yaml file'))) + } + + + + .get_occurence <- function(string, pattern){ + occurence <- str_count(string, pattern = pattern) + if (occurence > 0){ + return(pattern) + } else { + return("Unmodified") + } + } + + .plotdata <- function(data, input) { + + #retrive modifications before report, so columns can be selected + modsdata <- config[['modification_list']] + modsdata <- modsdata[modsdata$real_mod, ] + + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', modsdata[['unimod']], 'mod_sum' )] + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + labelsdata <- config[['ChemicalLabels']] + + + + outdf <- data.frame(Raw.file=character(), + Modification=character(), + Identifications=integer(), + stringsAsFactors=FALSE) + + # Iterate over all modifications + for (i in 1:nrow(modsdata)){ + + current_mod <- modsdata$unimod[[i]] + + plotdata_mod <- plotdata[plotdata[[current_mod]] > 0, ] + + plotdata_mod <- plotdata_mod %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + + plotdata_mod$Modification <- modsdata$name[[i]] + + outdf <- rbind(outdf, plotdata_mod) + } + + + plotdata_total <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + plotdata_total$Modification <- 'Total' + + outdf <- rbind(outdf, plotdata_total) + + plotdata_unmod <- plotdata[plotdata$mod_sum < 1, ] + plotdata_unmod <- plotdata_unmod %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), .groups = "drop") + plotdata_unmod$Modification <- 'Unmodified' + outdf <- rbind(outdf, plotdata_unmod) + + + levels_plot = c(modsdata[['name']],'Unmodified','Total') + outdf <- within(outdf, + Modification <- factor(Modification, levels=levels_plot)) + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Modification, y=Identifications, fill=Modification, colour=Modification)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + #theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + labs(x='', y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom", + axis.text.x = element_blank(), + axis.ticks = element_blank())+ + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/030_Identifications/05_miscleavage_table.R b/modules/dia-nn/030_Identifications/05_miscleavage_table.R new file mode 100644 index 0000000..e419c86 --- /dev/null +++ b/modules/dia-nn/030_Identifications/05_miscleavage_table.R @@ -0,0 +1,83 @@ +init <- function() { + + type <- 'table' + box_title <- 'Miscleavage Rate (percentage), PEP < 0.01' + help_text <- 'Miscleavage rate (percentage) for precursors identified with confidence PEP < 0.01' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages.R = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="R") + plotdata$Missed.cleavages.K = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="K") + plotdata$Missed.cleavages = plotdata$Missed.cleavages.R + plotdata$Missed.cleavages.K + + # group by raw file and number of missed cleavages, wrangle data + plotdata <- plotdata %>% + dplyr:: filter(!is.na(Missed.cleavages)) %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + max_missed = max(plotdata$Missed.cleavages) + + plotdata <- plotdata %>% + tidyr::spread(Missed.cleavages, n) + + plotdata[is.na(plotdata)] = 0 + + if (max_missed==1){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1`) / (`0` + `1`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else if (max_missed==2){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1` + 2*`2`) / (`0` + `1` + 2*`2`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else if (max_missed==3){ + plotdata <- plotdata %>% + dplyr::mutate(`% Missed cleavages`=(`1` + 2*`2`+ 3*`3`) / (`0` + `1` + 2*`2`+ 3*`3`) * 100) %>% + dplyr::rename(None='0') + return(plotdata) + + } else { + return("Not supported for more than 2 missed cleavage sites") + } + + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + plotdata + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} diff --git a/modules/dia-nn/030_Identifications/10_miscleavage_K.R b/modules/dia-nn/030_Identifications/10_miscleavage_K.R new file mode 100644 index 0000000..49de757 --- /dev/null +++ b/modules/dia-nn/030_Identifications/10_miscleavage_K.R @@ -0,0 +1,64 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Miscleavage Rate (K), PEP < 0.01' + help_text <- 'Plotting frequency of lysine miscleavages in confidently identified precursors.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="K") + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% dplyr::filter(!is.na(Missed.cleavages)) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=factor(Missed.cleavages), y=n, fill=factor(Missed.cleavages), colour=factor(Missed.cleavages)), + stat='identity', position='dodge2') + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_bar(stat="identity", alpha=0.7) + + labs(x='Missed K Cleavages', y='Count') + + theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom")+ + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank()) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/10_miscleavage_R.R b/modules/dia-nn/030_Identifications/10_miscleavage_R.R new file mode 100644 index 0000000..3015bc5 --- /dev/null +++ b/modules/dia-nn/030_Identifications/10_miscleavage_R.R @@ -0,0 +1,66 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Miscleavage Rate (R), PEP < 0.01' + help_text <- 'Plotting frequency of arginine miscleavages in confidently identified precursors.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .get_internal_occurence <- function(string, char){ + occurence <- str_count(string, pattern = paste0(char,".")) + occurence_w_proline <- str_count(string, pattern = paste0(char,"P")) + return(occurence-occurence_w_proline) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Stripped.Sequence', 'PEP','Ms1.Area')] + plotdata <- plotdata[plotdata$PEP<0.01, ] + plotdata <- plotdata[plotdata$Ms1.Area>0, ] + + plotdata$Missed.cleavages = sapply(plotdata$Stripped.Sequence, .get_internal_occurence, char="R") + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Missed.cleavages) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + plotdata <- plotdata %>% dplyr::filter(!is.na(Missed.cleavages)) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=factor(Missed.cleavages), y=n, fill=factor(Missed.cleavages), colour=factor(Missed.cleavages)), + stat='identity', position='dodge2') + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_bar(stat="identity", alpha=0.7) + + labs(x='Missed R Cleavages', y='Count') + + theme_base(input=input, show_legend=T)+ + theme_diann(input=input, show_legend=T) + + scale_fill_manual(name = "", values = custom_colors)+ + scale_color_manual(name = "", values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/030_Identifications/20_protein_groups.R b/modules/dia-nn/030_Identifications/20_protein_groups.R new file mode 100644 index 0000000..23c92fa --- /dev/null +++ b/modules/dia-nn/030_Identifications/20_protein_groups.R @@ -0,0 +1,92 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Protein Identifications' + help_text <- 'Number of proteotypic protein IDs found per run. Protein IDs are shown across all channels in an experiment.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) +} + + + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Proteotypic', 'Label', 'Protein.Ids','Protein.Group','Protein.Q.Value','PG.Q.Value')] + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + Proteins <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Ids), .groups = "drop") + Proteins$Label <- 'Proteins' + + Proteins.Q <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Protein.Q.Value < 0.01) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Ids), .groups = "drop") + Proteins.Q$Label <- 'Proteins, q-val < 1%' + + PG <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Group), .groups = "drop") + PG$Label <- 'Protein Groups' + + PG.Q <- plotdata %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(PG.Q.Value < 0.01) %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n_distinct(Protein.Group), .groups = "drop") + PG.Q$Label <- 'Protein Groups, q-val < 1%' + + plotdata <- rbind(Proteins, Proteins.Q, PG, PG.Q) + # create custom factor levels to influence the order of labels + levels_plot = c('Proteins', 'Proteins, q-val < 1%', 'Protein Groups', 'Protein Groups, q-val < 1%') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 0), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, colour=Label, fill=Label )) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Proteins', colour ='Filter', fill = 'Filter') + + theme_diann(input=input, show_legend=T) + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R b/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R new file mode 100644 index 0000000..f423225 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/20_total_IDs.R @@ -0,0 +1,123 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Precursors per Channel' + help_text <- 'The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors found in the report.tsv file which is by default controlled by a run-specific FDR.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label','Translated.Q.Value')] + + #plotdata <- read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5, col_types = cols()) + #plotdata <- translate_diann_channel_format(plotdata) + + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata %>% + # mutate(Raw.file=Run) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + + plotdata_union <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id), + Identifications.O = n_distinct(Precursor.Id[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_channel <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id), + Identifications.O = n_distinct(Precursor.Id[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_I <- plotdata %>% + dplyr::group_by(Raw.file,Precursor.Id) %>% + dplyr::summarise(Channels = n_distinct(Label), + Channels.O = n_distinct(Label[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_intersected <- plotdata_I %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Precursor.Id[Channels == 3]), + Identifications.O = n_distinct(Precursor.Id[Channels.O == 3]), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + plotdata = rbind(plotdata_channel, plotdata_intersected, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Identifications.T = Identifications - Identifications.O) %>% + gather("Type", "Identifications", c('Identifications.T','Identifications.O')) + + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Type=recode(Type, + Identifications.T = "Translated", + Identifications.O = "Main Search")) + + levels_plot = c("Translated","Main Search") + plotdata <- within(plotdata, + Type <- factor(Type, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label, alpha=Type)) + + geom_bar(stat="identity") + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors') + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + #theme_diann(input=input, show_legend=T) + + theme(legend.position = "bottom")+ + scale_alpha_manual(name="",values=c(0.4,0.8)) + + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none")+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none") + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R b/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R new file mode 100644 index 0000000..8871ae9 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/21_channel_ids.R @@ -0,0 +1,101 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Precursors per Channel, Channel Q-Value' + help_text <- 'The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors with a Channel.Q.Value <= 0.01.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need(data()[['report']][['Channel.Q.Value']], paste0('Channel.Q.Value column was not found. Your DIA-NN version does not yet support this feature.'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Channel.Q.Value', 'Label')] + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + plotdata <- plotdata %>% + dplyr::filter(Channel.Q.Value < 0.01) + + + # calculate channel wise IDs + plotdata_n <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate IDs across channels + plotdata_k <- plotdata %>% + dplyr::group_by(Raw.file, Precursor.Id) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate union of all IDs + plotdata_union <- plotdata_k %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate intersection of all IDs + plotdata_k = plotdata_k[plotdata_k$Identifications==3,] + plotdata_intersected <- plotdata_k %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + + plotdata = rbind(plotdata_n, plotdata_intersected) + plotdata = rbind(plotdata, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Precursors') + + theme_diann(input=input, show_legend=F) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]))+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]])) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R b/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R new file mode 100644 index 0000000..98413cc --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/24_protein_IDs.R @@ -0,0 +1,118 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Identified Proteins per Channel' + help_text <- 'The number of Proteins identified is shown for every channel together with the number of total and intersected proteins. The number of proteins is based on all proteotypic precursors independent of the Protein.Q.Value.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Proteotypic', 'Label', 'Protein.Ids','Protein.Group','Protein.Q.Value','PG.Q.Value', 'Translated.Q.Value')] + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + plotdata_union <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids), + Identifications.O = n_distinct(Protein.Ids[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_channel <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file,Label) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids), + Identifications.O = n_distinct(Protein.Ids[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_I <- plotdata %>% + dplyr::filter(Proteotypic == 1) %>% + dplyr::group_by(Raw.file,Protein.Ids) %>% + dplyr::summarise(Channels = n_distinct(Label), + Channels.O = n_distinct(Label[Translated.Q.Value <= 0.01]), + .groups = "drop") + + plotdata_intersected <- plotdata_I %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications = n_distinct(Protein.Ids[Channels == 3]), + Identifications.O = n_distinct(Protein.Ids[Channels.O == 3]), + .groups = "drop") + + # Set labels dor newly create dataframe + plotdata_intersected$Label = "Intersected" + plotdata_union$Label = "Union" + + + plotdata = rbind(plotdata_channel, plotdata_intersected, plotdata_union) + + channels = c() + for (channel in config[['channels']]) { + channels <- c(channels, channel[['name']]) + } + + # create custom factor levels to influence the order of labels + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Identifications.T = Identifications - Identifications.O) %>% + gather("Type", "Identifications", c('Identifications.T','Identifications.O')) + + levels_plot = c('Intersected',channels,'Union') + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + plotdata <- plotdata %>% + mutate(Type=recode(Type, + Identifications.T = "Translated", + Identifications.O = "Main Search")) + + levels_plot = c("Translated","Main Search") + plotdata <- within(plotdata, + Type <- factor(Type, levels=levels_plot)) + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label, alpha=Type)) + + geom_bar(stat="identity") + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(x='', y='Number of Proteins') + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + theme(legend.position = "bottom")+ + scale_alpha_manual(name="",values=c(0.4,0.8)) + + scale_fill_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none")+ + scale_color_manual(values = c(custom_colors[[6]],custom_colors[[1]],custom_colors[[2]],custom_colors[[3]],custom_colors[[6]]), guide = "none") + + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R b/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R new file mode 100644 index 0000000..e54a45c --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/30_channel_intensity.R @@ -0,0 +1,115 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Relative Single-Cell Intensity' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label')] + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + # create empty dataframe for output + + outdf <- data.frame(Raw.file=factor(), + logratio=double(), + Label=factor(), + stringsAsFactors=FALSE) + + # iterate all experiments + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + experiment = experiments[[i]] + + subdf <- plotdata[plotdata$Raw.file==experiment,] + + # get carrier label, eg. label with highest intensity + meandf <- subdf %>% + dplyr::group_by(Label) %>% + dplyr::summarise(mean=mean(Ms1.Area), + .groups = "drop") + meandf <- meandf[order(meandf$mean),] + + carrier_label <- 'd8' #meandf[1:1,][['Label']] + + # Calculate ratio to carrier for all but carrier label + + + + for (channel in config[['channels']]) { + current_label = channel[['name']] + if (current_label != carrier_label){ + + # join dataframes by intersected precursors + carrier_df <- subdf[subdf$Label==carrier_label,] + carrier_df <- carrier_df[,c("Precursor.Id","Ms1.Area")] + current_df <- subdf[subdf$Label==current_label,] + + joineddf <- merge(carrier_df, current_df, by = "Precursor.Id") + joineddf$logratio = log10(joineddf$Ms1.Area.y/joineddf$Ms1.Area.x) + + joineddf <- joineddf[,c("Raw.file","logratio","Label")] + + + outdf <- rbind(outdf, joineddf) + + } + } + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata, aes(x=Label, y=logratio)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + theme_diann(input=input, show_legend=T) + + geom_boxplot()+ + ylim(-2.5,1)+ + labs(x='', y='log10 Intensity Ratio') + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R b/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R new file mode 100644 index 0000000..64375f9 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/31_quantification_variability_MS1.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS1 Quantification Variability' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5)) + #plotdata['Raw.file'] <- plotdata['Run'] + #plotdata <- translate_diann_channel_format(plotdata) + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata[,c('Raw.file', 'Ms1.Area', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + # Add Label column + # + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + plotdata$seqcharge = paste0(plotdata$Stripped.Sequence, plotdata$Precursor.Charge) + + #normalize each sequence_charge to the mean of the set + plotdata <- plotdata %>% dplyr::group_by(Raw.file, seqcharge) %>% dplyr::mutate("Ms1.Area"=Ms1.Area/mean(Ms1.Area, na.rm=T)) %>% ungroup() + + # Normalize Ms1.Area based with the median of each cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::mutate("Ms1.Area.Norm" = Ms1.Area/median(Ms1.Area, na.rm=T)) %>% dplyr::ungroup() + + # Filter for at least 3 Precursors / Protein.Group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::mutate("n" = n()) %>% dplyr::ungroup() + plotdata <- plotdata[plotdata$n>3,] + + + # Calculate CV for every protein group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::summarise("cv" = sd(Ms1.Area.Norm)/mean(Ms1.Area.Norm)) + + # Calculate median cv for every cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise('median_cv' = median(cv)) %>% + dplyr::ungroup() + + # Calculate x coordinate as id in set. + # Only needed as heloper for plotting + plotdata <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::mutate('x' = row_number()) %>% + dplyr::ungroup() + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata)+ + geom_jitter( aes(x=x, y=median_cv, color=Label), size=6)+ + theme_diann() + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + ylab("Quantification Variability")+ + xlab("")+ + ylim(0,1)+ + xlim(0,4)+ + theme_diann() + + theme(legend.position = "bottom")+ + theme(axis.text.x = element_blank())+ + scale_color_manual(name='plexDIA Label:', values = c(custom_colors[[1]],custom_colors[[2]],custom_colors[[3]])) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R b/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R new file mode 100644 index 0000000..3d0d2b6 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/32_quantification_variability_MS2.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'MS2 Quantification Variability' + help_text <- 'Single-Cell intensity relative to the carrier channel for intersected precursors' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + # return the IDs for the carrier for a run + .get_carrrier_ids <- function(rawfile, carriertable){ + carriertable <- carriertable[carriertable$Raw.file==rawfile,] + + if (nrow(carriertable) == 0){ + return(1) + } else { + return(carriertable$Identifications[1]) + } + + + } + + .plotdata <- function(data, input) { + + plotdata <- data()[['report']][,c('Raw.file', 'Precursor.Quantity', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + #plotdata <- as.data.frame(read_tsv(file='/Users/georgwallmann/Library/CloudStorage/OneDrive-Personal/Studium/Northeastern/DO-MS-DIA/supplementary_information/do_ms_testcase/report.tsv',guess_max=1e5)) + #plotdata['Raw.file'] <- plotdata['Run'] + #plotdata <- translate_diann_channel_format(plotdata) + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata[,c('Raw.file', 'Precursor.Quantity', 'Precursor.Id', 'Label', 'Protein.Group','Stripped.Sequence','Precursor.Charge')] + + # Add Label column + # + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + + plotdata <- plotdata[plotdata$Precursor.Quantity>0,] + + + plotdata$seqcharge = paste0(plotdata$Stripped.Sequence, plotdata$Precursor.Charge) + + #normalize each sequence_charge to the mean of the set + plotdata <- plotdata %>% dplyr::group_by(Raw.file, seqcharge) %>% dplyr::mutate("Precursor.Quantity"=Precursor.Quantity/mean(Precursor.Quantity, na.rm=T)) %>% ungroup() + + # Normalize Precursor.Quantity based with the median of each cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::mutate("Precursor.Quantity.Norm" = Precursor.Quantity/median(Precursor.Quantity, na.rm=T)) %>% dplyr::ungroup() + + # Filter for at least 3 Precursors / Protein.Group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::mutate("n" = n()) %>% dplyr::ungroup() + plotdata <- plotdata[plotdata$n>3,] + + + # Calculate CV for every protein group + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label, Protein.Group) %>% + dplyr::summarise("cv" = sd(Precursor.Quantity.Norm)/mean(Precursor.Quantity.Norm)) + + # Calculate median cv for every cell + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, Label) %>% + dplyr::summarise('median_cv' = median(cv)) %>% + dplyr::ungroup() + + # Calculate x coordinate as id in set. + # Only needed as heloper for plotting + plotdata <- plotdata %>% + dplyr::group_by(Raw.file) %>% + dplyr::mutate('x' = row_number()) %>% + dplyr::ungroup() + + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + ggplot(plotdata)+ + geom_jitter( aes(x=x, y=median_cv, color=Label), size=6)+ + theme_diann() + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + ylab("Quantification Variability")+ + xlab("")+ + ylim(0,1)+ + xlim(0,4)+ + theme_diann() + + theme(legend.position = "bottom")+ + theme(axis.text.x = element_blank())+ + scale_color_manual(name='plexDIA Label:', values = c(custom_colors[[1]],custom_colors[[2]],custom_colors[[3]])) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R b/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R new file mode 100644 index 0000000..f517d6e --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/40_ms1_ids.R @@ -0,0 +1,101 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Total MS1 Precursors' + help_text <- 'Total number of precursors identified based on different confidence metrics.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['ms1_extracted']], paste0('Upload ms1_extracted.tsv'))) + validate(need((nrow(data()[['ms1_extracted']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + + .plotdata <- function(data, input) { + + ms1_extracted <- data()[['ms1_extracted']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id')] + + + report <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id')] + + + ms1_extracted <- ms1_extracted[ms1_extracted$Ms1.Area>0,] + report <- report[report$Ms1.Area>0,] + + + # calculate channel wise IDs + ms1_extracted_s <- ms1_extracted %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # calculate channel wise IDs + report_s <- report %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + # Set labels dor newly create dataframe + ms1_extracted_s$Label <- "Ms1_extracted.tsv" + report_s$Label <- "run q-value" + + plotdata = rbind(ms1_extracted_s, report_s) + + # create custom factor levels to influence the order of labels + levels_plot = c("run q-value","Ms1_extracted.tsv") + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + + + if ("Channel.Q.Value" %in% names(report)){ + report_channel_q<- report %>% + dplyr::filter(Ms1.Area > 0) %>% + dplyr::filter(Channel.Q.Value < 0.01) %>% + dplyr::group_by(Raw.file) %>% + dplyr::summarise(Identifications=n(), + .groups = "drop") + + report_channel_q$Label <- "channel q-value" + + plotdata = rbind(ms1_extracted_s, report_s, report_channel_q) + + # create custom factor levels to influence the order of labels + levels_plot = c("run q-value","Ms1_extracted.tsv","channel q-value") + plotdata <- within(plotdata, + Label <- factor(Label, levels=levels_plot)) + } + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(x=Label, y=Identifications, fill=Label, colour=Label)) + + geom_bar(stat="identity", alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x")+ + labs(y='Number of Precursors',x='' ) + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ + scale_fill_manual(values=c("#DC267F","#FFB000","#648FFF")) + + scale_fill_manual(values = c(custom_colors[[4]],custom_colors[[5]],custom_colors[[6]]))+ + scale_color_manual(values = c(custom_colors[[4]],custom_colors[[5]],custom_colors[[6]])) + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R b/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R new file mode 100644 index 0000000..778221a --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/50_missing_data_precursor.R @@ -0,0 +1,119 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Missing Data, Precursor Level' + help_text <- 'Plotting the Jaccard Index for identified precursors for all channel combinations. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .jaccard <- function(a, b) { + intersection = length(intersect(a, b)) + union = length(a) + length(b) - intersection + return (intersection/union) + } + + .jaccard_by_label <- function(dataframe, experiment){ + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + labels = sort(unique(dataframe[["Label"]])) + + rows <- length(labels) + idx <- which(lower.tri(matrix(, rows, rows), diag = FALSE) == TRUE, arr.ind=T) + + for(i in 1:nrow(idx)){ + first_index <- idx[i,'row'] + second_index <- idx[i,'col'] + + first_label <- labels[[first_index]] + second_label <- labels[[second_index]] + + precursors_first = dataframe[dataframe$Label == first_label, ] + precursors_first = precursors_first[["Precursor.Id"]] + + precursors_second = dataframe[dataframe$Label == second_label, ] + precursors_second = precursors_second[["Precursor.Id"]] + + dist <- .jaccard(precursors_first, precursors_second) + + x <- i/nrow(idx) + if ((second_label == 'd8') || (first_label == 'd8')) { + order <- 'reference' + } else { + order <- 'other' + } + outdf <- rbind(outdf, data.frame(Raw.file=experiment, Order=order, Jacc=dist, Label=first_label, X=x)) + + } + + return(outdf) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' )] + + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + experiment = experiments[[i]] + subdf <- plotdata[plotdata$Raw.file==experiment,] + jaccdf <- .jaccard_by_label(subdf, experiment) + outdf <- rbind(outdf, jaccdf) + + + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata)+ + geom_point(data=plotdata, aes(x=X, y=Jacc, color=Order), size=6)+ + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + scale_color_manual(name = "Comparison", values = c(custom_colors[[3]],custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_blank())+ + xlim(-1,2)+ + ylab("Jaccard Index")+ + xlab("")+ + ylim(0,1) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R b/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R new file mode 100644 index 0000000..9bb7b31 --- /dev/null +++ b/modules/dia-nn/050_plex-DIA_Diagnostics/51_missing_data_proteins.R @@ -0,0 +1,130 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Missing Data, Protein Level' + help_text <- 'Plotting the Jaccard Index for identified precursors for all channel combinations. ' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['report']], paste0('Upload report.txt'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + } + + + .jaccard <- function(a, b) { + intersection = length(intersect(a, b)) + union = length(a) + length(b) - intersection + return (intersection/union) + } + + .jaccard_by_label <- function(dataframe, experiment){ + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + labels = sort(unique(dataframe[["Label"]])) + + rows <- length(labels) + idx <- which(lower.tri(matrix(, rows, rows), diag = FALSE) == TRUE, arr.ind=T) + + for(i in 1:nrow(idx)){ + first_index <- idx[i,'row'] + second_index <- idx[i,'col'] + + first_label <- labels[[first_index]] + second_label <- labels[[second_index]] + + precursors_first = dataframe[dataframe$Label == first_label, ] + precursors_first = unique(precursors_first[["Protein.Ids"]]) + + precursors_second = dataframe[dataframe$Label == second_label, ] + precursors_second = unique(precursors_second[["Protein.Ids"]]) + + dist <- .jaccard(precursors_first, precursors_second) + + x <- i/nrow(idx) + if ((second_label == 'd8') || (first_label == 'd8')) { + order <- 'reference' + } else { + order <- 'other' + } + outdf <- rbind(outdf, data.frame(Raw.file=experiment, Order=order, Jacc=dist, Label=first_label, X=x)) + + } + + return(outdf) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Label' ,'Protein.Ids','Proteotypic')] + + #plotdata <- read_tsv(file='G:/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/GW/test_data/diann_v_16/report.tsv',guess_max=1e5, col_types = cols()) + #plotdata <- translate_diann_channel_format(plotdata) + + # Add column for modified precursor without channel + #plotdata <- separate_channel_info(plotdata) + #plotdata <- plotdata %>% + #mutate(Raw.file=Run) + + plotdata <- plotdata[plotdata$Ms1.Area>0,] + plotdata <- plotdata[plotdata$Proteotypic==1,] + + outdf <- data.frame(Raw.file=factor(), + Order=factor(), + Jacc=double(), + Label=factor(), + X=double(), + stringsAsFactors=FALSE) + + experiments = unique(plotdata[["Raw.file"]]) + for (i in 1:length(experiments)){ + print(i) + print(experiments[[i]]) + experiment = experiments[[i]] + + subdf <- plotdata[plotdata$Raw.file==experiment,] + jaccdf <- .jaccard_by_label(subdf, experiment) + + outdf <- rbind(outdf, jaccdf) + + + + } + + return(outdf) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata)+ + geom_point(data=plotdata, aes(x=X, y=Jacc, color=Order), size=6)+ + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + scale_color_manual(name = "Comparison", values = c(custom_colors[[3]],custom_colors[[1]]))+ + theme(legend.position = "bottom")+ + theme_diann(input=input, show_legend=T) + + theme(axis.text.x = element_blank())+ + xlim(-1,2)+ + ylab("Jaccard Index")+ + xlab("")+ + ylim(0,1) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R b/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R new file mode 100644 index 0000000..77c99cf --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/01_charge_distribution.R @@ -0,0 +1,85 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified by Charge' + help_text <- 'Identified features are reported based on the charge. Precursors quantified in seperate channels are treated as separate precursors..' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + validate(need(data()[['report']], paste0('Upload report.tsv'))) + validate(need((nrow(data()[['report']]) > 1), paste0('No Rows selected'))) + validate(need(config[['ChemicalLabels']], paste0('Please provide a list of labels under the key: ChemicalLabels in the settings.yaml file'))) + } + + .get_label <- function(sequence, labelsdata){ + + label = '' + + for (i in 1:length(labelsdata)){ + current_label <- labelsdata[[i]] + + if (grepl( current_label, sequence, fixed = TRUE)){ + label <- current_label + } + + } + + return(label) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['report']][,c('Raw.file', 'Ms1.Area', 'Precursor.Id','Protein.Group')] + featuredata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtEnd')] + labelsdata <- config[['ChemicalLabels']] + + featuredata$charge[featuredata$charge > 3] <- 4 + + featuredata_total <- featuredata %>% + dplyr::group_by(Raw.file, charge) %>% + dplyr::tally() + + + + return(featuredata_total) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata) + + geom_bar(aes(x=charge, y=n, fill=factor(charge), colour=factor(charge)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Charge State', y='Count', fill='Charge State') + + + theme_diann(input=input, show_legend=T) + + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors) + + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} + diff --git a/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R b/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R new file mode 100644 index 0000000..54ed86e --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/02_nIsotopes.R @@ -0,0 +1,61 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Isotopic Peaks Identified per Feature' + help_text <- 'The number of isotopic peaks identified is shown for features detected in the Dinosaur search.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','nIsotopes')] + + plotdata$nIsotopes[plotdata$nIsotopes > 5] <- 5 + + plotdata <- plotdata %>% + dplyr::group_by(Raw.file, nIsotopes) %>% + dplyr::tally() + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxIsotopes <- max(plotdata$nIsotopes) + + ggplot(plotdata) + + geom_bar(aes(x=nIsotopes, y=n, fill=factor(nIsotopes), colour=factor(nIsotopes)), + stat='identity', position='dodge2', alpha=0.7) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + labs(x='Number of Isotopic Peaks', y='Features identified', fill='Isotopes') + + scale_fill_manual(values = custom_colors)+ + scale_color_manual(values = custom_colors)+ + theme(axis.text.x=element_blank(), + axis.ticks.x=element_blank(), + legend.position = "bottom") + + theme_diann(input=input, show_legend=T) + + guides(fill = guide_legend(override.aes = list(color = NA)), + color = 'none', + shape = 'none') + + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/03_nscans.R b/modules/dia-nn/060_Feature_Detection/03_nscans.R new file mode 100644 index 0000000..024c9b1 --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/03_nscans.R @@ -0,0 +1,57 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Number of Scans per feature' + help_text <- 'The number of MS1 scans is shown for all features identified in the Dinosaur search.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','nScans')] + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$nScans, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$nScans, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(nScans)) + + plotdata[plotdata$nScans >= ceiling, 'nScans'] <- ceiling + plotdata[plotdata$nScans <= floor, 'nScans'] <- floor + + maxScans <- max(plotdata$nScans) + + ggplot(plotdata, aes(nScans)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=30, fill=custom_colors[[6]]) + + coord_flip() + + xlim(0, maxScans) + + labs(x='Number of Scans', y='Features identified') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R b/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R new file mode 100644 index 0000000..b68aba7 --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/041_IDs_by_RT.R @@ -0,0 +1,56 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified across Gradient' + help_text <- 'The frequency of precursor identifications based on the Dinosaur search is plotted across the chromatographic gradient.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(rtStart > config[['RT.Start']]) %>% + filter(rtStart < config[['RT.End']]) + + plotdata$Category = 'z = 1' + plotdata$Category[plotdata$charge > 1] <- 'z > 1' + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxRT <- max(plotdata$rtApex) + + ggplot(plotdata, aes(x=rtApex, color = Category)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='Retention Time (min)', y='Number of Precursors') + + scale_color_manual(name='Charge:', values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme_diann(input=input, show_legend=T)+ + theme(legend.position = "bottom") + + } + + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R b/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R new file mode 100644 index 0000000..7b1adab --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/042_IDs_by_mz.R @@ -0,0 +1,51 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Features Identified across m/z' + help_text <- 'The frequency of precursor identifications based on the Dinosaur search is plotted across the mass to charge ratio.' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + plotdata$Category = 'z = 1' + plotdata$Category[plotdata$charge > 1] <- 'z > 1' + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + maxMZ <- max(plotdata$mz) + minMZ <- max(plotdata$mz) + + ggplot(plotdata, aes(x=mz, color = Category)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + + stat_bin(aes(y=..count..), size = 0.8, bins=100,position = "identity",geom="step")+ + coord_flip() + + labs(x='m/z', y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + scale_color_manual(name='Charge:', values=c(custom_colors[[1]], custom_colors[[6]]))+ + theme(legend.position = "bottom") + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R b/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R new file mode 100644 index 0000000..6c2017e --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/04_RT_length_base.R @@ -0,0 +1,60 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Retention Length of Features at Base' + help_text <- 'Plotting the retention length of identified features at the base.' + source_file <- 'report' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd')] + + # Apply retention time filter as specified in settings.yaml + plotdata <- plotdata %>% + filter(rtStart > config[['RT.Start']]) %>% + filter(rtStart < config[['RT.End']]) + + plotdata <- dplyr::mutate(plotdata, RT.Length = (rtEnd-rtStart)*60) + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$RT.Length, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$RT.Length, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(RT.Length)) + + plotdata[plotdata$RT.Length >= ceiling, "RT.Length"] <- ceiling + plotdata[plotdata$RT.Length <= floor, "RT.Length"] <- floor + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + validate(need((nrow(plotdata) > 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(RT.Length)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x='Retention Lengths at base (sec)', y='Features identified') + + theme_diann(input=input, show_legend=T) + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R b/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R new file mode 100644 index 0000000..880e71b --- /dev/null +++ b/modules/dia-nn/060_Feature_Detection/05_intensity_distribution.R @@ -0,0 +1,53 @@ +init <- function() { + + type <- 'plot' + box_title <- 'Feature Intensity Distribution' + help_text <- 'The distribution of integrated intensities is shown for identified features. ' + source_file <- 'features' + + .validate <- function(data, input) { + validate(need(data()[['features']], paste0('Please provide a features.tsv file'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['features']][,c('Raw.file', 'mz', 'rtStart','charge','rtApex','rtEnd','intensitySum')] + plotdata$Intensity <- log10(plotdata$intensitySum) + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$Intensity, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$Intensity, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(Intensity)) + if(nrow(plotdata) > 0){ + plotdata[plotdata$Intensity >= ceiling, 2] <- ceiling + plotdata[plotdata$Intensity <= floor, 2] <- floor + } + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + ggplot(plotdata, aes(x=Intensity )) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=50, fill=custom_colors[[6]]) + + coord_flip() + + labs(x=expression(bold('Log'[10]*'Feature Intensity')), y='Number of Precursors') + + theme_diann(input=input, show_legend=T) + + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot, + dynamic_width=200, + dynamic_width_base=50 + )) +} diff --git a/modules/005_Summary/summary_005_maxquant.R b/modules/max_quant/005_Summary/summary_005_maxquant.R similarity index 100% rename from modules/005_Summary/summary_005_maxquant.R rename to modules/max_quant/005_Summary/summary_005_maxquant.R diff --git a/modules/005_Summary/summary_010_expmap.R b/modules/max_quant/005_Summary/summary_010_expmap.R similarity index 100% rename from modules/005_Summary/summary_010_expmap.R rename to modules/max_quant/005_Summary/summary_010_expmap.R diff --git a/modules/005_Summary/summary_020_parameters.R b/modules/max_quant/005_Summary/summary_020_parameters.R similarity index 100% rename from modules/005_Summary/summary_020_parameters.R rename to modules/max_quant/005_Summary/summary_020_parameters.R diff --git a/modules/005_Summary/summary_030_expsummary.R b/modules/max_quant/005_Summary/summary_030_expsummary.R similarity index 100% rename from modules/005_Summary/summary_030_expsummary.R rename to modules/max_quant/005_Summary/summary_030_expsummary.R diff --git a/modules/010_Chromatography/chrom_00_FWHM.R b/modules/max_quant/010_Chromatography/chrom_00_FWHM.R similarity index 100% rename from modules/010_Chromatography/chrom_00_FWHM.R rename to modules/max_quant/010_Chromatography/chrom_00_FWHM.R diff --git a/modules/010_Chromatography/chrom_00_FWHM_most_intense.R b/modules/max_quant/010_Chromatography/chrom_00_FWHM_most_intense.R similarity index 100% rename from modules/010_Chromatography/chrom_00_FWHM_most_intense.R rename to modules/max_quant/010_Chromatography/chrom_00_FWHM_most_intense.R diff --git a/modules/010_Chromatography/chrom_01_IDfreq_by_RT.R b/modules/max_quant/010_Chromatography/chrom_01_IDfreq_by_RT.R similarity index 100% rename from modules/010_Chromatography/chrom_01_IDfreq_by_RT.R rename to modules/max_quant/010_Chromatography/chrom_01_IDfreq_by_RT.R diff --git a/modules/010_Chromatography/chrom_02_RT_length_base.R b/modules/max_quant/010_Chromatography/chrom_02_RT_length_base.R similarity index 100% rename from modules/010_Chromatography/chrom_02_RT_length_base.R rename to modules/max_quant/010_Chromatography/chrom_02_RT_length_base.R diff --git a/modules/020_Ion_Sampling/instrument_00_apex_offset.R b/modules/max_quant/020_Ion_Sampling/instrument_00_apex_offset.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_00_apex_offset.R rename to modules/max_quant/020_Ion_Sampling/instrument_00_apex_offset.R diff --git a/modules/020_Ion_Sampling/instrument_01_MS1_int_ID.R b/modules/max_quant/020_Ion_Sampling/instrument_01_MS1_int_ID.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_01_MS1_int_ID.R rename to modules/max_quant/020_Ion_Sampling/instrument_01_MS1_int_ID.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_Intersected_int_Norm.R diff --git a/modules/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R b/modules/max_quant/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R rename to modules/max_quant/020_Ion_Sampling/instrument_02_MS1_int_z_gt_1.R diff --git a/modules/020_Ion_Sampling/instrument_03_MS1_int_z_all.R b/modules/max_quant/020_Ion_Sampling/instrument_03_MS1_int_z_all.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_03_MS1_int_z_all.R rename to modules/max_quant/020_Ion_Sampling/instrument_03_MS1_int_z_all.R diff --git a/modules/020_Ion_Sampling/instrument_04_injection_time_PSM.R b/modules/max_quant/020_Ion_Sampling/instrument_04_injection_time_PSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_04_injection_time_PSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_04_injection_time_PSM.R diff --git a/modules/020_Ion_Sampling/instrument_04_ms2_per_PSM.R b/modules/max_quant/020_Ion_Sampling/instrument_04_ms2_per_PSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_04_ms2_per_PSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_04_ms2_per_PSM.R diff --git a/modules/020_Ion_Sampling/instrument_05_injection_time_noPSM.R b/modules/max_quant/020_Ion_Sampling/instrument_05_injection_time_noPSM.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_05_injection_time_noPSM.R rename to modules/max_quant/020_Ion_Sampling/instrument_05_injection_time_noPSM.R diff --git a/modules/020_Ion_Sampling/instrument_06_MS1_fill_time.R b/modules/max_quant/020_Ion_Sampling/instrument_06_MS1_fill_time.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_06_MS1_fill_time.R rename to modules/max_quant/020_Ion_Sampling/instrument_06_MS1_fill_time.R diff --git a/modules/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R b/modules/max_quant/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R rename to modules/max_quant/020_Ion_Sampling/instrument_07_MS1_dutycycle_RT.R diff --git a/modules/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R b/modules/max_quant/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R similarity index 100% rename from modules/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R rename to modules/max_quant/020_Ion_Sampling/instrument_09__duty_cycle_IDrate.R diff --git a/modules/020_Ion_Sampling/instrument_10_RI_precursor.R b/modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R similarity index 93% rename from modules/020_Ion_Sampling/instrument_10_RI_precursor.R rename to modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R index 1d00828..7028a08 100644 --- a/modules/020_Ion_Sampling/instrument_10_RI_precursor.R +++ b/modules/max_quant/020_Ion_Sampling/instrument_10_RI_precursor.R @@ -1,86 +1,86 @@ -init <- function() { - - type <- 'plot' - box_title <- 'MS2 Signal / MS1 Signal per peptide, log10' - help_text <- 'This plot displays the sum of the reporter ion intensities divided by its precursor intensity per run on a log10 scale. This can serve as an indicator of transmission efficiency.' - source_file <- 'evidence' - - .validate <- function(data, input) { - validate(need(data()[['evidence']], paste0('Upload evidence.txt'))) - - # require reporter ion quantification data - validate(need(any(grepl('Reporter.intensity.corrected', colnames(data()[['evidence']]))), - paste0('Loaded data does not contain reporter ion quantification'))) - } - - .plotdata <- function(data, input) { - plotdata <- data()[['evidence']] - plotdata$SeqCharge <- paste(plotdata$Sequence, plotdata$Charge, sep="") - duplicated_seqcharge <- plotdata[duplicated(plotdata$SeqCharge), ] - - - potential_unique <- duplicated_seqcharge$SeqCharge - subframes <- split(plotdata, plotdata$Raw.file) - - - final_unique = c() - for (i in 1:length(potential_unique)){ - bool_vec = c() - for (k in 1:length(subframes)){ - if (potential_unique[i] %in% subframes[[k]][['SeqCharge']]){ - bool_vec = c(bool_vec, TRUE) - } - } - if (sum(bool_vec, na.rm = TRUE) == length(subframes)){ - final_unique = c(final_unique, potential_unique[i]) - } - } - plotdata <- plotdata[plotdata$SeqCharge %in% final_unique,] - - plotdata <- plotdata %>% dplyr::filter(Type != "MULTI-MATCH") - - plotdata <- plotdata %>% - rowwise() %>% - mutate(RI.Sum = sum(across(starts_with("Reporter.intensity.corrected")), na.rm = T)) - plotdata <- plotdata %>% - rowwise() %>% - mutate(RI.Precursor.Ratio = log10(RI.Sum / Intensity)) - - - # Thresholding data at 1 and 99th percentiles - ceiling <- quantile(plotdata$RI.Precursor.Ratio, probs=.99, na.rm = TRUE) - floor <- quantile(plotdata$RI.Precursor.Ratio, probs=.01, na.rm = TRUE) - - plotdata <- dplyr::filter(plotdata, is.finite(RI.Precursor.Ratio)) - - - plotdata[plotdata$RI.Precursor.Ratio >= ceiling, ]['RI.Precursor.Ratio'] <- ceiling - plotdata[plotdata$RI.Precursor.Ratio <= floor, ]['RI.Precursor.Ratio'] <- floor - - return(plotdata) - } - - .plot <- function(data, input) { - .validate(data, input) - plotdata <- .plotdata(data, input) - - validate(need((nrow(plotdata) >= 1), paste0('No Rows selected'))) - - ggplot(plotdata, aes(RI.Precursor.Ratio)) + - facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + - geom_histogram(bins=100) + - coord_flip() + - labs(x=expression(bold('Log'[10]*' Ratio')), y='Number of Peptides') - - } - - return(list( - type=type, - box_title=box_title, - help_text=help_text, - source_file=source_file, - validate_func=.validate, - plotdata_func=.plotdata, - plot_func=.plot - )) -} +init <- function() { + + type <- 'plot' + box_title <- 'MS2 Signal / MS1 Signal per peptide, log10' + help_text <- 'This plot displays the sum of the reporter ion intensities divided by its precursor intensity per run on a log10 scale. This can serve as an indicator of transmission efficiency.' + source_file <- 'evidence' + + .validate <- function(data, input) { + validate(need(data()[['evidence']], paste0('Upload evidence.txt'))) + + # require reporter ion quantification data + validate(need(any(grepl('Reporter.intensity.corrected', colnames(data()[['evidence']]))), + paste0('Loaded data does not contain reporter ion quantification'))) + } + + .plotdata <- function(data, input) { + plotdata <- data()[['evidence']] + plotdata$SeqCharge <- paste(plotdata$Sequence, plotdata$Charge, sep="") + duplicated_seqcharge <- plotdata[duplicated(plotdata$SeqCharge), ] + + + potential_unique <- duplicated_seqcharge$SeqCharge + subframes <- split(plotdata, plotdata$Raw.file) + + + final_unique = c() + for (i in 1:length(potential_unique)){ + bool_vec = c() + for (k in 1:length(subframes)){ + if (potential_unique[i] %in% subframes[[k]][['SeqCharge']]){ + bool_vec = c(bool_vec, TRUE) + } + } + if (sum(bool_vec, na.rm = TRUE) == length(subframes)){ + final_unique = c(final_unique, potential_unique[i]) + } + } + plotdata <- plotdata[plotdata$SeqCharge %in% final_unique,] + + plotdata <- plotdata %>% dplyr::filter(Type != "MULTI-MATCH") + + plotdata <- plotdata %>% + rowwise() %>% + mutate(RI.Sum = sum(across(starts_with("Reporter.intensity.corrected")), na.rm = T)) + plotdata <- plotdata %>% + rowwise() %>% + mutate(RI.Precursor.Ratio = log10(RI.Sum / Intensity)) + + + # Thresholding data at 1 and 99th percentiles + ceiling <- quantile(plotdata$RI.Precursor.Ratio, probs=.99, na.rm = TRUE) + floor <- quantile(plotdata$RI.Precursor.Ratio, probs=.01, na.rm = TRUE) + + plotdata <- dplyr::filter(plotdata, is.finite(RI.Precursor.Ratio)) + + + plotdata[plotdata$RI.Precursor.Ratio >= ceiling, ]['RI.Precursor.Ratio'] <- ceiling + plotdata[plotdata$RI.Precursor.Ratio <= floor, ]['RI.Precursor.Ratio'] <- floor + + return(plotdata) + } + + .plot <- function(data, input) { + .validate(data, input) + plotdata <- .plotdata(data, input) + + validate(need((nrow(plotdata) >= 1), paste0('No Rows selected'))) + + ggplot(plotdata, aes(RI.Precursor.Ratio)) + + facet_wrap(~Raw.file, nrow = 1, scales = "free_x") + + geom_histogram(bins=100) + + coord_flip() + + labs(x=expression(bold('Log'[10]*' Ratio')), y='Number of Peptides') + + } + + return(list( + type=type, + box_title=box_title, + help_text=help_text, + source_file=source_file, + validate_func=.validate, + plotdata_func=.plotdata, + plot_func=.plot + )) +} \ No newline at end of file diff --git a/modules/030_Peptide_Identifications/00_miscleavage.R b/modules/max_quant/030_Peptide_Identifications/00_miscleavage.R similarity index 100% rename from modules/030_Peptide_Identifications/00_miscleavage.R rename to modules/max_quant/030_Peptide_Identifications/00_miscleavage.R diff --git a/modules/030_Peptide_Identifications/05_miscleavage_table.R b/modules/max_quant/030_Peptide_Identifications/05_miscleavage_table.R similarity index 100% rename from modules/030_Peptide_Identifications/05_miscleavage_table.R rename to modules/max_quant/030_Peptide_Identifications/05_miscleavage_table.R diff --git a/modules/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R b/modules/max_quant/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R similarity index 100% rename from modules/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R rename to modules/max_quant/030_Peptide_Identifications/10_PEP_colored_cumulative_update.R diff --git a/modules/030_Peptide_Identifications/25_PIF.R b/modules/max_quant/030_Peptide_Identifications/25_PIF.R similarity index 100% rename from modules/030_Peptide_Identifications/25_PIF.R rename to modules/max_quant/030_Peptide_Identifications/25_PIF.R diff --git a/modules/030_Peptide_Identifications/30_fractional_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/30_fractional_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate.R diff --git a/modules/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R b/modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R similarity index 100% rename from modules/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R rename to modules/max_quant/030_Peptide_Identifications/30_fractional_ID_rate_intensitybins.R diff --git a/modules/030_Peptide_Identifications/40_total_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/40_total_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/40_total_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/40_total_ID_rate.R diff --git a/modules/030_Peptide_Identifications/dart_00_a_link2.R b/modules/max_quant/030_Peptide_Identifications/dart_00_a_link2.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_00_a_link2.R rename to modules/max_quant/030_Peptide_Identifications/dart_00_a_link2.R diff --git a/modules/030_Peptide_Identifications/dart_00_pep_scatter.R b/modules/max_quant/030_Peptide_Identifications/dart_00_pep_scatter.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_00_pep_scatter.R rename to modules/max_quant/030_Peptide_Identifications/dart_00_pep_scatter.R diff --git a/modules/030_Peptide_Identifications/dart_01_fdr_increase.R b/modules/max_quant/030_Peptide_Identifications/dart_01_fdr_increase.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_01_fdr_increase.R rename to modules/max_quant/030_Peptide_Identifications/dart_01_fdr_increase.R diff --git a/modules/030_Peptide_Identifications/dart_02_alignment.R b/modules/max_quant/030_Peptide_Identifications/dart_02_alignment.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_02_alignment.R rename to modules/max_quant/030_Peptide_Identifications/dart_02_alignment.R diff --git a/modules/030_Peptide_Identifications/dart_03_ids_per_exp.R b/modules/max_quant/030_Peptide_Identifications/dart_03_ids_per_exp.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_03_ids_per_exp.R rename to modules/max_quant/030_Peptide_Identifications/dart_03_ids_per_exp.R diff --git a/modules/030_Peptide_Identifications/dart_04_fractional_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/dart_04_fractional_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_04_fractional_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/dart_04_fractional_ID_rate.R diff --git a/modules/030_Peptide_Identifications/dart_05_total_ID_rate.R b/modules/max_quant/030_Peptide_Identifications/dart_05_total_ID_rate.R similarity index 100% rename from modules/030_Peptide_Identifications/dart_05_total_ID_rate.R rename to modules/max_quant/030_Peptide_Identifications/dart_05_total_ID_rate.R diff --git a/modules/040_Contamination/contam_00_charge_count.R b/modules/max_quant/040_Contamination/contam_00_charge_count.R similarity index 100% rename from modules/040_Contamination/contam_00_charge_count.R rename to modules/max_quant/040_Contamination/contam_00_charge_count.R diff --git a/modules/040_Contamination/contam_01_charge_tic.R b/modules/max_quant/040_Contamination/contam_01_charge_tic.R similarity index 100% rename from modules/040_Contamination/contam_01_charge_tic.R rename to modules/max_quant/040_Contamination/contam_01_charge_tic.R diff --git a/modules/040_Contamination/contam_02_MS1_int_z_1.R b/modules/max_quant/040_Contamination/contam_02_MS1_int_z_1.R similarity index 100% rename from modules/040_Contamination/contam_02_MS1_int_z_1.R rename to modules/max_quant/040_Contamination/contam_02_MS1_int_z_1.R diff --git a/modules/040_Contamination/contam_03_mz_z_1.R b/modules/max_quant/040_Contamination/contam_03_mz_z_1.R similarity index 100% rename from modules/040_Contamination/contam_03_mz_z_1.R rename to modules/max_quant/040_Contamination/contam_03_mz_z_1.R diff --git a/modules/040_Contamination/contam_04_z1_int_by_RT.R b/modules/max_quant/040_Contamination/contam_04_z1_int_by_RT.R similarity index 100% rename from modules/040_Contamination/contam_04_z1_int_by_RT.R rename to modules/max_quant/040_Contamination/contam_04_z1_int_by_RT.R diff --git a/modules/040_Contamination/contam_04_z2_int_by_RT.R b/modules/max_quant/040_Contamination/contam_04_z2_int_by_RT.R similarity index 100% rename from modules/040_Contamination/contam_04_z2_int_by_RT.R rename to modules/max_quant/040_Contamination/contam_04_z2_int_by_RT.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_001_med_RI_heatmap.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_00_NAs.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_00_NAs.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_00_NAs.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_00_NAs.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_05_misscleavages_sc.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_07_scRI_swarmplot.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_08_rRI_vs_RT.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_09_scRI_swarmplot_cors.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_11_synthetic_peptides.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_RI_contam.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_12_a_explain.R diff --git a/modules/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R b/modules/max_quant/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R similarity index 100% rename from modules/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R rename to modules/max_quant/050_SCoPE-MS_Diagnostics/scope_13_cor_mat.R diff --git a/modules/080_Carrier/carrier_00.R b/modules/max_quant/080_Carrier/carrier_00.R similarity index 100% rename from modules/080_Carrier/carrier_00.R rename to modules/max_quant/080_Carrier/carrier_00.R diff --git a/modules/080_Carrier/carrier_01_label_efficiency.R b/modules/max_quant/080_Carrier/carrier_01_label_efficiency.R similarity index 86% rename from modules/080_Carrier/carrier_01_label_efficiency.R rename to modules/max_quant/080_Carrier/carrier_01_label_efficiency.R index 805aca6..24c6669 100644 --- a/modules/080_Carrier/carrier_01_label_efficiency.R +++ b/modules/max_quant/080_Carrier/carrier_01_label_efficiency.R @@ -10,10 +10,10 @@ init <- function() { validate(need(data()[['Labeling_Efficiency']], paste0('Upload evidence.txt in labeling efficiency input'))) # require TMT as a variable mod - validate(need(any(grepl('TMT', data()[['Labeling_Efficiency']]$Modifications)), - paste0('Loaded data was not searched with TMT as a variable modification'))) - validate(need(any(grepl('TMTPro_K_LE', colnames(data()[['Labeling_Efficiency']]))), - paste0('Loaded data was not searched with TMT as a variable modification'))) + #validate(need(any(grepl('TMT', data()[['Labeling_Efficiency']]$Modifications)), + #paste0('Loaded data was not searched with TMT as a variable modification'))) + #validate(need(any(grepl('TMTPro_K_LE', colnames(data()[['Labeling_Efficiency']]))), + # paste0('Loaded data was not searched with TMT as a variable modification'))) } .plotdata <- function(data, input) { @@ -26,8 +26,8 @@ init <- function() { plotdata$K_count <- stringr::str_count(plotdata$Sequence, "K") plotdata$Acetyl_count = stringr::str_count(plotdata$Modifications, "Acetyl (Protein N-term)") - n_nam<-c("TMT11plex_N_LE", "TMTPro_Nter_LE","Variable_TMTpro16plex.N-term") - k_nam<-c("TMT11plex_K_LE", "TMTPro_K_LE","Variable_TMTpro16plex.Lys") + n_nam<-c("TMT11plex_N_LE", "TMTPro_Nter_LE") + k_nam<-c("TMT11plex_K_LE", "TMTPro_K_LE") n_ind<-which(colnames(plotdata)%in%n_nam) k_ind<-which(colnames(plotdata)%in%k_nam) diff --git a/modules/080_Carrier/carrier_02_RelativeAbundance.R b/modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R similarity index 95% rename from modules/080_Carrier/carrier_02_RelativeAbundance.R rename to modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R index ecb2a71..621b6b6 100644 --- a/modules/080_Carrier/carrier_02_RelativeAbundance.R +++ b/modules/max_quant/080_Carrier/carrier_02_RelativeAbundance.R @@ -46,12 +46,9 @@ init <- function() { validate(need(nrow(plotdata) > 2, paste0('Less than 2 peptides with labeled and unlabeled precursors'))) ggplot(plotdata, aes(Log2IntRatio)) + - #geom_bar(stat='identity') + geom_histogram() + coord_flip() + facet_wrap(~Raw.file.x, nrow=1) + - #scale_x_discrete(labels=c('Unlabeled', 'Labeled')) + - #scale_fill_discrete(guide=F) + labs(x="Log2(Precursor Intensity Ratio)\n(Labeled / Unlabeled)", y='# Peptides') + theme_base(input=input) + theme(axis.text.x=element_text(angle=45, hjust=1, vjust=1)) diff --git a/modules/080_Carrier/carrier_03__miscleavage_table.R b/modules/max_quant/080_Carrier/carrier_03__miscleavage_table.R similarity index 100% rename from modules/080_Carrier/carrier_03__miscleavage_table.R rename to modules/max_quant/080_Carrier/carrier_03__miscleavage_table.R diff --git a/modules/080_Carrier/carrier_04_R_vs_K_missedcleavages.R b/modules/max_quant/080_Carrier/carrier_04_R_vs_K_missedcleavages.R similarity index 100% rename from modules/080_Carrier/carrier_04_R_vs_K_missedcleavages.R rename to modules/max_quant/080_Carrier/carrier_04_R_vs_K_missedcleavages.R diff --git a/out.html b/out.html new file mode 100644 index 0000000..f9da014 --- /dev/null +++ b/out.html @@ -0,0 +1,1899 @@ + + + + +
+ + + + + + + + +DIA-NN Experiments
+wGW032 F:/GW/raw_data/wGW032.raw
+wGW033 F:/GW/raw_data/wGW033.raw
+wGW034 F:/GW/raw_data/wGW034.raw
+wGW035 F:/GW/raw_data/wGW035.raw
+wGW036 F:/GW/raw_data/wGW036.raw
+wGW037 F:/GW/raw_data/wGW037.raw
+wGW038 F:/GW/raw_data/wGW038.raw
+Plotting the MS1 intensity for all precursors which were associated with one of the defined channels.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Precursor are plotted across the chromatographic gradient.
+ +Plotting the MS1 intensity for all precursors summed over all channels.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Plotting the MS1 intensity for all precursors summed over all channels. Only intersected precursors present in all loaded experiments are shown.
+ +Plotting the MS1 Intensity for intersected precursors summed over all channels. Experiments are normalized to the first experiment.
+ +Number of precursors observed during MS1 scans by charge state
+ +Ms1 fill times along gradient
+## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.
+## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+The toal Ion Current (TIC) is shown for bins along the retention time gradient.
+## [1] "Plot failed to render. Reason: Error: Upload tic.tsv\n"
+Ms2 fill times along gradient
+## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+The averge fill time is shown in magenta for different bins along the retention time gradient. The standard deviation is depicted as area in blue, scans outside this area are shown as single datapoints.
+## [1] "Plot failed to render. Reason: Error: Upload fill_times.txt\n"
+The average Ms2 fill times are shown across the gradient for every distinct Ms2 window.
+## [1] "Plot failed to render. Reason: Error: Upload tic.tsv\n"
+Plotting the MS1 counts based on the signal to noise ratio for all precursors which were associated with one of the defined channels.
+## [1] "Plot failed to render. Reason: Error: Upload report.txt\n"
+Plotting the number of precursors identified at each given confidence level.
+ +Number of precursors found based on quantification startegy. Ms2 precursors are counted based on Precursor.Quantity > 0 and Ms1 precursors are counted based on Ms1.Area > 0.
+ +Number of precursors found based on modification types specified
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Miscleavage rate (percentage) for precursors identified with confidence PEP < 0.01
+Plotting frequency of lysine miscleavages in confidently identified precursors.
+ +Plotting frequency of arginine miscleavages in confidently identified precursors.
+ +Number of proteotypic protein IDs found per run. Protein IDs are shown across all channels in an experiment.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors found in the report.tsv file which is by default controlled by a run-specific FDR.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+The number of precursors identified is shown for every channel together with the number of total and intersected precursors. The number of precursors is based on all precursors with a Channel.Q.Value <= 0.01.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+The number of Proteins identified is shown for every channel together with the number of total and intersected proteins. The number of proteins is based on all proteotypic precursors independent of the Protein.Q.Value.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Single-Cell intensity relative to the carrier channel for intersected precursors
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Total number of precursors identified based on different confidence metrics.
+## [1] "Plot failed to render. Reason: Error: Upload ms1_extracted.tsv\n"
+Plotting the Jaccard Index for identified precursors for all channel combinations.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Plotting the Jaccard Index for identified precursors for all channel combinations.
+## [1] "Plot failed to render. Reason: Error in `[.data.frame`(data()[[\"report\"]], , c(\"Raw.file\", \"Ms1.Area\", : undefined columns selected\n"
+Identified features are reported based on the charge. Precursors quantified in seperate channels are treated as separate precursors..
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+The number of isotopes identified is shown for features detected in the Dinosaur search.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+The number of MS1 scans is shown for all features identified in the Dinosaur search.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+Plotting the retention length of identified features at the base.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+The frequency of precursor identifications based on the Dinosaur search is plotted across the chromatographic gradient.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+The frequency of precursor identifications based on the Dinosaur search is plotted across the mass to charge ratio.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+The distribution of integrated intensities is shown for identified features.
+## [1] "Plot failed to render. Reason: Error: Please provide a features.tsv file\n"
+