diff --git a/README.md b/README.md index 377eeea..55fb7a4 100644 --- a/README.md +++ b/README.md @@ -18,13 +18,10 @@ Assuming you have completed all pre-processing and normalization procedures, her 1. **Your Data**: + Ensure your methylation data is loaded into R's Global Environment as a numeric matrix in *probe by sample* form: with probe IDs as your row names and sample IDs as the column names + Your phenotype / clinical data should be a data frame with a column called `Sample` (spelled exactly with an upper-case "S"); these sample IDs should match the column names of the methylation data -2. **Pre-Defined Regions**: Load the list of pre-calculated regions of "contiguous" CpGs which matches your Illumina data type. We have pre-calculated some of these lists of regions. We used the `CloseBySingleRegion()` function with `maxGap = 200` (genomic locations within 200 base pairs are placed in the same cluster) and `minCpGs = 3` (we need at least 3 CpGs to retain the location). These data files are: - + Genic regions, 450k array: `extdata/450k_Gene_3_200.rds`. Load this via `system.file("extdata", "450k_Gene_3_200.rds", package = "coMethDMR", mustWork = TRUE)` - + Inter-genic regions, 450k array: `extdata/450k_InterGene_3_200.rds`.Load this via `system.file("extdata", "450k_InterGene_3_200.rds", package = "coMethDMR", mustWork = TRUE)` - + Genic regions, EPIC array: download the supplemental data file from - + Inter-genic regions, EPIC array: download the supplemental data file from - + Both Genic and Inter-genic regions, 450k array: downlaod the supplemental data file [hg19 data](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/450k_All_3_200_hg19.rds) or [hg38 data](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/450k_All_3_200_hg38.rds) - + Both Genic and Inter-genic regions, EPIC array: downlaod the supplemental data file [hg19 data](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/EPIC_10b4_All_3_200_hg19.rds) or [hg38 data](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/EPIC_10b4_All_3_200_hg38.rds) +2. **Pre-Defined Regions**: Load the list of pre-calculated regions of "contiguous" CpGs which matches your Illumina data type. We have pre-calculated some of these lists of regions. We used the `CloseByAllRegions()` function with `maxGap = 200` (genomic locations within 200 base pairs are placed in the same cluster) and `minCpGs = 3` (we need at least 3 CpGs to retain the location). These data files are: + + Both Genic and Inter-genic regions for 450k array: downlaod the supplemental data file [hg19 regions](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/450k_All_3_200_hg19.rds) or [hg38 regions](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/450k_All_3_200_hg38.rds) + + Both Genic and Inter-genic regions, EPIC array: downlaod the supplemental data file [hg19 regions](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/EPIC_10b4_All_3_200_hg19.rds) or [hg38 regions](https://github.com/TransBioInfoLab/coMethDMR_data/blob/main/data/EPIC_10b4_All_3_200_hg38.rds) + + The code that generated the above files are here 3. **Adjust Methylation for Covariates** with the `GetResiduals()` function; your methylation values may be confounded by clinical variables unrelated to your treatment, such as sex, age, or even [the square of age](https://www.nature.com/articles/s41598-021-88504-0) + *Input*: your methylation data and covariates from **Step 1** + *Output*: a matrix of methylation residuals in *probe by sample* form