-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maf documentation #73
base: develop
Are you sure you want to change the base?
Conversation
docs/Data/File_Formats/MAF_Format.md
Outdated
|---|---|---|---| | ||
|Annotated_somatic_mutation|Controlled |Annotated VCF|MAF produced from one caller at the aliquot level.| | ||
|Aggregated_somatic_mutation|Controlled |Aggregation of VCFs into one MAF file (*.protected.maf.gz)|Aggregation of aliquot-level MAFs| | ||
|Masked_somatic_mutation|Open\* |Filtered version of aggregated_somatic_mutation MAF (*.somatic.maf.gz)|Filtered aggregation of aliquot-level MAFs| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't us the protected or somatic naming anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the figure but the numerated list needs some work
docs/Data/File_Formats/MAF_Format.md
Outdated
* __deleterious_low_confidence__: Less likely to have a phenotypic effect than 'deleterious' | ||
### Aliquot-Level MAF Files (Data Release >17.0) | ||
|
||
Aliquot-level MAF files, annotated somatic mutations, are produced for each aliquot per variant caller. These files are then run through the Aliquot Ensemble Somatic Variant Merging and Masking workflow. There are a few filters that are applied at this step. The variants must be somatic, the variant size must be ≤ 50 bp, and it must pass the filters for the caller, except for MuSE which passes on filters for Tier 1-4 and the panel of normals. From this workflow two files are produced, aggregated somatic mutation and masked somatic mutation. The aggregated somatic mutation file is the aggregation of all variants from the multiple variant callers for each aliquot with these applied filters. The masked somatic mutation file is the aggregation of all variants from the multiple variant callers for each aliquot, which are then passed through a second filtering process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really for each tumor-normal pair of aliquots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for MuSE which passes on filters for Tier 1-4 and the panel of normals
This may confuse ppl cause this panel of normals is specific to MuTect2 which tags it in the VCF, but we also have our own PoN filter that is applied to all MAFs and isn't the one we are referring to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files are then run through the Aliquot Ensemble Somatic Variant Merging and Masking workflow.
Maybe should be explicit: For each tumor-normal pair, the per-caller Aliquot-level MAF files are then run though the...
edit: ok I see you clarify this at the end of the paragraph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: MuSE and the PoN @kmhernan
Which panel of normal is used for MuSE? Is it the MuTect2 PoN, the GDC filter PoN, or some mysterious 3rd PoN?
* The annotated somatic mutations aliquot-level MAF files, produced from the different callers, are merged into one raw merged aliquot-level MAF file. Then selection for the variants are made based on the following low quality variant filtering and germline masking: | ||
1. The variant must occur within at least two of the callers. | ||
2. Remaining variants with __FILTER != panel_of_normals__ are __removed__. Note that the `FILTER != panel_of_normals` value is only relevant for the variants generated from the MuTect2 pipeline. | ||
3. The __non-TCGA exac allele frequency__ variants (0.001; common\_in\_exac) are __kept__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, we remove variants with non-TCGA ExAC allele frequency > than the cutoff; however, they can be rescued based on item number 4
docs/Data/File_Formats/MAF_Format.md
Outdated
The process for modifying the aliquot-level MAF files into a masked somatic mutation aliquot-level MAF is as follows: | ||
|
||
* The annotated somatic mutations aliquot-level MAF files, produced from the different callers, are merged into one raw merged aliquot-level MAF file. Then selection for the variants are made based on the following low quality variant filtering and germline masking: | ||
1. The variant must occur within at least two of the callers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe: The variant must be supported by at least two of the callers
docs/Data/File_Formats/MAF_Format.md
Outdated
|
||
## MAF File Structure | ||
|
||
The MAF files structure can be found in the following github repository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MAF columns are defined in the following github repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a new link we can use for this? The one below cannot be accessed
No description provided.