diff --git a/index.html b/index.html index 5393408..5ac58e5 100644 --- a/index.html +++ b/index.html @@ -317,6 +317,7 @@

Welcome to the icaparser Python
  • Strip JSON files from variants that do not pass quality criteria
  • Load all mutations from a stripped Illumina Connected Annotations JSON file
  • Filter mutations based on annotations and positions
  • +
  • Aggregate mutations to gene level
  • Create annotated tables of filtered mutations
  • See the examples and the API documentation for diff --git a/search/search_index.json b/search/search_index.json index 816f13a..88f25a5 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to the icaparser Python package","text":"

    This Python package provides functions for parsing JSON files created by Illumina's Connected Annotations (ICA)pipeline. ICA annotates mutations with \u201aa set of tools and data sources. This package allows to:

    See the examples and the API documentation for further details.

    "},{"location":"examples/","title":"Examples","text":""},{"location":"examples/#stripping-very-large-json-files","title":"Stripping very large JSON files","text":"

    Some JSON files from Illumina TSO panels (for example, TSO500) are not QC filtered and contain all detected genomic variants, irrespective of whether they pass the quality criteria. Such files can get very large, too large to be processed by any JSON parser. If your JSON file does not contain only QC-filtered variants (\"PASS\"), it needs to be stripped (filtered) first before using the icaparser module for further processing.

    The code below can be run in Python in a terminal or in a Jupyter notebook. Terminal is recommended.

    import icaparser as icap\nicap.strip_json_files(source_dir='../Data/Original', target_dir='../Data/Derived')\n
    "},{"location":"examples/#simple-example","title":"Simple example","text":"

    The code below is the Hello World example for reading and filtering ICA JSON files with default filtering rules. For more sophisticated filtering options, see the API reference.

    import icaparser as icap\njson_files = icap.get_dna_json_files('../Data/Derived')\nfirst_file = json_files[0]\n# Get the annotation data sources\nicap.get_data_sources(first_file)\n# Get pipeline run metadata\nicap.get_pipeline_metadata(json_files)\n# Get a mutation table\nmut_table = icap.get_mutation_table_for_files(json_files)\n
    "},{"location":"installation/","title":"Installation instructions","text":""},{"location":"installation/#installation-of-the-icaparser-package","title":"Installation of the icaparser package","text":"

    It is recommended to create a new virtual environment with Python >= 3.9 and to install the icaparser package in that environment. Activate the environment and run:

    pip install \"git+https://github.com/Bayer-Group/ica-parser.git#subdirectory=icaparser\"\n

    If you want to install a particular development branch, use

    pip install \"git+https://github.com/Bayer-Group/ica-parser.git@BRANCHNAME#subdirectory=icaparser\"\n

    If you use Jupyter notebooks, the virtual environment should be added as a new Jupyter kernel. See Using Virtual Environments in Jupyter Notebook and Python - Parametric Thoughts how to do that.

    "},{"location":"installation/#installation-of-ipywidgets","title":"Installation of ipywidgets","text":"

    Required for progress bars in Jupyter. Please refer to the Jupyter or JupyterLab documentation how to install the widgets. For example:

    conda install jupyter # if not installed yet\nconda install jupyterlab_widgets\njupyter labextension install jupyter-matplotlib\njupyter lab build\nexit\n

    \u2192 Restart Jupyter

    "},{"location":"reference/","title":"API documentation","text":"

    Parser for JSON files from Illumina Connected Annotations pipeline.

    "},{"location":"reference/#icaparser.icaparser.add_gene_types","title":"add_gene_types(positions)","text":"

    Adds the gene type to each transcript.

    Transcripts will be annotated with the gene type (oncogene, tsg, mixed) by adding a new attribute geneType. Only transcripts with one of these three gene types get this additional annotation. Other transcripts will not get the geneType attribute.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions = icap.add_gene_types(positions)\n
    "},{"location":"reference/#icaparser.icaparser.apply_mutation_classification_rules","title":"apply_mutation_classification_rules(positions, rule_set=get_default_mutation_classification_rules(), gene_type_map=get_default_gene_type_map(), hide_progress=False)","text":"

    Applies mutation classification rules to all positions.

    Each variant is categorized for each transcript that overlaps with the genomic position of the variant. Each transcript that passes the \"mutated\" or \"uncertain\" mutation classification rules gets a new attribute mutation_status with the value \"mutated\" or \"uncertain\". The input list of positions is modified by adding the mutation_status attribute to transcripts, and the modified list of positions is returned as the first element of the returned tuple.

    In addition to modifying and returning the list of positions, this function also returns the assembled mutation status after aggregating the impact on all transcripts covering a variant. This is returned as the second item of the returned tuple. The impact depends on the type of gene (\"gof\" or \"lof\"), so the impacts are assembled separately for each gene type.

    The impact of a particular mutational variant can be different for different overlapping transcript variants of a gene, and the transcript variants can also belong to different genes. The strongest impact on any overlapping transcript of a gene is defined as the impact of that mutational variant on the gene. The analyst must decide which isoforms are used to classify genes. For example, only canonical transcripts may be considered. Alternatively, all transcripts or a subset of transcripts may be used. Therefore, it is necessary to first apply transcript-level filters to all genomic positions before this function is called for determining the mutation status of genes.

    The returned value is a multi-dimensional dictionary:

    sample_id \u2192 gene \u2192 gene_type \u2192 variant_id \u2192 mutationStatus

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions, sample_muts = icap.apply_mutation_classification_rules(positions)\n
    "},{"location":"reference/#icaparser.icaparser.cleanup_cosmic","title":"cleanup_cosmic(positions)","text":"

    Remove Cosmic entries with alleles not matching the variant alleles.

    ICA attaches Cosmic entries to variants based on position only, which leads to wrong assignments of Cosmic entries to variants. This function removes all Cosmic entries from a variant for which reference and altered alleles do not match those of the variant.

    Filtering is done in place.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.common_variant_filter","title":"common_variant_filter(variant, max_af=0.001)","text":"

    Get a variant filter based on GnomAD, GnomAd Exome, and 1000 Genomes.

    Returns True if none of the maximum allele frequencies from GnomAD, GnomAD exomes and 1000 genomes is greater than max_af. The default value of 0.1 % for the maximum allele frequency corresponds to that of the AACR GENIE project.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.explode_consequence","title":"explode_consequence(mutation_table, inplace=False)","text":"

    Explode the VEP consequence column of a mutation table.

    Exploding the VEP consequence column with the standard Pandas explode() function would return consquences as strings, not as ordered categories. This function will instead return a consequence column which is an ordered category. The categories are ordered by their impact.

    Exploding means that if a row of the input table has multiple consequences in the consequence column, the list of consequences will be split into single consequences and the output table will have multiple rows with a single consequence per row.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.explode_consequence(mutation_table, inplace=True)\n
    >>> mutation_table_exploded = icap.explode_consequence(mutation_table)\n
    "},{"location":"reference/#icaparser.icaparser.filter_positions_by_transcripts","title":"filter_positions_by_transcripts(positions, filter_func)","text":"

    Filter positions based on a filter function for transcripts.

    Apply a filter function to all transcripts of each position. Transcripts not passing the filter are removed from the variants of a position. Variants without any transcript left are removed from a position. Positions without any variants left are removed from the returned list of positions.

    Parameters:

    Returns:

    Examples:

    >>> is_canonical_transcript = lambda x: x.get('isCanonical', False)\n>>> canonical_positions = icap.filter_positions_by_transcripts(\n        non_common_positions,\n        is_canonical_transcript\n    )\n
    "},{"location":"reference/#icaparser.icaparser.filter_positions_by_variants","title":"filter_positions_by_variants(positions, filter_func)","text":"

    Filter positions based on a filter function for variants.

    Apply a filter function to all variants of each position. Variants not passing the filter are removed from a position. Positions without any variants passing the filter are removed from the returned list.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> max_af = 0.001\n>>> is_not_common_variant = lambda x: icap.common_variant_filter(x, max_af)\n>>> non_common_positions = icap.filter_positions_by_variants(\n        positions,\n        is_not_common_variant\n    )\n
    "},{"location":"reference/#icaparser.icaparser.filter_variants_by_transcripts","title":"filter_variants_by_transcripts(variants, filter_func)","text":"

    Filter variants based on a filter function for transcripts.

    Apply a filter function to all transcripts of each variant. Transcripts not passing the filter are removed from a variant. Variants without any transcripts passing the filter are removed from the returned list.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_aggregated_mutation_table","title":"get_aggregated_mutation_table(positions, sample_muts=None, mutation_classification_rules=get_default_mutation_classification_rules(), mutation_aggregation_rules=get_default_mutation_aggregation_rules(), gene_type_map=get_default_gene_type_map(), hide_progress=False)","text":"

    Returns a sample-gene-mutationStatus table.

    This function applies mutation classification rules to all mutational variants and aggregates the mutations according to the aggregation rules. This results in a table with one row for each sample-gene pair. The table contains several columns with impacts according to lof and gof rules on allele level and gene level and with one additional column with the maximum impact for both allele and gene level.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_biotype_priority","title":"get_biotype_priority(biotype)","text":"

    Get the numeric priority of a biotype.

    The numeric priority of a biotype that is returned by this function is the same as defined by vcf2maf.pl by MSKCC. Biotypes are 'protein_coding', 'LRG_gene', ,'miRNA', ...

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_clinvar","title":"get_clinvar(variant)","text":"

    Get a table of all ClinVar annotations for a variant.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_clinvar_max_significance","title":"get_clinvar_max_significance(variant, ordered_significances=_CLINVAR_ORDERED_SIGNIFICANCES)","text":"

    Get the maximum signifinance for all ClinVar annotations of a variant.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_consequences","title":"get_consequences(transcript)","text":"

    Get a list of consequences for a transcript.

    A list of consequences of a variant for a transcript is returned. If any of the annotated consequences is a combination of single consequences, separated by ampersands (&) or commas, the consequence is split into single consequences.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_cosmic_max_sample_count","title":"get_cosmic_max_sample_count(variant, only_allele_specific=True)","text":"

    Get the maximum sample count for all Cosmic annotations of a variant.

    A variant can have no, one or multiple associated Cosmic identifiers. This function returns the maximum sample count of all Cosmic identifiers. For each Cosmic identifier, sample numbers are summed up across all indications. Returns 0 if no Cosmic identifier exists for this variant.

    The 'only_allele_specific' argument is used to exclude Cosmic entries that annotate the same chromosomal location but an allele that is different from the allele of the annotated variant. ICA annotates a variant with all Cosmic entries for that chromosomal location, irrespective of alleles. When counting Cosmic samples, this leads to an overestimation of Cosmic sample counts for a particular variant. Therefore, 'only_allele_specific' is True by default to count only samples from Cosmic entries with matching alleles. Occasionally, it may be desired, though, to count all samples with mutations at a given position, irrespective of allele. For example, several different alleles at a functional site of a gene can lead to function-disrupting mutations, so we want to get the maximum sample count for any allele at that position. One might also think of adding the sample counts for all Cosmic entries annotating a variant, but this does not work due to redundancy of Cosmic entries. Older Cosmic versions often included the same sample in different Cosmic entries. And newer Cosmic versions often have multiple entries for an allele, one for each transcript variant, with the same underlying samples.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_data_sources","title":"get_data_sources(file)","text":"

    Extract a table with annotation data sources from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_default_gene_type_map","title":"get_default_gene_type_map()","text":"

    Returns the default gene type map.

    The canonical gene types are gof, lof, and the union of both. Genes that need to be activated to drive a tumor are of type gof. Genes that need to be deactivated to drive a tumor are of type lof. Genes that need to be activated or deactivated depending on the context are of the union of both types. Genes for which it is unknown if they need to be activated or deactivated are also annotated with both types. Genes can be originally annotated with other type names than the canonical ones. The gene type map is used to map these other gene type names to the canonical gene types.

    The default map is:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_gene_type_map()\n
    "},{"location":"reference/#icaparser.icaparser.get_default_mutation_aggregation_rules","title":"get_default_mutation_aggregation_rules()","text":"

    Returns the default mutation aggregation rules.

    Two types of the mutation status of a gene are defined - allele level and gene level:

    For gain of function (gof) genes, the classifications at both the allele and gene levels are identical unless there is supplementary information about activating modifications beyond mutations. In contrast, for loss of function (lof) genes, classifications at the allele and gene levels may diverge. For instance, a truncating mutation in a tumor suppressor gene typically disrupts the function of the affected allele. However, other alleles of the same gene may remain functionally active, meaning the gene as a whole can still be operational, unless the mutated allele is a dominant negative variant. For a gene to be considered completely dysfunctional, all its alleles must be impaired, either through additional mutations or other mechanisms such as copy number deletions or hypermethylation. Consequently, a single variant that disrupts function at the allele level does not necessarily imply disruption at the gene level.

    For loss of function (lof) genes, the available information often falls short of allowing a reliable estimation of functional effects. As a result, heuristic rules must be employed, and the analyst is tasked with deciding whether to utilize allele-level or gene-level classifications. A lof gene is classified as functionally disrupted at gene level (strong impact) if it harbors at least two mutations, each either of strong impact or of uncertain impact. Should a lof gene possess only one such mutation, it is classified as having an uncertain impact at the gene level, regardless of whether the mutation exhibits a strong impact at the allele level. By differentiating the effects at both the allele and gene levels, we maintain the flexibility to determine in subsequent analyses how to consolidate these categories for further statistical evaluations.

    The function returns a dictionary containing two keys: gof and lof. Associated with each key is a function that accepts a dictionary of counts as its input and outputs a tuple comprising two elements: the mutation status at the allele level and at the gene level. The input dictionary of counts is expected to have two keys, mutated and uncertain. The value for each key represents the number of variants within a gene classified as mutated or uncertain, respectively.

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_mutation_aggregation_rules()\n
    "},{"location":"reference/#icaparser.icaparser.get_default_mutation_classification_rules","title":"get_default_mutation_classification_rules(cosmic_threshold=10)","text":"

    Returns the default rules for classifying mutations.

    Defines the default rules for classifying mutations. The returned dictionary has keys \"gof\" and \"lof\", and the respective values are the rule sets for these gene types. Each rule set is a dictionary with the keys \"mutated\" and \"uncertain\". The values for \"mutated\" or \"uncertain\" are dictionaries with three filter functions, a \"position_filter\", a \"variant_filter\", and a \"transcript_filter\". For example, a transcript will be called \"mutated\" if all three filters for \"mutated\" return True, and it will be called \"uncertain\", if all three filter functions for \"uncertain\" return True.

    These are the default rules returned by this function:

    GOF

    mutated: non-deleterious hotspot mutations.

    uncertain: non-deleterious mutations that aren't hotspots.

    LOF

    mutated: deleterious mutations (such as truncations, start or stop codon loss).

    uncertain: amino acid sequence modifying mutations that are not most likely deleterious. This includes missense mutations and in-frame insertions and deletions.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_mutation_classification_rules()\n>>> icap.get_default_mutation_classification_rules(cosmic_threshold=20)\n
    "},{"location":"reference/#icaparser.icaparser.get_dna_json_files","title":"get_dna_json_files(base_dir, pattern='*MergedVariants_Annotated_filtered.json.gz')","text":"

    Find DNA annotation JSON files in or below base_dir.

    Searches for ICA DNA annotation JSON files in and below base_dir. All file names matching pattern are returned.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gene_type","title":"get_gene_type(gene_symbol)","text":"

    Get the gene type (oncogene, tsg, mixed) for a gene.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_genes","title":"get_genes(file)","text":"

    Extract gene annotation from a ICA JSON file.

    The genes section of ICA JSON files is optional. If this section is not included in the file, an empty list is returned.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gnomad_exome_max_af","title":"get_gnomad_exome_max_af(variant, cohorts=['afr', 'amr', 'eas', 'nfe', 'sas'])","text":"

    Get the maximum allele frequency for gnomAD Exome.

    Get the maximum allele frequences across all major cohorts annotated by gnomAD, Exome excluding bottleneck populations (Ashkenazy Jews and Finish) and other.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gnomad_max_af","title":"get_gnomad_max_af(variant, cohorts=['afr', 'amr', 'eas', 'nfe', 'sas'])","text":"

    Get the maximum allele frequency for gnomAD.

    Get the maximum allele frequences across all major cohorts annotated by gnomAD, excluding bottleneck populations (Ashkenazy Jews and Finish) and other.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_header","title":"get_header(file)","text":"

    Extract the header element from a ICA JSON file.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_header_scalars","title":"get_header_scalars(file)","text":"

    Extract a table with all scalar attributes from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_max_af","title":"get_max_af(variant, source, cohorts=None)","text":"

    Get the maximum allele frequency for a particular annotation source.

    Get the maximum allele frequency across all cohorts annotated by the annotation source.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_max_af(variant, 'gnomad')\n
    "},{"location":"reference/#icaparser.icaparser.get_multi_sample_positions","title":"get_multi_sample_positions(files, *args, **kwargs)","text":"

    Extract all positions for a set of ICA JSON files.

    The sample id is stored as an additional new attribute of the samples element of a position. The samples element is a list, although ICA usually only creates single sample JSON files.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions = icap.get_multi_sample_positions(json_files)\n>>> print(positions[0]['samples'][0]['sampleId'])\n
    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_files","title":"get_mutation_table_for_files(json_files, max_af=0.001, min_vep_consequence_priority=6, min_cosmic_sample_count=0, only_canonical=False, extra_variant_filters=[], extra_transcript_filters=[])","text":"

    Get an annotated table of all filtered transcripts from a list of ICA JSON files.

    Load all positions from a list of ICA JSON files and filter them. Positions having any remaining variants and transcripts passing the filter are returned as an annotated table.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> extra_transcript_filters = [\n        lambda x: x.get('source', '') == 'Ensembl',\n        lambda x: x.get('hgnc', '') == 'KRAS'\n    ]\n>>> mut_table = icap.get_mutation_table_for_files(\n        json_files,\n        extra_transcript_filters=extra_transcript_filters\n    )\n
    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_position","title":"get_mutation_table_for_position(position)","text":"

    Get an annotated table of all transcripts for a single position.

    Returns an annotated table of all transcripts that are affected by a mutation at a position.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_positions","title":"get_mutation_table_for_positions(positions, hide_progress=False)","text":"

    Get an annotated table of all transcripts for all positions.

    Returns an annotated table of all transcripts that are affected by a mutation at any of the positions.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_onekg_max_af","title":"get_onekg_max_af(variant)","text":"

    Get the maximum allele frequency for the 1000 Genomes Project.

    Get the maximum allele frequences across all cohorts annotated by the 1000 Genomes Project.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_pipeline_metadata","title":"get_pipeline_metadata(files)","text":"

    Extract a table with metadata annotation pipeline run from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_position_by_coordinates","title":"get_position_by_coordinates(positions, chromosome, position)","text":"

    Extract a particular position from a position list.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_position_by_coordinates(positions, 'chr1', 204399064)\n
    "},{"location":"reference/#icaparser.icaparser.get_positions","title":"get_positions(file, variant_filters=[], transcript_filters=[])","text":"

    Extract all positions from a ICA JSON file.

    The sample id is stored as an additional new attribute of the samples element of a position. The samples element is a list, although ICA usually only creates single sample JSON files.

    Parameters:

    Returns:

    Examples:

    >>> transcript_filters = [\n        lambda x: x.get('source', '') == 'Ensembl',\n        lambda x: x.get('hgnc', '') == 'KRAS'\n    ]\n>>> positions = icap.get_sample_positions(\n        json_file,\n        transcript_filters = transcript_filters\n    )\n>>> print(positions[0]['samples'][0]['sampleId'])\n
    "},{"location":"reference/#icaparser.icaparser.get_sample","title":"get_sample(file, suffix='(-D[^.]*)?\\\\.bam')","text":"

    Extract the sample name from a ICA JSON file.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_name","title":"get_strongest_vep_consequence_name(transcript)","text":"

    Get the name of the strongest VEP consequence for a transcript.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_priority","title":"get_strongest_vep_consequence_priority(transcript)","text":"

    Get the strongest priority of VEP consequence for a transcript.

    Get the strongest numeric priority of all VEP consequences for a transcript. Smaller numeric priorities mean stronger impact.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_rank","title":"get_strongest_vep_consequence_rank(transcript)","text":"

    Get the strongest rank of VEP consequences for a transcript.

    Get the strongest numeric rank of all VEP consequences for a transcript. Smaller ranks mean stronger impact.

    The priority of consequences is taken into account first. So if two consequences have different priorities, the consequence with the higher priority (lower priority number) will be used, and the rank for this consequence will be returned. If there are multiple consequences with the same priority, the lowest (strongest) rank will be returned.

    For clarification: ranks are unique, i.e. all VEP consequences ordered as listed on the VEP documentation page get the row number of this table assigned as rank.

    However, several consequences can have the same priority (e.g., stop gained and frameshift have the same priority). Priorities are copied from vcf2maf.pl of MSKCC.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_vep_consequence_for_rank","title":"get_vep_consequence_for_rank(rank)","text":"

    Get the VEP consequence term of a numeric rank.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_vep_priority_for_consequence","title":"get_vep_priority_for_consequence(consequence)","text":"

    Get the numeric priority of a VEP consequence term.

    The numeric priority of a consequence that is returned by this function is the same as defined by vcf2maf.pl of MSKCC.

    Parameters:

    Returns: the priority of the consequence, smaller values mean higher priority.

    "},{"location":"reference/#icaparser.icaparser.get_vep_rank_for_consequence","title":"get_vep_rank_for_consequence(consequence)","text":"

    Get the numeric rank of a VEP consequence term.

    The numeric rank of a consequence is the position of the consequence in this list of consequences for the Variant Effect Predictor VEP.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.split_multi_sample_json_file","title":"split_multi_sample_json_file(json_file, output_dir)","text":"

    Splits a multi-sample JSON file into sample specific JSON files.

    This function reads a multi-sample JSON file that was generated by annotating a multi-sample VCF file with ICA and splits it into sample-specific JSON files.

    Annotating very many single-sample VCF files with ICA is very time consuming, because ICA reads all annotation sources for each VCF file and this is dominating the runtime of ICA. It is therefore helpful to first merge many single-sample VCF files into one or a small number of multi-sample VCF files (for example, with bcftools merge), to annotate the multi-sample VCF file with ICA, and then to split the multi-sample JSON output of ICA into single-sample JSON files. These single-sample JSON files are required for the rest of this package.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.strip_json_file","title":"strip_json_file(ifname, ofname)","text":"

    Reduce the JSON file size by keeping only 'PASS' variants.

    JSON files from Illumina's ICA pipeline can be very large because they contain any deviation from the reference genome, irrespective of the quality of the mutation call. Gzip compressed JSON files with sizes in the gigabyte range cannot be processed by JSON packages that read the entire file into memory. It is necessary to first reduce the size of JSON files by removing all variants that do not meet Illumina's quality criteria.

    This function reads a single JSON file and creates a single JSON outpout file by removing all variants that do not pass Illumina's quality criteria.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.strip_json_files","title":"strip_json_files(source_dir, target_dir, pattern='*.json.gz')","text":"

    Strip all JSON files of a project by keeping only 'PASS' variants.

    JSON files from Illumina's ICA pipeline can be very large because they contain any deviation from the reference genome, irrespective of the quality of the mutation call. Gzip compressed JSON files with sizes in the gigabyte range cannot be processed by JSON packages that read the entire file into memory. It is necessary to first reduce the size of JSON files by removing all variants that do not meet Illumina's quality criteria.

    This function searches source_dir recursively for all files matching the file_pattern. Each of those files is processed and a stripped version keeping only variants that PASS Illumina's quality criteria is created. The output file has the same name as the input file. The directory structure below source_dir is replicated in target_dir. Output files get the suffix '_filtered.json.gz'.

    Parameters:

    Returns:

    Examples:

    >>> strip_json_files('../Data/Original', '../Data/Derived')\n
    "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to the icaparser Python package","text":"

    This Python package provides functions for parsing JSON files created by Illumina's Connected Annotations (ICA)pipeline. ICA annotates mutations with \u201aa set of tools and data sources. This package allows to:

    See the examples and the API documentation for further details.

    "},{"location":"examples/","title":"Examples","text":""},{"location":"examples/#stripping-very-large-json-files","title":"Stripping very large JSON files","text":"

    Some JSON files from Illumina TSO panels (for example, TSO500) are not QC filtered and contain all detected genomic variants, irrespective of whether they pass the quality criteria. Such files can get very large, too large to be processed by any JSON parser. If your JSON file does not contain only QC-filtered variants (\"PASS\"), it needs to be stripped (filtered) first before using the icaparser module for further processing.

    The code below can be run in Python in a terminal or in a Jupyter notebook. Terminal is recommended.

    import icaparser as icap\nicap.strip_json_files(source_dir='../Data/Original', target_dir='../Data/Derived')\n
    "},{"location":"examples/#simple-example","title":"Simple example","text":"

    The code below is the Hello World example for reading and filtering ICA JSON files with default filtering rules. For more sophisticated filtering options, see the API reference.

    import icaparser as icap\njson_files = icap.get_dna_json_files('../Data/Derived')\nfirst_file = json_files[0]\n# Get the annotation data sources\nicap.get_data_sources(first_file)\n# Get pipeline run metadata\nicap.get_pipeline_metadata(json_files)\n# Get a mutation table\nmut_table = icap.get_mutation_table_for_files(json_files)\n
    "},{"location":"installation/","title":"Installation instructions","text":""},{"location":"installation/#installation-of-the-icaparser-package","title":"Installation of the icaparser package","text":"

    It is recommended to create a new virtual environment with Python >= 3.9 and to install the icaparser package in that environment. Activate the environment and run:

    pip install \"git+https://github.com/Bayer-Group/ica-parser.git#subdirectory=icaparser\"\n

    If you want to install a particular development branch, use

    pip install \"git+https://github.com/Bayer-Group/ica-parser.git@BRANCHNAME#subdirectory=icaparser\"\n

    If you use Jupyter notebooks, the virtual environment should be added as a new Jupyter kernel. See Using Virtual Environments in Jupyter Notebook and Python - Parametric Thoughts how to do that.

    "},{"location":"installation/#installation-of-ipywidgets","title":"Installation of ipywidgets","text":"

    Required for progress bars in Jupyter. Please refer to the Jupyter or JupyterLab documentation how to install the widgets. For example:

    conda install jupyter # if not installed yet\nconda install jupyterlab_widgets\njupyter labextension install jupyter-matplotlib\njupyter lab build\nexit\n

    \u2192 Restart Jupyter

    "},{"location":"reference/","title":"API documentation","text":"

    Parser for JSON files from Illumina Connected Annotations pipeline.

    "},{"location":"reference/#icaparser.icaparser.add_gene_types","title":"add_gene_types(positions)","text":"

    Adds the gene type to each transcript.

    Transcripts will be annotated with the gene type (oncogene, tsg, mixed) by adding a new attribute geneType. Only transcripts with one of these three gene types get this additional annotation. Other transcripts will not get the geneType attribute.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions = icap.add_gene_types(positions)\n
    "},{"location":"reference/#icaparser.icaparser.apply_mutation_classification_rules","title":"apply_mutation_classification_rules(positions, rule_set=get_default_mutation_classification_rules(), gene_type_map=get_default_gene_type_map(), hide_progress=False)","text":"

    Applies mutation classification rules to all positions.

    Each variant is categorized for each transcript that overlaps with the genomic position of the variant. Each transcript that passes the \"mutated\" or \"uncertain\" mutation classification rules gets a new attribute mutation_status with the value \"mutated\" or \"uncertain\". The input list of positions is modified by adding the mutation_status attribute to transcripts, and the modified list of positions is returned as the first element of the returned tuple.

    In addition to modifying and returning the list of positions, this function also returns the assembled mutation status after aggregating the impact on all transcripts covering a variant. This is returned as the second item of the returned tuple. The impact depends on the type of gene (\"gof\" or \"lof\"), so the impacts are assembled separately for each gene type.

    The impact of a particular mutational variant can be different for different overlapping transcript variants of a gene, and the transcript variants can also belong to different genes. The strongest impact on any overlapping transcript of a gene is defined as the impact of that mutational variant on the gene. The analyst must decide which isoforms are used to classify genes. For example, only canonical transcripts may be considered. Alternatively, all transcripts or a subset of transcripts may be used. Therefore, it is necessary to first apply transcript-level filters to all genomic positions before this function is called for determining the mutation status of genes.

    The returned value is a multi-dimensional dictionary:

    sample_id \u2192 gene \u2192 gene_type \u2192 variant_id \u2192 mutationStatus

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions, sample_muts = icap.apply_mutation_classification_rules(positions)\n
    "},{"location":"reference/#icaparser.icaparser.cleanup_cosmic","title":"cleanup_cosmic(positions)","text":"

    Remove Cosmic entries with alleles not matching the variant alleles.

    ICA attaches Cosmic entries to variants based on position only, which leads to wrong assignments of Cosmic entries to variants. This function removes all Cosmic entries from a variant for which reference and altered alleles do not match those of the variant.

    Filtering is done in place.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.common_variant_filter","title":"common_variant_filter(variant, max_af=0.001)","text":"

    Get a variant filter based on GnomAD, GnomAd Exome, and 1000 Genomes.

    Returns True if none of the maximum allele frequencies from GnomAD, GnomAD exomes and 1000 genomes is greater than max_af. The default value of 0.1 % for the maximum allele frequency corresponds to that of the AACR GENIE project.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.explode_consequence","title":"explode_consequence(mutation_table, inplace=False)","text":"

    Explode the VEP consequence column of a mutation table.

    Exploding the VEP consequence column with the standard Pandas explode() function would return consquences as strings, not as ordered categories. This function will instead return a consequence column which is an ordered category. The categories are ordered by their impact.

    Exploding means that if a row of the input table has multiple consequences in the consequence column, the list of consequences will be split into single consequences and the output table will have multiple rows with a single consequence per row.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.explode_consequence(mutation_table, inplace=True)\n
    >>> mutation_table_exploded = icap.explode_consequence(mutation_table)\n
    "},{"location":"reference/#icaparser.icaparser.filter_positions_by_transcripts","title":"filter_positions_by_transcripts(positions, filter_func)","text":"

    Filter positions based on a filter function for transcripts.

    Apply a filter function to all transcripts of each position. Transcripts not passing the filter are removed from the variants of a position. Variants without any transcript left are removed from a position. Positions without any variants left are removed from the returned list of positions.

    Parameters:

    Returns:

    Examples:

    >>> is_canonical_transcript = lambda x: x.get('isCanonical', False)\n>>> canonical_positions = icap.filter_positions_by_transcripts(\n        non_common_positions,\n        is_canonical_transcript\n    )\n
    "},{"location":"reference/#icaparser.icaparser.filter_positions_by_variants","title":"filter_positions_by_variants(positions, filter_func)","text":"

    Filter positions based on a filter function for variants.

    Apply a filter function to all variants of each position. Variants not passing the filter are removed from a position. Positions without any variants passing the filter are removed from the returned list.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> max_af = 0.001\n>>> is_not_common_variant = lambda x: icap.common_variant_filter(x, max_af)\n>>> non_common_positions = icap.filter_positions_by_variants(\n        positions,\n        is_not_common_variant\n    )\n
    "},{"location":"reference/#icaparser.icaparser.filter_variants_by_transcripts","title":"filter_variants_by_transcripts(variants, filter_func)","text":"

    Filter variants based on a filter function for transcripts.

    Apply a filter function to all transcripts of each variant. Transcripts not passing the filter are removed from a variant. Variants without any transcripts passing the filter are removed from the returned list.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_aggregated_mutation_table","title":"get_aggregated_mutation_table(positions, sample_muts=None, mutation_classification_rules=get_default_mutation_classification_rules(), mutation_aggregation_rules=get_default_mutation_aggregation_rules(), gene_type_map=get_default_gene_type_map(), hide_progress=False)","text":"

    Returns a sample-gene-mutationStatus table.

    This function applies mutation classification rules to all mutational variants and aggregates the mutations according to the aggregation rules. This results in a table with one row for each sample-gene pair. The table contains several columns with impacts according to lof and gof rules on allele level and gene level and with one additional column with the maximum impact for both allele and gene level.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_biotype_priority","title":"get_biotype_priority(biotype)","text":"

    Get the numeric priority of a biotype.

    The numeric priority of a biotype that is returned by this function is the same as defined by vcf2maf.pl by MSKCC. Biotypes are 'protein_coding', 'LRG_gene', ,'miRNA', ...

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_clinvar","title":"get_clinvar(variant)","text":"

    Get a table of all ClinVar annotations for a variant.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_clinvar_max_significance","title":"get_clinvar_max_significance(variant, ordered_significances=_CLINVAR_ORDERED_SIGNIFICANCES)","text":"

    Get the maximum signifinance for all ClinVar annotations of a variant.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_consequences","title":"get_consequences(transcript)","text":"

    Get a list of consequences for a transcript.

    A list of consequences of a variant for a transcript is returned. If any of the annotated consequences is a combination of single consequences, separated by ampersands (&) or commas, the consequence is split into single consequences.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_cosmic_max_sample_count","title":"get_cosmic_max_sample_count(variant, only_allele_specific=True)","text":"

    Get the maximum sample count for all Cosmic annotations of a variant.

    A variant can have no, one or multiple associated Cosmic identifiers. This function returns the maximum sample count of all Cosmic identifiers. For each Cosmic identifier, sample numbers are summed up across all indications. Returns 0 if no Cosmic identifier exists for this variant.

    The 'only_allele_specific' argument is used to exclude Cosmic entries that annotate the same chromosomal location but an allele that is different from the allele of the annotated variant. ICA annotates a variant with all Cosmic entries for that chromosomal location, irrespective of alleles. When counting Cosmic samples, this leads to an overestimation of Cosmic sample counts for a particular variant. Therefore, 'only_allele_specific' is True by default to count only samples from Cosmic entries with matching alleles. Occasionally, it may be desired, though, to count all samples with mutations at a given position, irrespective of allele. For example, several different alleles at a functional site of a gene can lead to function-disrupting mutations, so we want to get the maximum sample count for any allele at that position. One might also think of adding the sample counts for all Cosmic entries annotating a variant, but this does not work due to redundancy of Cosmic entries. Older Cosmic versions often included the same sample in different Cosmic entries. And newer Cosmic versions often have multiple entries for an allele, one for each transcript variant, with the same underlying samples.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_data_sources","title":"get_data_sources(file)","text":"

    Extract a table with annotation data sources from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_default_gene_type_map","title":"get_default_gene_type_map()","text":"

    Returns the default gene type map.

    The canonical gene types are gof, lof, and the union of both. Genes that need to be activated to drive a tumor are of type gof. Genes that need to be deactivated to drive a tumor are of type lof. Genes that need to be activated or deactivated depending on the context are of the union of both types. Genes for which it is unknown if they need to be activated or deactivated are also annotated with both types. Genes can be originally annotated with other type names than the canonical ones. The gene type map is used to map these other gene type names to the canonical gene types.

    The default map is:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_gene_type_map()\n
    "},{"location":"reference/#icaparser.icaparser.get_default_mutation_aggregation_rules","title":"get_default_mutation_aggregation_rules()","text":"

    Returns the default mutation aggregation rules.

    Two types of the mutation status of a gene are defined - allele level and gene level:

    For gain of function (gof) genes, the classifications at both the allele and gene levels are identical unless there is supplementary information about activating modifications beyond mutations. In contrast, for loss of function (lof) genes, classifications at the allele and gene levels may diverge. For instance, a truncating mutation in a tumor suppressor gene typically disrupts the function of the affected allele. However, other alleles of the same gene may remain functionally active, meaning the gene as a whole can still be operational, unless the mutated allele is a dominant negative variant. For a gene to be considered completely dysfunctional, all its alleles must be impaired, either through additional mutations or other mechanisms such as copy number deletions or hypermethylation. Consequently, a single variant that disrupts function at the allele level does not necessarily imply disruption at the gene level.

    For loss of function (lof) genes, the available information often falls short of allowing a reliable estimation of functional effects. As a result, heuristic rules must be employed, and the analyst is tasked with deciding whether to utilize allele-level or gene-level classifications. A lof gene is classified as functionally disrupted at gene level (strong impact) if it harbors at least two mutations, each either of strong impact or of uncertain impact. Should a lof gene possess only one such mutation, it is classified as having an uncertain impact at the gene level, regardless of whether the mutation exhibits a strong impact at the allele level. By differentiating the effects at both the allele and gene levels, we maintain the flexibility to determine in subsequent analyses how to consolidate these categories for further statistical evaluations.

    The function returns a dictionary containing two keys: gof and lof. Associated with each key is a function that accepts a dictionary of counts as its input and outputs a tuple comprising two elements: the mutation status at the allele level and at the gene level. The input dictionary of counts is expected to have two keys, mutated and uncertain. The value for each key represents the number of variants within a gene classified as mutated or uncertain, respectively.

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_mutation_aggregation_rules()\n
    "},{"location":"reference/#icaparser.icaparser.get_default_mutation_classification_rules","title":"get_default_mutation_classification_rules(cosmic_threshold=10)","text":"

    Returns the default rules for classifying mutations.

    Defines the default rules for classifying mutations. The returned dictionary has keys \"gof\" and \"lof\", and the respective values are the rule sets for these gene types. Each rule set is a dictionary with the keys \"mutated\" and \"uncertain\". The values for \"mutated\" or \"uncertain\" are dictionaries with three filter functions, a \"position_filter\", a \"variant_filter\", and a \"transcript_filter\". For example, a transcript will be called \"mutated\" if all three filters for \"mutated\" return True, and it will be called \"uncertain\", if all three filter functions for \"uncertain\" return True.

    These are the default rules returned by this function:

    GOF

    mutated: non-deleterious hotspot mutations.

    uncertain: non-deleterious mutations that aren't hotspots.

    LOF

    mutated: deleterious mutations (such as truncations, start or stop codon loss).

    uncertain: amino acid sequence modifying mutations that are not most likely deleterious. This includes missense mutations and in-frame insertions and deletions.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_default_mutation_classification_rules()\n>>> icap.get_default_mutation_classification_rules(cosmic_threshold=20)\n
    "},{"location":"reference/#icaparser.icaparser.get_dna_json_files","title":"get_dna_json_files(base_dir, pattern='*MergedVariants_Annotated_filtered.json.gz')","text":"

    Find DNA annotation JSON files in or below base_dir.

    Searches for ICA DNA annotation JSON files in and below base_dir. All file names matching pattern are returned.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gene_type","title":"get_gene_type(gene_symbol)","text":"

    Get the gene type (oncogene, tsg, mixed) for a gene.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_genes","title":"get_genes(file)","text":"

    Extract gene annotation from a ICA JSON file.

    The genes section of ICA JSON files is optional. If this section is not included in the file, an empty list is returned.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gnomad_exome_max_af","title":"get_gnomad_exome_max_af(variant, cohorts=['afr', 'amr', 'eas', 'nfe', 'sas'])","text":"

    Get the maximum allele frequency for gnomAD Exome.

    Get the maximum allele frequences across all major cohorts annotated by gnomAD, Exome excluding bottleneck populations (Ashkenazy Jews and Finish) and other.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_gnomad_max_af","title":"get_gnomad_max_af(variant, cohorts=['afr', 'amr', 'eas', 'nfe', 'sas'])","text":"

    Get the maximum allele frequency for gnomAD.

    Get the maximum allele frequences across all major cohorts annotated by gnomAD, excluding bottleneck populations (Ashkenazy Jews and Finish) and other.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_header","title":"get_header(file)","text":"

    Extract the header element from a ICA JSON file.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_header_scalars","title":"get_header_scalars(file)","text":"

    Extract a table with all scalar attributes from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_max_af","title":"get_max_af(variant, source, cohorts=None)","text":"

    Get the maximum allele frequency for a particular annotation source.

    Get the maximum allele frequency across all cohorts annotated by the annotation source.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_max_af(variant, 'gnomad')\n
    "},{"location":"reference/#icaparser.icaparser.get_multi_sample_positions","title":"get_multi_sample_positions(files, *args, **kwargs)","text":"

    Extract all positions for a set of ICA JSON files.

    The sample id is stored as an additional new attribute of the samples element of a position. The samples element is a list, although ICA usually only creates single sample JSON files.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> positions = icap.get_multi_sample_positions(json_files)\n>>> print(positions[0]['samples'][0]['sampleId'])\n
    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_files","title":"get_mutation_table_for_files(json_files, max_af=0.001, min_vep_consequence_priority=6, min_cosmic_sample_count=0, only_canonical=False, extra_variant_filters=[], extra_transcript_filters=[])","text":"

    Get an annotated table of all filtered transcripts from a list of ICA JSON files.

    Load all positions from a list of ICA JSON files and filter them. Positions having any remaining variants and transcripts passing the filter are returned as an annotated table.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> extra_transcript_filters = [\n        lambda x: x.get('source', '') == 'Ensembl',\n        lambda x: x.get('hgnc', '') == 'KRAS'\n    ]\n>>> mut_table = icap.get_mutation_table_for_files(\n        json_files,\n        extra_transcript_filters=extra_transcript_filters\n    )\n
    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_position","title":"get_mutation_table_for_position(position)","text":"

    Get an annotated table of all transcripts for a single position.

    Returns an annotated table of all transcripts that are affected by a mutation at a position.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_mutation_table_for_positions","title":"get_mutation_table_for_positions(positions, hide_progress=False)","text":"

    Get an annotated table of all transcripts for all positions.

    Returns an annotated table of all transcripts that are affected by a mutation at any of the positions.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_onekg_max_af","title":"get_onekg_max_af(variant)","text":"

    Get the maximum allele frequency for the 1000 Genomes Project.

    Get the maximum allele frequences across all cohorts annotated by the 1000 Genomes Project.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_pipeline_metadata","title":"get_pipeline_metadata(files)","text":"

    Extract a table with metadata annotation pipeline run from the JSON header.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_position_by_coordinates","title":"get_position_by_coordinates(positions, chromosome, position)","text":"

    Extract a particular position from a position list.

    Parameters:

    Returns:

    Examples:

    >>> import icaparser as icap\n>>> icap.get_position_by_coordinates(positions, 'chr1', 204399064)\n
    "},{"location":"reference/#icaparser.icaparser.get_positions","title":"get_positions(file, variant_filters=[], transcript_filters=[])","text":"

    Extract all positions from a ICA JSON file.

    The sample id is stored as an additional new attribute of the samples element of a position. The samples element is a list, although ICA usually only creates single sample JSON files.

    Parameters:

    Returns:

    Examples:

    >>> transcript_filters = [\n        lambda x: x.get('source', '') == 'Ensembl',\n        lambda x: x.get('hgnc', '') == 'KRAS'\n    ]\n>>> positions = icap.get_sample_positions(\n        json_file,\n        transcript_filters = transcript_filters\n    )\n>>> print(positions[0]['samples'][0]['sampleId'])\n
    "},{"location":"reference/#icaparser.icaparser.get_sample","title":"get_sample(file, suffix='(-D[^.]*)?\\\\.bam')","text":"

    Extract the sample name from a ICA JSON file.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_name","title":"get_strongest_vep_consequence_name(transcript)","text":"

    Get the name of the strongest VEP consequence for a transcript.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_priority","title":"get_strongest_vep_consequence_priority(transcript)","text":"

    Get the strongest priority of VEP consequence for a transcript.

    Get the strongest numeric priority of all VEP consequences for a transcript. Smaller numeric priorities mean stronger impact.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_strongest_vep_consequence_rank","title":"get_strongest_vep_consequence_rank(transcript)","text":"

    Get the strongest rank of VEP consequences for a transcript.

    Get the strongest numeric rank of all VEP consequences for a transcript. Smaller ranks mean stronger impact.

    The priority of consequences is taken into account first. So if two consequences have different priorities, the consequence with the higher priority (lower priority number) will be used, and the rank for this consequence will be returned. If there are multiple consequences with the same priority, the lowest (strongest) rank will be returned.

    For clarification: ranks are unique, i.e. all VEP consequences ordered as listed on the VEP documentation page get the row number of this table assigned as rank.

    However, several consequences can have the same priority (e.g., stop gained and frameshift have the same priority). Priorities are copied from vcf2maf.pl of MSKCC.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_vep_consequence_for_rank","title":"get_vep_consequence_for_rank(rank)","text":"

    Get the VEP consequence term of a numeric rank.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.get_vep_priority_for_consequence","title":"get_vep_priority_for_consequence(consequence)","text":"

    Get the numeric priority of a VEP consequence term.

    The numeric priority of a consequence that is returned by this function is the same as defined by vcf2maf.pl of MSKCC.

    Parameters:

    Returns: the priority of the consequence, smaller values mean higher priority.

    "},{"location":"reference/#icaparser.icaparser.get_vep_rank_for_consequence","title":"get_vep_rank_for_consequence(consequence)","text":"

    Get the numeric rank of a VEP consequence term.

    The numeric rank of a consequence is the position of the consequence in this list of consequences for the Variant Effect Predictor VEP.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.split_multi_sample_json_file","title":"split_multi_sample_json_file(json_file, output_dir)","text":"

    Splits a multi-sample JSON file into sample specific JSON files.

    This function reads a multi-sample JSON file that was generated by annotating a multi-sample VCF file with ICA and splits it into sample-specific JSON files.

    Annotating very many single-sample VCF files with ICA is very time consuming, because ICA reads all annotation sources for each VCF file and this is dominating the runtime of ICA. It is therefore helpful to first merge many single-sample VCF files into one or a small number of multi-sample VCF files (for example, with bcftools merge), to annotate the multi-sample VCF file with ICA, and then to split the multi-sample JSON output of ICA into single-sample JSON files. These single-sample JSON files are required for the rest of this package.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.strip_json_file","title":"strip_json_file(ifname, ofname)","text":"

    Reduce the JSON file size by keeping only 'PASS' variants.

    JSON files from Illumina's ICA pipeline can be very large because they contain any deviation from the reference genome, irrespective of the quality of the mutation call. Gzip compressed JSON files with sizes in the gigabyte range cannot be processed by JSON packages that read the entire file into memory. It is necessary to first reduce the size of JSON files by removing all variants that do not meet Illumina's quality criteria.

    This function reads a single JSON file and creates a single JSON outpout file by removing all variants that do not pass Illumina's quality criteria.

    Parameters:

    Returns:

    "},{"location":"reference/#icaparser.icaparser.strip_json_files","title":"strip_json_files(source_dir, target_dir, pattern='*.json.gz')","text":"

    Strip all JSON files of a project by keeping only 'PASS' variants.

    JSON files from Illumina's ICA pipeline can be very large because they contain any deviation from the reference genome, irrespective of the quality of the mutation call. Gzip compressed JSON files with sizes in the gigabyte range cannot be processed by JSON packages that read the entire file into memory. It is necessary to first reduce the size of JSON files by removing all variants that do not meet Illumina's quality criteria.

    This function searches source_dir recursively for all files matching the file_pattern. Each of those files is processed and a stripped version keeping only variants that PASS Illumina's quality criteria is created. The output file has the same name as the input file. The directory structure below source_dir is replicated in target_dir. Output files get the suffix '_filtered.json.gz'.

    Parameters:

    Returns:

    Examples:

    >>> strip_json_files('../Data/Original', '../Data/Derived')\n
    "}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index cf7cbb8..b1a4ca1 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ