diff --git a/rnapolii/modeling/.template.deposition.ipynb b/rnapolii/modeling/.template.deposition.ipynb index 4de916c..d005760 100644 --- a/rnapolii/modeling/.template.deposition.ipynb +++ b/rnapolii/modeling/.template.deposition.ipynb @@ -165,6 +165,7 @@ "import IMP.pmi.mmcif\n", "import ihm\n", "import ihm.location\n", + "import ihm.reference\n", "import ihm.model" ] }, @@ -737,6 +738,47 @@ "last_step.num_models_end = 200000" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add UniProt sequence information {#uniprot}\n", + "\n", + "Usually the sequences for each subunit we modeled are available in a reference database such as\n", + "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n", + "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n", + "python-ihm API to add ``ihm.reference.UniProtSequence`` objects. These are added per *entity*, not\n", + "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n", + "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n", + "names (without copy numbers) to ``ihm.Entity`` objects:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for subunit, accession in [('Rpb1', 'P04050'),\n", + " ('Rpb2', 'P08518')]:\n", + " ref = ihm.reference.UniProtSequence.from_accession(accession)\n", + " po.entities[subunit].references.append(ref)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, ``~ihm.reference.UniProtSequence.from_accession`` queries the UniProt API to get full information\n", + "(so requires a network connection). Alternatively, we could create ``ihm.reference.UniProtSequence``\n", + "objects outselves. Here we just populate the first two sequences for illustration.\n", + "\n", + "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n", + "between the two and any single-point mutations should be annotated with ``ihm.reference.Alignment``\n", + "and ``ihm.reference.SeqDif`` objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n", + "for an example." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1073,7 +1115,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1087,7 +1129,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.5" + "version": "3.12.5" } }, "nbformat": 4, diff --git a/rnapolii/modeling/deposition-colab.ipynb b/rnapolii/modeling/deposition-colab.ipynb index b7d074d..d81f7d1 100644 --- a/rnapolii/modeling/deposition-colab.ipynb +++ b/rnapolii/modeling/deposition-colab.ipynb @@ -22,6 +22,7 @@ " - [Polishing the deposition](#polishing)\n", " - [Cross-linker type](#xltype)\n", " - [Correct number of output models](#fixnummodel)\n", + " - [Add UniProt sequence information](#uniprot)\n", " - [Add model coordinates](#addcoords)\n", " - [Replace local links with DOIs](#adddois)\n", " - [Output](#output)\n", @@ -161,6 +162,7 @@ "import IMP.pmi.mmcif\n", "import ihm\n", "import ihm.location\n", + "import ihm.reference\n", "import ihm.model" ] }, @@ -729,6 +731,47 @@ "last_step.num_models_end = 200000" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add UniProt sequence information\n", + "\n", + "Usually the sequences for each subunit we modeled are available in a reference database such as\n", + "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n", + "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n", + "python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n", + "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n", + "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n", + "names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for subunit, accession in [('Rpb1', 'P04050'),\n", + " ('Rpb2', 'P08518')]:\n", + " ref = ihm.reference.UniProtSequence.from_accession(accession)\n", + " po.entities[subunit].references.append(ref)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n", + "(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n", + "objects outselves. Here we just populate the first two sequences for illustration.\n", + "\n", + "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n", + "between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n", + "and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n", + "for an example." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1065,7 +1108,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1079,7 +1122,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.5" + "version": "3.12.5" } }, "nbformat": 4, diff --git a/rnapolii/modeling/deposition.ipynb b/rnapolii/modeling/deposition.ipynb index b93e45c..f6e082f 100644 --- a/rnapolii/modeling/deposition.ipynb +++ b/rnapolii/modeling/deposition.ipynb @@ -22,6 +22,7 @@ " - [Polishing the deposition](#polishing)\n", " - [Cross-linker type](#xltype)\n", " - [Correct number of output models](#fixnummodel)\n", + " - [Add UniProt sequence information](#uniprot)\n", " - [Add model coordinates](#addcoords)\n", " - [Replace local links with DOIs](#adddois)\n", " - [Output](#output)\n", @@ -133,6 +134,7 @@ "import IMP.pmi.mmcif\n", "import ihm\n", "import ihm.location\n", + "import ihm.reference\n", "import ihm.model" ] }, @@ -701,6 +703,47 @@ "last_step.num_models_end = 200000" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Add UniProt sequence information\n", + "\n", + "Usually the sequences for each subunit we modeled are available in a reference database such as\n", + "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n", + "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n", + "python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n", + "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n", + "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n", + "names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for subunit, accession in [('Rpb1', 'P04050'),\n", + " ('Rpb2', 'P08518')]:\n", + " ref = ihm.reference.UniProtSequence.from_accession(accession)\n", + " po.entities[subunit].references.append(ref)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n", + "(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n", + "objects outselves. Here we just populate the first two sequences for illustration.\n", + "\n", + "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n", + "between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n", + "and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n", + "for an example." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1037,7 +1080,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1051,7 +1094,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.5" + "version": "3.12.5" } }, "nbformat": 4, diff --git a/rnapolii/modeling/deposition.py b/rnapolii/modeling/deposition.py index 621673e..d2d8f4d 100755 --- a/rnapolii/modeling/deposition.py +++ b/rnapolii/modeling/deposition.py @@ -6,6 +6,7 @@ import IMP.pmi.mmcif import ihm import ihm.location +import ihm.reference import ihm.model import IMP @@ -267,6 +268,11 @@ # Correct number of output models to account for multiple runs last_step.num_models_end = 200000 +for subunit, accession in [('Rpb1', 'P04050'), + ('Rpb2', 'P08518')]: + ref = ihm.reference.UniProtSequence.from_accession(accession) + po.entities[subunit].references.append(ref) + # Get last protocol in the file protocol = po.system.orphan_protocols[-1] # State that we filtered the 200000 frames down to one cluster of