diff --git a/rnapolii/modeling/.template.deposition.ipynb b/rnapolii/modeling/.template.deposition.ipynb
index 4de916c..d005760 100644
--- a/rnapolii/modeling/.template.deposition.ipynb
+++ b/rnapolii/modeling/.template.deposition.ipynb
@@ -165,6 +165,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
+ "import ihm.reference\n",
"import ihm.model"
]
},
@@ -737,6 +738,47 @@
"last_step.num_models_end = 200000"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Add UniProt sequence information {#uniprot}\n",
+ "\n",
+ "Usually the sequences for each subunit we modeled are available in a reference database such as\n",
+ "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
+ "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
+ "python-ihm API to add ``ihm.reference.UniProtSequence`` objects. These are added per *entity*, not\n",
+ "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
+ "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
+ "names (without copy numbers) to ``ihm.Entity`` objects:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for subunit, accession in [('Rpb1', 'P04050'),\n",
+ " ('Rpb2', 'P08518')]:\n",
+ " ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
+ " po.entities[subunit].references.append(ref)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here, ``~ihm.reference.UniProtSequence.from_accession`` queries the UniProt API to get full information\n",
+ "(so requires a network connection). Alternatively, we could create ``ihm.reference.UniProtSequence``\n",
+ "objects outselves. Here we just populate the first two sequences for illustration.\n",
+ "\n",
+ "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
+ "between the two and any single-point mutations should be annotated with ``ihm.reference.Alignment``\n",
+ "and ``ihm.reference.SeqDif`` objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
+ "for an example."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -1073,7 +1115,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -1087,7 +1129,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.9.5"
+ "version": "3.12.5"
}
},
"nbformat": 4,
diff --git a/rnapolii/modeling/deposition-colab.ipynb b/rnapolii/modeling/deposition-colab.ipynb
index b7d074d..d81f7d1 100644
--- a/rnapolii/modeling/deposition-colab.ipynb
+++ b/rnapolii/modeling/deposition-colab.ipynb
@@ -22,6 +22,7 @@
" - [Polishing the deposition](#polishing)\n",
" - [Cross-linker type](#xltype)\n",
" - [Correct number of output models](#fixnummodel)\n",
+ " - [Add UniProt sequence information](#uniprot)\n",
" - [Add model coordinates](#addcoords)\n",
" - [Replace local links with DOIs](#adddois)\n",
" - [Output](#output)\n",
@@ -161,6 +162,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
+ "import ihm.reference\n",
"import ihm.model"
]
},
@@ -729,6 +731,47 @@
"last_step.num_models_end = 200000"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Add UniProt sequence information\n",
+ "\n",
+ "Usually the sequences for each subunit we modeled are available in a reference database such as\n",
+ "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
+ "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
+ "python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n",
+ "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
+ "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
+ "names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for subunit, accession in [('Rpb1', 'P04050'),\n",
+ " ('Rpb2', 'P08518')]:\n",
+ " ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
+ " po.entities[subunit].references.append(ref)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n",
+ "(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n",
+ "objects outselves. Here we just populate the first two sequences for illustration.\n",
+ "\n",
+ "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
+ "between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n",
+ "and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
+ "for an example."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -1065,7 +1108,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -1079,7 +1122,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.9.5"
+ "version": "3.12.5"
}
},
"nbformat": 4,
diff --git a/rnapolii/modeling/deposition.ipynb b/rnapolii/modeling/deposition.ipynb
index b93e45c..f6e082f 100644
--- a/rnapolii/modeling/deposition.ipynb
+++ b/rnapolii/modeling/deposition.ipynb
@@ -22,6 +22,7 @@
" - [Polishing the deposition](#polishing)\n",
" - [Cross-linker type](#xltype)\n",
" - [Correct number of output models](#fixnummodel)\n",
+ " - [Add UniProt sequence information](#uniprot)\n",
" - [Add model coordinates](#addcoords)\n",
" - [Replace local links with DOIs](#adddois)\n",
" - [Output](#output)\n",
@@ -133,6 +134,7 @@
"import IMP.pmi.mmcif\n",
"import ihm\n",
"import ihm.location\n",
+ "import ihm.reference\n",
"import ihm.model"
]
},
@@ -701,6 +703,47 @@
"last_step.num_models_end = 200000"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Add UniProt sequence information\n",
+ "\n",
+ "Usually the sequences for each subunit we modeled are available in a reference database such as\n",
+ "[UniProt](https://www.uniprot.org/). IMP doesn't need to know the database accession codes in order\n",
+ "to perform the modeling, but it is useful to link them for the deposition. We can do this using the\n",
+ "python-ihm API to add [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence) objects. These are added per *entity*, not\n",
+ "per subunit (an entity has a unique sequence; if multiple subunits or copies have the same sequence,\n",
+ "they all map to the same entity). ProtocolOutput provides an `entities` dict, which maps our subunit\n",
+ "names (without copy numbers) to [ihm.Entity](https://python-ihm.readthedocs.io/en/latest/main.html#ihm.Entity) objects:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for subunit, accession in [('Rpb1', 'P04050'),\n",
+ " ('Rpb2', 'P08518')]:\n",
+ " ref = ihm.reference.UniProtSequence.from_accession(accession)\n",
+ " po.entities[subunit].references.append(ref)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here, [from_accession](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence.from_accession) queries the UniProt API to get full information\n",
+ "(so requires a network connection). Alternatively, we could create [ihm.reference.UniProtSequence](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.UniProtSequence)\n",
+ "objects outselves. Here we just populate the first two sequences for illustration.\n",
+ "\n",
+ "If for some reason the sequence modeled by IMP is different from that in UniProt, both the alignment\n",
+ "between the two and any single-point mutations should be annotated with [ihm.reference.Alignment](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.Alignment)\n",
+ "and [ihm.reference.SeqDif](https://python-ihm.readthedocs.io/en/latest/reference.html#ihm.reference.SeqDif) objects. See the [pol_ii_g scripts](https://github.com/integrativemodeling/pol_ii_g/blob/a416964fe024352352789d1be8fbd7cfd288832f/production_scripts/sample.py#L288-L305)\n",
+ "for an example."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -1037,7 +1080,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -1051,7 +1094,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.9.5"
+ "version": "3.12.5"
}
},
"nbformat": 4,
diff --git a/rnapolii/modeling/deposition.py b/rnapolii/modeling/deposition.py
index 621673e..d2d8f4d 100755
--- a/rnapolii/modeling/deposition.py
+++ b/rnapolii/modeling/deposition.py
@@ -6,6 +6,7 @@
import IMP.pmi.mmcif
import ihm
import ihm.location
+import ihm.reference
import ihm.model
import IMP
@@ -267,6 +268,11 @@
# Correct number of output models to account for multiple runs
last_step.num_models_end = 200000
+for subunit, accession in [('Rpb1', 'P04050'),
+ ('Rpb2', 'P08518')]:
+ ref = ihm.reference.UniProtSequence.from_accession(accession)
+ po.entities[subunit].references.append(ref)
+
# Get last protocol in the file
protocol = po.system.orphan_protocols[-1]
# State that we filtered the 200000 frames down to one cluster of