##ColabFold v1.5.3: AlphaFold2 using MMseqs2
+Easy to use protein structure and complex prediction using AlphaFold2 and Alphafold2-multimer. Sequence alignments/templates are generated through MMseqs2 and HHsearch. For more details, see bottom of the notebook, checkout the ColabFold GitHub and read our manuscript. +Old versions: v1.4, v1.5.1, v1.5.2
+ ++
News#
+-
+
2023/07/31: The ColabFold MSA server is back to normal. It was using older DB (UniRef30 2202/PDB70 220313) from 27th ~8:30 AM CEST to 31st ~11:10 AM CEST.
+2023/06/12: New databases! UniRef30 updated to 2023_02 and PDB to 230517. We now use PDB100 instead of PDB70 (see notes).
+2023/06/12: We introduced a new default pairing strategy: Previously, for multimer predictions with more than 2 chains, we only pair if all sequences taxonomically match (“complete” pairing). The new default “greedy” strategy pairs any taxonomically matching subsets.
+
# @title Input protein sequence(s), then hit `Runtime` -> `Run all` { display-mode: "form" }
+from google.colab import files
+import os
+import re
+import hashlib
+import random
+
+from sys import version_info
+python_version = f"{version_info.major}.{version_info.minor}"
+
+def add_hash(x,y):
+ return x+"_"+hashlib.sha1(y.encode()).hexdigest()[:5]
+
+query_sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK' #@param {type:"string"}
+#@markdown - Use `:` to specify inter-protein chainbreaks for **modeling complexes** (supports homo- and hetro-oligomers). For example **PI...SK:PI...SK** for a homodimer
+jobname = 'test' #@param {type:"string"}
+# number of models to use
+num_relax = 0 #@param [0, 1, 5] {type:"raw"}
+#@markdown - specify how many of the top ranked structures to relax using amber
+template_mode = "none" #@param ["none", "pdb100","custom"]
+#@markdown - `none` = no template information is used. `pdb100` = detect templates in pdb100 (see [notes](#pdb100)). `custom` - upload and search own templates (PDB or mmCIF format, see [notes](#custom_templates))
+
+use_amber = num_relax > 0
+
+# remove whitespaces
+query_sequence = "".join(query_sequence.split())
+
+basejobname = "".join(jobname.split())
+basejobname = re.sub(r'\W+', '', basejobname)
+jobname = add_hash(basejobname, query_sequence)
+
+# check if directory with jobname exists
+def check(folder):
+ if os.path.exists(folder):
+ return False
+ else:
+ return True
+if not check(jobname):
+ n = 0
+ while not check(f"{jobname}_{n}"): n += 1
+ jobname = f"{jobname}_{n}"
+
+# make directory to save results
+os.makedirs(jobname, exist_ok=True)
+
+# save queries
+queries_path = os.path.join(jobname, f"{jobname}.csv")
+with open(queries_path, "w") as text_file:
+ text_file.write(f"id,sequence\n{jobname},{query_sequence}")
+
+if template_mode == "pdb100":
+ use_templates = True
+ custom_template_path = None
+elif template_mode == "custom":
+ custom_template_path = os.path.join(jobname,f"template")
+ os.makedirs(custom_template_path, exist_ok=True)
+ uploaded = files.upload()
+ use_templates = True
+ for fn in uploaded.keys():
+ os.rename(fn,os.path.join(custom_template_path,fn))
+else:
+ custom_template_path = None
+ use_templates = False
+
+print("jobname",jobname)
+print("sequence",query_sequence)
+print("length",len(query_sequence.replace(":","")))
+
jobname test_a5e17
+sequence PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
+length 59
+
# @title Install dependencies { display-mode: "form" }
+%%time
+import os
+USE_AMBER = use_amber
+USE_TEMPLATES = use_templates
+PYTHON_VERSION = python_version
+
+if not os.path.isfile("COLABFOLD_READY"):
+ print("installing colabfold...")
+ os.system("pip install -q --no-warn-conflicts 'colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold'")
+ os.system("pip install --upgrade dm-haiku")
+ os.system("ln -s /usr/local/lib/python3.*/dist-packages/colabfold colabfold")
+ os.system("ln -s /usr/local/lib/python3.*/dist-packages/alphafold alphafold")
+ # patch for jax > 0.3.25
+ os.system("sed -i 's/weights = jax.nn.softmax(logits)/logits=jnp.clip(logits,-1e8,1e8);weights=jax.nn.softmax(logits)/g' alphafold/model/modules.py")
+ os.system("touch COLABFOLD_READY")
+
+if USE_AMBER or USE_TEMPLATES:
+ if not os.path.isfile("CONDA_READY"):
+ print("installing conda...")
+ os.system("wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh")
+ os.system("bash Mambaforge-Linux-x86_64.sh -bfp /usr/local")
+ os.system("mamba config --set auto_update_conda false")
+ os.system("touch CONDA_READY")
+
+if USE_TEMPLATES and not os.path.isfile("HH_READY") and USE_AMBER and not os.path.isfile("AMBER_READY"):
+ print("installing hhsuite and amber...")
+ os.system(f"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer")
+ os.system("touch HH_READY")
+ os.system("touch AMBER_READY")
+else:
+ if USE_TEMPLATES and not os.path.isfile("HH_READY"):
+ print("installing hhsuite...")
+ os.system(f"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python='{PYTHON_VERSION}'")
+ os.system("touch HH_READY")
+ if USE_AMBER and not os.path.isfile("AMBER_READY"):
+ print("installing amber...")
+ os.system(f"mamba install -y -c conda-forge openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer")
+ os.system("touch AMBER_READY")
+
installing colabfold...
+CPU times: user 114 ms, sys: 27.2 ms, total: 142 ms
+Wall time: 42.9 s
+
#@markdown ### MSA options (custom MSA upload, single sequence, pairing mode)
+msa_mode = "mmseqs2_uniref_env" #@param ["mmseqs2_uniref_env", "mmseqs2_uniref","single_sequence","custom"]
+pair_mode = "unpaired_paired" #@param ["unpaired_paired","paired","unpaired"] {type:"string"}
+#@markdown - "unpaired_paired" = pair sequences from same species + unpaired MSA, "unpaired" = seperate MSA for each chain, "paired" - only use paired sequences.
+
+# decide which a3m to use
+if "mmseqs2" in msa_mode:
+ a3m_file = os.path.join(jobname,f"{jobname}.a3m")
+
+elif msa_mode == "custom":
+ a3m_file = os.path.join(jobname,f"{jobname}.custom.a3m")
+ if not os.path.isfile(a3m_file):
+ custom_msa_dict = files.upload()
+ custom_msa = list(custom_msa_dict.keys())[0]
+ header = 0
+ import fileinput
+ for line in fileinput.FileInput(custom_msa,inplace=1):
+ if line.startswith(">"):
+ header = header + 1
+ if not line.rstrip():
+ continue
+ if line.startswith(">") == False and header == 1:
+ query_sequence = line.rstrip()
+ print(line, end='')
+
+ os.rename(custom_msa, a3m_file)
+ queries_path=a3m_file
+ print(f"moving {custom_msa} to {a3m_file}")
+
+else:
+ a3m_file = os.path.join(jobname,f"{jobname}.single_sequence.a3m")
+ with open(a3m_file, "w") as text_file:
+ text_file.write(">1\n%s" % query_sequence)
+
# @title { display-mode: "form" }
+#@markdown ### Advanced settings
+model_type = "auto" #@param ["auto", "alphafold2_ptm", "alphafold2_multimer_v1", "alphafold2_multimer_v2", "alphafold2_multimer_v3"]
+#@markdown - if `auto` selected, will use `alphafold2_ptm` for monomer prediction and `alphafold2_multimer_v3` for complex prediction.
+#@markdown Any of the mode_types can be used (regardless if input is monomer or complex).
+num_recycles = "3" #@param ["auto", "0", "1", "3", "6", "12", "24", "48"]
+#@markdown - if `auto` selected, will use `num_recycles=20` if `model_type=alphafold2_multimer_v3`, else `num_recycles=3` .
+recycle_early_stop_tolerance = "auto" #@param ["auto", "0.0", "0.5", "1.0"]
+#@markdown - if `auto` selected, will use `tol=0.5` if `model_type=alphafold2_multimer_v3` else `tol=0.0`.
+relax_max_iterations = 200 #@param [0, 200, 2000] {type:"raw"}
+#@markdown - max amber relax iterations, `0` = unlimited (AlphaFold2 default, can take very long)
+pairing_strategy = "greedy" #@param ["greedy", "complete"] {type:"string"}
+#@markdown - `greedy` = pair any taxonomically matching subsets, `complete` = all sequences have to match in one line.
+
+
+#@markdown #### Sample settings
+#@markdown - enable dropouts and increase number of seeds to sample predictions from uncertainty of the model.
+#@markdown - decrease `max_msa` to increase uncertainity
+max_msa = "auto" #@param ["auto", "512:1024", "256:512", "64:128", "32:64", "16:32"]
+num_seeds = 1 #@param [1,2,4,8,16] {type:"raw"}
+use_dropout = False #@param {type:"boolean"}
+
+num_recycles = None if num_recycles == "auto" else int(num_recycles)
+recycle_early_stop_tolerance = None if recycle_early_stop_tolerance == "auto" else float(recycle_early_stop_tolerance)
+if max_msa == "auto": max_msa = None
+
+#@markdown #### Save settings
+save_all = False #@param {type:"boolean"}
+save_recycles = False #@param {type:"boolean"}
+save_to_google_drive = False #@param {type:"boolean"}
+#@markdown - if the save_to_google_drive option was selected, the result zip will be uploaded to your Google Drive
+dpi = 200 #@param {type:"integer"}
+#@markdown - set dpi for image resolution
+
+if save_to_google_drive:
+ from pydrive.drive import GoogleDrive
+ from pydrive.auth import GoogleAuth
+ from google.colab import auth
+ from oauth2client.client import GoogleCredentials
+ auth.authenticate_user()
+ gauth = GoogleAuth()
+ gauth.credentials = GoogleCredentials.get_application_default()
+ drive = GoogleDrive(gauth)
+ print("You are logged into Google Drive and are good to go!")
+
+#@markdown Don't forget to hit `Runtime` -> `Run all` after updating the form.
+
#@title Run Prediction
+display_images = True #@param {type:"boolean"}
+
+import sys
+import warnings
+warnings.simplefilter(action='ignore', category=FutureWarning)
+from Bio import BiopythonDeprecationWarning
+warnings.simplefilter(action='ignore', category=BiopythonDeprecationWarning)
+from pathlib import Path
+from colabfold.download import download_alphafold_params, default_data_dir
+from colabfold.utils import setup_logging
+from colabfold.batch import get_queries, run, set_model_type
+from colabfold.plot import plot_msa_v2
+
+import os
+import numpy as np
+try:
+ K80_chk = os.popen('nvidia-smi | grep "Tesla K80" | wc -l').read()
+except:
+ K80_chk = "0"
+ pass
+if "1" in K80_chk:
+ print("WARNING: found GPU Tesla K80: limited to total length < 1000")
+ if "TF_FORCE_UNIFIED_MEMORY" in os.environ:
+ del os.environ["TF_FORCE_UNIFIED_MEMORY"]
+ if "XLA_PYTHON_CLIENT_MEM_FRACTION" in os.environ:
+ del os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]
+
+from colabfold.colabfold import plot_protein
+from pathlib import Path
+import matplotlib.pyplot as plt
+
+# For some reason we need that to get pdbfixer to import
+if use_amber and f"/usr/local/lib/python{python_version}/site-packages/" not in sys.path:
+ sys.path.insert(0, f"/usr/local/lib/python{python_version}/site-packages/")
+
+def input_features_callback(input_features):
+ if display_images:
+ plot_msa_v2(input_features)
+ plt.show()
+ plt.close()
+
+def prediction_callback(protein_obj, length,
+ prediction_result, input_features, mode):
+ model_name, relaxed = mode
+ if not relaxed:
+ if display_images:
+ fig = plot_protein(protein_obj, Ls=length, dpi=150)
+ plt.show()
+ plt.close()
+
+result_dir = jobname
+log_filename = os.path.join(jobname,"log.txt")
+setup_logging(Path(log_filename))
+
+queries, is_complex = get_queries(queries_path)
+model_type = set_model_type(is_complex, model_type)
+
+if "multimer" in model_type and max_msa is not None:
+ use_cluster_profile = False
+else:
+ use_cluster_profile = True
+
+download_alphafold_params(model_type, Path("."))
+results = run(
+ queries=queries,
+ result_dir=result_dir,
+ use_templates=use_templates,
+ custom_template_path=custom_template_path,
+ num_relax=num_relax,
+ msa_mode=msa_mode,
+ model_type=model_type,
+ num_models=5,
+ num_recycles=num_recycles,
+ relax_max_iterations=relax_max_iterations,
+ recycle_early_stop_tolerance=recycle_early_stop_tolerance,
+ num_seeds=num_seeds,
+ use_dropout=use_dropout,
+ model_order=[1,2,3,4,5],
+ is_complex=is_complex,
+ data_dir=Path("."),
+ keep_existing_results=False,
+ rank_by="auto",
+ pair_mode=pair_mode,
+ pairing_strategy=pairing_strategy,
+ stop_at_score=float(100),
+ prediction_callback=prediction_callback,
+ dpi=dpi,
+ zip_results=False,
+ save_all=save_all,
+ max_msa=max_msa,
+ use_cluster_profile=use_cluster_profile,
+ input_features_callback=input_features_callback,
+ save_recycles=save_recycles,
+ user_agent="colabfold/google-colab-main",
+)
+results_zip = f"{jobname}.result.zip"
+os.system(f"zip -r {results_zip} {jobname}")
+
Downloading alphafold2 weights to .: 100%|██████████| 3.47G/3.47G [02:40<00:00, 23.2MB/s]
+
2023-12-07 22:02:42,821 Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
+2023-12-07 22:02:42,823 Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
+2023-12-07 22:02:44,726 Running on GPU
+2023-12-07 22:02:44,904 Found 4 citations for tools or databases
+2023-12-07 22:02:44,904 Query 1/1: test_a5e17 (length 59)
+
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:01 remaining: 00:00]
+
2023-12-07 22:02:47,182 Setting max_seq=512, max_extra_seq=5120
+
---------------------------------------------------------------------------
+KeyboardInterrupt Traceback (most recent call last)
+<ipython-input-5-06098a2c69e9> in <cell line: 65>()
+ 63
+ 64 download_alphafold_params(model_type, Path("."))
+---> 65 results = run(
+ 66 queries=queries,
+ 67 result_dir=result_dir,
+
+/content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, relax_max_iterations, relax_tolerance, relax_stiffness, relax_max_outer_iterations, keep_existing_results, rank_by, pair_mode, pairing_strategy, data_dir, host_url, user_agent, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, pdb_hit_file, local_pdb_path, use_cluster_profile, feature_dict_callback, **kwargs)
+ 1568 first_job = False
+ 1569
+-> 1570 results = predict_structure(
+ 1571 prefix=jobname,
+ 1572 result_dir=result_dir,
+
+/content/colabfold/batch.py in predict_structure(prefix, result_dir, feature_dict, is_complex, use_templates, sequences_lengths, pad_len, model_type, model_runner_and_params, num_relax, relax_max_iterations, relax_tolerance, relax_stiffness, relax_max_outer_iterations, rank_by, random_seed, num_seeds, stop_at_score, prediction_callback, use_gpu_relax, save_all, save_single_representations, save_pair_representations, save_recycles)
+ 419 # predict
+ 420 result, recycles = \
+--> 421 model_runner.predict(input_features,
+ 422 random_seed=seed,
+ 423 return_representations=return_representations,
+
+/content/alphafold/model/model.py in predict(self, feat, random_seed, return_representations, callback)
+ 183 # run
+ 184 key, sub_key = jax.random.split(key)
+--> 185 result, prev = run(sub_key, sub_feat, prev)
+ 186
+ 187 if return_representations:
+
+/content/alphafold/model/model.py in run(key, feat, prev)
+ 163 x[k] = np.asarray(v,np.float16)
+ 164 return x
+--> 165 result = _jnp_to_np(self.apply(self.params, key, {**feat, "prev":prev}))
+ 166 prev = result.pop("prev")
+ 167 return result, prev
+
+ [... skipping hidden 1 frame]
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in cache_miss(*args, **kwargs)
+ 254 @api_boundary
+ 255 def cache_miss(*args, **kwargs):
+--> 256 outs, out_flat, out_tree, args_flat, jaxpr = _python_pjit_helper(
+ 257 fun, infer_params_fn, *args, **kwargs)
+ 258 executable = _read_most_recent_pjit_call_executable(jaxpr)
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in _python_pjit_helper(fun, infer_params_fn, *args, **kwargs)
+ 165 dispatch.check_arg(arg)
+ 166 try:
+--> 167 out_flat = pjit_p.bind(*args_flat, **params)
+ 168 except pxla.DeviceAssignmentMismatchError as e:
+ 169 fails, = e.args
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/core.py in bind(self, *args, **params)
+ 2654 top_trace = (top_trace if not axis_main or axis_main.level < top_trace.level
+ 2655 else axis_main.with_cur_sublevel())
+-> 2656 return self.bind_with_trace(top_trace, args, params)
+ 2657
+ 2658
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/core.py in bind_with_trace(self, trace, args, params)
+ 386
+ 387 def bind_with_trace(self, trace, args, params):
+--> 388 out = trace.process_primitive(self, map(trace.full_raise, args), params)
+ 389 return map(full_lower, out) if self.multiple_results else full_lower(out)
+ 390
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/core.py in process_primitive(self, primitive, tracers, params)
+ 866
+ 867 def process_primitive(self, primitive, tracers, params):
+--> 868 return primitive.impl(*tracers, **params)
+ 869
+ 870 def process_call(self, primitive, f, tracers, params):
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in _pjit_call_impl(jaxpr, in_shardings, out_shardings, resource_env, donated_invars, name, keep_unused, inline, *args)
+ 1210 has_explicit_sharding = _pjit_explicit_sharding(
+ 1211 in_shardings, out_shardings, None, None)
+-> 1212 return xc._xla.pjit(name, f, call_impl_cache_miss, [], [], donated_argnums,
+ 1213 tree_util.dispatch_registry,
+ 1214 _get_cpp_global_cache(has_explicit_sharding))(*args)
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in call_impl_cache_miss(*args_, **kwargs_)
+ 1194 donated_invars, name, keep_unused, inline):
+ 1195 def call_impl_cache_miss(*args_, **kwargs_):
+-> 1196 out_flat, compiled = _pjit_call_impl_python(
+ 1197 *args, jaxpr=jaxpr, in_shardings=in_shardings,
+ 1198 out_shardings=out_shardings, resource_env=resource_env,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in _pjit_call_impl_python(jaxpr, in_shardings, out_shardings, resource_env, donated_invars, name, keep_unused, inline, *args)
+ 1127 resource_env.physical_mesh if resource_env is not None else None)
+ 1128
+-> 1129 compiled = _pjit_lower(
+ 1130 jaxpr, in_shardings, out_shardings, resource_env,
+ 1131 donated_invars, name, keep_unused, inline,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in _pjit_lower(jaxpr, in_shardings, out_shardings, *args, **kwargs)
+ 1258 in_shardings = SameDeviceAssignmentTuple(tuple(in_shardings), da)
+ 1259 out_shardings = SameDeviceAssignmentTuple(tuple(out_shardings), da)
+-> 1260 return _pjit_lower_cached(jaxpr, in_shardings, out_shardings, *args, **kwargs)
+ 1261
+ 1262
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/pjit.py in _pjit_lower_cached(jaxpr, sdat_in_shardings, sdat_out_shardings, resource_env, donated_invars, name, keep_unused, inline, lowering_parameters)
+ 1297 lowering_parameters=lowering_parameters)
+ 1298 else:
+-> 1299 return pxla.lower_sharding_computation(
+ 1300 jaxpr, api_name, name, in_shardings, out_shardings,
+ 1301 tuple(donated_invars), tuple(jaxpr.in_avals),
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/profiler.py in wrapper(*args, **kwargs)
+ 338 def wrapper(*args, **kwargs):
+ 339 with TraceAnnotation(name, **decorator_kwargs):
+--> 340 return func(*args, **kwargs)
+ 341 return wrapper
+ 342 return wrapper
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/pxla.py in lower_sharding_computation(fun_or_jaxpr, api_name, fun_name, in_shardings, out_shardings, donated_invars, global_in_avals, keep_unused, inline, devices_from_context, lowering_parameters)
+ 2029 semantic_out_shardings = SemanticallyEqualShardings(out_shardings)
+ 2030 (module, keepalive, host_callbacks, unordered_effects, ordered_effects,
+-> 2031 nreps, tuple_args, shape_poly_state) = _cached_lowering_to_hlo(
+ 2032 closed_jaxpr, api_name, fun_name, backend, semantic_in_shardings,
+ 2033 semantic_out_shardings, da_object,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/pxla.py in _cached_lowering_to_hlo(closed_jaxpr, api_name, fun_name, backend, semantic_in_shardings, semantic_out_shardings, da_object, donated_invars, name_stack, all_default_mem_kind, lowering_parameters)
+ 1830 "Finished jaxpr to MLIR module conversion {fun_name} in {elapsed_time} sec",
+ 1831 fun_name=str(name_stack), event=dispatch.JAXPR_TO_MLIR_MODULE_EVENT):
+-> 1832 lowering_result = mlir.lower_jaxpr_to_module(
+ 1833 module_name,
+ 1834 closed_jaxpr,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in lower_jaxpr_to_module(module_name, jaxpr, ordered_effects, backend_or_name, platforms, axis_context, name_stack, donated_args, replicated_args, arg_shardings, result_shardings, arg_names, result_names, num_replicas, num_partitions, all_default_mem_kind, lowering_parameters)
+ 804 attrs["mhlo.num_partitions"] = i32_attr(num_partitions)
+ 805 replace_tokens_with_dummy = lowering_parameters.replace_tokens_with_dummy
+--> 806 lower_jaxpr_to_fun(
+ 807 ctx, "main", jaxpr, ordered_effects, public=True,
+ 808 create_tokens=replace_tokens_with_dummy,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in lower_jaxpr_to_fun(ctx, name, jaxpr, effects, create_tokens, public, replace_tokens_with_dummy, replicated_args, arg_shardings, result_shardings, use_sharding_annotations, input_output_aliases, num_output_tokens, api_name, arg_names, result_names, arg_memory_kinds, result_memory_kinds)
+ 1211 callee_name_stack = ctx.name_stack.extend(util.wrap_name(name, api_name))
+ 1212 consts = [ir_constants(xla.canonicalize_dtype(x)) for x in jaxpr.consts]
+-> 1213 out_vals, tokens_out = jaxpr_subcomp(
+ 1214 ctx.replace(name_stack=callee_name_stack), jaxpr.jaxpr, tokens_in,
+ 1215 consts, *args, dim_var_values=dim_var_values)
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in jaxpr_subcomp(ctx, jaxpr, tokens, consts, dim_var_values, *args)
+ 1429 if len(ctx.platforms) == 1:
+ 1430 # Classic, single-platform lowering
+-> 1431 ans = rule(rule_ctx, *rule_inputs, **eqn.params)
+ 1432 else:
+ 1433 ans = lower_multi_platform(rule_ctx, str(eqn), rules,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in f_lowered(ctx, *args, **params)
+ 1626 # TODO(frostig,mattjj): check ctx.avals_out against jaxpr avals out?
+ 1627
+-> 1628 out, tokens = jaxpr_subcomp(
+ 1629 ctx.module_context, jaxpr, ctx.tokens_in, _ir_consts(consts),
+ 1630 *map(wrap_singleton_ir_values, args), dim_var_values=ctx.dim_var_values)
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in jaxpr_subcomp(ctx, jaxpr, tokens, consts, dim_var_values, *args)
+ 1429 if len(ctx.platforms) == 1:
+ 1430 # Classic, single-platform lowering
+-> 1431 ans = rule(rule_ctx, *rule_inputs, **eqn.params)
+ 1432 else:
+ 1433 ans = lower_multi_platform(rule_ctx, str(eqn), rules,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/lax/control_flow/loops.py in _while_lowering(ctx, cond_jaxpr, body_jaxpr, cond_nconsts, body_nconsts, *args)
+ 1670 body_consts = [mlir.ir_constants(xla.canonicalize_dtype(x))
+ 1671 for x in body_jaxpr.consts]
+-> 1672 new_z, tokens_out = mlir.jaxpr_subcomp(body_ctx, body_jaxpr.jaxpr,
+ 1673 tokens_in, body_consts, *(y + z), dim_var_values=ctx.dim_var_values)
+ 1674 out_tokens = [tokens_out.get(eff) for eff in body_effects]
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in jaxpr_subcomp(ctx, jaxpr, tokens, consts, dim_var_values, *args)
+ 1429 if len(ctx.platforms) == 1:
+ 1430 # Classic, single-platform lowering
+-> 1431 ans = rule(rule_ctx, *rule_inputs, **eqn.params)
+ 1432 else:
+ 1433 ans = lower_multi_platform(rule_ctx, str(eqn), rules,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in f_lowered(ctx, *args, **params)
+ 1626 # TODO(frostig,mattjj): check ctx.avals_out against jaxpr avals out?
+ 1627
+-> 1628 out, tokens = jaxpr_subcomp(
+ 1629 ctx.module_context, jaxpr, ctx.tokens_in, _ir_consts(consts),
+ 1630 *map(wrap_singleton_ir_values, args), dim_var_values=ctx.dim_var_values)
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in jaxpr_subcomp(ctx, jaxpr, tokens, consts, dim_var_values, *args)
+ 1429 if len(ctx.platforms) == 1:
+ 1430 # Classic, single-platform lowering
+-> 1431 ans = rule(rule_ctx, *rule_inputs, **eqn.params)
+ 1432 else:
+ 1433 ans = lower_multi_platform(rule_ctx, str(eqn), rules,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/lax/control_flow/loops.py in _while_lowering(ctx, cond_jaxpr, body_jaxpr, cond_nconsts, body_nconsts, *args)
+ 1632 mlir.ir_constants(xla.canonicalize_dtype(x)) for x in cond_jaxpr.consts
+ 1633 ]
+-> 1634 ((pred,),), _ = mlir.jaxpr_subcomp(
+ 1635 cond_ctx,
+ 1636 cond_jaxpr.jaxpr,
+
+/usr/local/lib/python3.10/dist-packages/jax/_src/interpreters/mlir.py in jaxpr_subcomp(ctx, jaxpr, tokens, consts, dim_var_values, *args)
+ 1313 return func_op
+ 1314
+-> 1315 def jaxpr_subcomp(ctx: ModuleContext, jaxpr: core.Jaxpr,
+ 1316 tokens: TokenSet,
+ 1317 consts: Sequence[Sequence[ir.Value]],
+
+KeyboardInterrupt:
+
#@title Display 3D structure {run: "auto"}
+import py3Dmol
+import glob
+import matplotlib.pyplot as plt
+from colabfold.colabfold import plot_plddt_legend
+from colabfold.colabfold import pymol_color_list, alphabet_list
+rank_num = 1 #@param ["1", "2", "3", "4", "5"] {type:"raw"}
+color = "lDDT" #@param ["chain", "lDDT", "rainbow"]
+show_sidechains = False #@param {type:"boolean"}
+show_mainchains = False #@param {type:"boolean"}
+
+tag = results["rank"][0][rank_num - 1]
+jobname_prefix = ".custom" if msa_mode == "custom" else ""
+pdb_filename = f"{jobname}/{jobname}{jobname_prefix}_unrelaxed_{tag}.pdb"
+pdb_file = glob.glob(pdb_filename)
+
+def show_pdb(rank_num=1, show_sidechains=False, show_mainchains=False, color="lDDT"):
+ model_name = f"rank_{rank_num}"
+ view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)
+ view.addModel(open(pdb_file[0],'r').read(),'pdb')
+
+ if color == "lDDT":
+ view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':50,'max':90}}})
+ elif color == "rainbow":
+ view.setStyle({'cartoon': {'color':'spectrum'}})
+ elif color == "chain":
+ chains = len(queries[0][1]) + 1 if is_complex else 1
+ for n,chain,color in zip(range(chains),alphabet_list,pymol_color_list):
+ view.setStyle({'chain':chain},{'cartoon': {'color':color}})
+
+ if show_sidechains:
+ BB = ['C','O','N']
+ view.addStyle({'and':[{'resn':["GLY","PRO"],'invert':True},{'atom':BB,'invert':True}]},
+ {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
+ view.addStyle({'and':[{'resn':"GLY"},{'atom':'CA'}]},
+ {'sphere':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
+ view.addStyle({'and':[{'resn':"PRO"},{'atom':['C','O'],'invert':True}]},
+ {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
+ if show_mainchains:
+ BB = ['C','O','N','CA']
+ view.addStyle({'atom':BB},{'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
+
+ view.zoomTo()
+ return view
+
+show_pdb(rank_num, show_sidechains, show_mainchains, color).show()
+if color == "lDDT":
+ plot_plddt_legend().show()
+
#@title Plots {run: "auto"}
+from IPython.display import display, HTML
+import base64
+from html import escape
+
+# see: https://stackoverflow.com/a/53688522
+def image_to_data_url(filename):
+ ext = filename.split('.')[-1]
+ prefix = f'data:image/{ext};base64,'
+ with open(filename, 'rb') as f:
+ img = f.read()
+ return prefix + base64.b64encode(img).decode('utf-8')
+
+pae = image_to_data_url(os.path.join(jobname,f"{jobname}{jobname_prefix}_pae.png"))
+cov = image_to_data_url(os.path.join(jobname,f"{jobname}{jobname_prefix}_coverage.png"))
+plddt = image_to_data_url(os.path.join(jobname,f"{jobname}{jobname_prefix}_plddt.png"))
+display(HTML(f"""
+<style>
+ img {{
+ float:left;
+ }}
+ .full {{
+ max-width:100%;
+ }}
+ .half {{
+ max-width:50%;
+ }}
+ @media (max-width:640px) {{
+ .half {{
+ max-width:100%;
+ }}
+ }}
+</style>
+<div style="max-width:90%; padding:2em;">
+ <h1>Plots for {escape(jobname)}</h1>
+ <img src="{pae}" class="full" />
+ <img src="{cov}" class="half" />
+ <img src="{plddt}" class="half" />
+</div>
+"""))
+
#@title Package and download results
+#@markdown If you are having issues downloading the result archive, try disabling your adblocker and run this cell again. If that fails click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \"Download\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).
+
+if msa_mode == "custom":
+ print("Don't forget to cite your custom MSA generation method.")
+
+files.download(f"{jobname}.result.zip")
+
+if save_to_google_drive == True and drive:
+ uploaded = drive.CreateFile({'title': f"{jobname}.result.zip"})
+ uploaded.SetContentFile(f"{jobname}.result.zip")
+ uploaded.Upload()
+ print(f"Uploaded {jobname}.result.zip to Google Drive with ID {uploaded.get('id')}")
+
Instructions #
+Quick start
+-
+
Paste your protein sequence(s) in the input field.
+Press “Runtime” -> “Run all”.
+The pipeline consists of 5 steps. The currently running step is indicated by a circle with a stop sign next to it.
+
Result zip file contents
+-
+
PDB formatted structures sorted by avg. pLDDT and complexes are sorted by pTMscore. (unrelaxed and relaxed if
use_amber
is enabled).
+Plots of the model quality.
+Plots of the MSA coverage.
+Parameter log file.
+A3M formatted input MSA.
+A
predicted_aligned_error_v1.json
using AlphaFold-DB’s format and ascores.json
for each model which contains an array (list of lists) for PAE, a list with the average pLDDT and the pTMscore.
+BibTeX file with citations for all used tools and databases.
+
At the end of the job a download modal box will pop up with a jobname.result.zip
file. Additionally, if the save_to_google_drive
option was selected, the jobname.result.zip
will be uploaded to your Google Drive.
MSA generation for complexes
+For the complex prediction we use unpaired and paired MSAs. Unpaired MSA is generated the same way as for the protein structures prediction by searching the UniRef100 and environmental sequences three iterations each.
+The paired MSA is generated by searching the UniRef100 database and pairing the best hits sharing the same NCBI taxonomic identifier (=species or sub-species). We only pair sequences if all of the query sequences are present for the respective taxonomic identifier.
+Using a custom MSA as input
+To predict the structure with a custom MSA (A3M formatted): (1) Change the msa_mode
: to “custom”, (2) Wait for an upload box to appear at the end of the “MSA options …” box. Upload your A3M. The first fasta entry of the A3M must be the query sequence without gaps.
It is also possilbe to proide custom MSAs for complex predictions. Read more about the format here.
+As an alternative for MSA generation the HHblits Toolkit server can be used. After submitting your query, click “Query Template MSA” -> “Download Full A3M”. Download the A3M file and upload it in this notebook.
+ +As of 23/06/08, we have transitioned from using the PDB70 to a 100% clustered PDB, the PDB100. The construction methodology of PDB100 differs from that of PDB70.
+The PDB70 was constructed by running each PDB70 representative sequence through HHblits against the Uniclust30. On the other hand, the PDB100 is built by searching each PDB100 representative structure with Foldseek against the AlphaFold Database.
+To maintain compatibility with older Notebook versions and local installations, the generated files and API responses will continue to be named “PDB70”, even though we’re now using the PDB100.
+ +To predict the structure with a custom template (PDB or mmCIF formatted): (1) change the template_mode
to “custom” in the execute cell and (2) wait for an upload box to appear at the end of the “Input Protein” box. Select and upload your templates (multiple choices are possible).
-
+
Templates must follow the four letter PDB naming with lower case letters.
+Templates in mmCIF format must contain
_entity_poly_seq
. An error is thrown if this field is not present. The field_pdbx_audit_revision_history.revision_date
is automatically generated if it is not present.
+Templates in PDB format are automatically converted to the mmCIF format.
_entity_poly_seq
and_pdbx_audit_revision_history.revision_date
are automatically generated.
+
If you encounter problems, please report them to this issue.
+Comparison to the full AlphaFold2 and AlphaFold2 Colab
+This notebook replaces the homology detection and MSA pairing of AlphaFold2 with MMseqs2. For a comparison against the AlphaFold2 Colab and the full AlphaFold2 system read our paper.
+Troubleshooting
+-
+
Check that the runtime type is set to GPU at “Runtime” -> “Change runtime type”.
+Try to restart the session “Runtime” -> “Factory reset runtime”.
+Check your input sequence.
+
Known issues
+-
+
Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.
+Your browser can block the pop-up for downloading the result file. You can choose the
save_to_google_drive
option to upload to Google Drive instead or manually download the result file: Click on the little folder icon to the left, navigate to file:jobname.result.zip
, right-click and select “Download” (see screenshot).
+
Limitations
+-
+
Computing resources: Our MMseqs2 API can handle ~20-50k requests per day.
+MSAs: MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or MGnify.
+We recommend to additionally use the full AlphaFold2 pipeline.
+
Description of the plots
+-
+
Number of sequences per position - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences.
+Predicted lDDT per position - model confidence (out of 100) at each position. The higher the better.
+Predicted Alignment Error - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.
+
Bugs
+-
+
If you encounter any bugs, please report the issue to sokrypton/ColabFold#issues
+
License
+The source code of ColabFold is licensed under MIT. Additionally, this notebook uses the AlphaFold2 source code and its parameters licensed under Apache 2.0 and CC BY 4.0 respectively. Read more about the AlphaFold license here.
+Acknowledgments
+-
+
We thank the AlphaFold team for developing an excellent model and open sourcing the software.
+KOBIC and Söding Lab for providing the computational resources for the MMseqs2 MSA server.
+Richard Evans for helping to benchmark the ColabFold’s Alphafold-multimer support.
+David Koes for his awesome py3Dmol plugin, without whom these notebooks would be quite boring!
+Do-Yoon Kim for creating the ColabFold logo.
+A colab by Sergey Ovchinnikov (@sokrypton), Milot Mirdita (@milot_mirdita) and Martin Steinegger (@thesteinegger).
+