All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed a bug in the variant scoring strategy
pseudolikelihood_ratio
whenparallel_chains
was greater than 1. - Added the ability to save results (output sequences and scores, plus a few other tidbits) to a CSV file by calling
save_results()
on the DirectedEvolution object. - Minor modification to
embeddings.py
to support pLMs using mixed precision. - Added unit tests for the
VariantScoring
class and a new unit test for the sampler to test saving results. - Fixed a bug with
torch.softmax
inutils.safe_logits_to_probs
.
- The ability to change the expert variant scoring strategy has been added. There is now a class
VariantScoring
which can be configured with ascoring_strategy
argument (currently supported:attribute_value
,pseudolikelihood_ratio
, andmutant_marginal
(NEW)). Each expert has an instance of aVariantScoring
class. It is defined inevo_prot_grad.common.variant_scoring
. - The main entry point for instantiating an expert,
get_expert
, now has ascoring_strategy
argument for configuring the expert. - The
use_without_wildtype
argument of the Expert class has been removed. Each scoring strategy normalizes the score with respect to the wildtype score, so this was superflous. If you want to instantiate an expert and use it outside of the DirectedEvolution class, you have to explicitly callexpert.init_wildtype(wt_seq)
before calling the expert to cache the wildtype score (see below). Expert
private class method_model_output_to_scalar_score
has been removed in favor of a public facing methodget_model_output
. This method can be used to directly get expert scores for sequences.- The
Expert
class no longer has awt_score
attribute. The wildtype score is now stored in theVariantScoring
class (wt_score_cache
).
- The
Expert
abstract class now publicly exposes the following methods:init_wildtype
, for storing the wildtype string sequence and caching the WT score,tokenize
for tokenizing a sequence,get_model_output
which accepts a list of protein sequence strings and returns the one-hot encoded sequences and the expert model's predictions. - Renamed
experts.base_experts.HuggingFaceExpert
toexperts.base_experts.ProteinLMExpert
- Improved error message reporting for
get_expert
- Upgraded
transformers[torch]
to4.38.0