CPythia version 1.1.0
We retrained Pythia with additional MSAs, the larger set of training data consists of:
11 108 DNA MSAs
979 Protein MSAs
460 Morphological MSAs
= 12 547 MSAs, all empirical data of course :-)
The new predictor shows an improved accuracy:
Mean absolute error: 0.07 (previously 0.09)
Mean absolute percentage error: 1.7% (previously 2.5%)
Pythia is trained on two additional features: the patterns-over-site ration and a an entropy-like measurement based on the number and frequency of patterns in the MSA