-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default weights used for scoring #30
Comments
Thanks Kristian |
Thanks for linking me in! I don't recognise this weighting scheme; in our 2016 paper we had a different set of weightings that totalled 1, but in our internal MetFrag use we tend to now keep all scores weighted at 1 so that the total max score equals the number of terms used. I am not sure what database and suspect list combination that you are using @Tomnl but for e.g. PubChem or PubChemLite we would usually do:
We have found the exact spectral similarity score a lot easier to interpret (and integrate into automated scoring schemes) than the MetFusion score. |
Thank you @schymane and @korseby This is really useful information! It's unfortunate we do not no know the origin of these weightings used but perhaps a good point now to revise them in the Galaxy tool based on what it is practically used in the community for MetFrag. In Birmingham have been using MetFrag with a few different use cases but a reasonably common usage for general annotation is using the FragmenterScore, (a local) Pubchem database, and a list of Natural products (provided by Kristian derived from the Universal Natural Products Database) for the inclusion list and the Offline MetFusion score. We then combine with some other annotation approaches using different scoring schemes and weights. In some complicated workflows it does make a lot of sense to take the scores out of MetFrag (like @schymane has done with the spectral matching) - and I will probably change to do something similar in the future. @schymane - do you have general weightings that you use for these scores? (would that be something that you could share? Or is it something that is really dependent on your own analysis and evaluations) I wonder….although it could be very useful for users to have default weightings in the tool, perhaps for the Galaxy tool at the moment we remove the default weightings and force the user determine their own weightings based on their own preferences. Does this seem OK with you @korseby? This seems to follow the logic of MetFragCLI which does not specifically provide default weightings for the scores (At least I could not find any default weightings). |
Honestly, over several years and many, many users we have found it best to keep each scoring term with a weighting of 1, so that the scores become additive. It seems to be much easier for people to understand intuitively when selecting their candidates afterwards (then max score is the "number of scoring terms chosen"). So this is what I would recommend as the preferred default behaviour - but then leave the option for people to tweak the scores (weighting) if they wish.
What is important to do then is to ensure that the two experimental terms (e.g. MetFrag Fragmenter Score and the MoNA score) are reported clearly (separately) in the output along with the aggregated score, because these two are the key terms that help you see if the MetFrag and MoNA results really match the input spectrum, or if it's just the "best match" but only explains the experimental data poorly. Frank also experimented with calculating a spectral match based on the MetFrag predicted fragments here. If you want to see some of this pictorially, please see some examples here: |
Thanks @schymane for sharing, this is really useful. I agree just having a simple additive scoring for default for the Galaxy tool like you describe seems sensible and intuitive (but letting the user change the weightings if they would like it). I will update the Galaxy tool using those defaults and I will check the outputs to see how the experimental terms are separated in the output. Thanks for you help! And thanks for the including the links - I will have a read to explore a bit more. |
Sorry for the delay. We had a workshop this week and I am incredibly busy right now. So, yes I am completely fine with this weighting. We used my weighting a few years ago together with a suspect list of natural products. This way, natural products were ranked much higher and we were able to get a lot of candidates ranked first. We mostly applied this weighting scheme for non-model species of plants. The use of other libraries than pubchem has rendered this weighting scheme obsolete anyway. |
Hi @korseby,
There were some default weights given in the Galaxy tool development for the scoring approaches used for FragmenterScore, OfflineMetFusionScore and SuspectListScore.
These were originally given the weights of 0.4,0.6 and 1.0.
These were added when the Galaxy tool was updated from the MetFrag Galaxy tool PhenoMeNal wrapper - I just wanted to see if you know where these weights came from?
The text was updated successfully, but these errors were encountered: