Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: No variation in sampling dates! Please specify your clock rate explicitly #227

Open
BCMollett opened this issue Mar 7, 2023 · 7 comments

Comments

@BCMollett
Copy link

Hi,

I am running treetime with treetime --covariation --confidence --clock-filter 5 --tree <input.nwk> --aln <input.aln.fasta> --dates <input.csv>' on a selection of N1 subtype influenza viruses and it is returning the following:

ValueError: No variation in sampling dates! Please specify your clock rate explicitly.

ERROR: No variation in sampling dates! Please specify your clock rate explicitly.

ERROR in TreeTime.run: An error occurred which was not properly handled in TreeTime. If this error persists, please let us know by filing a new issue including the original command and the error above at: https://github.com/neherlab/treetime/issues

The dataset contains sequences with dates from 2014-2021 and I have previously used the same command for N2 subtype and all other gene segments without error. I am sure all headers and dates are correct/matching

Do you have any idea/advice on how to get around this issue?

Thanks,
Ben

@corneliusroemer
Copy link
Member

Hi Ben, happy to help! It sounds like somewhere between you and treetime there's a misunderstanding about what the sampling dates are. Could be as simple as a different column name for your dates. But rather than speculating, the best way forward is if you share your inputs (the tree exact files), the exact command you use (copy paste) and the output of treetime --version. You can send the files to [email protected] if you can't share publicly.

@BCMollett
Copy link
Author

Thank you for the quick reply!
I am just checking the restrictions that may be in place surrounding sharing files on my end but when/if possible I will send the files through email

@corneliusroemer
Copy link
Member

corneliusroemer commented Mar 8, 2023

It should be possible to debug with a lot of columns removed to reduce scope of sharing.

You could try reducing sample numbers to 5 or so, maybe you have some public samples in there anyways, just keep these?

Otherwise, just the header of the csv could be useful - that shouldn't contain anything sensitive.

@BCMollett
Copy link
Author

I have sent through the files. Did you receive them?

@corneliusroemer
Copy link
Member

I have sent through the files. Did you receive them?

Yes, thanks! Just had a look. It appears that the clock-filter filters out too many tips/sequences causing some assumption somewhere to be violated. This case should probably be handled better, so thanks a lot for the report!

As a workaround you could try some of the following options:

  • Switch off clock filter, by passing --clock-filter 0

In the future, you could try to find out more about what's going on inside treetime by passing e.g. --verbose 4 or an even higher number to see more verbose output.

A key line in the output is:

 0.90    TreeTime.clock_filter: More than a third of leaves have been excluded by
         the clock filter. Please check your input data.

When treetime runs successfully (which you can achieve by passing --clock-filter 0) you'll see why the clock filter ends up throwing out almost all of the data:

image

Almost none of the data lies in the "acceptable" regression range, unless you use large clock filter values (10+ standard deviations) or switch it off altogether). Your data deviates so much from the assumptions of the clock filter model that it fails here.

You can find this plot and other diagnostic information in the run-output folder which should appear in your working directory, see screenshot for the standard content:
image

@corneliusroemer
Copy link
Member

This is the full log I get with default verbosity:

treetime --covariation --confidence --clock-filter 5 --tree N1_subset.aln.clean.fasta.treefile.nwk --aln N1_subset.aln.clean.fasta --dates Matched_Metadata.csv            

Attempting to parse dates...
        Using column 'strain' as name. This needs match the taxon names in the tree!!
        Using column 'date' as date.

0.00    -TreeAnc: set-up

0.16    WARNING: Previous versions of TreeTime (<0.7.0) RECONSTRUCTED sequences of
        tips at positions with AMBIGUOUS bases. This resulted in unexpected
        behavior is some cases and is no longer done by default. If you want to
        replace those ambiguous sites with their most likely state, rerun with
        `reconstruct_tip_states=True` or `--reconstruct-tip-states`.

0.66    TreeTime.reroot: with method or node: least-squares

0.66    TreeTime.reroot: rerooting will ignore covariance and shared ancestry.

0.90    TreeTime.clock_filter: More than a third of leaves have been excluded by
        the clock filter. Please check your input data.

0.91    TreeTime.reroot: with method or node: least-squares

0.91    TreeTime.reroot: rerooting will account for covariance and shared ancestry.
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 57, in run
    return self._run(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 221, in _run
    self.clock_filter(reroot=reroot_mechanism, n_iqd=n_iqd, plot=plot_rtt, fixed_clock_rate=fixed_clock_rate)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 439, in clock_filter
    self.reroot(root=reroot, clock_rate=fixed_clock_rate)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 521, in reroot
    new_root = self._find_best_root(covariation=use_cov,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treetime.py", line 949, in _find_best_root
    return Treg.optimal_reroot(force_positive=force_positive, slope=slope, keep_node_order=self.keep_node_order)['node']
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 433, in optimal_reroot
    best_root = self.find_best_root(force_positive=force_positive, slope=slope)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 340, in find_best_root
    x, chisq = self._optimal_root_along_branch(n, tv, bv, var, slope=slope)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in _optimal_root_along_branch
    chisq_grid = np.array([chisq(x) for x in grid])
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 396, in <listcomp>
    chisq_grid = np.array([chisq(x) for x in grid])
                           ^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 386, in chisq
    return base_regression(tmpQ, slope=slope)['chisq']
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/py11/lib/python3.11/site-packages/treetime/treeregression.py", line 32, in base_regression
    raise ValueError("No variation in sampling dates! Please specify your clock rate explicitly.")
ValueError: No variation in sampling dates! Please specify your clock rate explicitly.

ERROR: No variation in sampling dates! Please specify your clock rate explicitly. 
 
ERROR in TreeTime.run: An error occurred which was not properly handled in TreeTime. If this error persists, please let us know by filing a new issue including the original command and the error above at: https://github.com/neherlab/treetime/issues

Some things to address within treetime to make such issues easier to debug for users:

The log message 0.90 TreeTime.clock_filter: More than a third of leaves have been excluded by the clock filter. Please check your input data. is hard to spot. In this case it correctly indicates a path to the root cause, but this tip would be better in the error itself.

When that "no variant in sampling dates" error happens, it would be good to help the user by reporting the following:

  • How many samples are left at this point in the program (post clockfilter): "only 0/1/5/10 samples left, check whether or why clockfilter has filtered them all out"
  • What the sampling dates are: maybe the wrong column was inferred: sampling dates are "strain A: 2045-12-23, ...", report up to say ~5 for quick recognition of this being a problem a not
  • Suggest the user set --clock-filter 0 in case clock filter causes the problem, then inspect the "root_to_tip_regression.pdf" to see what's going on

@BCMollett
Copy link
Author

I'm glad it was a relatively simple issue! You have given me a bit to think about with this dataset and treetime troubleshooting

Thanks so much for your assistance.

corneliusroemer added a commit to bioconda/bioconda-recipes that referenced this issue Mar 13, 2023
Unpin biopython as bug has been fixed neherlab/treetime#227
BiocondaBot added a commit to bioconda/bioconda-recipes that referenced this issue Mar 13, 2023
Merge PR #39871, commits were: 
 * Update meta.yaml

Unpin biopython as bug has been fixed neherlab/treetime#227
 * Update bcbio-gff to 0.7.0
cokelaer pushed a commit to cokelaer/bioconda-recipes that referenced this issue Apr 28, 2023
Merge PR bioconda#39871, commits were: 
 * Update meta.yaml

Unpin biopython as bug has been fixed neherlab/treetime#227
 * Update bcbio-gff to 0.7.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants