Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training on arbitrary sims #40

Open
agladstein opened this issue Dec 7, 2020 · 3 comments
Open

training on arbitrary sims #40

agladstein opened this issue Dec 7, 2020 · 3 comments

Comments

@agladstein
Copy link
Contributor

I tried to update the .toml config to use my own simulated .trees files. I changed it to:

[sim.tranche]
# The labels and modelspec(s) for each tranche. The network will be trained to
# classify data as coming from one of these tranches. Each tranche consists of
# a list of simulation modelspecs.
# Only two tranches are supported.
"constant_2pop" = [
	"genomatnn_data/trees/constant_2pop",

	# Skip this for now, as it's too computationally intensive
	# to do many replicates for training. :-(
	#"HomSap/HomininComposite_4G20/DFE",
]

single_pulse_uni_AB = [
	"genomatnn_data/trees/single_pulse_uni_AB",
]

But, I get the error:

Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/bin/genomatnn", line 33, in <module>
    sys.exit(load_entry_point('genomatnn==0.1.dev115+g4e4a918', 'console_scripts', 'genomatnn')())
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/cli.py", line 587, in main
    args = parse_args(args_list)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/cli.py", line 540, in parse_args
    args.conf = config.Config(args.conf)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 79, in __init__
    self._getcfg_sim()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 142, in _getcfg_sim
    self._getcfg_tranche(self.sim["tranche"])
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 150, in _getcfg_tranche
    model = sim.get_demog_model(modelspec)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/sim.py", line 841, in get_demog_model
    raise ValueError(f"{modelspec} not found")
ValueError: genomatnn_data/trees/constant_2pop not found

So, am I right in interpreting this that genomatnn is not currently setup to use any arbitrary simulations defined in the toml? Or am I missing something about how to define them?

@grahamgower
Copy link
Owner

So, am I right in interpreting this that genomatnn is not currently setup to use any arbitrary simulations defined in the toml?

Unfortunately, no.

The "paths" that are specified in that part of the toml file must also be present in the internal data structures in genomatnn/sim.py. The internals use stdpopsim-style models right now. When the models are simulated, they're then given paths on the filesystem that match. One reason that internal data structures are required, is so that the population id number in the tree sequence files can be matched to a human-readable population name (genomatnn orders the populations in the genotype matrices, for both simulated and empirical data sources). The tree sequences output by slim differ from those output by msprime, in that the population name metadata is not present in slim-generated tree sequences. There's lots of new stuff in tskit/msprime-1.0 that will make it easy for stdpopsim to modify the metadata after using stdpopsim's slim engine, so that the metadata can be made consistent regardless of which engine is used to do the simulation. So actually, genomatnn doesn't look at the tree sequence metadata at all right now, and uses the models defined in the code (I've used slim simulations in almost exclusively).

@agladstein
Copy link
Contributor Author

I see!
You can close this if you like.

@grahamgower
Copy link
Owner

I'll leave it open. I do plan to make things easier in the future, once the new stuff in msprime-1.0 is released and demes is released. (Although this will probably get into stdpopsim first)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants