Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training step fails with undefined variable 'chroms' #36

Open
gregorydonahue opened this issue May 9, 2024 · 0 comments
Open

Training step fails with undefined variable 'chroms' #36

gregorydonahue opened this issue May 9, 2024 · 0 comments

Comments

@gregorydonahue
Copy link

Hello - I'm trying to run basepairmodels on the provided CTCF chIP-seq data from ENCODE. I got to the model training step, when the job failed with:

Traceback (most recent call last):  

File "/home/gdonahue/software/MiniConda/miniconda3/envs/basepairmodels/bin/train", line 8, in <module>  
  sys.exit(main())  
File "/home/gdonahue/software/MiniConda/miniconda3/envs/basepairmodels/lib/python3.7/site-packages/basepairmodels/cli/bpnettrainer.py", line 136, in main  
  args.mnll_loss_background_sample_weight)  
File "/home/gdonahue/software/MiniConda/miniconda3/envs/basepairmodels/lib/python3.7/site-packages/basepairmodels/common/training.py", line 774, in train_and_validate_ksplits  
  train_chroms = list(chroms.difference(  
NameError: name 'chroms' is not defined

Looking at training.py, I gather that what it's doing is examining all the chromosome IDs from chrom.txt (which end up as input on the command line), and subtracting the ones in splits.json to get the list of training data chromosomes. But I don't see where the set of all chromosome IDs is created - thought it might have come through an import but I didn't see it there either.

This is my splits.json:

{  
  "0": {  
    "val": ["chr10", "chr8"],  
    "test": ["chr1"]  
  }  
}

This is the command used to run it (through LSF on a local cluster):

bsub -M 64000 -n 16 -o Logs/1.train.out train --input-data $INPUT_DATA --output-dir $MODEL_DIR --reference-genome $REFERENCE_GENOME --chroms $(paste -s -d ' ' $REFERENCE_DIR/chroms.txt) --chrom-sizes $CHROM_SIZES --splits $CV_SPLITS --model-arch-name BPNet --model-arch-params-json $MODEL_PARAMS --sequence-generator-name BPNet --model-output-filename model --input-seq-len 2114 --output-len 1000 --shuffle --threads 10 --epochs 100 --learning-rate 0.004

I also tested it from the command-line, to be sure that LSF was not doing something unexpected. Same problem.

I then went through each of the variables in the command and ran ls on each to be sure the file exists and I didn't screw up something with my data directory setup. No problems there. Additionally, the evaluation of this:

$> paste -s -d ' ' $REFERENCE_DIR/chroms.txt  
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr11 chr10 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr20 chr19 chrY chr22 chr21 chrM

...also checks out. What am I missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant