Torch implementation of a diffusion model designed for DNA sequence reconstruction and score prediction tasks, continuous, e.g. given a ChIP score, create a likely DNA sequence that might get it.
N.B. GRAHAM ON COMPUTE CANADA IS DOWN UNTIL JAN 7 -- THIS CODE IS NOT UP TO DATE AND MISSES A LOT OF STUFF THAT I CAN NOT ACCESS.
- handle DNA sequences with variable lengths using padding and masking (up to a max)
- combined sequence reconstruction and score prediction objectives, helps to learn better
-
Clone the repository:
git clone https://github.com/yourusername/dna-diffusion-model.git cd dna-diffusion-model pip install -e .
-
Data should look like. 1.23 ATCGTAA 0.89 GCTAGCTGCTA
-
Modify main to look like.
data_generator = BEDFileDataGenerator( filepath='/path/to/your/dataset.bed', num_sequences=15000, maxlen=512 )
-
Run like:
python main.py