-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FeatureDB.update on disk db very slow #227
Comments
OK, figured out what the issue is here. This is effectively updating the database with itself, so it's reading and writing simultaneously. That is, Three quick fixes: 1. Convert to a list of introns firstConsume the db.update(db.create_introns(), **kwargs) use db.update(list(db.create_introns(), **kwargs) This will increase memory usage, but it works. 2. Use WALWAL allows simultaneous reads/writes without blocking. Warning, this does NOT work on a networked filesystem like those typically used on an HPC cluster! db.set_pragmas({'journal_mode': 'WAL'})
db.update(db.create_introns(), **kwargs) 3. Write to intermediate fileIf memory is an issue and you're using networked filesystem, then you can write out to file first: with open('tmp.gtf', 'w') as fout:
for intron in db.create_introns():
fout.write(str(intron) + '\n')
db.update(gffutils.DataIterator('tmp.gtf'), **kwargs) SummaryI'm not sure if anything in the code should be changed to address this. These different solutions would each be useful in different situations. So I think the best thing to do is to add some explanatory text to both |
Addressed in #231 |
I'm trying to use FeatureDB.update and FeatureDB.create_introns to add intron features to the database.
If the database is created in memory, the speed is very fast, but if created on disk, it appears to be very slow.
gffutils version: 0.12
python version: 3.12.0
The gtf file I'm using is from refseq ftp:
It's a subset of
GCF_000001405.25_GRCh37.p13_genomic.gtf.gz
.Code:
If db is created on disk, I observed that it hangs on this step:
Populating features table and first-order relations: 0 features
What could cause this? Thanks in advance for any insights!
The text was updated successfully, but these errors were encountered: