-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8_processing notebook process_patient takes extremely long #10
Comments
The SQL queries for chartevents are taking a very long time. My chartevents.csv is around 3.3GB, but I am not sure how to optimize these queries. Please let me know if there is any better way:
|
Hey Jared, Same issue here. I don't have a powerful machine like yours, it took me about 5 days to finish that cell. Also, the last cell of that notebook was giving me error - "name 'admission_first_ids_set' is not defined". Are you in the same situation? |
I had the same issue and figured I would share my solution for anyone else in the future. There are two things that greatly improve the performance:
conn = getConnection()
cur = conn.cursor()
for i in range(17):
print(i+1)
query = f'''DROP INDEX IF EXISTS chartevents_{i+1}_idx_hadm;
CREATE INDEX chartevents_{i+1}_idx_hadm ON mimiciii.chartevents_{i+1} (hadm_id) INCLUDE (charttime,itemid,value,valuenum,valueuom)'''
cur.execute(query)
conn.commit()
# query = '''DROP INDEX IF EXISTS chartevents_idx_hadm;
# CREATE INDEX chartevents_idx02 ON mimiciii.chartevents (hadm_id) INCLUDE (charttime,itemid,value,valuenum,valueuom);'''
# cur.execute(query)
# conn.commit()
conn.close() |
We have replaced the notebooks with one preprocessing script. See #21 . It has optimized the running speed: with 4 cores it should finish within 1 day. |
def process_patient(aid) takes extremely long to run in the notebook. This function takes about 30 seconds to run per patient input. Even running the subsequent part with 20 workers would still take several days to complete.
Is there anything I can change to get this to complete in a reasonable amount of time? I'm running this on a 20 Core Intel(R) Core(TM) i9-9900X CPU @ 3.50GHz.
The text was updated successfully, but these errors were encountered: