You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Indexes drastically sped up this part of the query
WITH resolved_concepts_mapped
AS (
SELECT concept_sets.concept_id AS resolved_concept_id,
c1.concept_id,
c1.concept_name,
c1.domain_id,
c1.vocabulary_id,
c1.concept_class_id,
c1.standard_concept,
c1.concept_code
FROM (
SELECT DISTINCT concept_id
FROM @schema.@resolved_concepts
WHERE database_id IN (@databaseIds)
AND cohort_id = @cohort_id
) concept_sets
INNER JOIN @schema.@concept_relationship cr ON concept_sets.concept_id = cr.concept_id_2
INNER JOIN @schema.@concept c1 ON cr.concept_id_1 = c1.concept_id
WHERE relationship_id = 'Maps to'
AND standard_concept IS NULL
)
However the final join was still slow. It is a many to many join that multiplies the rows of concept table (result from above) for every combination of database, concept set, and cohort.
SELECT
c.database_id,
c.cohort_id,
c.concept_set_id,
mapped.*
FROM (SELECT DISTINCT concept_id, database_id, cohort_id, concept_set_id FROM @schema.@resolved_concepts) c
INNER JOIN resolved_concepts_mapped mapped ON c.concept_id = mapped.resolved_concept_id
{@cohort_id != ''} ? { WHERE c.cohort_id = @cohort_id};
However when we downloaded this SELECT DISTINCT concept_id, database_id, cohort_id, concept_set_id FROM @schema.@resolved_concepts) into R and then did the join it was very fast.
Perhaps there is a way to speed the whole query up in SQL. For us adding indexes and splitting the query into two and doing the final join in R did fix the slowness in the shiny app.
I was also thinking that the app might not really need to do this final join since it duplicates a lot of data. In any case we are recording our findings in this issue and can make a PR if the approach we came up with seems reasonable.
The proposal would be to optimize the mappedConceptSet function by adding indexes if they are not there (for sqlite only since I'm not sure if adding indexes really works across all dbms), split the query into two parts as shown above, and then do the final join in R instead of in sql.
The text was updated successfully, but these errors were encountered:
Adding these indexs to the sqlite database improves the speed and user experience of the cohort diagnostics shiny app.
DBI::dbExecute(con, "CREATE INDEX idx_resolved_concepts ON resolved_concepts (concept_id, database_id, cohort_id, concept_set_id);")
DBI::dbExecute(con, "CREATE INDEX idx_concept_relationship ON concept_relationship (concept_id_2, relationship_id);")
DBI::dbExecute(con, "CREATE INDEX idx_concept_main ON concept (concept_id, standard_concept);")
We could also consider pre-joining this data once when we build the sqlite database rather than joining in the database while the use is interacting with the app. It would add some columns to the resolved_concepts table so would require more disk space but might be better for the app's user experience.
The mapped concept set query is very slow on sqlite when the concept set is large sometimes taking minutes to complete.
OhdsiShinyModules/R/cohort-diagnostics-shared.R
Line 727 in 42f5bb4
It can be improved by adding indexes and performing the final join in R instead of the database.
@cebarboza and I were working on it.
Indexes drastically sped up this part of the query
However the final join was still slow. It is a many to many join that multiplies the rows of concept table (result from above) for every combination of database, concept set, and cohort.
However when we downloaded this
SELECT DISTINCT concept_id, database_id, cohort_id, concept_set_id FROM @schema.@resolved_concepts)
into R and then did the join it was very fast.Perhaps there is a way to speed the whole query up in SQL. For us adding indexes and splitting the query into two and doing the final join in R did fix the slowness in the shiny app.
I was also thinking that the app might not really need to do this final join since it duplicates a lot of data. In any case we are recording our findings in this issue and can make a PR if the approach we came up with seems reasonable.
The proposal would be to optimize the
mappedConceptSet
function by adding indexes if they are not there (for sqlite only since I'm not sure if adding indexes really works across all dbms), split the query into two parts as shown above, and then do the final join in R instead of in sql.The text was updated successfully, but these errors were encountered: