Decouple from Solr #4459

mjgiarlo · 2023-03-23T15:27:38Z

This is an "SDR evolution" ticket, the intent of which is to reduce dependencies within DSA. (HT to @jcoyne for the idea!)

As far as we know, Solr is used within DSA for:

Reports (could move to Argo which is more dependent on Solr);
Embargo release
Collection members
Virtual objects

Since DSA already sits atop the source of truth for Cocina (Postgres), and it's queryable, DSA can get this information directly from Postgres without needing to consult Solr.

Fixes #4459 This is a spike commit towards an SDR Evolution the team has been batting around for a while now, namely severing DSA's dependency on Solr. The spike largely replaces Solr queries with direct DB queries, and for most use cases this works just fine. The key word here is "most..." * The Solr queries have been replaced with DB queries that reach into JSONB columns which results in table scans. I tested all of these queries in stage with large-ish, but not prod-huge, data sets (~25K records) and most of them perform fine. That said, we might want to test this with prod-like data and do some benchmarking to determine if we want to index more of the JSONB data. * A notable performance outlier is `MemberService.for` which needs to make a single Workflow API call for *each* member of a virtual object. These are impressively slow for a virtual object with a few thousand members, taking over a minute to complete. Another question we'd need to answer to take this work forward is what to do about `bin/generate-druid-list`, which allows a user to issue Solr queries directly, and `lib/tasks/missing_druids.rake`, which compares what's in the DSA DB and what's in Solr to determine if any objects need (re-)indexing. Are these still useful? If so, could they live elsewhere or could we solve these problems in a different way? If the answer is no, we may not want to proceed with this decoupling. **NOTE:** Since this is a spike meant to generate discussion, I have not yet bothered with deal with changing the tests (or caring about linting). That will naturally come later if we decide the idea and implementation has merit.

mjgiarlo added tech debt SDR migration carryover labels Mar 23, 2023

ndushay removed the SDR migration carryover label Aug 3, 2023

mjgiarlo self-assigned this Aug 30, 2023

mjgiarlo linked a pull request Aug 30, 2023 that will close this issue

[SPIKE] Decouple DSA from Solr #4578

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple from Solr #4459

Decouple from Solr #4459

mjgiarlo commented Mar 23, 2023

Decouple from Solr #4459

Decouple from Solr #4459

Comments

mjgiarlo commented Mar 23, 2023