Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of scope/method that has an association with a tsvector generated column #3

Closed
minimul opened this issue Feb 3, 2024 · 4 comments

Comments

@minimul
Copy link

minimul commented Feb 3, 2024

Context.

How would one do something like this with the pg_fulltext gem? (pg_search fantasy example but associations with tsvector column are not supported in that gem).

Thanks in advance.

  class Organization < ApplicationRecord
  
  pg_search_scope :searchable_name_description_location,
                  against: :searchable_name_description,
                  using: {
                    tsearch: {
                      dictionary: 'english',
                      tsvector_column: 'searchable_name_description'
                    }
                  },
                  associated_against: {
                    geo_locations: {
                      using: {
                         tsearch: {
                            dictionary: 'english',
                             tsvector_column: 'searchable_city_state_neighborhood'
                         }
                      }
                    }
                  }
@arcreative
Copy link
Owner

arcreative commented Feb 7, 2024

@minimul Hi there--the idea with this gem is that it just uses the existing TSVs that you provide instead of generating them on the fly (which is horrifically slow and hard on your database). In my projects, I set up triggers on my database tables in a migration so the TSV is always waiting, and you can optionally use JOINs to associate any other tables with it that you wish:

CREATE OR REPLACE FUNCTION update_users_tsv() RETURNS trigger AS $$
  BEGIN
    NEW.tsv := (
      SELECT
        setweight(to_tsvector('simple', NEW.id::varchar || ' ' || NEW.first_name || ' ' || NEW.last_name || ' ' || NEW.system_first_name || ' ' || NEW.system_last_name || ' ' || NEW.nickname || ' ' || NEW.email || ' ' || NEW.empl_id), 'A') ||
        setweight(to_tsvector('simple', NEW.about || ' ' || NEW.interests), 'B') ||
        setweight(to_tsvector('simple', array_to_string(array_agg(DISTINCT skills.name), ' ')), 'C') ||
        setweight(to_tsvector('simple', array_to_string(array_agg(DISTINCT certifications.title), ' ')), 'C') ||
        setweight(to_tsvector('simple', array_to_string(array_agg(DISTINCT certifications.school), ' ')), 'C') ||
        setweight(to_tsvector('simple', array_to_string(array_agg(DISTINCT cv_entries.account_name), ' ')), 'C') ||
        setweight(to_tsvector('simple', array_to_string(array_agg(DISTINCT cv_entries.account_name || ' ' || cv_entries.short_description || ' ' || cv_entries.description), ' ')), 'D')
      FROM users u
      LEFT JOIN assignments sa ON sa.subject_type = 'User' AND sa.subject_id = u.id AND sa.assignable_type = 'Skill'
      LEFT JOIN skills ON skills.id::varchar = sa.assignable_id
      LEFT JOIN cv_entries ON cv_entries.user_id = u.id
      LEFT JOIN certifications ON certifications.user_id = u.id
      WHERE u.id = NEW.id
      GROUP BY u.id
    );
    RETURN NEW;
  END;
$$ LANGUAGE plpgsql;

and

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON users FOR EACH ROW EXECUTE PROCEDURE
update_users_tsv();

Then your scope definition is as easy as

add_search_scope

in your model.

Just be aware that when JOINing, it might be finicky when the record is new (you can check the TG_OP within the trigger to conditionally update based on the users table only, or associated records on update), I just don't have that issue with this particular use case because our users are imported from another system.

Please note that you can also use multiple tsvector columns--one for "basic" searches and another for "advanced" searches with associated records, and you can just add a second search scope using the optional name parameter, e.g. add_search_scope(:search_associated).

Feel free to ping me with any additional questions if needed :-)

@minimul
Copy link
Author

minimul commented Feb 16, 2024

Thanks so much @arcreative

instead of generating them on the fly (which is horrifically slow and hard on your database).

What is your opinion regarding an article like this that states a generated column is performant with triggers?

@arcreative
Copy link
Owner

@minimul that looks cool, but I’m not sure if it would work with joins? I can’t remember where I saw it, but I think there was a limitation for generated columns that prevents you from using joins and thus creating associated tsvectors. If you get it working, let me know, as I could definitely use it! If you’re just doing simple tsvectors, generated looks like the way to go for its insert performance.

@minimul
Copy link
Author

minimul commented Feb 21, 2024

I did get two generated stored TSVECTOR columns working in a join but the problem I was having was returning an ActiveRecord::Relation.

I'll be soon working again on that code and I'll share something then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants