You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we have around 25 million entries in the feed, with Atom Hopper 1.2.33 and categories stored in a Postgres array, a search for entries with a given category value is prohibitively slow. When we get a page of entries, we also get the link to an older page. Finding the marker entry id for that link is what slows down everything. We find that atom hopper uses the SQL below.
Because of "limit 1", Postgres chooses to traverse the table using the primary key index (datelastupdated, id) instead of using the categories index.
We have also seen that categories can be stored in a dedicated column when a prefix column map is used, such that posting with the category customer:12312 would put 12312 in the column customer_id. That seems to be working well. However, I wonder why that option isn't properly documented. Are there great downsides to it, or better options? We don't want to rewrite Atom Hopper's SQL...
(SELECT * FROM entries WHERE feed = 'comp/dummyfeed' AND datelastupdated = '2017-05-16 11:22:16.710796' AND categories @> '{cat432542}'::varchar[] AND id < 14729912)
UNION ALL
(SELECT * FROM entries WHERE feed = 'comp/dummyfeed' AND datelastupdated < '2017-05-16 11:22:16.710796' AND categories @> '{cat432542}'::varchar[]
ORDER BY datelastupdated DESC, id DESC limit 1)
ORDER BY datelastupdated DESC, id DESC limit 1
The text was updated successfully, but these errors were encountered:
Hi, and thanks for this nice feed provider :)
When we have around 25 million entries in the feed, with Atom Hopper 1.2.33 and categories stored in a Postgres array, a search for entries with a given category value is prohibitively slow. When we get a page of entries, we also get the link to an older page. Finding the marker entry id for that link is what slows down everything. We find that atom hopper uses the SQL below.
We have also seen that categories can be stored in a dedicated column when a prefix column map is used, such that posting with the category customer:12312 would put 12312 in the column customer_id. That seems to be working well. However, I wonder why that option isn't properly documented. Are there great downsides to it, or better options? We don't want to rewrite Atom Hopper's SQL...
(SELECT * FROM entries WHERE feed = 'comp/dummyfeed' AND datelastupdated = '2017-05-16 11:22:16.710796' AND categories @> '{cat432542}'::varchar[] AND id < 14729912)
UNION ALL
(SELECT * FROM entries WHERE feed = 'comp/dummyfeed' AND datelastupdated < '2017-05-16 11:22:16.710796' AND categories @> '{cat432542}'::varchar[]
ORDER BY datelastupdated DESC, id DESC limit 1)
ORDER BY datelastupdated DESC, id DESC limit 1
The text was updated successfully, but these errors were encountered: