Rummager indexes content into elasticsearch and serves the GOV.UK search API.
alphagov/finder-frontend uses the search API to render site search and finder pages (such as gov.uk/aaib-reports).
https://www.gov.uk/api/search.json?q=taxes
For the most up to date query syntax and API output see the Search API documentation.
You can also find some examples in the blog post: "Use the search API to get useful information about GOV.UK content".
Rummager is a Sinatra application that interfaces with Elasticsearch.
There are two ways documents get added to a search index:
- HTTP requests to Rummager's Documents API (deprecated)
- Rummager subscribes to RabbitMQ messages from the Publishing API.
Note: Once whitehall documents are using the new indexing process, the documents API will be removed and rummager will consume only from the publishing API.
Rummager search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.
- Link: Either the base path for a content item, or an external link.
- Document: An elasticsearch document, something we can search for.
- Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
- Index: An elasticsearch search
index. Rummager
maintains several separate indices (
detailed
,government
andgovuk
), but searches return documents from all of them. - Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.
- elasticsearch - "You Know, for Search...".
- redis - used by indexing workers.
(This is not necessary when restoring from a backup or replicating data into the development VM)
To create an empty index:
bundle exec rake rummager:create_index[<index_name>]
To create an empty index for all rummager indices:
RUMMAGER_INDEX=all bundle exec rake rummager:create_all_indices
If you're running the GDS development VM you need to have elasticsearch running before running the tests or starting the application.
Elasticsearch should start when you start up your dev VM, but if it doesn't, run:
sudo service elasticsearch-development.development start
bundle exec rake
If you're running the GDS development VM:
cd /var/govuk/govuk-puppet/development-vm && bundle exec bowl rummager
Rummager should then be available at rummager.dev.gov.uk.
If you're not running the GDS development VM:
./startup.sh
Rummager uses Sidekiq to manage its indexing workload. To run this in the development VM, you need to run both of these commands:
# to start the Sidekiq process
bundle exec rake jobs:work
# to start the rummager webapp
bundle exec mr-sparkle --force-polling -- -p 3009
Rummager subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.
bundle exec rake message_queue:insert_data_into_govuk
There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (government
, detailed
) and can be removed once those indexes no longer exist.
bundle exec rake message_queue:listen_to_publishing_queue
The ab_tests
parameter can be used to distinguish between two versions of
the search query.
Using search-performance-explorer, you can compare the results side by side.
The health check script can be used to evaluate Rummager using a set of judgments about which documents are 'good' results for some sample queries.
After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.
RUMMAGER_INDEX=all bundle exec rake rummager:migrate_schema
There are some other APIs that are only exposed internally:
- doc/content-api.md for the
/content/*
endpoint. - doc/documents.md for the
*/documents/
endpoint.
These are used by search admin.
- New indexing process: how to update a format to use the new indexing process
- Schemas: how to work with schemas and the document types
- Popularity information: Rummager uses Google Analytics data to improve search results.
- Publishing advanced search: Information about the advanced search finder
- Publishing document finders: Information about publishing finders using rake tasks