Skip to content

leedxw/rummager

 
 

Repository files navigation

Rummager

Rummager indexes content into elasticsearch and serves the GOV.UK search API.

Live examples

GOV.UK search

alphagov/finder-frontend uses the search API to render site search and finder pages (such as gov.uk/aaib-reports).

The public search API

https://www.gov.uk/api/search.json?q=taxes Screenshot of API Response

For the most up to date query syntax and API output see the Search API documentation.

You can also find some examples in the blog post: "Use the search API to get useful information about GOV.UK content".

Technical documentation

Rummager is a Sinatra application that interfaces with Elasticsearch.

There are two ways documents get added to a search index:

  1. HTTP requests to Rummager's Documents API (deprecated)
  2. Rummager subscribes to RabbitMQ messages from the Publishing API.

Note: Once whitehall documents are using the new indexing process, the documents API will be removed and rummager will consume only from the publishing API.

Rummager search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.

Nomenclature

  • Link: Either the base path for a content item, or an external link.
  • Document: An elasticsearch document, something we can search for.
  • Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
  • Index: An elasticsearch search index. Rummager maintains several separate indices (detailed, government and govuk), but searches return documents from all of them.
  • Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.

Dependencies

Creating search indexes from scratch

(This is not necessary when restoring from a backup or replicating data into the development VM)

To create an empty index:

bundle exec rake rummager:create_index[<index_name>]

To create an empty index for all rummager indices:

RUMMAGER_INDEX=all bundle exec rake rummager:create_all_indices

Starting elasticsearch

If you're running the GDS development VM you need to have elasticsearch running before running the tests or starting the application.

Elasticsearch should start when you start up your dev VM, but if it doesn't, run:

sudo service elasticsearch-development.development start

Running the test suite

bundle exec rake

Running the application

If you're running the GDS development VM:

cd /var/govuk/govuk-puppet/development-vm && bundle exec bowl rummager

Rummager should then be available at rummager.dev.gov.uk.

If you're not running the GDS development VM:

./startup.sh

Workers

Rummager uses Sidekiq to manage its indexing workload. To run this in the development VM, you need to run both of these commands:

# to start the Sidekiq process
bundle exec rake jobs:work

# to start the rummager webapp
bundle exec mr-sparkle --force-polling -- -p 3009

Publishing API integration

Rummager subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.

	bundle exec rake message_queue:insert_data_into_govuk

There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (government, detailed) and can be removed once those indexes no longer exist.

bundle exec rake message_queue:listen_to_publishing_queue

Evaluating search results

The ab_tests parameter can be used to distinguish between two versions of the search query.

Using search-performance-explorer, you can compare the results side by side.

The health check script can be used to evaluate Rummager using a set of judgments about which documents are 'good' results for some sample queries.

Changing the schema/Reindexing

After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.

RUMMAGER_INDEX=all bundle exec rake rummager:migrate_schema

Internal only APIs

There are some other APIs that are only exposed internally:

These are used by search admin.

Additional Docs

Licence

MIT License

Packages

No packages published

Languages

  • Ruby 99.3%
  • Other 0.7%