Skip to content

Latest commit

 

History

History
143 lines (102 loc) · 10.7 KB

CONTRIBUTING.md

File metadata and controls

143 lines (102 loc) · 10.7 KB

OpenTofu Registry Search Contribution Guide

Welcome and thank you for contributing to the OpenTofu Registry Search. Before you begin, please familiarize yourself with the README, which contains all the information on how to run the indexing and how to run the frontend.

Signing off your commits

If you decide to this repository, we ask for a Developer Certificate of Origin sign-off to your commits. You can do so by using the -s option to your commit:

git commit -s -m "Commit message"

Please make sure that you set the user.name and user.email configuration in your local git config matches your GitHub settings.

Architecture

This repository is split into 3 parts.

Backend

The backend uses libregistry to access the OpenTofu Registry dataset in order to generate the JSON files containing version information, clone the repositories of each module and provider to extract the markdown documentation and upload it to an S3-style bucket. The backend only runs when we index data, the dataset is otherwise statically served to the frontend.

Frontend

The frontend is built using ReactJS and fetches the data from the API generated by the backend through the search worker. (This is currently hosted at api.opentofu.org.)

Search worker

In order to power search, we index the data in a PostgreSQL database and provide responses to search queries using a Cloudflare worker. The search worker also proxies requests through to the S3-style bucket.

Development environment

Backend

You can run the backend by running go run ./cmd/generate/main.go in the backend directory. This command has a number of options detailed below.

General options

Option Description
--licenses-file JSON file containing a list SPDX codes of approved licenses to index. (Required)
--skip-update-providers Do not update providers, only update modules (if enabled) and regenerate the search index.
--skip-update-modules Do not update modules, only update providers (if enabled) and regenerate the search index.
--namespace Limit updates to a namespace.
--name Limit updates to a name. Only works in conjunction with --namespace. For providers, this will result in a single provider getting updated. For modules, this will update all target systems under a name.
--target-system Limit updates to a target system for module updates only. Only works in conjunction with --namespace and --name.
--log-level "Set the log level (trace, debug, info, warn, error).
--registry-dir Directory to check out the registry in.
--vcs-dir Directory to use for checking out providers and modules in.
--commit-parallelism Parallel uploads to use on commit.
--tofu-binary-path Temporary: Tofu binary path to use for module schema extraction. This binary must support the tofu metadata dump command.
--force-regenerate Force regenerating a namespace, name, or target system. This parameter is a comma-separate list consisting of either a namespace, a namespace and a name separated by a /, or a namespace, name and target system separated by a /. Example: namespace/name/targetsystem,othernamespace/othername

Storage options

The system supports storing in a local directory or in an S3 bucket. Passing S3 options automatically switches to the S3 option.

Local directory

By default, the storage will write the resulting documents in a local directory. You can change where this directory is located by passing the --destination-dir option.

S3 bucket

The S3 storage has the following options:

Command line option Environment variable Description
AWS_ACCESS_KEY_ID Access key (required).
AWS_SECRET_ACCESS_KEY Secret key (required).
--s3-bucket S3 bucket to use for uploads (required).
--s3-endpoint AWS_ENDPOINT_URL_S3 S3 endpoint to use for uploads.
--s3-path-style Use path-style URLs for S3.
--s3-ca-cert-file AWS_CA_BUNDLE File containing the CA certificate for the S3 endpoint. Defaults to the system certificates.
--s3-region AWS_REGION Region to use for S3 uploads. Defaults to auto-detection.

Blocklist

This system supports explicitly blocking providers and modules from being indexed by using a blocklist. You can pass a JSON file using the --blocklist option of the following format:

{
  "providers": {
    "namespace/name": "Reason why it was blocked (shown to the user).",
    "namespace/": "This entire namespace is blocked."
  },
  "modules": {
    "namespace/name/targetsystem": "Reason why it was blocked (shown to the user).",
    "namespace/name/": "Anything under this name is blocked.",
    "namespace//": "This entire namespace is blocked."
  }
}

As the listed providers and modules are in the registry dataset, they will still appear in the UI, but the contents of the documentation are not shown. This is useful to comply with DMCA take-down requests, erroneous license detection, or to honor the wishes of the provider/module author.

Frontend

In order to run the frontend, enter the frontend directory, install required dependencies by running pnpm install and then run pnpm run dev. You can create a .env file to configure where the generated dataset and search API are:

VITE_DATA_API_URL=http://localhost:8000

Search

The search API is independent of the backend because it requires a PostgreSQL database. It consists of an indexer to fill the database and a Cloudflare worker to answer search queries.

The indexer reads from the generated search index feed located at the /search.ndjson endpoint of the generated data from the backend and fills the database. You can pass the --connection-string option in order to supply it with a database connection.

The worker receives all calls to the API from the frontend and processes search requests, passing on any requests it cannot handle to the R2 (S3-style) bucket containing the dataset. For development, you can set up a wrangler.toml in the following format:

#:schema node_modules/wrangler/config-schema.json
name = "registry-ui-search"
main = "src/index.ts"
compatibility_date = "2024-08-21"
compatibility_flags = ["nodejs_compat"]

GitHub Actions

This repository also contains an automated workflow for GitHub Actions. This workflow periodically runs the indexing and requires the following secrets to be set up:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_ENDPOINT_URL_S3
  • S3_BUCKET

Additionally, the Index (manual) workflow allows you to manually trigger the generation for a namespace, name, or target system, including an option to force the regeneration from scratch.