diff --git a/.github/ISSUE_TEMPLATE/backend.yml b/.github/ISSUE_TEMPLATE/backend.yml new file mode 100644 index 00000000..dff9473a --- /dev/null +++ b/.github/ISSUE_TEMPLATE/backend.yml @@ -0,0 +1,11 @@ +name: "Backend bug" +description: "Report a problem with the indexed dataset" +labels: ["bug","Backend"] +body: + - type: textarea + id: text + attributes: + label: Please explain the bug + value: + validations: + required: true diff --git a/.github/ISSUE_TEMPLATE/beta-feedback.yml b/.github/ISSUE_TEMPLATE/beta-feedback.yml new file mode 100644 index 00000000..0ab49df0 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/beta-feedback.yml @@ -0,0 +1,11 @@ +name: "Beta: Feedback" +description: "General feedback for the beta." +labels: ["feedback"] +body: + - type: textarea + id: text + attributes: + label: Feedback + value: + validations: + required: true diff --git a/.github/ISSUE_TEMPLATE/beta-inclusion.yml b/.github/ISSUE_TEMPLATE/beta-inclusion.yml new file mode 100644 index 00000000..3770bdee --- /dev/null +++ b/.github/ISSUE_TEMPLATE/beta-inclusion.yml @@ -0,0 +1,12 @@ +name: "Beta: Add a provider/module" +description: "The beta does not include the full registry dataset. Use this issue to request the inclusion of a provider." +labels: ["feedback"] +body: + - type: textarea + id: repository + attributes: + label: Repository URL + placeholder: https://github.com/... + value: + validations: + required: true diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 00000000..a49eab2f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1 @@ +blank_issues_enabled: true \ No newline at end of file diff --git a/.github/ISSUE_TEMPLATE/frontend.yml b/.github/ISSUE_TEMPLATE/frontend.yml new file mode 100644 index 00000000..4a69a6df --- /dev/null +++ b/.github/ISSUE_TEMPLATE/frontend.yml @@ -0,0 +1,33 @@ +name: "Frontend bug" +description: "Report a rendering error or bug in the frontend" +labels: ["bug","frontend"] +body: + - type: textarea + id: url + attributes: + label: URL of the page that's broken + placeholder: https://search.opentofu.org/... + value: + validations: + required: true + - type: textarea + id: screenshot + attributes: + label: Screenshot + value: + validations: + required: true + - type: textarea + id: browser + attributes: + label: OS, browser version, installed extensions + value: + validations: + required: true + - type: textarea + id: extra + attributes: + label: Additional information + value: + validations: + required: false diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 00000000..f092512b --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,7 @@ + +## Checklist + +- [ ] I have read the [contribution guide](https://github.com/opentofu/opentofu/blob/main/CONTRIBUTING.md). +- [ ] I have not used an AI coding assistant to create this PR. +- [ ] My contribution is compatible with the MPL-2.0 license and I have provided a DCO sign-off on all my commits. +- [ ] I have written all code in this PR myself OR I have marked all code I have not written myself (including modified code, e.g. copied from other places and then modified) with a comment indicating where it came from. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..752d264b --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,142 @@ +# OpenTofu Registry Search Contribution Guide + +Welcome and thank you for contributing to the OpenTofu Registry Search. Before you begin, please familiarize yourself with the [README](README.md), which contains all the information on how to run the indexing and how to run the frontend. + +## Signing off your commits + +If you decide to this repository, we ask for a [Developer Certificate of Origin](https://developercertificate.org/) sign-off to your commits. You can do so by using the `-s` option to your commit: + +``` +git commit -s -m "Commit message" +``` + +Please make sure that you set the `user.name` and `user.email` configuration in your local git config matches your GitHub settings. + +## Architecture + +This repository is split into 3 parts. + +### Backend + +The backend uses [libregistry](https://github.com/opentofu/libregistry) to access the [OpenTofu Registry dataset](https://github.com/opentofu/registry) in order to generate the JSON files containing version information, clone the repositories of each module and provider to extract the markdown documentation and upload it to an S3-style bucket. The backend only runs when we index data, the dataset is otherwise statically served to the frontend. + +### Frontend + +The frontend is built using ReactJS and fetches the data from the API generated by the backend through the search worker. (This is currently hosted at [api.opentofu.org](https://api.opentofu.org/).) + +### Search worker + +In order to power search, we index the data in a PostgreSQL database and provide responses to search queries using a Cloudflare worker. The search worker also proxies requests through to the S3-style bucket. + +## Development environment + +### Backend + +You can run the backend by running `go run ./cmd/generate/main.go` in the [`backend`](backend) directory. This command +has a number of options detailed below. + +#### General options + +| Option | Description | +|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `--licenses-file` | JSON file containing a list SPDX codes of approved licenses to index. (Required) | | +| `--skip-update-providers` | Do not update providers, only update modules (if enabled) and regenerate the search index. | +| `--skip-update-modules` | Do not update modules, only update providers (if enabled) and regenerate the search index. | +| `--namespace` | Limit updates to a namespace. | +| `--name` | Limit updates to a name. Only works in conjunction with `--namespace`. For providers, this will result in a single provider getting updated. For modules, this will update all target systems under a name. | +| `--target-system` | Limit updates to a target system for module updates only. Only works in conjunction with `--namespace` and `--name`. | +| `--log-level` | "Set the log level (trace, debug, info, warn, error). | +| `--registry-dir` | Directory to check out the registry in. | +| `--vcs-dir` | Directory to use for checking out providers and modules in. | +| `--commit-parallelism` | Parallel uploads to use on commit. | +| `--tofu-binary-path` | Temporary: Tofu binary path to use for module schema extraction. This binary must support the `tofu metadata dump` command. | +| `--force-regenerate` | Force regenerating a namespace, name, or target system. This parameter is a comma-separate list consisting of either a namespace, a namespace and a name separated by a `/`, or a namespace, name and target system separated by a `/`. Example: `namespace/name/targetsystem,othernamespace/othername` | + +#### Storage options + +The system supports storing in a local directory or in an S3 bucket. Passing S3 options automatically switches to the +S3 option. + +##### Local directory + +By default, the storage will write the resulting documents in a local directory. You can change where this directory +is located by passing the `--destination-dir` option. + +##### S3 bucket + +The S3 storage has the following options: + +| Command line option | Environment variable | Description | +|---------------------|-------------------------|----------------------------------------------------------------------------------------------| +| | `AWS_ACCESS_KEY_ID` | Access key (required). | +| | `AWS_SECRET_ACCESS_KEY` | Secret key (required). | +| `--s3-bucket` | | S3 bucket to use for uploads (required). | +| `--s3-endpoint` | `AWS_ENDPOINT_URL_S3` | S3 endpoint to use for uploads. | +| `--s3-path-style` | | Use path-style URLs for S3. | +| `--s3-ca-cert-file` | `AWS_CA_BUNDLE` | File containing the CA certificate for the S3 endpoint. Defaults to the system certificates. | +| `--s3-region` | `AWS_REGION` | Region to use for S3 uploads. Defaults to auto-detection. | + +#### Blocklist + +This system supports explicitly blocking providers and modules from being indexed by using a blocklist. You can pass a JSON file using the `--blocklist` option of the following format: + +```json +{ + "providers": { + "namespace/name": "Reason why it was blocked (shown to the user).", + "namespace/": "This entire namespace is blocked." + }, + "modules": { + "namespace/name/targetsystem": "Reason why it was blocked (shown to the user).", + "namespace/name/": "Anything under this name is blocked.", + "namespace//": "This entire namespace is blocked." + } +} +``` + +As the listed providers and modules are in the registry dataset, they will still appear in the UI, but the contents of +the documentation are not shown. This is useful to comply with DMCA take-down requests, erroneous license detection, or +to honor the wishes of the provider/module author. + +### Frontend + +In order to run the frontend, enter the [frontend](frontend) directory and run `npm run dev`. You can create a `.env` +file to configure where the generated dataset and search API are: + +```env +VITE_DATA_API_URL=http://localhost:8000 +``` + +### Search + +The search API is independent of the backend because it requires a PostgreSQL database. It consists of an indexer to +fill the database and a Cloudflare worker to answer search queries. + +The indexer reads from the generated search index feed located at the `/search.ndjson` endpoint of the generated data +from the backend and fills the database. You can pass the `--connection-string` option in order to supply it with a +database connection. + +The worker receives all calls to the API from the frontend and processes search requests, passing on any requests it +cannot handle to the R2 (S3-style) bucket containing the dataset. For development, you can set up a `wrangler.toml` in +the following format: + +```toml +#:schema node_modules/wrangler/config-schema.json +name = "registry-ui-search" +main = "src/index.ts" +compatibility_date = "2024-08-21" +compatibility_flags = ["nodejs_compat"] +``` + +## GitHub Actions + +This repository also contains an automated workflow for GitHub Actions. This workflow periodically runs the indexing and +requires the following secrets to be set up: + +- `AWS_ACCESS_KEY_ID` +- `AWS_SECRET_ACCESS_KEY` +- `AWS_ENDPOINT_URL_S3` +- `S3_BUCKET` + +Additionally, the `Index (manual)` workflow allows you to manually trigger the generation for a namespace, name, or +target system, including an option to force the regeneration from scratch. diff --git a/README.md b/README.md index dee32ee4..c82096f8 100644 --- a/README.md +++ b/README.md @@ -1,114 +1,4 @@ -# OpenTofu registry user interface +# OpenTofu Registry Search -This repository contains the code to generate the dataset for the OpenTofu Registry user interface (backend), -the user interface itself (frontend), as well as the Cloudflare worker powering search. - -## Backend - -You can run the backend by running `go run ./cmd/generate/main.go` in the [`backend`](backend) directory. This command -has a number of options detailed below. - -### General options - -| Option | Description | -|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--licenses-file` | JSON file containing a list SPDX codes of approved licenses to index. (Required) | | -| `--skip-update-providers` | Do not update providers, only update modules (if enabled) and regenerate the search index. | -| `--skip-update-modules` | Do not update modules, only update providers (if enabled) and regenerate the search index. | -| `--namespace` | Limit updates to a namespace. | -| `--name` | Limit updates to a name. Only works in conjunction with `--namespace`. For providers, this will result in a single provider getting updated. For modules, this will update all target systems under a name. | -| `--target-system` | Limit updates to a target system for module updates only. Only works in conjunction with `--namespace` and `--name`. | -| `--log-level` | "Set the log level (trace, debug, info, warn, error). | -| `--registry-dir` | Directory to check out the registry in. | -| `--vcs-dir` | Directory to use for checking out providers and modules in. | -| `--commit-parallelism` | Parallel uploads to use on commit. | -| `--tofu-binary-path` | Temporary: Tofu binary path to use for module schema extraction. This binary must support the `tofu metadata dump` command. | -| `--force-regenerate` | Force regenerating a namespace, name, or target system. This parameter is a comma-separate list consisting of either a namespace, a namespace and a name separated by a `/`, or a namespace, name and target system separated by a `/`. Example: `namespace/name/targetsystem,othernamespace/othername` | - -### Storage options - -The system supports storing in a local directory or in an S3 bucket. Passing S3 options automatically switches to the -S3 option. - -#### Local directory - -By default, the storage will write the resulting documents in a local directory. You can change where this directory -is located by passing the `--destination-dir` option. - -#### S3 bucket - -The S3 storage has the following options: - -| Command line option | Environment variable | Description | -|---------------------|-------------------------|----------------------------------------------------------------------------------------------| -| | `AWS_ACCESS_KEY_ID` | Access key (required). | -| | `AWS_SECRET_ACCESS_KEY` | Secret key (required). | -| `--s3-bucket` | | S3 bucket to use for uploads (required). | -| `--s3-endpoint` | `AWS_ENDPOINT_URL_S3` | S3 endpoint to use for uploads. | -| `--s3-path-style` | | Use path-style URLs for S3. | -| `--s3-ca-cert-file` | `AWS_CA_BUNDLE` | File containing the CA certificate for the S3 endpoint. Defaults to the system certificates. | -| `--s3-region` | `AWS_REGION` | Region to use for S3 uploads. Defaults to auto-detection. | - -### Blocklist - -This system supports explicitly blocking providers and modules from being indexed by using a blocklist. You can pass a JSON file using the `--blocklist` option of the following format: - -```json -{ - "providers": { - "namespace/name": "Reason why it was blocked (shown to the user).", - "namespace/": "This entire namespace is blocked." - }, - "modules": { - "namespace/name/targetsystem": "Reason why it was blocked (shown to the user).", - "namespace/name/": "Anything under this name is blocked.", - "namespace//": "This entire namespace is blocked." - } -} -``` - -As the listed providers and modules are in the registry dataset, they will still appear in the UI, but the contents of -the documentation are not shown. This is useful to comply with DMCA take-down requests, erroneous license detection, or -to honor the wishes of the provider/module author. - -### GitHub Actions - -This repository also contains a workflow for GitHub Actions. This workflow periodically runs the indexing and requires -the following secrets to be set up: - -- `AWS_ACCESS_KEY_ID` -- `AWS_SECRET_ACCESS_KEY` -- `AWS_ENDPOINT_URL_S3` -- `S3_BUCKET` - -Additionally, the `Generate (manual)` workflow allows you to manually trigger the generation for a namespace, name, or target system, including an option to force the regeneration from scratch. - -## Frontend - -In order to run the frontend, enter the [frontend](frontend) directory and run `npm run dev`. You can create a `.env` -file to configure where the generated dataset and search API are: - -```env -VITE_DATA_API_URL=http://localhost:8000 -``` - -## Search - -The search API is independent of the backend because it requires a PostgreSQL database. It consists of an indexer to -fill the database and a Cloudflare worker to answer search queries. - -The indexer reads from the generated search index feed located at the `/search.ndjson` endpoint of the generated data -from the backend and fills the database. You can pass the `--connection-string` option in order to supply it with a -database connection. - -The worker receives all calls to the API from the frontend and processes search requests, passing on any requests it -cannot handle to the R2 (S3-style) bucket containing the dataset. For development, you can set up a `wrangler.toml` in -the following format: - -```toml -#:schema node_modules/wrangler/config-schema.json -name = "registry-ui-search" -main = "src/index.ts" -compatibility_date = "2024-08-21" -compatibility_flags = ["nodejs_compat"] -``` +This repository contains the code that powers the [OpenTofu Registry Search](https://search.opentofu.org). +Please see the [contribution guide](CONTRIBUTING.md) for details on how to work with this codebase.