Skip to content

Commit

Permalink
Introduce a reproducible build
Browse files Browse the repository at this point in the history
This is an initial take on introducing a fully hermetic, reproducible
build for lucene-cuvs.

Broadly speaking, it does the following:

1. Get all dependencies we need, including CUDA and a base C++
   toolchain.
2. Take the base dependencies to build a Clang/LLVM toolchain that is
   used as host compiler for CUDA.
3. Build the C++ shared object.

This also provides the infrastructure to target AMD GPUs (though the
kernels need to be written first :D)

Additionally this commit:

- Changes raft to the new cuvs repo where applicable.
- Bumps all dependencies to more or less the latest version.
- Removes redundant files (pom.xml, CMakeLists.txt, obsolete headers).
- Introduces pre-commit hooks for all files.
  • Loading branch information
aaronmondal committed May 30, 2024
1 parent 89afa74 commit ed82e37
Show file tree
Hide file tree
Showing 61 changed files with 11,400 additions and 555 deletions.
54 changes: 54 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Don't inherit PATH and LD_LIBRARY_PATH.
build --incompatible_strict_action_env

# Use a prebuilt JDK instead of relying on the host's java runtime.
common --java_runtime_version=remotejdk_21
common --tool_java_runtime_version=remotejdk_21

# TODO(aaronmondal): Remove after https://github.com/bazelbuild/bazel/pull/22001
build --noincompatible_sandbox_hermetic_tmp

# Enforce C++20 as the default for rules_cc, regardless of toolchain config.
build --cxxopt=-std=c++20 --host_cxxopt=-std=c++20

# Since expect rules_cc targets to be mainly exec_tools, use O3.
build --cxxopt=-O3 --host_cxxopt=-O3

# Forbid network access unless explicitly enabled.
build --sandbox_default_allow_network=false

# Use correct runfile locations.
build --nolegacy_external_runfiles

# Enable sandboxing for exclusive tests like GPU performance tests.
test --incompatible_exclusive_test_sandboxed

# Make sure rules_cc uses the correct transition mechanism.
build --incompatible_enable_cc_toolchain_resolution

# Propagate tags such as no-remote for precompilations to downstream actions.
common --incompatible_allow_tags_propagation

# Bzlmod configuration.
common --enable_bzlmod
common --registry=https://raw.githubusercontent.com/bazelbuild/bazel-central-registry/main/
common --registry=https://raw.githubusercontent.com/eomii/bazel-eomii-registry/main/

# Remote optimizations.
build --remote_build_event_upload=minimal
build --remote_download_minimal
build --nolegacy_important_outputs

# Smaller profiling. Careful. Disabling this might explode remote cache usage.
build --slim_profile
build --experimental_profile_include_target_label
build --noexperimental_profile_include_primary_output

# Nix-generated action env for rules_ll.
try-import %workspace%/.bazelrc.ll

# Nix-generated flags for LRE.
try-import %workspace%/.bazelrc.lre

# Allow user-side customization.
try-import %workspace%/.bazelrc.user
1 change: 1 addition & 0 deletions .bazelversion
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
8.0.0-pre.20240516.1
1 change: 1 addition & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
use flake --impure
26 changes: 25 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,26 @@
build
target
target

# Generated by Bazel
/bazel-*

# Generated by rules_ll
/.bazelrc.ll

# Generated by LRE
/.bazelrc.lre

# Custom user-side configuration
/.bazelrc.user

# Generated by direnv
/.direnv

# Generated by the pre-commit nix flake module
/.pre-commit-config.yaml

# Generated by `ll up`
/kustomize.yaml

# NativeLink's local Pulumi stack.
/Pulumi.dev.yaml
1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# See `lucene` and `cuda` subdirectories.
1 change: 0 additions & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -223,4 +223,3 @@ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

134 changes: 134 additions & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
module(
name = "lucene-cuvs",
version = "0.0.0",
compatibility_level = 0,
)

# Platform support. A base requirement for everything else.
#
# See: https://github.com/bazelbuild/platforms
bazel_dep(name = "platforms", version = "0.0.10")

# C++ rules. Don't use Bazel's legacy builtin rules.
#
# See: https://github.com/bazelbuild/rules_cc
bazel_dep(name = "rules_cc", version = "0.0.9")

# Basic starlark extensions. Always good to have available.
#
# See: https://github.com/bazelbuild/bazel-skylib
bazel_dep(name = "bazel_skylib", version = "1.6.1")

# Java rules. Don't use Bazel's legacy builtin rules.
#
# See: https://bazel.build/reference/be/java for the rules,
# https://github.com/bazelbuild/rules_java for the repository.
bazel_dep(name = "rules_java", version = "7.5.0")

# Bug-fix to prevent an annoying debug message because of duplicate maven repos.
#
# TODO(aaronmondal): Remove this after:
# https://github.com/bazelbuild/rules_jvm_external/issues/916
bazel_dep(name = "protobuf", version = "26.0.bcr.1")

# Java dependencies.
#
# Run `bazel query "@maven//..."` to print available targets.
#
# See: https://github.com/bazelbuild/rules_jvm_external/blob/master/docs/bzlmod.md
bazel_dep(name = "rules_jvm_external", version = "6.1")

maven = use_extension("@rules_jvm_external//:extensions.bzl", "maven")
maven.install(
artifacts = [
"org.apache.lucene:lucene-core:9.9.0",
"org.apache.lucene:lucene-codecs:9.9.0",
"com.opencsv:opencsv:5.3",
"commons-io:commons-io:2.15.1",
"com.github.fommil:jniloader:1.1",
],
lock_file = "//:maven_install.json",
)
use_repo(maven, "maven", "unpinned_maven")

# JNI rules for C++/Java FFIs.
#
# We apply some visibility patching to directly access the jni headers.
#
# See: https://github.com/fmeum/rules_jni
bazel_dep(name = "rules_jni", version = "0.9.1")
git_override(
module_name = "rules_jni",
commit = "7cb9c69d4d1f9ca2fae93d21d9c3498a9d0657a0",
patch_strip = 1,
patches = ["//patches:rules_jni_public_headers.diff"],
remote = "https://github.com/fmeum/rules_jni",
)

# The Clang/LLVM toolchain and rules for CUDA compilation.
#
# See: https://github.com/eomii/rules_ll
bazel_dep(name = "rules_ll", version = "0")
git_override(
module_name = "rules_ll",
# Note: Keep this commit in sync with the one in flake.nix.
commit = "3ee809512cfb605a00fe5eb938eab0e4f8705204",
remote = "https://github.com/eomii/rules_ll",
)

# We need explicit access to the `@llvm-project` workspace for OpenMP. The
# `llvm_project_overlay` extension aggregates patches across all modules. This
# means that rules_ll's patches remain implicitly applied and caches are
# identical with any other project using rules_ll at the same commit.
#
# Don't mix this module up with the `llvm-project` module in the
# `bazel-central-registry`. The module we're using here is from the
# `bazel-eomii-registry`. Upstreaming the patch aggregation logic or finding
# a different solution is still a work in progress at
# https://github.com/llvm/llvm-project/pull/88927.
#
# See: https://github.com/eomii/bazel-eomii-registry/tree/main/modules/llvm-project-overlay
bazel_dep(name = "llvm-project-overlay", version = "17-init-bcr.3")

llvm_project_overlay = use_extension(
"@llvm-project-overlay//utils/bazel:extensions.bzl",
"llvm_project_overlay",
)
use_repo(
llvm_project_overlay,
"llvm-project",
)

# The demo dataset. Available via the `@dataset//file` Bazel target.
#
# See: https://bazel.build/rules/lib/repo/http#http_file
http_file = use_repo_rule(
"@bazel_tools//tools/build_defs/repo:http.bzl",
"http_file",
)

http_file(
name = "dataset",
downloaded_file_path = "dataset.zip", # Must have a `.zip` extension.
integrity = "sha256-gHb64BruF4r2U+dxbdKoz1yJqHIOXzQxyrtb0va32L4=",
url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip",
)

# External dependencies. See the `thirdparty` directory for build files.
#
# See: https://bazel.build/external/extension
lucene_cuvs_deps = use_extension(
"@lucene-cuvs//:extensions.bzl",
"lucene_cuvs_dependencies",
)
use_repo(
lucene_cuvs_deps,
"cccl", # https://github.com/NVIDIA/cccl
"cutlass", # https://github.com/NVIDIA/cutlass
"cuvs", # https://github.com/rapidsai/cuvs
"fmt", # https://github.com/fmtlib/fmt
"local-remote-execution", # https://github.com/TraceMachina/nativelink/tree/main/local-remote-execution
"raft", # https://github.com/rapidsai/raft
"rmm", # https://github.com/rapidsai/rmm
"spdlog", # https://github.com/gabime/spdlog
)
Loading

0 comments on commit ed82e37

Please sign in to comment.