GitHub - treeverse/lakefs-iceberg: A custom Iceberg catalog implementation for lakeFS

lakeFS Iceberg Catalog

lakeFS enriches your Iceberg tables with Git capabilities: create a branch and make your changes in isolation, without affecting other team members.

See the instructions below on how to use it, and check out the integration in action in the lakeFS samples repository.

Install

Use the following Maven dependency to install the lakeFS custom catalog:

<dependency>
  <groupId>io.lakefs</groupId>
  <artifactId>lakefs-iceberg</artifactId>
  <version>0.1.4</version>
</dependency>

Configure

Here is how to configure the lakeFS custom catalog in Spark:

conf.set("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog");
conf.set("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog");
conf.set("spark.sql.catalog.lakefs.warehouse", "lakefs://example-repo");

You will also need to configure the S3A Hadoop FileSystem to interact with lakeFS:

conf.set("fs.s3a.access.key", "AKIAlakefs12345EXAMPLE")
conf.set("fs.s3a.secret.key", "abc/lakefs/1234567bPxRfiCYEXAMPLEKEY")
conf.set("fs.s3a.endpoint", "https://example-org.us-east-1.lakefscloud.io")
conf.set("fs.s3a.path.style.access", "true")

Create a table

To create a table on your main branch, use the following syntax:

CREATE TABLE lakefs.main.table1 (id int, data string);

Create a branch

We can now commit the creation of the table to the main branch:

lakectl commit lakefs://example-repo/main -m "my first iceberg commit"

Then, create a branch:

lakectl branch create lakefs://example-repo/dev -s lakefs://example-repo/main

Make changes on the branch

We can now make changes on the branch:

INSERT INTO lakefs.dev.table1 VALUES (3, 'data3');

Query the table

If we query the table on the branch, we will see the data we inserted:

SELECT * FROM lakefs.dev.table1;

Results in:

+----+------+
| id | data |
+----+------+
| 1  | data1|
| 2  | data2|
| 3  | data3|
+----+------+

However, if we query the table on the main branch, we will not see the new changes:

SELECT * FROM lakefs.main.table1;

Results in:

+----+------+
| id | data |
+----+------+
| 1  | data1|
| 2  | data2|
+----+------+

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lakeFS Iceberg Catalog

Install

Configure

Create a table

Create a branch

Make changes on the branch

Query the table

About

Releases 6

Packages

Contributors 5

Languages

License

treeverse/lakefs-iceberg

Folders and files

Latest commit

History

Repository files navigation

lakeFS Iceberg Catalog

Install

Configure

Create a table

Create a branch

Make changes on the branch

Query the table

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 5

Languages

Packages