Start by ⭐️ starring lakeFS open source project.
This repository includes lakeFS with Flink which you can run on your local machine.
Clone this repository
git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/flink
You now have two options:
If you have already installed lakeFS or are utilizing lakeFS cloud then follow these steps:
-
lakeFS uses S3 Gateway to communicate with Flink. So, change
fs.s3a.endpoint
,fs.s3a.access.key
andfs.s3a.secret.key
Flink properties forjobmanager
andtaskmanager
services indocker-compose.yml
file to lakeFS endpoint e.g.https://username.aws_region_name.lakefscloud.io
(if you are using lakeFS Cloud), lakeFS Access Key and lakeFS Secret Key:FLINK_PROPERTIES= jobmanager.rpc.address: jobmanager state.backend: filesystem fs.s3a.path.style.access: true fs.s3a.endpoint: http://lakefs:8000 fs.s3a.access.key: AKIAIOSFOLKFSSAMPLES fs.s3a.secret.key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-
Run only Flink server:
docker compose up
If you want to provision a lakeFS server as well as MinIO for your object store, plus Flink then bring up the full stack:
docker compose --profile local-lakefs up
- Flink Dashboard http://localhost:8081/
If you've brought up the full stack you'll also have:
- LakeFS http://localhost:38000/ (
AKIAIOSFOLKFSSAMPLES
/wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
) - MinIO http://localhost:39001/ (
minioadmin
/minioadmin
)
To deploy Flink's example Word Count job to the running Flink server, issue the following command. This job will read the README.md
file from the main
branch of quickstart
lakeFS repository and will write the output back to word-count
folder in the same lakeFS repository & branch. If you want to use another lakeFS repository/branch or another text file then change the command accordingly:
docker exec -it lakefs-with-flink-jobmanager \
./bin/flink run examples/streaming/WordCount.jar \
--input s3://quickstart/main/README.md \
--output s3://quickstart/main/word-count