-
Notifications
You must be signed in to change notification settings - Fork 594
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
47 changed files
with
401 additions
and
190 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
FROM rust:1.67 as feature-store-server | ||
|
||
USER root | ||
|
||
ENV WORK_DIR /opt/feature-store | ||
RUN mkdir -p $WORK_DIR | ||
|
||
WORKDIR $WORK_DIR | ||
|
||
RUN apt update | ||
RUN apt install -y python3 python3-pip wget ca-certificates | ||
RUN apt install -y postgresql-client | ||
|
||
ADD ../nyc-taxi-feature-store-demo/server/model/requirements.txt $WORK_DIR/model-pipreqs.txt | ||
ADD ../nyc-taxi-feature-store-demo/generator/requirements.txt $WORK_DIR/generator-pipreqs.txt | ||
RUN pip3 install --upgrade pip | ||
RUN pip3 install -r $WORK_DIR/model-pipreqs.txt | ||
RUN pip3 install -r $WORK_DIR/generator-pipreqs.txt | ||
RUN pip3 install risingwave | ||
|
||
RUN apt install -y lsof curl openssl libssl-dev pkg-config build-essential | ||
RUN apt install -y cmake librdkafka-dev | ||
|
||
# Install .NET 6.0 | ||
RUN wget https://packages.microsoft.com/config/debian/11/packages-microsoft-prod.deb -O packages-microsoft-prod.deb | ||
RUN dpkg -i packages-microsoft-prod.deb | ||
RUN rm packages-microsoft-prod.deb | ||
RUN apt-get update && apt-get install -y dotnet-sdk-6.0 | ||
RUN apt install -y liblttng-ust0 | ||
|
||
# `cargo build` included in ./build | ||
ADD ../nyc-taxi-feature-store-demo/server $WORK_DIR/build/server | ||
ADD ../nyc-taxi-feature-store-demo/simulator $WORK_DIR/build/simulator | ||
RUN cargo build --manifest-path $WORK_DIR/build/server/Cargo.toml --release | ||
RUN cargo build --manifest-path $WORK_DIR/build/simulator/Cargo.toml --release | ||
|
||
RUN cp $WORK_DIR/build/server/target/release/server $WORK_DIR/feature-store-server | ||
RUN cp $WORK_DIR/build/simulator/target/release/simulator $WORK_DIR/feature-store-simulator | ||
RUN rm -rf $WORK_DIR/build | ||
|
||
ADD ../nyc-taxi-feature-store-demo/server/model $WORK_DIR/server/model | ||
ADD ../nyc-taxi-feature-store-demo/server/udf.py $WORK_DIR/udf.py | ||
ADD ../nyc-taxi-feature-store-demo/generator $WORK_DIR/generator | ||
ADD ../nyc-taxi-feature-store-demo/taxi-start.sql $WORK_DIR/taxi-start.sql | ||
ADD ../nyc-taxi-feature-store-demo/mfa-start.sql $WORK_DIR/mfa-start.sql | ||
RUN mkdir $WORK_DIR/run-sh | ||
ADD ../nyc-taxi-feature-store-demo/run.sh $WORK_DIR/run-sh/ | ||
ADD ../nyc-taxi-feature-store-demo/run-mfa.sh $WORK_DIR/run-sh/ | ||
|
||
RUN cp $WORK_DIR/run-sh/run-mfa.sh $WORK_DIR/run.sh;\ | ||
|
||
RUN chmod +x $WORK_DIR/run.sh && rm -rf $WORK_DIR/run-sh | ||
|
||
CMD ["sh", "-c", "sleep 10 && ./run.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Description | ||
|
||
#### Feature store demo. | ||
|
||
We use `simulators` to simulate data input. | ||
|
||
Then all messages will be sent to the `server` and written in `Kafka` -> `RisingWave`. `RisingWave` will process the data based on pre-defined operations. | ||
|
||
We also utilize the `simulator` to simulate user queries to our `feature`. The `server` will receive requests -> query data -> and return results. | ||
|
||
If we intend to modify our business logic, we simply need to update the materialized view within our `RisingWave` by using SQL statements. | ||
|
||
#### Specific case: | ||
|
||
This chapter is a simple demo of feature extraction in `RisingWave`, primarily showcasing the real-time feature aggregation and updating capabilities of `RisingWave`. | ||
|
||
In this case, we need to calculate the frequency and count of user account changes over a period of time. We mainly use SQL aggregation functions and UDFs (User-Defined Functions) to achieve this. | ||
|
||
Due to the similarity between the code in this demo and another code, the implementation code is located in the `nyc-taxi-feature-store-demo` folder. | ||
|
||
## Installation | ||
|
||
Run it in local. | ||
|
||
1. Build docker. Kafka RisingWave and Feature Store. | ||
|
||
```docker compose up --build``` | ||
|
||
2. Then we can get the simulation results for Feature store in `.log`. | ||
|
||
```cat .log/simulator_log``` |
119 changes: 119 additions & 0 deletions
119
integration_tests/account-change-feature-demo/docker-compose.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
--- | ||
version: "3" | ||
services: | ||
kafka: | ||
image: confluentinc/cp-kafka:7.1.0 | ||
platform: linux/amd64 | ||
hostname: kafka | ||
container_name: kafka | ||
ports: | ||
- "29092:29092" | ||
- "9092:9092" | ||
environment: | ||
KAFKA_BROKER_ID: 1 | ||
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 | ||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT | ||
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092 | ||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 | ||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 | ||
KAFKA_TOOLS_LOG4J_LOGLEVEL: ERROR | ||
depends_on: | ||
[ zookeeper ] | ||
healthcheck: | ||
test: [ "CMD-SHELL", "kafka-topics --bootstrap-server kafka:9092 --list" ] | ||
interval: 5s | ||
timeout: 10s | ||
retries: 5 | ||
|
||
init-kafka: | ||
image: confluentinc/cp-kafka:7.1.0 | ||
depends_on: | ||
- kafka | ||
entrypoint: [ '/bin/sh', '-c' ] | ||
command: | | ||
" | ||
# blocks until kafka is reachable | ||
kafka-topics --bootstrap-server kafka:9092 --list | ||
echo -e 'Creating kafka topics' | ||
kafka-topics --bootstrap-server kafka:9092 --create --if-not-exists --topic taxi --replication-factor 1 --partitions 1 | ||
echo -e 'Creating kafka topics' | ||
kafka-topics --bootstrap-server kafka:9092 --create --if-not-exists --topic mfa --replication-factor 1 --partitions 1 | ||
echo -e 'Successfully created the following topics:' | ||
kafka-topics --bootstrap-server kafka:9092 --list | ||
" | ||
zookeeper: | ||
image: confluentinc/cp-zookeeper:7.1.0 | ||
platform: linux/amd64 | ||
hostname: zookeeper | ||
container_name: zookeeper | ||
ports: | ||
- "2181:2181" | ||
environment: | ||
ZOOKEEPER_CLIENT_PORT: 2181 | ||
ZOOKEEPER_TICK_TIME: 2000 | ||
compactor-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: compactor-0 | ||
compute-node-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: compute-node-0 | ||
volumes: | ||
- "./server/udf.py:/udf.py" | ||
- "./mfa-start.sql:/mfa-start.sql" | ||
- "./mfa-mock.sql:/mfa-mock.sql" | ||
feature-store: | ||
image: rust:1.67 | ||
build: | ||
context: . | ||
target: feature-store-server | ||
depends_on: | ||
[kafka,meta-node-0,frontend-node-0] | ||
volumes: | ||
- "./log:/opt/feature-store/.log" | ||
etcd-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: etcd-0 | ||
frontend-node-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: frontend-node-0 | ||
grafana-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: grafana-0 | ||
meta-node-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: meta-node-0 | ||
ports: | ||
- "8815:8815" | ||
depends_on: | ||
[kafka] | ||
minio-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: minio-0 | ||
prometheus-0: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: prometheus-0 | ||
connector-node: | ||
extends: | ||
file: ../../docker/docker-compose.yml | ||
service: connector-node | ||
volumes: | ||
etcd-0: | ||
external: false | ||
grafana-0: | ||
external: false | ||
minio-0: | ||
external: false | ||
prometheus-0: | ||
external: false | ||
name: risingwave-compose | ||
|
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Description | ||
|
||
#### Feature store demo. | ||
|
||
We use `simulators` to simulate data input. | ||
|
||
Then all messages will be sent to the `server` and written in `Kafka` -> `RisingWave`. `RisingWave` will process the data based on pre-defined operations. | ||
|
||
We also utilize the `simulator` to simulate user queries to our `feature`. The `server` will receive requests -> query data -> and return results. | ||
|
||
If we intend to modify our business logic, we simply need to update the materialized view within our `RisingWave` by using SQL statements. | ||
|
||
#### Specific case: | ||
|
||
The case in this chapter is a New York taxi fare prediction. We need to predict the taxi fare based on the starting and ending points of the trip. | ||
|
||
We use the starting and ending points as primary keys, extract and transform corresponding features, and save them in `RisingWave`. These features are updated based on user behavior. | ||
|
||
When a user needs to make a prediction using these features, they can retrieve all the features for training. | ||
|
||
When a user needs to make a prediction using these features, they can provide their starting and ending points, query the corresponding features in `RisingWave`, and inject them into a machine learning model for real-time fare prediction. | ||
|
||
## Installation | ||
|
||
1. Build docker. Kafka RisingWave and Feature Store. | ||
|
||
```docker compose up --build``` | ||
|
||
|
||
The Feature Store system performs several tasks in sequence: | ||
|
||
- Writes simulated offline data into `Kafka`→`RisingWave`, extracting behavior and feature tables. | ||
|
||
- Joins the behavior and feature tables to obtain corresponding offline training data and conducts model training. | ||
|
||
- Writes simulated online feature data into `Kafka`→`RisingWave`. | ||
|
||
- Uses `do_location_id` (destination location) and `pu_location_id` (pickup location) to query the latest online features in RisingWave and utilizes these online features along with the trained model for predictions. | ||
|
||
2. Then we can get the simulation results for Feature store in `.log`. | ||
|
||
```cat .log/simulator_log``` |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.