Skip to content

Commit

Permalink
add readme
Browse files Browse the repository at this point in the history
  • Loading branch information
xxhZs committed Sep 8, 2023
1 parent 0a52c3b commit bf115c1
Show file tree
Hide file tree
Showing 47 changed files with 401 additions and 190 deletions.
315 changes: 155 additions & 160 deletions Cargo.lock

Large diffs are not rendered by default.

54 changes: 54 additions & 0 deletions integration_tests/account-change-feature-demo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
FROM rust:1.67 as feature-store-server

USER root

ENV WORK_DIR /opt/feature-store
RUN mkdir -p $WORK_DIR

WORKDIR $WORK_DIR

RUN apt update
RUN apt install -y python3 python3-pip wget ca-certificates
RUN apt install -y postgresql-client

ADD ../nyc-taxi-feature-store-demo/server/model/requirements.txt $WORK_DIR/model-pipreqs.txt
ADD ../nyc-taxi-feature-store-demo/generator/requirements.txt $WORK_DIR/generator-pipreqs.txt
RUN pip3 install --upgrade pip
RUN pip3 install -r $WORK_DIR/model-pipreqs.txt
RUN pip3 install -r $WORK_DIR/generator-pipreqs.txt
RUN pip3 install risingwave

RUN apt install -y lsof curl openssl libssl-dev pkg-config build-essential
RUN apt install -y cmake librdkafka-dev

# Install .NET 6.0
RUN wget https://packages.microsoft.com/config/debian/11/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
RUN dpkg -i packages-microsoft-prod.deb
RUN rm packages-microsoft-prod.deb
RUN apt-get update && apt-get install -y dotnet-sdk-6.0
RUN apt install -y liblttng-ust0

# `cargo build` included in ./build
ADD ../nyc-taxi-feature-store-demo/server $WORK_DIR/build/server
ADD ../nyc-taxi-feature-store-demo/simulator $WORK_DIR/build/simulator
RUN cargo build --manifest-path $WORK_DIR/build/server/Cargo.toml --release
RUN cargo build --manifest-path $WORK_DIR/build/simulator/Cargo.toml --release

RUN cp $WORK_DIR/build/server/target/release/server $WORK_DIR/feature-store-server
RUN cp $WORK_DIR/build/simulator/target/release/simulator $WORK_DIR/feature-store-simulator
RUN rm -rf $WORK_DIR/build

ADD ../nyc-taxi-feature-store-demo/server/model $WORK_DIR/server/model
ADD ../nyc-taxi-feature-store-demo/server/udf.py $WORK_DIR/udf.py
ADD ../nyc-taxi-feature-store-demo/generator $WORK_DIR/generator
ADD ../nyc-taxi-feature-store-demo/taxi-start.sql $WORK_DIR/taxi-start.sql
ADD ../nyc-taxi-feature-store-demo/mfa-start.sql $WORK_DIR/mfa-start.sql
RUN mkdir $WORK_DIR/run-sh
ADD ../nyc-taxi-feature-store-demo/run.sh $WORK_DIR/run-sh/
ADD ../nyc-taxi-feature-store-demo/run-mfa.sh $WORK_DIR/run-sh/

RUN cp $WORK_DIR/run-sh/run-mfa.sh $WORK_DIR/run.sh;\

RUN chmod +x $WORK_DIR/run.sh && rm -rf $WORK_DIR/run-sh

CMD ["sh", "-c", "sleep 10 && ./run.sh"]
31 changes: 31 additions & 0 deletions integration_tests/account-change-feature-demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Description

#### Feature store demo.

We use `simulators` to simulate data input.

Then all messages will be sent to the `server` and written in `Kafka` -> `RisingWave`. `RisingWave` will process the data based on pre-defined operations.

We also utilize the `simulator` to simulate user queries to our `feature`. The `server` will receive requests -> query data -> and return results.

If we intend to modify our business logic, we simply need to update the materialized view within our `RisingWave` by using SQL statements.

#### Specific case:

This chapter is a simple demo of feature extraction in `RisingWave`, primarily showcasing the real-time feature aggregation and updating capabilities of `RisingWave`.

In this case, we need to calculate the frequency and count of user account changes over a period of time. We mainly use SQL aggregation functions and UDFs (User-Defined Functions) to achieve this.

Due to the similarity between the code in this demo and another code, the implementation code is located in the `nyc-taxi-feature-store-demo` folder.

## Installation

Run it in local.

1. Build docker. Kafka RisingWave and Feature Store.

```docker compose up --build```

2. Then we can get the simulation results for Feature store in `.log`.

```cat .log/simulator_log```
119 changes: 119 additions & 0 deletions integration_tests/account-change-feature-demo/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
version: "3"
services:
kafka:
image: confluentinc/cp-kafka:7.1.0
platform: linux/amd64
hostname: kafka
container_name: kafka
ports:
- "29092:29092"
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TOOLS_LOG4J_LOGLEVEL: ERROR
depends_on:
[ zookeeper ]
healthcheck:
test: [ "CMD-SHELL", "kafka-topics --bootstrap-server kafka:9092 --list" ]
interval: 5s
timeout: 10s
retries: 5

init-kafka:
image: confluentinc/cp-kafka:7.1.0
depends_on:
- kafka
entrypoint: [ '/bin/sh', '-c' ]
command: |
"
# blocks until kafka is reachable
kafka-topics --bootstrap-server kafka:9092 --list
echo -e 'Creating kafka topics'
kafka-topics --bootstrap-server kafka:9092 --create --if-not-exists --topic taxi --replication-factor 1 --partitions 1
echo -e 'Creating kafka topics'
kafka-topics --bootstrap-server kafka:9092 --create --if-not-exists --topic mfa --replication-factor 1 --partitions 1
echo -e 'Successfully created the following topics:'
kafka-topics --bootstrap-server kafka:9092 --list
"
zookeeper:
image: confluentinc/cp-zookeeper:7.1.0
platform: linux/amd64
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
compactor-0:
extends:
file: ../../docker/docker-compose.yml
service: compactor-0
compute-node-0:
extends:
file: ../../docker/docker-compose.yml
service: compute-node-0
volumes:
- "./server/udf.py:/udf.py"
- "./mfa-start.sql:/mfa-start.sql"
- "./mfa-mock.sql:/mfa-mock.sql"
feature-store:
image: rust:1.67
build:
context: .
target: feature-store-server
depends_on:
[kafka,meta-node-0,frontend-node-0]
volumes:
- "./log:/opt/feature-store/.log"
etcd-0:
extends:
file: ../../docker/docker-compose.yml
service: etcd-0
frontend-node-0:
extends:
file: ../../docker/docker-compose.yml
service: frontend-node-0
grafana-0:
extends:
file: ../../docker/docker-compose.yml
service: grafana-0
meta-node-0:
extends:
file: ../../docker/docker-compose.yml
service: meta-node-0
ports:
- "8815:8815"
depends_on:
[kafka]
minio-0:
extends:
file: ../../docker/docker-compose.yml
service: minio-0
prometheus-0:
extends:
file: ../../docker/docker-compose.yml
service: prometheus-0
connector-node:
extends:
file: ../../docker/docker-compose.yml
service: connector-node
volumes:
etcd-0:
external: false
grafana-0:
external: false
minio-0:
external: false
prometheus-0:
external: false
name: risingwave-compose

30 changes: 0 additions & 30 deletions integration_tests/mfa-change-demo/README.md

This file was deleted.

42 changes: 42 additions & 0 deletions integration_tests/nyc-taxi-feature-store-demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Description

#### Feature store demo.

We use `simulators` to simulate data input.

Then all messages will be sent to the `server` and written in `Kafka` -> `RisingWave`. `RisingWave` will process the data based on pre-defined operations.

We also utilize the `simulator` to simulate user queries to our `feature`. The `server` will receive requests -> query data -> and return results.

If we intend to modify our business logic, we simply need to update the materialized view within our `RisingWave` by using SQL statements.

#### Specific case:

The case in this chapter is a New York taxi fare prediction. We need to predict the taxi fare based on the starting and ending points of the trip.

We use the starting and ending points as primary keys, extract and transform corresponding features, and save them in `RisingWave`. These features are updated based on user behavior.

When a user needs to make a prediction using these features, they can retrieve all the features for training.

When a user needs to make a prediction using these features, they can provide their starting and ending points, query the corresponding features in `RisingWave`, and inject them into a machine learning model for real-time fare prediction.

## Installation

1. Build docker. Kafka RisingWave and Feature Store.

```docker compose up --build```


The Feature Store system performs several tasks in sequence:

- Writes simulated offline data into `Kafka``RisingWave`, extracting behavior and feature tables.

- Joins the behavior and feature tables to obtain corresponding offline training data and conducts model training.

- Writes simulated online feature data into `Kafka``RisingWave`.

- Uses `do_location_id` (destination location) and `pu_location_id` (pickup location) to query the latest online features in RisingWave and utilizes these online features along with the trained model for predictions.

2. Then we can get the simulation results for Feature store in `.log`.

```cat .log/simulator_log```
File renamed without changes.

0 comments on commit bf115c1

Please sign in to comment.