Skip to content

Commit

Permalink
Merge branch 'main' into xxh/clickhouse_sink_upsert2
Browse files Browse the repository at this point in the history
  • Loading branch information
xxhZs authored Oct 24, 2023
2 parents 2268063 + e818508 commit 434e69a
Show file tree
Hide file tree
Showing 176 changed files with 2,905 additions and 2,375 deletions.
34 changes: 16 additions & 18 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

74 changes: 54 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
</picture>
</p>

[![Slack](https://badgen.net/badge/Slack/Join%20RisingWave/0abd59?icon=slack)](https://risingwave.com/slack)
[![Build status](https://badge.buildkite.com/9394d2bca0f87e2e97aa78b25f765c92d4207c0b65e7f6648f.svg)](https://buildkite.com/risingwavelabs/main)
[![codecov](https://codecov.io/gh/risingwavelabs/risingwave/branch/main/graph/badge.svg?token=EB44K9K38B)](https://codecov.io/gh/risingwavelabs/risingwave)

<p align="center">
<b>Stream Processing Redefined.</b>
</p>
<div align="center">

### 🌊Stream Processing Redefined.

</div>

<p align="center">
<a
href="https://docs.risingwave.com/"
Expand All @@ -21,7 +21,7 @@
<a
href="https://tutorials.risingwave.com/"
target="_blank"
><b>Hands-on Tutorials</b></a>&nbsp;&nbsp;&nbsp;🌊&nbsp;&nbsp;&nbsp;
><b>Hands-on Tutorials</b></a>&nbsp;&nbsp;&nbsp;🎯&nbsp;&nbsp;&nbsp;
<a
href="https://cloud.risingwave.com/"
target="_blank"
Expand All @@ -33,6 +33,27 @@
<b>Get Instant Help</b>
</a>
</p>
<div align="center">
<a
href="https://risingwave.com/slack"
target="_blank"
>
<img alt="Slack" src="https://badgen.net/badge/Slack/Join%20RisingWave/0abd59?icon=slack" />
</a>
<a
href="https://buildkite.com/risingwavelabs/main"
target="_blank"
>
<img alt="Build status" src="https://badge.buildkite.com/9394d2bca0f87e2e97aa78b25f765c92d4207c0b65e7f6648f.svg" />
</a>
<a
href="https://codecov.io/gh/risingwavelabs/risingwave"
target="_blank"
>
<img alt="codecov" src="https://codecov.io/gh/risingwavelabs/risingwave/branch/main/graph/badge.svg?token=EB44K9K38B" />
</a>
</div>
RisingWave is a distributed SQL streaming database that enables <b>simple</b>, <b>efficient</b>, and <b>reliable</b> processing of streaming data.

![RisingWave](https://github.com/risingwavelabs/risingwave-docs/blob/0f7e1302b22493ba3c1c48e78810750ce9a5ff42/docs/images/archi_simple.png)
Expand All @@ -59,22 +80,35 @@ Learn more at [Quick Start](https://docs.risingwave.com/docs/current/get-started

## Why RisingWave for stream processing?
RisingWave adaptly tackles some of the most challenging problems in stream processing. Compared to existing stream processing systems, RisingWave shines through with the following key features:
* **Easy to learn:** RisingWave speaks PostgreSQL-style SQL, enabling users to dive into stream processing in much the same way as operating a PostgreSQL database.
* **Highly efficient in multi-stream joins:** RisingWave has made significant optimizations for multiple stream join scenarios. Users can easily join 10-20 streams (or more) efficiently in a production environment.
* **High resource utilization:** Queries within RisingWave benefit from shared computational resources, obviating the need for users to manually allocate resources for individual queries.
* **No compromise on large state management:**: The decoupled compute-storage architecture of RisingWave ensures remote persistence of internal states, and users never need to worry about the size of internal states when handling complex queries.
* **Transparent dynamic scaling:** RisingWave supports near-instantaneous dynamic scaling without any service interruptions.
* **Instant failure recovery:** RisingWave's state management mechanism allows it to recover from failure in seconds, not minutes or hours.
* **Easy to verify correctness:** RisingWave persists results in materialized views and allow users to break down complex stream computation programs into stacked materialized views, simplifying program development and result verification.
* **Simplified data stack:** RisingWave's ability to store data and serve queries eliminates the need for separate maintenance of stream processors and databases. Users can effortlessly link RisingWave to their preferred BI tools or through client libraries.
* **Simple to maintain and operate:** - RisingWave abstracts away unnecessary low-level details, allowing users to concentrate solely on SQL code-level issues.
* **Rich ecosystem:** With integrations to a diverse range of cloud systems and the PostgreSQL ecosystem, RisingWave boasts a rich and expansive ecosystem.
* **Easy to learn**
* RisingWave speaks PostgreSQL-style SQL, enabling users to dive into stream processing in much the same way as operating a PostgreSQL database.
* **Highly efficient in multi-stream joins**
* RisingWave has made significant optimizations for multiple stream join scenarios. Users can easily join 10-20 streams (or more) efficiently in a production environment.
* **High resource utilization**
* Queries in RisingWave leverage shared computational resources, eliminating the need for users to manually allocate resources for each query.
* **No compromise on large state management**
* The decoupled compute-storage architecture of RisingWave ensures remote persistence of internal states, and users never need to worry about the size of internal states when handling complex queries.
* **Transparent dynamic scaling**
* RisingWave supports near-instantaneous dynamic scaling without any service interruptions.
* **Instant failure recovery**
* RisingWave's state management mechanism allows it to recover from failure in seconds, not minutes or hours.
* **Easy to verify correctness**
* RisingWave persists results in materialized views and allow users to break down complex stream computation programs into stacked materialized views, simplifying program development and result verification.
* **Simplified data stack**
* RisingWave's ability to store data and serve queries eliminates the need for separate maintenance of stream processors and databases. Users can effortlessly connect RisingWave to their preferred BI tools or through client libraries.
* **Simple to maintain and operate**
* RisingWave abstracts away unnecessary low-level details, allowing users to concentrate solely on SQL code-level issues.
* **Rich ecosystem**
* With integrations to a diverse range of cloud systems and the PostgreSQL ecosystem, RisingWave boasts a rich and expansive ecosystem.

## RisingWave's limitations
RisingWave isn’t a panacea for all data engineering hurdles. It has its own set of limitations:
* **No programmable interfaces:** RisingWave does not provide low-level APIs in languages like Java and Scala, and does not allow users to manage internal states manually (unless you want to hack!). For coding in Java, Scala, and other languages, please consider using RisingWave's User-Defined Functions (UDF).
* **No support for transaction processing:** RisingWave isn’t cut out for transactional workloads, thus it’s not a viable substitute for operational databases dedicated to transaction processing. However, it supports read-only transactions, ensuring data freshness and consistency. It also comprehends the transactional semantics of upstream database Change Data Capture (CDC).
* **Not tailored for ad-hoc analytical queries:** RisingWave's row store design is tailored for optimal stream processing performance rather than interactive analytical workloads. Hence, it's not a suitable replacement for OLAP databases. Yet, a reliable integration with many OLAP databases exists, and a collaborative use of RisingWave and OLAP databases is a common practice among many users.
* **No programmable interfaces**
* RisingWave does not provide low-level APIs in languages like Java and Scala, and does not allow users to manage internal states manually (unless you want to hack!). For coding in Java, Scala, and other languages, please consider using RisingWave's User-Defined Functions (UDF).
* **No support for transaction processing**
* RisingWave isn’t cut out for transactional workloads, thus it’s not a viable substitute for operational databases dedicated to transaction processing. However, it supports read-only transactions, ensuring data freshness and consistency. It also comprehends the transactional semantics of upstream database Change Data Capture (CDC).
* **Not tailored for ad-hoc analytical queries**
* RisingWave's row store design is tailored for optimal stream processing performance rather than interactive analytical workloads. Hence, it's not a suitable replacement for OLAP databases. Yet, a reliable integration with many OLAP databases exists, and a collaborative use of RisingWave and OLAP databases is a common practice among many users.


## RisingWave Cloud
Expand Down
1 change: 1 addition & 0 deletions ci/scripts/deterministic-recovery-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ chmod +x ./risingwave_simulation

export RUST_LOG="info,\
risingwave_meta::barrier::recovery=debug,\
risingwave_meta::manager::catalog=debug,\
risingwave_meta::rpc::ddl_controller=debug,\
risingwave_meta::barrier::mod=debug,\
risingwave_simulation=debug"
Expand Down
2 changes: 2 additions & 0 deletions ci/scripts/run-micro-benchmarks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ main() {
echo "--- Getting aws instance type"
local instance_type=$(get_instance_type)
echo "instance_type: $instance_type"
echo "$instance_type" > microbench_instance_type.txt
buildkite-agent artifact upload ./microbench_instance_type.txt
if [[ $instance_type != "m6i.4xlarge" ]]; then
echo "Only m6i.4xlarge is supported, skipping microbenchmark"
exit 0
Expand Down
13 changes: 13 additions & 0 deletions ci/scripts/upload-micro-bench-results.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,19 @@ get_commit() {
| sed 's/\"//g'
}

get_machine() {
buildkite-agent artifact download microbench_instance_type.txt ./
cat ./microbench_instance_type.txt
}

echo "--- Checking microbench_instance_type"
INSTANCE_TYPE=$(get_machine)
echo "instance type: $INSTANCE_TYPE"
if [[ $INSTANCE_TYPE != "m6i.4xlarge" ]]; then
echo "Only m6i.4xlarge is supported, microbenchmark was skipped"
exit 0
fi

setup

BUILDKITE_BUILD_URL="https://buildkite.com/risingwavelabs/main-cron/builds/$BUILDKITE_BUILD_NUMBER"
Expand Down
10 changes: 5 additions & 5 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
version: "3"
services:
compactor-0:
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.2.0}"
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.3.0}"
command:
- compactor-node
- "--listen-addr"
Expand Down Expand Up @@ -37,7 +37,7 @@ services:
timeout: 5s
retries: 5
compute-node-0:
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.2.0}"
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.3.0}"
command:
- compute-node
- "--listen-addr"
Expand Down Expand Up @@ -122,7 +122,7 @@ services:
timeout: 5s
retries: 5
frontend-node-0:
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.2.0}"
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.3.0}"
command:
- frontend-node
- "--listen-addr"
Expand Down Expand Up @@ -179,7 +179,7 @@ services:
timeout: 5s
retries: 5
meta-node-0:
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.2.0}"
image: "ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.3.0}"
command:
- meta-node
- "--listen-addr"
Expand Down Expand Up @@ -295,7 +295,7 @@ services:
timeout: 5s
retries: 5
connector-node:
image: ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.2.0}
image: ghcr.io/risingwavelabs/risingwave:${RW_IMAGE_VERSION:-v1.3.0}
entrypoint: "/risingwave/bin/connector-node/start-service.sh"
ports:
- 50051
Expand Down
1 change: 1 addition & 0 deletions proto/expr.proto
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ message AggCall {
MODE = 24;
LAST_VALUE = 25;
GROUPING = 26;
INTERNAL_LAST_SEEN_VALUE = 27;
}
Type type = 1;
repeated InputRef args = 2;
Expand Down
2 changes: 1 addition & 1 deletion src/batch/src/executor/aggregation/filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ impl AggregateFunction for Filter {
mod tests {
use risingwave_common::test_prelude::StreamChunkTestExt;
use risingwave_expr::aggregate::{build_append_only, AggCall};
use risingwave_expr::expr::{build_from_pretty, Expression, LiteralExpression};
use risingwave_expr::expr::{build_from_pretty, ExpressionBoxExt, LiteralExpression};

use super::*;

Expand Down
2 changes: 1 addition & 1 deletion src/batch/src/executor/project_set.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ mod tests {
use risingwave_common::catalog::{Field, Schema};
use risingwave_common::test_prelude::*;
use risingwave_common::types::DataType;
use risingwave_expr::expr::{Expression, InputRefExpression, LiteralExpression};
use risingwave_expr::expr::{ExpressionBoxExt, InputRefExpression, LiteralExpression};
use risingwave_expr::table_function::repeat;

use super::*;
Expand Down
14 changes: 12 additions & 2 deletions src/expr/core/src/aggregate/def.rs
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,9 @@ pub enum AggKind {
PercentileDisc,
Mode,
Grouping,

/// Return last seen one of the input values.
InternalLastSeenValue,
}

impl AggKind {
Expand Down Expand Up @@ -264,6 +267,7 @@ impl AggKind {
PbType::PercentileDisc => Ok(AggKind::PercentileDisc),
PbType::Mode => Ok(AggKind::Mode),
PbType::Grouping => Ok(AggKind::Grouping),
PbType::InternalLastSeenValue => Ok(AggKind::InternalLastSeenValue),
PbType::Unspecified => bail!("Unrecognized agg."),
}
}
Expand Down Expand Up @@ -294,8 +298,9 @@ impl AggKind {
Self::VarSamp => PbType::VarSamp,
Self::PercentileCont => PbType::PercentileCont,
Self::PercentileDisc => PbType::PercentileDisc,
Self::Grouping => PbType::Grouping,
Self::Mode => PbType::Mode,
Self::Grouping => PbType::Grouping,
Self::InternalLastSeenValue => PbType::InternalLastSeenValue,
}
}
}
Expand Down Expand Up @@ -422,6 +427,7 @@ pub mod agg_kinds {
| AggKind::BoolAnd
| AggKind::BoolOr
| AggKind::ApproxCountDistinct
| AggKind::InternalLastSeenValue
};
}
pub use single_value_state;
Expand Down Expand Up @@ -450,7 +456,11 @@ impl AggKind {
/// Get the total phase agg kind from the partial phase agg kind.
pub fn partial_to_total(self) -> Option<Self> {
match self {
AggKind::BitXor | AggKind::Min | AggKind::Max | AggKind::Sum => Some(self),
AggKind::BitXor
| AggKind::Min
| AggKind::Max
| AggKind::Sum
| AggKind::InternalLastSeenValue => Some(self),
AggKind::Sum0 | AggKind::Count => Some(AggKind::Sum0),
agg_kinds::simply_cannot_two_phase!() => None,
agg_kinds::rewritten!() => None,
Expand Down
Loading

0 comments on commit 434e69a

Please sign in to comment.