Skip to content

Commit

Permalink
Initial commit - import code
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffxiang committed Aug 7, 2024
1 parent 1c72411 commit 360291b
Show file tree
Hide file tree
Showing 365 changed files with 9,067 additions and 0 deletions.
29 changes: 29 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
target/
pom.xml.tag
pom.xml.releaseBackup
pom.xml.versionsBackup
pom.xml.next
release.properties
dependency-reduced-pom.xml
buildNumber.properties
.mvn/timing.properties
# https://github.com/takari/maven-wrapper#usage-without-binary-jar
.mvn/wrapper/maven-wrapper.jar

# Eclipse m2e generated files
# Eclipse Core
.project
# JDT-specific (Eclipse Java Development Tools)
.classpath

# intellij
.idea
*.iml

# vscode
.vscode

# quickstart
ts-examples/quickstart-scripts/target
ts-examples/src/main/resources/server.properties
ts-examples/quickstart-scripts/config/storage/*.json
7 changes: 7 additions & 0 deletions ADOPTERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Adopters

This is an alphabetical list of people and organizations who are using this
project. If you'd like to be included here, please send a Pull Request that
adds your information to this file.

- [Pinterest](https://www.pinterest.com/)
40 changes: 40 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Code of Conduct

At Pinterest, we work hard to ensure that our work environment is welcoming
and inclusive to as many people as possible. We are committed to creating this
environment for everyone involved in our open source projects as well. We
welcome all participants regardless of ability, age, ethnicity, identified
gender, religion (or lack there of), sexual orientation and socioeconomic
status.

This code of conduct details our expectations for upholding these values.

## Good behavior

We expect members of our community to exhibit good behavior including (but of
course not limited to):

- Using intentional and empathetic language.
- Focusing on resolving instead of escalating conflict.
- Providing constructive feedback.

## Unacceptable behavior

Some examples of unacceptable behavior (again, this is not an exhaustive
list):

- Harassment, publicly or in private.
- Trolling.
- Sexual advances (this isn’t the place for it).
- Publishing other’s personal information.
- Any behavior which would be deemed unacceptable in a professional environment.

## Recourse

If you are witness to or the target of unacceptable behavior, it should be
reported to Pinterest at [email protected]. All reporters will
be kept confidential and an appropriate response for each incident will be
evaluated.

If the maintainers do not uphold and enforce this code of conduct in
good faith, community leadership will hold them accountable.
58 changes: 58 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Contributing

First off, thanks for taking the time to contribute! This guide will answer
some common questions about how this project works.

While this is a Pinterest open source project, we welcome contributions from
everyone. Regular outside contributors can become project maintainers.

## Help

If you're having trouble using this project, please start by reading the [`README.md`](README.md)
and searching for solutions in the existing open and closed issues.

## Security

If you've found a security issue in one of our open source projects,
please report it at [Bugcrowd](https://bugcrowd.com/pinterest); you may even
make some money!

## Code of Conduct

Please be sure to read and understand our [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md).
We work hard to ensure that our projects are welcoming and inclusive to as many
people as possible.

## Reporting Issues

If you have a bug report, please provide as much information as possible so that
we can help you out:

- Version of the project you're using.
- Code (or even better whole projects) which reproduce the issue.
- Steps which reproduce the issue.
- Screenshots, GIFs or videos (if relevant).
- Stack traces for crashes.
- Any logs produced.

## Making Changes

1. Fork this repository to your own account
2. Make your changes and verify that tests pass
3. Commit your work and push to a new branch on your fork
4. Submit a pull request
5. Participate in the code review process by responding to feedback

Once there is agreement that the code is in good shape, one of the project's
maintainers will merge your contribution.

To increase the chances that your pull request will be accepted:

- Follow the coding style
- Write tests for your changes
- Write a good commit message

## License

By contributing to this project, you agree that your contributions will be
licensed under its [license](LICENSE).
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Tiered Storage Common

## Overview
This module contains common classes and interfaces that are used by the `ts-consumer` and `ts-segment-uploader` modules,
such as Metrics, StorageEndpointProvider, etc.

## Build
To build with maven:
```
mvn clean install
```

## Test
To run tests with maven:
```
mvn test
```
9 changes: 9 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Reporting Security Issues

If you discover a security issue in this project, please report it using
[Bugcrowd](https://bugcrowd.com/pinterest).

This will allow us to assess the risk and make a fix available before we
publish a public bug report.

Thanks for helping us make our software safe for everyone!
Binary file added docs/images/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/consumer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/uploader.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 65 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Quick Start

## Prerequisites
- Java 11+
- Maven 3+
- AWS S3 bucket of your choice
- Linux machine with access to the S3 bucket
- At least ~10-20 GB of free disk space

Clone the repo and then build with Maven:
```
mvn clean package -DskipTests
```

Now, navigate to the `ts-examples` directory: `cd ts-examples`.

## 1. Start Local Kafka
First, let's start a local Kafka cluster. You can do this by running
`./quickstart-scripts/startlocalkafka.sh`. This will do the following:
1. Download Kafka 2.3.1 binaries
2. Start a local cluster with 1 local broker
3. Create a topic `my_test_topic` with a single partition

## 2. Run Local SegmentUploader
Now, let's run the uploader locally.

### Configurations

#### Storage Endpoint Configuration
Create a JSON file inside [ts-examples/quickstart-scripts/config/storage](../ts-examples/quickstart-scripts/config/storage) and name it `my_test_kafka_cluster.json`. Don't worry about accidentally committing this file - files in this directory are ignored by `.gitignore`. The content of the JSON file should look like:
```
{
"bucket_public": "my-bucket-public", ## change this to a bucket you have access to
"bucket_pii": "my-bucket-pii", ## change this to a bucket you have access to
"prefix": "kafka_tiered_storage_logs/prefix", ## whatever prefix you'd like
"topics": {
"my_test_topic": {
"pii": true
}
}
}
```
This JSON file specifies the upload S3 destination for each topic and is read by [ExampleS3StorageServiceEndpointProvider.java](../ts-examples/src/main/java/com/pinterest/kafka/tieredstorage/common/discovery/s3/ExampleS3StorageServiceEndpointProvider.java).

#### Uploader Configuration
The uploader configuration file has already been pre-filled for you [here](../ts-examples/quickstart-scripts/config/my_test_kafka_cluster.properties). Inspect it to see how the `storage.service.endpoint.provider.class` configuration takes in a FQDN classname for the `StorageServiceEndpointProvider` class that should be used by the uploader.

### Start Uploader
Simply execute `./quickstart-scripts/startuploader.sh` to start the uploader process.

## 3. Send Test Data
Let's send some test data to our local topic `my_test_topic`. Do this by running `./quickstart-scripts/sendtestdata.sh`, which will send 10,000,000 (10 million) messages to the topic every time you run the script.

### Track the log directory
Inspect `/tmp/kafka-logs/my_test_topic-0` directory and notice how there should be some log segments being generated. The one we're interested in ends in `.log` (likely `00000000000000000000.log` at the moment). Once that file fills up to 1G (default `log.segment.bytes`), Kafka will close that segment and open a new one. This is also the triggering condition for our uploader to upload the segment to S3.

Re-run the `./quickstart-scripts/sendtestdata.sh` script as many times as you want / need in order to see that the segment fill up to 1G and a new segment is created.

## 4. Verify Upload
Once the segment is rotated, the uploader should have triggered an upload of that `.log` segment file, along with the `.index`, `.timeindex` files. The uploader will also upload an `offset.wm` file after successful uploads of all 3. The uploader logs should show this if you take a look.

You can also verify the successful upload by inspecting the S3 path that you've defined in step 2 using `aws s3 ls s3://<my-bucket>/<my-prefix>/my_test_cluster/my_test_topic-0/`.

## 5. Consume Data
Finally, we can consume the data that we've uploaded. Do so by executing `./quickstart-scripts/consumetestdata.sh`. This runs a [TieredStorageConsumer](../ts-consumer/src/main/java/com/pinterest/kafka/tieredstorage/consumer/TieredStorageConsumer.java) instance using `TIERED_STORAGE_ONLY` consumption mode, meaning that it will only read data available on S3, and not from the Kafka broker directly.
134 changes: 134 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.pinterest.kafka.tieredstorage</groupId>
<artifactId>kafka-tiered-storage</artifactId>
<version>0.0.2-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<artifactId>ts-common</artifactId>

<properties>
<maven.compiler.source>20</maven.compiler.source>
<maven.compiler.target>20</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.12</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.17.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.17.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<version>2.17.273</version>
</dependency>
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-jvm</artifactId>
<version>4.1.2</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>29.0-jre</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>RELEASE</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9.1</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0-M5</version>
<configuration>
<!--suppress UnresolvedMavenProperty -->
<argLine>${argLine}</argLine>
</configuration>
</plugin>
</plugins>
</build>

</project>
17 changes: 17 additions & 0 deletions ts-common/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Tiered Storage Common

## Overview
This module contains common classes and interfaces that are used by the `ts-consumer` and `ts-segment-uploader` modules,
such as Metrics, StorageEndpointProvider, etc.

## Build
To build with maven:
```
mvn clean install
```

## Test
To run tests with maven:
```
mvn test
```
Loading

0 comments on commit 360291b

Please sign in to comment.