Skip to content

Commit

Permalink
Docs & notebook to use the published jar
Browse files Browse the repository at this point in the history
No need to build the jar by oneself any more
  • Loading branch information
juhoautio committed Apr 16, 2021
1 parent 81f1d13 commit 99eb45c
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 119 deletions.
28 changes: 9 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,15 @@ satisfies the requirements of `DruidSource`
First, set the following spark conf:

```python
.conf("spark.jars",
"s3://my-bucket/my/prefix/rovio-ingest-1.0-SNAPSHOT.jar") \

.conf("spark.jars.repositories",
"https://s01.oss.sonatype.org/content/repositories/snapshots") \
.conf("spark.jars.packages",
"com.rovio.ingest:rovio-ingest:1.0.0_spark_3.0.1-SNAPSHOT") \
.conf("spark.submit.pyFiles",
"s3://my-bucket/my/prefix/rovio_ingest.zip")
```

This is assuming that you [built from source](#building-from-source) and copied the packages to s3.
This is assuming that you [built a python zip](#building-rovio_ingest-python) and copied it to s3.

```python
from rovio_ingest import DRUID_SOURCE
Expand Down Expand Up @@ -170,14 +171,13 @@ Maven: see [Java](#java).

A `Dataset[Row]` extension is provided to repartition the dataset for the `DruidSource` Datasource.

First, set the following spark conf:
For an interactive spark session you can set the following spark conf:

```scala
("spark.jars", "s3://my-bucket/my/prefix/rovio-ingest-1.0-SNAPSHOT.jar")
("spark.jars.repositories", "https://s01.oss.sonatype.org/content/repositories/snapshots"),
("spark.jars.packages", "com.rovio.ingest:rovio-ingest:1.0.0_spark_3.0.1-SNAPSHOT")
```

This is assuming that you [built from source](#building-rovio-ingest-jar) and copied the jar to s3.

```scala
import org.apache.spark.sql.{Dataset, Row, SaveMode, SparkSession}
import com.rovio.ingest.extensions.DruidDatasetExtensions._
Expand Down Expand Up @@ -235,14 +235,6 @@ Maven (for a full example, see [examples/rovio-ingest-maven-example](examples/ro

A `DruidDataset` wrapper class is provided to repartition the dataset for the `DruidSource` DataSource.

First, set the following spark conf:

```java
("spark.jars", "s3://my-bucket/my/prefix/rovio-ingest-1.0-SNAPSHOT.jar")
```

This is assuming that you [built from source](#building-rovio-ingest-jar) and copied the jar to s3.

```java
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
Expand Down Expand Up @@ -372,9 +364,7 @@ To build the jar package:

The recommended way is to build a shaded jar and use it.

Another option is to depend on `rovio-ingest` as a maven module (or use the plain jar), but there may
be version conflicts between maven dependencies. If you'd still like to do it that way, see
[this notebook](python/notebooks/druid_ingestion_test.ipynb) for guidance.
To test the jar in practice, see [this notebook](python/notebooks/druid_ingestion_test.ipynb) as an example.

#### Building rovio_ingest (python)

Expand Down
Loading

0 comments on commit 99eb45c

Please sign in to comment.