Skip to content

Latest commit

 

History

History
708 lines (556 loc) · 25.3 KB

README.md

File metadata and controls

708 lines (556 loc) · 25.3 KB

Recent changes

Java 17 Support

To support Java 17, we needed to add support for the RUNTIME_VERSION option when creating procedures and functions with JVM languages. The runtime parameter was added to the applications DSL to support this change, but the default value will stay 11 for now to align with the Snowflake default.

Additionally, Java 17 is also being used now to compile and build this plugin.

Motivation

It needs to be easy to develop and test JVM applications even if they are being deployed to Snowflake. Using Gradle, we can easily build shaded JAR files with all dependencies included using the shadow plugin, and I've provided a sample Java project that demonstrates this basic use case:

cd examples/java-manual &&
./gradlew build

But this JAR would still have to be uploaded to a stage in Snowflake, and the UDF would have to be created or possibly recreated if its signature changed. I wanted an experience using Snowflake that is as natural to developers using IntelliJ or VS Code for standard JVM projects.

The Gradle Snowflake Plugin

This plugin provides easy configuration options for those getting started with Gradle but also provides advanced features for teams already using Gradle in other areas of the organization. It has three basic modes:

  1. Lightweight publishing to internal Snowflake stages using Snowpark.
  2. Slightly heavier publishing using external Snowflake stages and autoconfiguration of the maven-publish plugin.
  3. Publishing to Snowflake using external stages and custom configuration of the maven-publish plugin.

Have a look at the API docs.

This plugin can be used to build UDFs and procedures in any JVM language supported by Gradle, which currently provides official support for Java, Scala, Kotlin and Groovy. See the examples directory for examples using different languages.

Internal Stages using Snowpark

Unless you have a heavy investment in Gradle as an organization, this is likely the option you want to use. Additionally, if you plan on sharing UDFs across Snowflake accounts, this is the option you have to use, as JARs need to be in named internal stages. Look at the sample Java project using the plugin and you'll notice a few differences in the build file.

cd examples/java &&
cat build.gradle

We applied io.github.stewartbryson.snowflake and removed com.github.johnrengelman.shadow because the shadow plugin is automatically applied by the snowflake plugin:

plugins {
   id 'java'
   id 'io.github.stewartbryson.snowflake' version '@version@'
}

We now have the snowflakeJvm task available:

❯ ./gradlew help --task snowflakeJvm

> Task :help
Detailed task information for snowflakeJvm

Path
     :snowflakeJvm

Type
     SnowflakeJvm (io.github.stewartbryson.SnowflakeJvm)

Options
     --config     Custom credentials config file.

     --connection     Override the credentials connection to use. Default: use the base connection info in credentials config.

     --jar     Optional: manually pass a JAR file path to upload instead of relying on Gradle metadata.

     --stage     Override the Snowflake stage to publish to.

     --rerun     Causes the task to be re-run even if up-to-date.

Description
     A Cacheable Gradle task for publishing UDFs and procedures to Snowflake

Group
     publishing

BUILD SUCCESSFUL in 2s
5 actionable tasks: 3 executed, 2 up-to-date

Several command-line options mention overriding other configuration values. This is because the plugin also provides a DSL closure called snowflake that we can use to configure our build, which is documented in the class API docs:

snowflake {
   connection = 'gradle_plugin'
   stage = 'upload'
   applications {
      add_numbers {
         inputs = ["a integer", "b integer"]
         returns = "string"
         runtime = '17'
         handler = "Sample.addNum"
      }
   }
}

Snowflake credentials are managed in a config file, with the default being ~/.snowflake/config.toml as prescribed by the Snowflake Developer CLI project. As a secondary location, we also support the SnowSQL config file. In searching for a credentials config file, the plugin works in the following order:

  1. A custom location of your choosing, configured with the --config option in applicable tasks.
  2. <HOME_DIR>/.snowflake/config.toml
  3. <HOME_DIR>/.snowsql/config
  4. ./config.toml (This is useful in CI/CD pipelines, where secrets can be written easily to this file.)

The connection property in the plugin DSL defines which connection to use from the config file, relying on the default values if none is provided. It first loads all the default values, and replaces any values from the connection, similar to how Snowflake CLI and SnowSQL work.

The nested applications DSL might seem a bit daunting. This is a simple way to configure all the different UDFs and procedures we want to automatically create (or recreate) each time we publish the JAR file. The example above will generate and execute the following statement. Notice that the imports part of the statement is automatically added by the plugin as the JAR (including our code and all dependencies) is automatically uploaded to Snowflake:

CREATE OR REPLACE function add_numbers (a integer, b integer)
  returns string
  language JAVA
  handler = 'Sample.addNum'
  runtime_version = '17'
  imports = ('@upload/libs/java-0.1.0-all.jar')

With the configuration complete, we can execute the snowflakeJvm task, which will run any unit tests (see Testing below) and then publish our JAR and create our function. Note that if the named internal stage does not exist, the plugin will create it first:

❯ ./gradlew snowflakeJvm

> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
File java-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
  returns string
  language JAVA
  handler = 'Sample.addNum'
  runtime_version = '17'
  imports = ('@upload/libs/java-0.1.0-all.jar')


BUILD SUCCESSFUL in 5s
7 actionable tasks: 2 executed, 5 up-to-date

Our function now exists in Snowflake:

select add_numbers(1,2);

Sum is: 3

The snowflakeJvm task was written to be incremental and cacheable. If we run the task again without making any changes to task inputs (our code) or outputs, then the execution is avoided, which we know because of the up-to-date keyword.

❯ ./gradlew snowflakeJvm

BUILD SUCCESSFUL in 624ms
3 actionable tasks: 3 up-to-date

This project uses the S3 Gradle build cache plugin and we are very happy with it. Check the Gradle settings file for an implementation example.

Testing

Gradle has built-in Testing Suites for JVM projects, and using this functionality for JVM applications with Snowflake was my main motivation for writing this plugin. We'll now describe two topics regarding this plugin: unit testing and functional testing.

Unit Testing

Unit testing with Gradle is a dense topic, and the documentation will inform developers better than we can. In general, unit tests are what developers write regardless of where the code eventually gets executed. In the Java with testing example, you can see a sample testing specification (spec) using the Spock Framework and written in Groovy, which is a requirement for that framework, as is applying the groovy plugin to the project.

cd examples/java-testing

Our plugins DSL from the build file:

plugins {
    id 'java'
    id 'groovy' // needed for Spock testing framework
    id 'io.github.stewartbryson.snowflake' version '@version@'
}

Our choice to use the Spock framework in the default test testing suite:

test {
   useSpock('2.3-groovy-3.0')
}

And the unit test SampleTest spec in src/test/groovy:

import spock.lang.Shared
import spock.lang.Specification
import spock.lang.Subject

class SampleTest extends Specification {
   @Shared
   @Subject
   def sample = new Sample()

   def "adding 1 and 2"() {
      when: "Two numbers"
      def a = 1
      def b = 2
      then: "Add numbers"
      sample.addNum(a, b) == "Sum is: 3"
   }

   def "adding 3 and 4"() {
      when: "Two numbers"
      def a = 3
      def b = 4
      then: "Add numbers"
      sample.addNum(a, b) == "Sum is: 7"
   }
}

All unit tests in either src/test/java (written using JUnit or something else) or src/test/groovy (written with Spock) will automatically run whenever the test or build task is executed.

❯ ./gradlew build

> Task :test

SampleTest

  Test adding 1 and 2 PASSED
  Test adding 3 and 4 PASSED

SUCCESS: Executed 2 tests in 487ms

BUILD SUCCESSFUL in 1s
9 actionable tasks: 5 executed, 4 up-to-date

All Gradle testing tasks are automatically incremental and cacheable, and would be avoided if executed again without changes to the code in either the source or the spec. The same applies for the topic of functional testing below.

Functional Testing

Functional testing describes what the system does, and in my mind, this involves testing our deployed code in Snowflake. Regardless of what we call it, we know we need this as a crucial component in our build chain. This plugin contains a custom Spock Specification class called SnowflakeSpec that can be used in a new test suite. By default, this test suite is called functionalTest, though the name can be configured using the testSuite property.

Here is our configuration of the functionalTest test suite. The DSL provided by Gradle in JVM Testing is convoluted (in my opinion), but no one asked me:

functionalTest(JvmTestSuite) {
   targets {
       all {
           useSpock('2.3-groovy-3.0')
           dependencies {
               implementation "io.github.stewartbryson:gradle-snowflake-plugin:@version@"
           }
           testTask.configure {
               failFast true
               // which credentials connection to use
               systemProperty 'connection', project.snowflake.connection
           }
       }
   }
}

I'll walk through a few of these points. So that the SnowflakeSpec is available in the test classpath, we have to declare the plugin as a dependency to the test suite. Notice that we use the library maven coordinates, which are different from the coordinates in the plugins DSL. Additionally, our test specs are unaware of all the configurations of our Gradle build, so we have to pass our connection property as a Java system property to the SnowflakeSpec class.

This is the SnowflakeSampleTest spec in src/functionalTest/groovy:

import groovy.util.logging.Slf4j
import io.github.stewartbryson.SnowflakeSpec

/**
 * The SnowflakeSpec used for testing functions.
 */
@Slf4j
class SnowflakeSampleTest extends SnowflakeSpec {

   def 'ADD_NUMBERS() function with 1 and 2'() {
      when: "Two numbers exist"
      def a = 1
      def b = 2

      then: 'Add two numbers using ADD_NUMBERS()'
      selectFunction("add_numbers", [a,b]) == 'Sum is: 3'
   }

   def 'ADD_NUMBERS() function with 3 and 4'() {
      when: "Two numbers exist"
      def a = 3
      def b = 4

      then: 'Add two numbers using ADD_NUMBERS()'
      selectFunction("add_numbers", [a,b]) == 'Sum is: 7'
   }

}

The selectFunction method is an easy way to execute a function and test the results by just passing the function name and a list of arguments to pass to that function. And of course, this executes against Snowflake in real time.

❯ ./gradlew functionalTest

> Task :test

SampleTest

  Test adding 1 and 2 PASSED
  Test adding 3 and 4 PASSED

SUCCESS: Executed 2 tests in 481ms


> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
  returns string
  language JAVA
  handler = 'Sample.addNum'
  runtime_version = '17'
  imports = ('@upload/libs/java-testing-0.1.0-all.jar')


> Task :functionalTest

SnowflakeSampleTest

  Test ADD_NUMBERS() function with 1 and 2 PASSED (1.1s)
  Test ADD_NUMBERS() function with 3 and 4 PASSED

SUCCESS: Executed 2 tests in 3.8s


BUILD SUCCESSFUL in 11s
11 actionable tasks: 7 executed, 4 up-to-date

Testing with Ephemeral Database Clones

Running functional tests using static Snowflake databases is boring, especially considering the zero-copy cloning functionality available. The plugin supports cloning an ephemeral database from the database we connect to and using it for testing our application. This workflow is useful for CI/CD processes and is configured with the plugin DSL. The plugin is aware when it is running in CI/CD environments and currently supports:

Because the plugin is aware when executing in CICD environments, we expose that information through the DSL, and can use it to control our cloning behavior.

Our build file change to enable ephemeral testing:

snowflake {
   connection = 'gradle_plugin'
   stage = 'upload'
   useEphemeral = project.snowflake.isCI() // use ephemeral with CICD workflows
   keepEphemeral = project.snowflake.isPR() // keep ephemeral for PRs
   applications {
      add_numbers {
         inputs = ["a integer", "b integer"]
         returns = "string"
         runtime = '17'
         handler = "Sample.addNum"
      }
   }
}

The useEphemeral property will determine whether the createEphemeral and dropEphemeral tasks are added at the beginning and end of the build, respectively. This allows for the functionalTest task to be run in the ephemeral clone just after our application is published. We've also added a little extra magic to keep the clone when building a pull request. The createEphemeral task issues a CREATE DATABASE... IF NOT EXISTS statement, so if will not fail if the clone exists from a prior run. Remember that our SnowflakeSpec class doesn't automatically know the details of our build, so we have to provide the ephemeral name using java system properties. Here is our modified testing suite:

functionalTest(JvmTestSuite) {
   targets {
       all {
           useSpock('2.3-groovy-3.0')
           dependencies {
               implementation "io.github.stewartbryson:gradle-snowflake-plugin:1.1.4"
           }
           testTask.configure {
               failFast true
               // which credentials connection to use
               systemProperty 'connection', project.snowflake.connection
               // if this is ephemeral, the test spec needs the name to connect to
               if (project.snowflake.useEphemeral) {
                   systemProperty 'ephemeralName', snowflake.ephemeralName
               }
           }
       }
   }
}

We can simulate a GitHub Actions environment just by setting the GITHUB_ACTIONS environment variable:

export GITHUB_ACTIONS=true
❯ ./gradlew functionalTest

> Task :createEphemeral
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Ephemeral clone EPHEMERAL_JAVA_TESTING_PR_4 created.

> Task :snowflakeJvm
Reusing existing connection.
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
  returns string
  language JAVA
  handler = 'Sample.addNum'
  runtime_version = '17'
  imports = ('@upload/libs/java-testing-0.1.0-all.jar')


> Task :functionalTest

SnowflakeSampleTest

  Test ADD_NUMBERS() function with 1 and 2 PASSED
  Test ADD_NUMBERS() function with 3 and 4 PASSED

SUCCESS: Executed 2 tests in 3s

BUILD SUCCESSFUL in 6s
13 actionable tasks: 4 executed, 9 up-to-date

When the CI/CD environment is detected, the plugin will name the ephemeral database clone based on the pull request number, the branch name, or the tag name instead of an autogenerated one. If we prefer to simply specify a clone name instead of relying on the plugin to generate it, that is supported as well:

ephemeralName = 'testing_db'
❯ ./gradlew functionalTest

> Task :createEphemeral
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Ephemeral clone testing_db created.

> Task :snowflakeJvm
Reusing existing connection.
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
 returns string
 language JAVA
 handler = 'Sample.addNum'
 runtime_version = '17'
 imports = ('@upload/libs/java-testing-0.1.0-all.jar')


> Task :functionalTest

SnowflakeSampleTest

 Test ADD_NUMBERS() function with 1 and 2 PASSED
 Test ADD_NUMBERS() function with 3 and 4 PASSED

SUCCESS: Executed 2 tests in 3.3s


> Task :dropEphemeral
Reusing existing connection.
Ephemeral clone testing_db dropped.

BUILD SUCCESSFUL in 7s
13 actionable tasks: 4 executed, 9 up-to-date

Autoconfiguration of maven-publish with External Stages

This option is useful when you want your artifacts available to consumers other than just Snowflake without publishing them to disparate locations. Gradle has built-in support for S3 or GCS as a Maven repository, and Snowflake has support for S3 or GCS external stages, so we simply marry the two in a single location. Looking at the sample project, notice we've populated a few additional properties:

snowflake {
   connection = 'gradle_plugin'
   stage = 's3_maven'
   groupId = 'io.github.stewartbryson'
   artifactId = 'sample-udfs'
   applications {
      add_numbers {
         inputs = ["a integer", "b integer"]
         returns = "string"
         runtime = '17'
         handler = "Sample.addNum"
      }
   }
}

The groupId and artifactId, plus the built-in version property that exists for all Gradle builds, provide the Maven coordinates for publishing externally to S3 or GCS. I've also created a property in my local gradle.properties file for the bucket:

# local file in ~/.gradle/gradle.properties
snowflake.publishUrl='s3://myrepo/release'

The plugin doesn't create the stage, but it does do a check to ensure that the Snowflake stage metadata matches the value in publishUrl. We get a few new tasks added to our project:

❯ ./gradlew tasks --group publishing

> Task :tasks

------------------------------------------------------------
Tasks runnable from root project 'java-external-stage'
------------------------------------------------------------

Publishing tasks
----------------
generateMetadataFileForSnowflakePublication - Generates the Gradle metadata file for publication 'snowflake'.
generatePomFileForSnowflakePublication - Generates the Maven POM file for publication 'snowflake'.
publish - Publishes all publications produced by this project.
publishAllPublicationsToS3_mavenRepository - Publishes all Maven publications produced by this project to the s3_maven repository.
publishSnowflakePublicationToMavenLocal - Publishes Maven publication 'snowflake' to the local Maven repository.
publishSnowflakePublicationToS3_mavenRepository - Publishes Maven publication 'snowflake' to Maven repository 's3_maven'.
publishToMavenLocal - Publishes all Maven publications produced by this project to the local Maven cache.
snowflakeJvm - A Cacheable Gradle task for publishing UDFs and procedures to Snowflake
snowflakePublish - Lifecycle task for all Snowflake publication tasks.

To see all tasks and more detail, run gradlew tasks --all

To see more detail about a task, run gradlew help --task <task>

BUILD SUCCESSFUL in 875ms
5 actionable tasks: 1 executed, 4 up-to-date

These are granular tasks for building metadata and POM files and publishing that along with the artifacts to S3. But the snowflakeJvm task initiates all these dependent tasks, including publishSnowflakePublicationToS3_mavenRepository which uploads the artifact but unfortunately doesn't provide console output to that effect:

❯ ./gradlew snowflakeJvm --no-build-cache --console plain
> Task :gradle-snowflake:gradle-snowflake-plugin:compileJava NO-SOURCE
> Task :gradle-snowflake:gradle-snowflake-plugin:compileGroovy UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:pluginDescriptors UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:processResources UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:classes UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:jar UP-TO-DATE
> Task :compileJava
> Task :processResources NO-SOURCE
> Task :classes
> Task :shadowJar
> Task :compileTestJava NO-SOURCE
> Task :processTestResources NO-SOURCE
> Task :testClasses UP-TO-DATE
> Task :test NO-SOURCE
> Task :generatePomFileForSnowflakePublication
> Task :publishSnowflakePublicationToS3_mavenRepository

> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
  returns string
  language JAVA
  handler = 'Sample.addNum'
  runtime_version = '17'
  imports = ('@s3_maven/io/github/stewartbryson/sample-udfs/0.1.0/sample-udfs-0.1.0-all.jar')


BUILD SUCCESSFUL in 4s
9 actionable tasks: 5 executed, 4 up-to-date

Manual configuration of maven-publish with External Stages

For organizations that already use maven-publish extensively, or have customizations outside the scope of autoconfiguration, the plugin supports disabling autoconfiguration:

useCustomMaven = true

We then configure publications and repositories as described in the maven-publish documentation, and add task dependencies for the snowflakeJvm task. The publishUrl property is no longer required because it's configured in the publications closure, but if provided, the plugin will ensure it matches the metadata for the stage property. groupId and artifactId are still required so that snowflakeJvm can autogenerate the imports section of the CREATE OR REPLACE... statement.

Contributing

Anyone can contribute! You don't need permission, my blessing, or expertise, clearly. To make changes to the README.md file, please make them in the master README file instead. The version tokens in this file are automatically replaced with the current value before publishing.

Three different testing tasks are defined:

❯ ./gradlew tasks --group verification

> Task :tasks

------------------------------------------------------------
Tasks runnable from root project 'gradle-snowflake'
------------------------------------------------------------

Verification tasks
------------------
check - Runs all checks.
functionalTest - Runs the functional test suite.
integrationTest - Runs the integration test suite.
test - Runs the test suite.

To see all tasks and more detail, run gradlew tasks --all

To see more detail about a task, run gradlew help --task <task>

BUILD SUCCESSFUL in 1s
1 actionable task: 1 executed

The functionalTest task contains all the tests that actually make a connection to Snowflake and test a deployment, except those involved with external stages. You need to add a connection in ~/.snowflake/config.toml called gradle_plugin.

WARNING: Ensure that the credentials you provide in gradle_plugin are safe for development purposes.

The integrationTest task requires the following external stages to exist in your Snowflake account:

  • gcs_maven: An external stage in GCS.
  • s3_maven: An external stage in S3.

It also requires the following gradle properties to be set, with the easiest method being placing them in ~/.gradle/gradle.properties:

  • gcsPublishUrl: the GCS url of gcs_maven.
  • s3PublishUrl: the S3 url of s3_maven.

It is understandable if you are unable to test external stages as part of your contribution. We segmented them out for this reason.

Open a pull request against the develop branch so that it can be merged and possibly tweaked before we open the PR against the main branch. This will also enable us to test external stages.