To support Java 17, we needed to add support for the RUNTIME_VERSION
option when creating procedures and functions with JVM languages.
The runtime
parameter was added to the applications
DSL to support this change, but the default value will stay 11
for now to align with the Snowflake default.
Additionally, Java 17 is also being used now to compile and build this plugin.
It needs to be easy to develop and test JVM applications even if they are being deployed to Snowflake. Using Gradle, we can easily build shaded JAR files with all dependencies included using the shadow plugin, and I've provided a sample Java project that demonstrates this basic use case:
cd examples/java-manual &&
./gradlew build
But this JAR would still have to be uploaded to a stage in Snowflake, and the UDF would have to be created or possibly recreated if its signature changed. I wanted an experience using Snowflake that is as natural to developers using IntelliJ or VS Code for standard JVM projects.
This plugin provides easy configuration options for those getting started with Gradle but also provides advanced features for teams already using Gradle in other areas of the organization. It has three basic modes:
- Lightweight publishing to internal Snowflake stages using Snowpark.
- Slightly heavier publishing using external Snowflake stages and autoconfiguration of
the
maven-publish
plugin. - Publishing to Snowflake using external stages and custom configuration of
the
maven-publish
plugin.
Have a look at the API docs.
This plugin can be used to build UDFs and procedures in any JVM language supported by Gradle, which currently provides official support for Java, Scala, Kotlin and Groovy. See the examples directory for examples using different languages.
Unless you have a heavy investment in Gradle as an organization, this is likely the option you want to use. Additionally, if you plan on sharing UDFs across Snowflake accounts, this is the option you have to use, as JARs need to be in named internal stages. Look at the sample Java project using the plugin and you'll notice a few differences in the build file.
cd examples/java &&
cat build.gradle
We applied io.github.stewartbryson.snowflake
and removed com.github.johnrengelman.shadow
because the shadow
plugin
is automatically applied by the snowflake
plugin:
plugins {
id 'java'
id 'io.github.stewartbryson.snowflake' version '@version@'
}
We now have the snowflakeJvm
task available:
❯ ./gradlew help --task snowflakeJvm
> Task :help
Detailed task information for snowflakeJvm
Path
:snowflakeJvm
Type
SnowflakeJvm (io.github.stewartbryson.SnowflakeJvm)
Options
--config Custom credentials config file.
--connection Override the credentials connection to use. Default: use the base connection info in credentials config.
--jar Optional: manually pass a JAR file path to upload instead of relying on Gradle metadata.
--stage Override the Snowflake stage to publish to.
--rerun Causes the task to be re-run even if up-to-date.
Description
A Cacheable Gradle task for publishing UDFs and procedures to Snowflake
Group
publishing
BUILD SUCCESSFUL in 2s
5 actionable tasks: 3 executed, 2 up-to-date
Several command-line options mention overriding other configuration values.
This is because the plugin also provides a DSL closure called snowflake
that we can use to configure our build, which
is documented in
the class API docs:
snowflake {
connection = 'gradle_plugin'
stage = 'upload'
applications {
add_numbers {
inputs = ["a integer", "b integer"]
returns = "string"
runtime = '17'
handler = "Sample.addNum"
}
}
}
Snowflake credentials are managed in a config file, with the default being ~/.snowflake/config.toml
as prescribed by the Snowflake Developer CLI project.
As a secondary location, we also support the SnowSQL config file.
In searching for a credentials config file, the plugin works in the following order:
- A custom location of your choosing, configured with the
--config
option in applicable tasks. <HOME_DIR>/.snowflake/config.toml
<HOME_DIR>/.snowsql/config
./config.toml
(This is useful in CI/CD pipelines, where secrets can be written easily to this file.)
The connection
property in the plugin DSL defines which connection to use from the config file, relying on the default values if none is
provided.
It first loads all the default values, and replaces any values from the connection, similar to how Snowflake CLI and SnowSQL work.
The nested
applications
DSL
might seem a bit daunting.
This is a simple way to configure all the different UDFs and procedures we want to automatically create (or recreate)
each time we publish the JAR file.
The example above will generate and execute the following statement.
Notice that the imports
part of the statement is automatically added by the plugin as the JAR (including our code and all dependencies) is automatically uploaded to Snowflake:
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@upload/libs/java-0.1.0-all.jar')
With the configuration complete, we can execute the snowflakeJvm
task, which will run any unit tests (see Testing
below) and then publish
our JAR and create our function.
Note that if the named internal stage does not exist, the plugin will create it first:
❯ ./gradlew snowflakeJvm
> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
File java-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@upload/libs/java-0.1.0-all.jar')
BUILD SUCCESSFUL in 5s
7 actionable tasks: 2 executed, 5 up-to-date
Our function now exists in Snowflake:
select add_numbers(1,2);
Sum is: 3
The snowflakeJvm
task was written to be incremental and cacheable.
If we run the task again without making any changes to task inputs (our code) or outputs, then the execution is avoided,
which we know because of the up-to-date keyword.
❯ ./gradlew snowflakeJvm
BUILD SUCCESSFUL in 624ms
3 actionable tasks: 3 up-to-date
This project uses the S3 Gradle build cache plugin and we are very happy with it. Check the Gradle settings file for an implementation example.
Gradle has built-in Testing Suites for JVM projects, and using this functionality for JVM applications with Snowflake was my main motivation for writing this plugin. We'll now describe two topics regarding this plugin: unit testing and functional testing.
Unit testing with Gradle is a dense topic, and the documentation will inform developers better than we can.
In general, unit tests are what developers write regardless of where the code eventually gets executed.
In the Java with testing example, you can see a sample testing specification (spec) using the
Spock Framework and written in Groovy, which is a requirement for that framework,
as is applying the groovy
plugin to the project.
cd examples/java-testing
Our plugins
DSL from the build file:
plugins {
id 'java'
id 'groovy' // needed for Spock testing framework
id 'io.github.stewartbryson.snowflake' version '@version@'
}
Our choice to use the Spock framework in the default test
testing suite:
test {
useSpock('2.3-groovy-3.0')
}
And the unit test SampleTest
spec in src/test/groovy
:
import spock.lang.Shared
import spock.lang.Specification
import spock.lang.Subject
class SampleTest extends Specification {
@Shared
@Subject
def sample = new Sample()
def "adding 1 and 2"() {
when: "Two numbers"
def a = 1
def b = 2
then: "Add numbers"
sample.addNum(a, b) == "Sum is: 3"
}
def "adding 3 and 4"() {
when: "Two numbers"
def a = 3
def b = 4
then: "Add numbers"
sample.addNum(a, b) == "Sum is: 7"
}
}
All unit tests in either src/test/java
(written using JUnit or something else) or src/test/groovy
(written with Spock)
will automatically run whenever the test
or build
task is executed.
❯ ./gradlew build
> Task :test
SampleTest
Test adding 1 and 2 PASSED
Test adding 3 and 4 PASSED
SUCCESS: Executed 2 tests in 487ms
BUILD SUCCESSFUL in 1s
9 actionable tasks: 5 executed, 4 up-to-date
All Gradle testing tasks are automatically incremental and cacheable, and would be avoided if executed again without changes to the code in either the source or the spec. The same applies for the topic of functional testing below.
Functional testing describes what the system does,
and in my mind, this involves testing our deployed code in Snowflake.
Regardless of what we call it, we know we need this as a crucial component in our build chain.
This plugin contains a custom Spock Specification class called SnowflakeSpec
that can be used in a new test suite.
By default, this test suite is called functionalTest
, though the name can be configured using the testSuite
property.
Here is our configuration of the functionalTest
test suite.
The DSL provided by Gradle in JVM Testing is convoluted (in my opinion), but no one asked me:
functionalTest(JvmTestSuite) {
targets {
all {
useSpock('2.3-groovy-3.0')
dependencies {
implementation "io.github.stewartbryson:gradle-snowflake-plugin:@version@"
}
testTask.configure {
failFast true
// which credentials connection to use
systemProperty 'connection', project.snowflake.connection
}
}
}
}
I'll walk through a few of these points.
So that the SnowflakeSpec
is available in the test classpath, we have to declare the plugin as a dependency
to the test suite.
Notice that we use the library maven coordinates, which are different from the coordinates in the plugins
DSL.
Additionally, our test specs are unaware of all the configurations of our Gradle build, so we have to pass our connection
property as a Java system property to the SnowflakeSpec
class.
This is the SnowflakeSampleTest
spec in src/functionalTest/groovy
:
import groovy.util.logging.Slf4j
import io.github.stewartbryson.SnowflakeSpec
/**
* The SnowflakeSpec used for testing functions.
*/
@Slf4j
class SnowflakeSampleTest extends SnowflakeSpec {
def 'ADD_NUMBERS() function with 1 and 2'() {
when: "Two numbers exist"
def a = 1
def b = 2
then: 'Add two numbers using ADD_NUMBERS()'
selectFunction("add_numbers", [a,b]) == 'Sum is: 3'
}
def 'ADD_NUMBERS() function with 3 and 4'() {
when: "Two numbers exist"
def a = 3
def b = 4
then: 'Add two numbers using ADD_NUMBERS()'
selectFunction("add_numbers", [a,b]) == 'Sum is: 7'
}
}
The selectFunction
method is an easy way to execute a function and test the results by just passing the function name and a list of arguments to pass to that function.
And of course, this executes against Snowflake in real time.
❯ ./gradlew functionalTest
> Task :test
SampleTest
Test adding 1 and 2 PASSED
Test adding 3 and 4 PASSED
SUCCESS: Executed 2 tests in 481ms
> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@upload/libs/java-testing-0.1.0-all.jar')
> Task :functionalTest
SnowflakeSampleTest
Test ADD_NUMBERS() function with 1 and 2 PASSED (1.1s)
Test ADD_NUMBERS() function with 3 and 4 PASSED
SUCCESS: Executed 2 tests in 3.8s
BUILD SUCCESSFUL in 11s
11 actionable tasks: 7 executed, 4 up-to-date
Running functional tests using static Snowflake databases is boring, especially considering the zero-copy cloning functionality available. The plugin supports cloning an ephemeral database from the database we connect to and using it for testing our application. This workflow is useful for CI/CD processes and is configured with the plugin DSL. The plugin is aware when it is running in CI/CD environments and currently supports:
Because the plugin is aware when executing in CICD environments, we expose that information through the DSL, and can use it to control our cloning behavior.
Our build file change to enable ephemeral testing:
snowflake {
connection = 'gradle_plugin'
stage = 'upload'
useEphemeral = project.snowflake.isCI() // use ephemeral with CICD workflows
keepEphemeral = project.snowflake.isPR() // keep ephemeral for PRs
applications {
add_numbers {
inputs = ["a integer", "b integer"]
returns = "string"
runtime = '17'
handler = "Sample.addNum"
}
}
}
The useEphemeral
property will determine whether the createEphemeral
and dropEphemeral
tasks
are added at the beginning and end of the build, respectively.
This allows for the functionalTest
task to be run in the ephemeral clone just after our application is published.
We've also added a little extra magic to keep the clone when building a pull request.
The createEphemeral
task issues a CREATE DATABASE... IF NOT EXISTS
statement, so if will not fail if the clone exists from a prior run.
Remember that our SnowflakeSpec
class doesn't automatically know the details of our build, so we have to provide
the ephemeral name using java system properties. Here is our modified testing suite:
functionalTest(JvmTestSuite) {
targets {
all {
useSpock('2.3-groovy-3.0')
dependencies {
implementation "io.github.stewartbryson:gradle-snowflake-plugin:1.1.4"
}
testTask.configure {
failFast true
// which credentials connection to use
systemProperty 'connection', project.snowflake.connection
// if this is ephemeral, the test spec needs the name to connect to
if (project.snowflake.useEphemeral) {
systemProperty 'ephemeralName', snowflake.ephemeralName
}
}
}
}
}
We can simulate a GitHub Actions environment just by setting the GITHUB_ACTIONS
environment variable:
❯ export GITHUB_ACTIONS=true
❯ ./gradlew functionalTest
> Task :createEphemeral
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Ephemeral clone EPHEMERAL_JAVA_TESTING_PR_4 created.
> Task :snowflakeJvm
Reusing existing connection.
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@upload/libs/java-testing-0.1.0-all.jar')
> Task :functionalTest
SnowflakeSampleTest
Test ADD_NUMBERS() function with 1 and 2 PASSED
Test ADD_NUMBERS() function with 3 and 4 PASSED
SUCCESS: Executed 2 tests in 3s
BUILD SUCCESSFUL in 6s
13 actionable tasks: 4 executed, 9 up-to-date
When the CI/CD environment is detected, the plugin will name the ephemeral database clone based on the pull request number, the branch name, or the tag name instead of an autogenerated one. If we prefer to simply specify a clone name instead of relying on the plugin to generate it, that is supported as well:
ephemeralName = 'testing_db'
❯ ./gradlew functionalTest
> Task :createEphemeral
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Ephemeral clone testing_db created.
> Task :snowflakeJvm
Reusing existing connection.
File java-testing-0.1.0-all.jar: UPLOADED
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@upload/libs/java-testing-0.1.0-all.jar')
> Task :functionalTest
SnowflakeSampleTest
Test ADD_NUMBERS() function with 1 and 2 PASSED
Test ADD_NUMBERS() function with 3 and 4 PASSED
SUCCESS: Executed 2 tests in 3.3s
> Task :dropEphemeral
Reusing existing connection.
Ephemeral clone testing_db dropped.
BUILD SUCCESSFUL in 7s
13 actionable tasks: 4 executed, 9 up-to-date
This option is useful when you want your artifacts available to consumers other than just Snowflake without publishing them to disparate locations. Gradle has built-in support for S3 or GCS as a Maven repository, and Snowflake has support for S3 or GCS external stages, so we simply marry the two in a single location. Looking at the sample project, notice we've populated a few additional properties:
snowflake {
connection = 'gradle_plugin'
stage = 's3_maven'
groupId = 'io.github.stewartbryson'
artifactId = 'sample-udfs'
applications {
add_numbers {
inputs = ["a integer", "b integer"]
returns = "string"
runtime = '17'
handler = "Sample.addNum"
}
}
}
The groupId
and artifactId
, plus the built-in version
property that exists for all Gradle builds, provide
the Maven coordinates for publishing externally to S3 or GCS.
I've also created a property in my local gradle.properties
file for the bucket:
# local file in ~/.gradle/gradle.properties
snowflake.publishUrl='s3://myrepo/release'
The plugin doesn't create the stage, but it does do a check to ensure that the Snowflake stage metadata matches the
value in publishUrl
. We get a few new tasks added to our project:
❯ ./gradlew tasks --group publishing
> Task :tasks
------------------------------------------------------------
Tasks runnable from root project 'java-external-stage'
------------------------------------------------------------
Publishing tasks
----------------
generateMetadataFileForSnowflakePublication - Generates the Gradle metadata file for publication 'snowflake'.
generatePomFileForSnowflakePublication - Generates the Maven POM file for publication 'snowflake'.
publish - Publishes all publications produced by this project.
publishAllPublicationsToS3_mavenRepository - Publishes all Maven publications produced by this project to the s3_maven repository.
publishSnowflakePublicationToMavenLocal - Publishes Maven publication 'snowflake' to the local Maven repository.
publishSnowflakePublicationToS3_mavenRepository - Publishes Maven publication 'snowflake' to Maven repository 's3_maven'.
publishToMavenLocal - Publishes all Maven publications produced by this project to the local Maven cache.
snowflakeJvm - A Cacheable Gradle task for publishing UDFs and procedures to Snowflake
snowflakePublish - Lifecycle task for all Snowflake publication tasks.
To see all tasks and more detail, run gradlew tasks --all
To see more detail about a task, run gradlew help --task <task>
BUILD SUCCESSFUL in 875ms
5 actionable tasks: 1 executed, 4 up-to-date
These are granular tasks for building metadata and POM files and publishing that along with the artifacts to S3.
But the snowflakeJvm
task initiates all these dependent tasks,
including publishSnowflakePublicationToS3_mavenRepository
which uploads the artifact but unfortunately doesn't provide
console output to that effect:
❯ ./gradlew snowflakeJvm --no-build-cache --console plain
> Task :gradle-snowflake:gradle-snowflake-plugin:compileJava NO-SOURCE
> Task :gradle-snowflake:gradle-snowflake-plugin:compileGroovy UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:pluginDescriptors UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:processResources UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:classes UP-TO-DATE
> Task :gradle-snowflake:gradle-snowflake-plugin:jar UP-TO-DATE
> Task :compileJava
> Task :processResources NO-SOURCE
> Task :classes
> Task :shadowJar
> Task :compileTestJava NO-SOURCE
> Task :processTestResources NO-SOURCE
> Task :testClasses UP-TO-DATE
> Task :test NO-SOURCE
> Task :generatePomFileForSnowflakePublication
> Task :publishSnowflakePublicationToS3_mavenRepository
> Task :snowflakeJvm
Using credentials config file: /Users/stewartbryson/.snowflake/config.toml
Deploying ==>
CREATE OR REPLACE function add_numbers (a integer, b integer)
returns string
language JAVA
handler = 'Sample.addNum'
runtime_version = '17'
imports = ('@s3_maven/io/github/stewartbryson/sample-udfs/0.1.0/sample-udfs-0.1.0-all.jar')
BUILD SUCCESSFUL in 4s
9 actionable tasks: 5 executed, 4 up-to-date
For organizations that already use maven-publish
extensively, or have customizations outside the scope of
autoconfiguration, the plugin supports disabling autoconfiguration:
useCustomMaven = true
We then configure publications
and repositories
as described in
the maven-publish
documentation, and
add task dependencies for the snowflakeJvm
task.
The publishUrl
property is no longer required because it's configured in the publications
closure, but if provided,
the plugin will ensure it matches the metadata for the stage
property.
groupId
and artifactId
are still required so that snowflakeJvm
can autogenerate the imports
section of
the CREATE OR REPLACE...
statement.
Anyone can contribute! You don't need permission, my blessing, or expertise, clearly.
To make changes to the README.md
file, please make them in the master README file instead.
The version tokens in this file are automatically replaced with the current value before publishing.
Three different testing tasks are defined:
❯ ./gradlew tasks --group verification
> Task :tasks
------------------------------------------------------------
Tasks runnable from root project 'gradle-snowflake'
------------------------------------------------------------
Verification tasks
------------------
check - Runs all checks.
functionalTest - Runs the functional test suite.
integrationTest - Runs the integration test suite.
test - Runs the test suite.
To see all tasks and more detail, run gradlew tasks --all
To see more detail about a task, run gradlew help --task <task>
BUILD SUCCESSFUL in 1s
1 actionable task: 1 executed
The functionalTest
task contains all the tests that actually make a connection to Snowflake and test a deployment,
except those involved with external stages.
You need to add a connection in ~/.snowflake/config.toml
called gradle_plugin
.
WARNING: Ensure that the credentials you provide in
gradle_plugin
are safe for development purposes.
The integrationTest
task requires the following external stages to exist in your Snowflake account:
gcs_maven
: An external stage in GCS.s3_maven
: An external stage in S3.
It also requires the following gradle properties to be set, with the easiest method being placing them in ~/.gradle/gradle.properties
:
gcsPublishUrl
: the GCS url ofgcs_maven
.s3PublishUrl
: the S3 url ofs3_maven
.
It is understandable if you are unable to test external stages as part of your contribution. We segmented them out for this reason.
Open a pull request against the develop
branch so that it can be merged and possibly tweaked before
we open the PR against the main
branch.
This will also enable us to test external stages.