Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the Snowpark Scala APIs #56

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-mrojas
Copy link
Collaborator

@sfc-gh-mrojas sfc-gh-mrojas commented Oct 10, 2023

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #802269

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Replace this with a short description of how your code change solves the related issue.

Pre-review checklist

(For Snowflake employees)

  • [ x] This change has passed precommit
  • I have reviewed code coverage report for my PR in (Sonarqube)

This PR will address these APIs:

Adding the following methods:

API Description
DataFrame.selectExpr Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
DataFrame.filter Filters rows using the given SQL expression.
DataFrame.dropDuplicates Returns a new DataFrame with duplicate rows removed, considering only the subset of columns.
DataFrame.transform Chaining custom transformations.
DataFrame.head Returns the first row / Returns the first n rows.
DataFrame.take Returns the first n rows
DataFrame.cache Alias for cacheResult
DataFrame.orderBy Alias for Sort
DataFrame.orderBy Alias for Sort
DataFrame.printSchema Shortcut for schema.printTreeString
DataFrame.toJSON Returns data as JSON
DataFrame.collectAsList Collects results as a list of Row
DataFrame.withColumnRenamed Returns a new dataframe with the renamed columns
Session.getOrCreate Gets the active session or creates new one
Column.isin Overload that accepts an array of strings
Column.isNotNull alias for is_not_null
Column.isNull alias for is_null
Column.startsWith
Column.contains
Column.regexp_replace
Column.as overload for Symbol
Column.isNaN
Column.substr overload for column arguments
Column.substr overload for int arguments
Column.notEqual overload for not_equal
Column.like overload for string argument
Column.rlike alias for regexp
Column.bitwiseAND alias for bitand
Column.bitwiseOR alias for bitor
Column.bitwiseXOR alias for bitxor
Column.getItem calls builtin get
Column.getField class builtin get
Column.cast with string expression
Column.eqNullSafe alias for equal_null
CaseExpr.when support for when with
* int
* String
* Float
* double
* booelan
CaseExpr.otherwise support for otherwise with:
* int
* String
* float
* double
* boolean
CaseExpr.else support for otherwise with:
* int
* String
* float
* double
* boolean
functions.expr alias for sqlExpr
functions.desc equivalent to Column.desc
functions.asc equivalent to Columns.asc
functions.size equivalent to array_size
functions.arrray alias for array_construct
functions.date_format alias for to_varchar
functions.last LAST_VALUE
functions.format_string FORMAT_STRING
functions.locate POSITION
function.log10 call builtin LOG
functions.log1p ln c + 1
functions.nanvl check if NaN
functions.base64 base64_ENCODE
functions.unbase64 BASE64_DECODE_STRING
functions.ntile NTILE
functions.shiftleft alias for bitshiftleft
functions.shiftright alias for bitshiftright
functions.hex HEX_ENCODE
functions.unhex HEX_DECODE_STRING
functions.randn RANDOM
functions.json_tuple JSON_EXTRACT_PATH_TEXT
functions.cbrt CBRT
functions.from_json TRY_PARSE_JSO
functions.date_sub implements equivalent to spark
functions.regexp_extract implements equivalent to regexp_extract
functions.signum SIGN
functions.substring_index implements equivalent to spark
functions.collect_list alias for array_agg
functions.reverse REVERSE
functions.isnull alias for is_null
functions.conv CONV
functions.unix_timestamp datetime to epoch
functions.regexp_replace REGEXP_REPLACE
functions.date_add adds days
functions.collect_set ARRAY_AGG(DISTINCT)
functions.from_unixtime epoch to datetime
functions.monotonically_increasing_id alias for seq8
functions.months_between MONTHS_BETWEEN
functions.instr REGEXP_INSTR
functions.from_utc_timestamp TO_TIMESTAMP_T
functions.format_number TO_VARCHAR
functions.log2 LOG
functions.element_at alias for get_path

src/CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@sfc-gh-bli sfc-gh-bli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When adding new functions please update Scala and Java API together
  2. Please add tests for any new codes.

@orellabac
Copy link

  1. When adding new functions please update Scala and Java API together
  2. Please add tests for any new codes.

@sfc-gh-bli are there any example of the recommended practice on how we can create and run new tests? are there any contribution steps documented that we can follow ?

@sfc-gh-bli
Copy link
Collaborator

  1. When adding new functions please update Scala and Java API together
  2. Please add tests for any new codes.

@sfc-gh-bli are there any example of the recommended practice on how we can create and run new tests? are there any contribution steps documented that we can follow ?
create a profile.properties file in the root directory, and fill all login info. here is an example of profile file.
https://github.com/snowflakedb/snowpark-java-scala/blob/main/profile.properties.example
Then you can run any existing test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants