support for pyspark connection method #308

cccs-jc · 2022-03-30T11:28:50Z

resolves #

Description

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-spark next" section.

cla-bot · 2022-03-30T11:28:52Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: cccs-jc.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email [email protected]
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

JCZuurmond

Hi @cccs-jc : I have added some comments. Could you merge your changes with the existing Spark session module?

JCZuurmond · 2022-03-30T16:31:51Z

dbt/adapters/spark/connections.py

+try:
+    from pyspark.rdd import _load_from_socket
+    import pyspark.sql.functions as F
+    from pyspark.sql import SparkSession


Functions and Sparksessions are not used in this file

JCZuurmond · 2022-03-30T16:34:52Z

dbt/adapters/spark/pysparkcon.py

+
+    def __init__(self, python_module):
+        self.result = None
+        if python_module:


I prefer to avoid such a hook, it's very specific. The python_module is a unexpected parameter for PysparkConnectionWrapper, it's unclear why it is needed and how it works.

We could add docs about this, still it is confusing to write PysparkConnectionWrapper(python_module)

I can change it, what do you propose ?

JCZuurmond · 2022-03-30T16:35:38Z

dbt/adapters/spark/pysparkcon.py

+            self.result = self.spark.sql(sql)
+            logger.debug("Executed with no errors")
+            if "show tables" in sql:
+                self.result = self.result.withColumn("description", F.lit(""))


why add the description column?

this is an iceberg specific issue. When using iceberg it's missing the column. I'll remove this from the PR

github-actions · 2022-09-28T02:12:34Z

This PR has been marked as Stale because it has been open for 180 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR, or it will be closed in 7 days.

support for pyspark connection method

7eafc23

dbt-labs#305

cccs-jc mentioned this pull request Mar 30, 2022

[CT-431] Support for Pyspark driver #305

Closed

JCZuurmond reviewed Mar 30, 2022

View reviewed changes

github-actions bot added the Stale label Sep 28, 2022

github-actions bot closed this Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for pyspark connection method #308

support for pyspark connection method #308

cccs-jc commented Mar 30, 2022

cla-bot bot commented Mar 30, 2022

JCZuurmond left a comment

JCZuurmond Mar 30, 2022

JCZuurmond Mar 30, 2022

cccs-jc Mar 31, 2022

JCZuurmond Mar 30, 2022

cccs-jc Mar 31, 2022

github-actions bot commented Sep 28, 2022

support for pyspark connection method #308

support for pyspark connection method #308

Conversation

cccs-jc commented Mar 30, 2022

Description

Checklist

cla-bot bot commented Mar 30, 2022

JCZuurmond left a comment

Choose a reason for hiding this comment

JCZuurmond Mar 30, 2022

Choose a reason for hiding this comment

JCZuurmond Mar 30, 2022

Choose a reason for hiding this comment

cccs-jc Mar 31, 2022

Choose a reason for hiding this comment

JCZuurmond Mar 30, 2022

Choose a reason for hiding this comment

cccs-jc Mar 31, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 28, 2022