-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for pyspark connection method #308
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: cccs-jc.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cccs-jc : I have added some comments. Could you merge your changes with the existing Spark session module?
try: | ||
from pyspark.rdd import _load_from_socket | ||
import pyspark.sql.functions as F | ||
from pyspark.sql import SparkSession |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions and Sparksessions are not used in this file
|
||
def __init__(self, python_module): | ||
self.result = None | ||
if python_module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to avoid such a hook, it's very specific. The python_module
is a unexpected parameter for PysparkConnectionWrapper
, it's unclear why it is needed and how it works.
We could add docs about this, still it is confusing to write PysparkConnectionWrapper(python_module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it, what do you propose ?
self.result = self.spark.sql(sql) | ||
logger.debug("Executed with no errors") | ||
if "show tables" in sql: | ||
self.result = self.result.withColumn("description", F.lit("")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add the description column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an iceberg specific issue. When using iceberg it's missing the column. I'll remove this from the PR
This PR has been marked as Stale because it has been open for 180 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR, or it will be closed in 7 days. |
#305
resolves #
Description
Checklist
CHANGELOG.md
and added information about my change to the "dbt-spark next" section.