-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1561] [Feature] Support defining server_side_parameters
inside of dbt_project.yml
#529
Comments
@jtcohen6 @dbeatty10 can you guys and also can core team have a look at this feature suggestion? This is very common use case and useful feature required. |
Hi @akashrn5, I have a feeling this is the same problem for
Would it make sense to adopt a similar approach, adding a |
Hi @Fleid , Thank you responding to the issue, i was waiting for it from long time. But, i have also mentioned the issue, like if two projects refer to same profile and both can have different property or config requirements, which is a valid use case, in this case it fails to satisfy. So it would be better if we have at project level which is at dbt_project.yml file to satisfy all the condition. What do you think? Also i would love to contribute this feature to community and i have already made changes in my custom application with dbt cli which works. If approved i would love to contribute the code to the community. Hoping to get the clarity soon from the community. Thank you :) |
In #591, we're discussing adding a new parameter to the profile, of type string, so it can be overloaded via an environment variable at each invocation. But as you mention, we could also move that field in the project file (and use variables if need be). The thing is that we're currently looking at that issue in a pragmatic way, where can it fit easily, rather than where it should be in the first place. |
@Fleid Sorry for the delay in reply. So there exists model level, profile level and project level config. Priority can be assigned as model level >> project level >> profile level. This would make more sense and it would be better to fix from a long term perspective. |
Hey @akashrn5, could you please check this discussion, I'm trying to regroup all the threads on that topic in one place. |
Hi @Fleid , yeah sure, this works. We can discuss and take a decision. |
server_side_parameters
inside of dbt_project.yml
@JCZuurmond I just changed the title. We can continue to iterate as-needed -- just reach out if you think we should change it further. |
Hi @JCZuurmond @dbeatty10 Sorry for the delay in response. @dbeatty10 thanks for changing the title. Can i take this up feature of bringing it to the project level and making the profile one as priority? I wanted work on making it work with session type connection. But i see @JCZuurmond has already done the changes and PR is raised. |
@akashrn5 it would be great if you implemented this! An open question I have: where in the If I understand the Furthermore, I have only seen the fields with a This implies that we have to put the I have some doubts on defining the I would prefer to define the |
@JCZuurmond thanks for your comments and doubts. Here is my take on this, would love your feedback on this, also @Fleid @dbeatty10 . You are right about the So i would suggest a new key like below with
This way we can make it bit generalised. Please give your feedback on this. |
Something like that makes sense to me, maybe with a subfield
Is that something that Note on the dash |
What do you think @dataders? |
hey @akashrn5 -- really appreciate both how clearly you've spelled out the use case as well as your diligence in following up and providing more context. What you're asking for is something we as the Core maintainers would like to happen; namely that all of these configurations can be set at a model-level. But that requires significant investment both within dbt-core as well as on the various database drivers upon which dbt adapters depend. This might be possible within just dbt-spark, but I'm not sure how that might happen. If you'll allow me to paraphrase your ask as the below, then perhaps YAML anchors might serve as a workaround?
Below is an example where I define an anchor jaffle_shop:
target: dev
outputs:
dev:
type: spark
method: odbc
driver: path/to/driver
schema: my_schema
host: myorg.sparkhost.com
token: abc123
cluster: '123'
server_side_parameters: &server_params
spark.driver.memory: 4g
spark.executor.memoryOverhead: 512m
prd:
type: spark
method: odbc
driver: path/to/driver
schema: my_schema
host: myorg_PROD.sparkhost.com
token: abc456
cluster: '456'
server_side_parameters: *server_params For now, I'm going to close this issue, but please feel free to respond and let me know what you think of my proposed workaround. |
Is this your first time submitting a feature request?
Describe the feature
Background
When the connection type is session for spark type and lets say user wants to pass specific application level properties, then only option user have is to change the properties in SPARK_HOME path, in spark-defaults.conf file. User has to have the option to pass the parameters so that these will be considered when the sparkSession is initialized.
Describe alternatives you've considered
Currently Supported or a way to achieve this:
If user needs to provide a dynamically configurable properties of spark, it can be passed using pre_hook tag in model configurations. But lets say there are multiple dbt projects and user wants to pass some application specific memory configurations for each of the dbt projects, in that case, the only possible way is user has to change the config in spark-defaults.conf which is not a feasible way.
Who will this benefit?
Any spark users can take benefit from this feature. If there any use cases where many application are submitted with different configs or memory requirements can take benefit from this, where they dont need to change the cluster or warehouse config directly. This solution can be discussed and made more generalized for any adapter.
Are you interested in contributing this feature?
Yes. I would love to work on this feature and contribute back to community.
Anything else?
What can be done
Currently server_side_paramaters we can give in profile, according to this #201. So code changes can be done to access the info from this and set when starting the spark session in sessions.py. But lets say there are multiple dbt projects referring to the same profile, but the application properties or memory requirements are different, in that case again user has to intervene to update the profiles.yml file which is not expected.
Proposal
Planning to introduce a feature to support the above use case. Please evaluate this and lets discuss on this and im planning to introduce a new generalized config file to support adding custom properties which can be used specific to dbt_projects. In this way above use cases will be satisfied.
Im planning to contribute this feature to opensource, Need help from the community to evaluate and finalize a solution for the same.
The text was updated successfully, but these errors were encountered: