-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using hive for metastore #11
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11 +/- ##
==========================================
+ Coverage 71.42% 78.57% +7.14%
==========================================
Files 1 1
Lines 14 28 +14
==========================================
+ Hits 10 22 +12
- Misses 4 6 +2 ☔ View full report in Codecov by Sentry. |
src/spark/utils.py
Outdated
for key, value in delta_conf.items(): | ||
spark_conf.set(key, value) | ||
|
||
return SparkSession.builder.config(conf=spark_conf).enableHiveSupport().getOrCreate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know anything about hive, really - why is it needed here? Doesn't DeltaLake take care of the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this approach, we can establish a permanent view that everyone can query without rebuilding/reloading the table. I removed enableHiveSupport()
from general spark session builder and only enable this via conf for delta lake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the delta lake SW doesn't maintain the table metadata anywhere other than local memory?
Where are the metadata persisted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, in memory by default. The metadata is persisted in the metastore_db folder which is created by default under /cdm_shared_folder and mounted by the rancher settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So anyone wanting to use the shared tables needs the same volume mount for their notebook container? What happens if two people try to create conflicting tables with the same name at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the lock mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a .lck file. Just delete that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that seems like that should work regardless of docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check and see if there's any way for the tables to be stored in minio so other systems that can talk deltalake can access them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. I will look into that.
No description provided.