You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @johannestang thanks for the quick start repo you have made!
i am also thinking of setting up a big data stack for a docker cluster and possibly doing so using helm charts for k8s.
Before i start testing with your stack i would to ask a couple of questions to understand the direction to take when making possible improvements:
first off, what do you think should be improved first in this project? do you have a desired roadmap ahead?
secondly, if i understand correctly, the HIVE service is actually only needed as a requirement to use Presto to run SQL right? Is it possible to get rid of Hive if we only want to use Presto / Impala or is it not possible currently?
thirdly, in your blog post you state that "There are of course many other interesting big data SQL engines, e.g. Impala, Spark SQL, and Drill. For background on these (and more) have a look at this great post." Does it mean that if one wants to use Spark (and thus Spark SQL) one can use Hive and remove Presto from the stack, or would you still recommend connecting SparkSQL to Presto to run queries?
thanks for the clarification.
P.S. have you published any newer Blog Post since 2019 ? Let me know
The text was updated successfully, but these errors were encountered:
Thanks for reaching out. I'm afraid it's been very long since I've worked on the components in this repo. Since putting together this stack I have also started using k8s, so that's probably the way I would go today. However, I'm not quite up to date with what's happened with the different projects in the stack, so there might be things that could/should be done differently today.
Since the blog post (and no, I never got around to writing part 2), I worked on adding Kafka to the stack using Kafka Connect to persist the data streams to Minio/S3 and then being able to query historical data from S3 or live data from Kafka using Presto. I, however, hit a roadblock with a incompatibility between Confluent's schema registry and Hive (which might have been fixed since). So while most of it worked it never reached a state I was willing to make publicly available. But to answer your question, the next thing I would add is Kafka.
I don't know - I at least didn't find a way of doing it. Things might be different today.
Yes. If you were to use Spark then Presto could be removed.
Hi @johannestang thanks for the quick start repo you have made!
i am also thinking of setting up a big data stack for a docker cluster and possibly doing so using helm charts for k8s.
Before i start testing with your stack i would to ask a couple of questions to understand the direction to take when making possible improvements:
thanks for the clarification.
P.S. have you published any newer Blog Post since 2019 ? Let me know
The text was updated successfully, but these errors were encountered: