Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Implement Progress Tracking for Long-Running Queries #285

Open
xujryan opened this issue Jan 2, 2024 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@xujryan
Copy link

xujryan commented Jan 2, 2024

Description:

I would like to suggest the implementation of a progress tracking mechanism for long-running queries, such as insert from S3. This feature could be incredibly beneficial in monitoring the execution of these extensive operations.

Motivation:

In many cases, queries in ClickHouse can take a substantial amount of time to execute, ranging from several minutes to hours or even days. During such long-running operations, users currently do not have a way to monitor the progress of these queries. Implementing a progress tracking feature, akin to what the ClickHouse CLI client offers, would be extremely beneficial. This would not only improve the user experience by providing real-time updates on query execution but also help in diagnosing and troubleshooting any issues that might arise during the execution of these lengthy queries.

@genzgd
Copy link
Collaborator

genzgd commented Jan 2, 2024

Unfortunately there's no way to do this currently using existing Python http libraries and ClickHouse's HTTP 1.1 interface. While intermediate progress headers are returned by ClickHouse, neither the requests or httpx library actually read those headers. So it will take a fair amount of work on either the Python or the ClickHouse side (in the form of HTTP2 support possibly) to implement this feature.

@genzgd genzgd added the enhancement New feature or request label Oct 7, 2024
@pkit
Copy link

pkit commented Nov 15, 2024

They do read it. They just ignore duplicate headers in the http.client stdlib and don't give you access to parse_headers function.
The reason is, as usual, they think python users are dumb.

@genzgd
Copy link
Collaborator

genzgd commented Nov 15, 2024

@pkit -- Yes, I realized that recently when digging into the code. It is truly irritating that there's no hooks or other means to actually capture these in real time.

@pkit
Copy link

pkit commented Nov 15, 2024

Fortunately aiohttp authors are much smarter. They do have set_parser call to do what's needed.
Unfortunately, converting stuff to async is tedious in python...

@pkit
Copy link

pkit commented Nov 15, 2024

BTW, authors of http.client stdlib made parse_headers not a method, but аn internal function just to make it much more difficult to replace it. That's the only non-method in the whole class interface...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants