Feature Request: Implement Progress Tracking for Long-Running Queries #285

xujryan · 2024-01-02T05:57:20Z

Description:

I would like to suggest the implementation of a progress tracking mechanism for long-running queries, such as insert from S3. This feature could be incredibly beneficial in monitoring the execution of these extensive operations.

Motivation:

In many cases, queries in ClickHouse can take a substantial amount of time to execute, ranging from several minutes to hours or even days. During such long-running operations, users currently do not have a way to monitor the progress of these queries. Implementing a progress tracking feature, akin to what the ClickHouse CLI client offers, would be extremely beneficial. This would not only improve the user experience by providing real-time updates on query execution but also help in diagnosing and troubleshooting any issues that might arise during the execution of these lengthy queries.

The text was updated successfully, but these errors were encountered:

genzgd · 2024-01-02T12:15:33Z

Unfortunately there's no way to do this currently using existing Python http libraries and ClickHouse's HTTP 1.1 interface. While intermediate progress headers are returned by ClickHouse, neither the requests or httpx library actually read those headers. So it will take a fair amount of work on either the Python or the ClickHouse side (in the form of HTTP2 support possibly) to implement this feature.

pkit · 2024-11-15T18:23:39Z

They do read it. They just ignore duplicate headers in the http.client stdlib and don't give you access to parse_headers function.
The reason is, as usual, they think python users are dumb.

genzgd · 2024-11-15T18:25:55Z

@pkit -- Yes, I realized that recently when digging into the code. It is truly irritating that there's no hooks or other means to actually capture these in real time.

pkit · 2024-11-15T18:30:02Z

Fortunately aiohttp authors are much smarter. They do have set_parser call to do what's needed.
Unfortunately, converting stuff to async is tedious in python...

pkit · 2024-11-15T18:32:10Z

BTW, authors of http.client stdlib made parse_headers not a method, but аn internal function just to make it much more difficult to replace it. That's the only non-method in the whole class interface...

genzgd added the enhancement New feature or request label Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Implement Progress Tracking for Long-Running Queries #285

Feature Request: Implement Progress Tracking for Long-Running Queries #285

xujryan commented Jan 2, 2024

genzgd commented Jan 2, 2024

pkit commented Nov 15, 2024

genzgd commented Nov 15, 2024

pkit commented Nov 15, 2024

pkit commented Nov 15, 2024 •

edited

Loading

Feature Request: Implement Progress Tracking for Long-Running Queries #285

Feature Request: Implement Progress Tracking for Long-Running Queries #285

Comments

xujryan commented Jan 2, 2024

Description:

Motivation:

genzgd commented Jan 2, 2024

pkit commented Nov 15, 2024

genzgd commented Nov 15, 2024

pkit commented Nov 15, 2024

pkit commented Nov 15, 2024 • edited Loading

pkit commented Nov 15, 2024 •

edited

Loading