A database client should be based on asyncio #141

elrik75 · 2023-03-04T19:02:32Z

Major DBs have their asyncio client nowdays:

postgresql: asyncpg
redis: aioredis
kafka: aiokafka
etc

It's logical for any I/O access for python to be async.
Note that there is an unmaintained async client for clickhouse: https://github.com/maximdanilchenko/aiochclient
Having an official async client for clickhouse with cython and full-featured will be great !

genzgd · 2023-03-04T23:03:02Z

Agreed, this should have had its own enhancement request/issue before now.

pkit · 2023-04-20T15:11:27Z

@elrik75 No, it shouldn't yet.
asyncio is so brain-dead right now that it would need a couple of years and a lot of changes to become usable.

vsbaldeev · 2023-07-14T09:48:23Z

Hello everyone!
As I understand official aio client for python is not developed yet.
So, which programming language and library should I use to make asynchronous requests to clickhouse ?

genzgd · 2023-07-14T10:15:34Z

The await/async keywords are part of the Python standard asyncio library and work fine with clickhouse-connect. The main caveat is that ClickHouse doesn't allow concurrent queries in the same session, so you should disable the autogenerate_session_id common setting before calling get_client. See the explanation here: https://clickhouse.com/docs/en/integrations/python#managing-clickhouse-session-ids

genzgd · 2023-07-14T10:24:53Z

To clarify, I see this issue as an enhancement request to either create a clean async wrapper around the library that feels more "async native" or even better, use a true async HTTP client. The urllib3 library doesn't use any async io so in that sense clickhouse-connect doesn't really do things in a real event driven way. But you can still use clickhouse-connect with async code.

araa47 · 2023-08-21T11:34:03Z

+1 , feel like this is a required feature for a modern python library

chrisprobst · 2024-04-11T08:36:37Z

@elrik75 No, it shouldn't yet. asyncio is so brain-dead right now that it would need a couple of years and a lot of changes to become usable.

Mind elaborating what you mean with brain-dead? It's very usable and used everywhere. What aspect of it is brain-dead and better than thread-blocking urllib3 (which means much more brain-dead to me)?

zerafachris · 2024-05-14T06:38:33Z

Hi @elrik75 @araa47 @genzgd @vsbaldeev ,
there seems to be a package which wraps on the old clickhouse driver here https://github.com/long2ice/asynch .
Maybe this could help?

pkit · 2024-05-16T05:48:33Z

Mind elaborating what you mean with brain-dead? It's very usable and used everywhere. What aspect of it is brain-dead and better than thread-blocking urllib3 (which means much more brain-dead to me)?

See here: https://stackoverflow.com/questions/46827007/runtimeerror-this-event-loop-is-already-running-in-python

Essentially brain-dead is the "all or nothing" implementation. No project can be converted to async ever without a complete rewrite of that project and all of the libraries it uses. And the libraries that use these libraries.
No nested tasks are allowed, no gradual I/O conversion, full rewrite only.

elrik75 · 2024-05-16T06:12:45Z

Essentially brain-dead is the "all or nothing" implementation.

That's correct. The point is more: what kind of DB client should CH support then? sync or async? What kind makes more sense? My point of view (because I work on backend servers) is that a sync IO lib is not very helpful.

genzgd · 2024-05-18T12:49:48Z

@alexted We in the process of defining the roadmap for clickhouse-connect and other integrations, but for the moment this work is not planned for the immediate future. The current thinking is that true async support would involve swapping out to the httpx HTTP client with integrated async support (instead of the venerable urllib3), but that is of course a fairly major change.

To be clear, this is definitely work we want to do, but it may take a while to come to the top of the priority list given all of the work on other projects. As always, community contributions to help us get moving in the right direction are always appreciated. :)

samuelcolvin · 2024-06-13T00:28:26Z

In case anyone is struggling with this, if the postgres connection port is open, you can connect to clickhouse with psycopg3:

    conn = await psycopg.AsyncConnection.connect(
        dbname='default',
        user='default',
        password='...',
        host='localhost',
        port='9005',
        cursor_factory=psycopg.AsyncClientCursor,
        autocommit=True,
    )

You need cursor_factory=psycopg.AsyncClientCursor, autocommit=True to get psycopg to use the simple postgres protocol since clickhouse doesn't support the extended protocol, for the same reason you can't use asyncpg.

All data types seem to be returned as strings, but it's better than nothing (and pydantic does a pretty good job of coercing data to an expected type)

genzgd · 2024-06-13T17:38:13Z

I've added an example of how to run clickhouse-connect queries asynchronously (it includes a semaphore just for fun as well). Based on very limited testing, this solution still significantly outperforms other ClickHouse "async" libraries. You're still running on a single Python thread at a time, but the HTTP requests are obviously very much I/O bound and the Cython transform code is extremely fast, so the GIL isn't as big a problem as you might think.

If you do experiment with similar solutions, please report your results (good or bad) here.

slvrtrn · 2024-07-09T11:32:40Z

Since 0.7.16, ClickHouse-Connect provides a convenience AsyncClient wrapper over the standard Client, so that it is no longer required to write your own. See this new entry in the docs. The async usage example was also updated.

samuelcolvin · 2024-07-10T10:02:42Z

Since 0.7.16, ClickHouse-Connect provides a convenience AsyncClient wrapper over the standard Client, so that it is no longer required to write your own. See this new entry in the docs. The async usage example was also updated.

This is very unfortunate. As I think I said to @tbragin, pretending to have any async client by just running the sync code in a thread pool is an own goal for ClickHouse. It'll often be slower than just using the sync client. You'd be better off either recommending loop.run_in_executor or the HTTP API and httpx.

clickhouse-connect/clickhouse_connect/driver/asyncclient.py

Lines 79 to 88 in ee3af79

    
           def _query(): 
        
               return self.client.query(query=query, parameters=parameters, settings=settings, query_formats=query_formats, 
        
                                        column_formats=column_formats, encoding=encoding, use_none=use_none, 
        
                                        column_oriented=column_oriented, use_numpy=use_numpy, max_str_len=max_str_len, 
        
                                        context=context, query_tz=query_tz, column_tzs=column_tzs, 
        
                                        external_data=external_data) 
        
           loop = asyncio.get_running_loop() 
        
           result = await loop.run_in_executor(None, _query) 
        
           return result

To actually solve this properly, you need to either:

implement a real async client from the ground up
or (preferable) support Arrow Flight SQL in ClickHouse, and then rely on a real high quality Arrow Flight SQL client library in Python, there's already a Rust client for Arrow Flight SQL, so building a Python one should be pretty easy

mshustov · 2024-07-10T11:41:34Z

@samuelcolvin, As you can see, the issue is not closed. We will support asyncio natively, however the implementation will take time. AsyncClient wrapper is created to close the gap already today

genzgd · 2024-07-10T17:40:05Z

@samuelcolvin Putting aside the Arrow Flight question for the moment, I'm not sure what problem you are trying to "solve". Queries run using the wrapper yield the asyncio event loop when waiting for the HTTP I/O. So your main thread will spend its time in the CPU bound parts of the query (generally, transforming data). The fact that urllib3 itself doesn't have an async API doesn't change how the asyncio event loop behaves while waiting for network data.

I can assure you that using clickhouse-connect with the async wrapper will outperform any code using a client based on httpx and a text format like JSON or CSV. So if you are getting an async API that yields the main event loop thread while waiting for I/O, and has high performance, what exactly is missing?

elrik75 added the enhancement New feature or request label Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A database client should be based on asyncio #141

A database client should be based on asyncio #141

elrik75 commented Mar 4, 2023

genzgd commented Mar 4, 2023

pkit commented Apr 20, 2023

vsbaldeev commented Jul 14, 2023

genzgd commented Jul 14, 2023

genzgd commented Jul 14, 2023

araa47 commented Aug 21, 2023

chrisprobst commented Apr 11, 2024

zerafachris commented May 14, 2024

pkit commented May 16, 2024

elrik75 commented May 16, 2024

genzgd commented May 18, 2024

samuelcolvin commented Jun 13, 2024 •

edited

Loading

genzgd commented Jun 13, 2024

slvrtrn commented Jul 9, 2024 •

edited

Loading

samuelcolvin commented Jul 10, 2024 •

edited

Loading

mshustov commented Jul 10, 2024 •

edited

Loading

genzgd commented Jul 10, 2024

A database client should be based on asyncio #141

A database client should be based on asyncio #141

Comments

elrik75 commented Mar 4, 2023

genzgd commented Mar 4, 2023

pkit commented Apr 20, 2023

vsbaldeev commented Jul 14, 2023

genzgd commented Jul 14, 2023

genzgd commented Jul 14, 2023

araa47 commented Aug 21, 2023

chrisprobst commented Apr 11, 2024

zerafachris commented May 14, 2024

pkit commented May 16, 2024

elrik75 commented May 16, 2024

genzgd commented May 18, 2024

samuelcolvin commented Jun 13, 2024 • edited Loading

genzgd commented Jun 13, 2024

slvrtrn commented Jul 9, 2024 • edited Loading

samuelcolvin commented Jul 10, 2024 • edited Loading

mshustov commented Jul 10, 2024 • edited Loading

genzgd commented Jul 10, 2024

samuelcolvin commented Jun 13, 2024 •

edited

Loading

slvrtrn commented Jul 9, 2024 •

edited

Loading

samuelcolvin commented Jul 10, 2024 •

edited

Loading

mshustov commented Jul 10, 2024 •

edited

Loading