CSV bulk insert performance #1206

jamesmstone · 2018-11-03T06:06:34Z

Environment

PostgreSQL version: Docker f0b44c71fee9 postgres:9.6
PostgREST version: Docker 3153f5b6d430 postgrest/postgrest
Operating system: Linux

Description of issue

Not an issue per se, however, I am getting poor performance when uploading a relatively small CSV file
I am essentially uploading time series data.

Below is the relevant excert of the db schema.

create table consumption_shape
(
	id serial not null
		constraint consumption_shape_pkey
			primary key,
	name text,
	filename text,
	owner_id integer default request.user_id()
		constraint consumption_shape_owner_id_fkey
			references user
)
;



create table shape_value
(
	id serial not null
		constraint shape_value_pkey
			primary key,
	time_ending timestamp not null,
	mwh double precision not null,
	consumption_shape integer not null
		constraint shape_value_consumption_shape_fkey
			references consumption_shape
)
;



CREATE VIEW shape_values AS SELECT shape_value.id,
    shape_value.time_ending,
    shape_value.mwh,
    shape_value.consumption_shape
   FROM data.shape_value;
;

The key table is the shape_value table.

Test

$ time curl 'http://172.17.0.1:3000/shape_values' -H 'Content-Type: text/csv' -H 'Accept: application/json' --data-binary @example.csv -D -
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Sat, 03 Nov 2018 05:48:56 GMT
Server: postgrest/5.1.0 (3cef4b7)
Content-Range: */*


real	0m18.528s

I have uplaoded the csv file I am uploading as a gist

I have noticed in the db logs, it seems Postgres is converting the csv to json before inserting, is this the expected behaviour.

Any help would be greatly appreciated, thanks!

I imagine I am sending the request incorrectly.

The text was updated successfully, but these errors were encountered:

steve-chavez · 2018-11-06T16:12:25Z

Yes, we convert the csv to json and then insert to postgres, this is because currently our pg lib doesn't support the pg COPY command(see nikita-volkov/hasql#1), once that's supported we could insert the csv in one go with no additional processing.

Bulk inserting JSON could also be made a lot faster, see #690.

nathancahill · 2023-08-12T20:30:57Z

Curious if the Hasql support is still a blocker for this feature?

steve-chavez · 2023-08-14T23:52:37Z

@nathancahill No, not really. In fact I don't see us using COPY now since it'll prevent using libpq pipeline mode (#2707).

I think the solution for this is #2826, which will provide an escape hatch for doing your own csv parsing in SQL (or any pg language, like plrust) which will have higher perf. I expect to work on it for the next release.

steve-chavez added enhancement a feature, ready for implementation perf labels Mar 7, 2019

steve-chavez mentioned this issue Oct 13, 2019

customize log verbosity #540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV bulk insert performance #1206

CSV bulk insert performance #1206

jamesmstone commented Nov 3, 2018

steve-chavez commented Nov 6, 2018

nathancahill commented Aug 12, 2023

steve-chavez commented Aug 14, 2023

CSV bulk insert performance #1206

CSV bulk insert performance #1206

Comments

jamesmstone commented Nov 3, 2018

Environment

Description of issue

Test

steve-chavez commented Nov 6, 2018

nathancahill commented Aug 12, 2023

steve-chavez commented Aug 14, 2023