Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV bulk insert performance #1206

Open
jamesmstone opened this issue Nov 3, 2018 · 3 comments
Open

CSV bulk insert performance #1206

jamesmstone opened this issue Nov 3, 2018 · 3 comments
Labels
enhancement a feature, ready for implementation perf

Comments

@jamesmstone
Copy link

Environment

  • PostgreSQL version: Docker f0b44c71fee9 postgres:9.6
  • PostgREST version: Docker 3153f5b6d430 postgrest/postgrest
  • Operating system: Linux

Description of issue

Not an issue per se, however, I am getting poor performance when uploading a relatively small CSV file
I am essentially uploading time series data.

Below is the relevant excert of the db schema.

create table consumption_shape
(
	id serial not null
		constraint consumption_shape_pkey
			primary key,
	name text,
	filename text,
	owner_id integer default request.user_id()
		constraint consumption_shape_owner_id_fkey
			references user
)
;



create table shape_value
(
	id serial not null
		constraint shape_value_pkey
			primary key,
	time_ending timestamp not null,
	mwh double precision not null,
	consumption_shape integer not null
		constraint shape_value_consumption_shape_fkey
			references consumption_shape
)
;



CREATE VIEW shape_values AS SELECT shape_value.id,
    shape_value.time_ending,
    shape_value.mwh,
    shape_value.consumption_shape
   FROM data.shape_value;
;

The key table is the shape_value table.

Test

$ time curl 'http://172.17.0.1:3000/shape_values' -H 'Content-Type: text/csv' -H 'Accept: application/json' --data-binary @example.csv -D -
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Date: Sat, 03 Nov 2018 05:48:56 GMT
Server: postgrest/5.1.0 (3cef4b7)
Content-Range: */*


real	0m18.528s

I have uplaoded the csv file I am uploading as a gist

I have noticed in the db logs, it seems Postgres is converting the csv to json before inserting, is this the expected behaviour.

Any help would be greatly appreciated, thanks!

I imagine I am sending the request incorrectly.

@steve-chavez
Copy link
Member

Yes, we convert the csv to json and then insert to postgres, this is because currently our pg lib doesn't support the pg COPY command(see nikita-volkov/hasql#1), once that's supported we could insert the csv in one go with no additional processing.

Bulk inserting JSON could also be made a lot faster, see #690.

@steve-chavez steve-chavez added enhancement a feature, ready for implementation perf labels Mar 7, 2019
@nathancahill
Copy link

Curious if the Hasql support is still a blocker for this feature?

@steve-chavez
Copy link
Member

@nathancahill No, not really. In fact I don't see us using COPY now since it'll prevent using libpq pipeline mode (#2707).

I think the solution for this is #2826, which will provide an escape hatch for doing your own csv parsing in SQL (or any pg language, like plrust) which will have higher perf. I expect to work on it for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a feature, ready for implementation perf
Development

No branches or pull requests

3 participants