Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-row inserts #107

Merged
merged 2 commits into from
Mar 2, 2022
Merged

Support multi-row inserts #107

merged 2 commits into from
Mar 2, 2022

Conversation

hariso
Copy link
Contributor

@hariso hariso commented Dec 17, 2021

Hello, Aiven team!

This is in relation to #40 (and, originally #34). It takes the spike in #39 from @ahmeroxa as a basis.

Multi-row insert is implemented in a separate dialect and added to only Pg and SQLite. I wasn't comfortable adding multi-row support to GenericDatabaseDialect, since I wasn't able to actually confirm that all databases support it. I found that for example, PostgreSQL didn't support this until 8.2 But if we can, for example, assume that it's only the minority of DBs (and versions) which do not support it, then that could be a reason for putting multi-row inserts into GenericDatabaseDialect.

Best wishes,

Haris

Update: Since it looks like the most used DBs (MySQL, Pg, SQL Server) all support multi-row inserts, the default for the dialects is to support multi-row inserts.

@hariso hariso requested a review from a team as a code owner December 17, 2021 18:19
@hariso
Copy link
Contributor Author

hariso commented Jan 17, 2022

Bump.: )

Copy link
Contributor

@juha-aiven juha-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the PR.

I had a quick initial look and left some comments.

I wasn't comfortable adding multi-row support to GenericDatabaseDialect, since I wasn't able to actually confirm that all databases support it.

What would happen if the multi-row support was added to all dialects and some DB didn't support it? The inserts would fail and things would crash?

Would it make sense to include the support "globally" and then it would be a "user error" if his DB didn't support it?

It seems that e.g. MySQL supports multi-row insert. I wouldn't like to take the route of not supporting it for a DB that has the feature, e.g. MySQL.

So, either we should double check which ones support it and add support for those or just add the support globally and it wouldn't just work if the underlying DB didn't support it.

build.gradle Outdated
@@ -117,6 +118,7 @@ dependencies {

implementation "com.google.guava:guava:27.1-jre"
implementation "org.slf4j:slf4j-api:$slf4jVersion"
implementation "com.amazon.redshift:redshift-jdbc42:2.1.0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed in implementation-scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have even escaped from another PR I plan to open (a Redshift dialect). I think I can remove it completely.

@@ -22,4 +23,4 @@ log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n

log4j.logger.org.apache.kafka=ERROR
log4j.logger.io.aiven.connect=ERROR
log4j.logger.io.aiven.connect=DEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this? I think if you face errors while developing, you can change the log level. I'd revert the change in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👍

@@ -1,6 +1,7 @@
/*
* Copyright 2020 Aiven Oy
* Copyright 2018 Confluent Inc.
* Copyright 2021 Meroxa, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong year. Applies to all files where this line appears.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Written last year.: )

@juha-aiven
Copy link
Contributor

Hey, I saw your comments, but apparently no new commits?

What about the more important question:

Would it make sense to include the support "globally" and then it would be a "user error" if his DB didn't support it?

@hariso
Copy link
Contributor Author

hariso commented Feb 7, 2022

Hey, I saw your comments, but apparently no new commits?

What about the more important question:

Would it make sense to include the support "globally" and then it would be a "user error" if his DB didn't support it?

Hey Juha!

Sorry for the lack of activity here, unfortunately I wasn't able to get to it. However, I think today/tomorrow I'll be done with the comments here.

The reason why I believed it might make sense for this to be turned off by default is because I simply wasn't sure what would make sense. Now that you mentioned MySQL (I simply forgot checking it), I do believe it should be the default.

Edit: I've just checked this: https://insights.stackoverflow.com/survey/2021#most-popular-technologies-database-prof. The top 4 mentioned here do support multi-row inserts.

@hariso hariso requested a review from juha-aiven February 7, 2022 11:34
Copy link
Contributor

@juha-aiven juha-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some very minor comments.

Is the idea to git rid of MultiInsertDialect, move the code to GenericDatabaseDialect and make all dbs support multi insert?

break;
default:
throw new AssertionError();

}
statement.addBatch();
// in a multi-row insert, all records are a single item in the batch
if (!(insertMode == MULTI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?

if (insertMode != MULTI) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

import static org.mockito.Mockito.when;

public class BufferedRecordsTest {

private final SqliteHelper sqliteHelper = new SqliteHelper(getClass().getSimpleName());
private String dbUrl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now that you ask, I think yes. I put it into setUp(), since I thought sqliteHelper.sqliteUri() can be called only after sqliteHelper is actually set up, but that doesn't appear to be the case.

connection);

final Schema schema = newPlanetSchema();
for (int i = 1; i <= 5; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to create five rows, but since the batch size is two, there is not a flush for the last row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Basically, a test for the flushing logic too. I'll add a comment, since it looks like it's not that obvious.: )

@hariso
Copy link
Contributor Author

hariso commented Feb 9, 2022

Left some very minor comments.

Is the idea to git rid of MultiInsertDialect, move the code to GenericDatabaseDialect and make all dbs support multi insert?

That's correct. The multi-insert support is part of GenericDatabaseDialect, but I forgot deleting MultiInsertDialect. I'll do that now and address the other feedback you left.

Thanks a lot!

@hariso hariso requested a review from juha-aiven February 9, 2022 11:50
@hariso
Copy link
Contributor Author

hariso commented Feb 9, 2022

OK, so for some my IntelliJ doesn't show Checkstyle errors anymore, and that's why I missed those above. I'll fix soon.

akudiyar
akudiyar previously approved these changes Feb 23, 2022
@hariso
Copy link
Contributor Author

hariso commented Feb 23, 2022

@akudiyar @juha-aiven I force-pushed the branch, to have the commits signed. When you have some time, it'd be great if you could check this PR again.: )

Copy link
Contributor

@juha-aiven juha-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments, let's address these and this PR should be ready.

@hariso hariso requested review from juha-aiven and removed request for a team March 2, 2022 10:29
@juha-aiven juha-aiven merged commit e5b2b14 into Aiven-Open:master Mar 2, 2022
@jeremytee97
Copy link

any idea on when redshift dialect would be supported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants