Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idempotent flag ignored for retries #331

Open
pdbossman opened this issue Nov 8, 2024 · 8 comments
Open

Idempotent flag ignored for retries #331

pdbossman opened this issue Nov 8, 2024 · 8 comments
Assignees

Comments

@pdbossman
Copy link

pdbossman commented Nov 8, 2024

Hello,
I was testing goCQL behavior with request shedding enabled. So I set
max_concurrent_requests_per_shard: 10

I was testing vanilla insert with value clause.

Initially, requestRetryPolicy was disabled, and I received shedding errors.

I then enabled requestRetryPolicy with ExponentialBackoffRetryPolicy.

    requestRetryPolicy := &gocql.ExponentialBackoffRetryPolicy{
		Min:        100 * time.Millisecond,  // Minimum retry delay
        //Min:        time.Second,  // Minimum retry delay
        Max:        10 * time.Second, // Maximum retry delay
        NumRetries: 10,           // Maximum number of retries
    }

As expected, queries were retried. I still observed shedding in monitoring, but the driver retried the requests with exponential backoff.

I then tested with idempotent set to false.

func insertQuery(session *gocql.Session, ctx context.Context, firstName, lastName, address, pictureLocation string, increment int, logger *zap.Logger) bool {
	//sp := &gocql.SimpleSpeculativeExecution{NumAttempts: 2, TimeoutDelay: 1 * time.Millisecond}
	query := session.Query("INSERT INTO mutant_data (first_name,last_name,address,picture_location) VALUES (?,?,?,?)", firstName, lastName, address, pictureLocation)
	//query.SetSpeculativeExecutionPolicy(sp)
	query.Idempotent(false)
	if err := query.Exec(); err != nil {
		logger.Error("insert mutant_data "+firstName+" "+lastName, zap.Error(err))
	}
}

Problem: Even with Idemotent false, which should block retry - the queries were still retried with exponential backoff policy. It also looks like it might be in general for all retries (speculative and timeout):
apache@92b4056

All retry policies should honor the idempotent flag.

@tarzanek
Copy link

tarzanek commented Nov 8, 2024

if we decide to merge upstream, resp. fix this
we should consider that there are users
who depend on this behaviour - e.g. they retry counter queries with retry policy - so we will need to get them a way to use old behaviour (I expect making sure you can force counter query to not be idempotent and then having above upstream fix will make it fly)

@pdbossman
Copy link
Author

if we decide to merge upstream, resp. fix this we should consider that there are users who depend on this behaviour - e.g. they retry counter queries with retry policy - so we will need to get them a way to use old behaviour (I expect making sure you can force counter query to not be idempotent and then having above upstream fix will make it fly)

@tarzanek the flag is settable by the user.
They can just add query.Idempotent(true) now to tell the driver to retry it.

@Michal-Leszczynski
Copy link

@pdbossman the scylla-go-driver is a repo for a gocql replacement project that was started, but never finished.
This issue should be ceated in the gocql repo.

@Michal-Leszczynski
Copy link

Unfortunately, I don't have permissions to transfer this issue there.
@Lorak-mmk do you have such permissions, and if so, could you do it?

@Lorak-mmk
Copy link

I don't really work with any Go drivers and have no permissions for those repositories.
@sylwiaszunejko / @dkropachev may be able to do this.

@sylwiaszunejko
Copy link
Collaborator

I don't have permissions to scylla-go-driver do I cannot transfer the issue

@mykaul mykaul transferred this issue from scylladb/scylla-go-driver Nov 13, 2024
@mykaul
Copy link

mykaul commented Nov 13, 2024

I don't have permissions to scylla-go-driver do I cannot transfer the issue

Done!

@dkropachev
Copy link
Collaborator

We need to let retry policy to decide, if it is retries in these circumstances.
We also need to remember not to break old behavior. Here are the steps I propse:

  1. Make writeCoalescer or Conn.exec to include a flag into error that signals that query could have been actually executed, despite error been recieved.
  2. Pick up this error in queryExecutor.do and if query is idempotent wrap it with user-facing error that would signal to retry policy and to user that this idempotent query was failed by could be actually executed.
  3. If it is needed we can implement RetryPolicyV2 that allows retry policy to properly decide if such queries to be retried or not.
  4. Regarding server errors, as of now, if server responds with any error, we assume that query could have been executed, but in future we could have had a list of exceptions for server errors, .
  5. Additionally we can solve Query retry logic is not triggered in certain cases #326 in the same PR

While we are doing that we need to remember that some customers rely on current behavior and we either should not break it and then we can release it as next patch version, if it is not possible then we need to have a clear message in the change log and go with minor release.

@dkropachev dkropachev removed the triage label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants