Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moving backfilled oracle blocking start work to background thread #1201

Merged
merged 2 commits into from
Jul 24, 2024

Conversation

patrickhuie19
Copy link
Collaborator

Motivation

Old impl of backfilled oracle would run log poller in a non-blocking fashion on job creation. The new implementation has it blocking on job creation, which is impacting load testing.

Solution

Move back to the old background thread pattern.

Copy link
Contributor

@dimkouv dimkouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, just please make sure to check the mentioned concerns before merging.

@@ -175,7 +180,6 @@ func (r *ChainAgnosticBackFilledOracle) Start(ctx context.Context) error {
}
if err := ctx.Err(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% related to this PR but we could do here:

	eg := new(errgroup.Group)

	// Replay in parallel if both requested.
	eg.Go(func() error {
		s := time.Now()
		if err := r.srcProvider.Start(ctx); err != nil {
			return err
		}
		r.lggr.Infow("finished replaying src chain", "time", time.Since(s))
		return nil
	})

	eg.Go(func() error {
		s := time.Now()
		if err := r.dstProvider.Start(ctx); err != nil {
			return err
		}
		r.lggr.Infow("finished replaying dst chain", "time", time.Since(s))
		return nil
	})

	if err := eg.Wait(); err != nil {
		r.lggr.Criticalw("unexpected error replaying, continuing plugin boot without all the logs backfilled", "err", err)
	}

@@ -141,7 +141,12 @@ type ChainAgnosticBackFilledOracle struct {
}

func (r *ChainAgnosticBackFilledOracle) Start(ctx context.Context) error {
ctx, cancelFn := context.WithCancel(context.Background())
go r.run(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that the caller of Start() does not expect to have something ready.

e.g. The caller might expect that the blocks are replayed before moving on.

@@ -141,7 +141,12 @@ type ChainAgnosticBackFilledOracle struct {
}

func (r *ChainAgnosticBackFilledOracle) Start(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure that we can keep the caller's ctx while the process is running in the background?
Maybe we want to do ctx, cancelFn := context.WithCancel(context.Background()), maybe caller's context is cancelled after Start responds.

@patrickhuie19 patrickhuie19 merged commit 5ef031c into ccip-develop Jul 24, 2024
102 checks passed
@patrickhuie19 patrickhuie19 deleted the fix/backfilled-oracle branch July 24, 2024 11:01
@cl-sonarqube-production
Copy link

mateusz-sekara added a commit that referenced this pull request Jul 24, 2024
## Solution

It adds the missing part to this PR
#1201

We can't pass the foreground routine context to the task run in the
background and could take minutes or hours to finish. Instead of that,
we create a background context and expose cancelFn. This is exactly how
it used to work before introducing loopp layer
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
)

## Motivation
[Old
impl](https://github.com/smartcontractkit/ccip/blob/7d2cae3660bac29ba9dd48b03d88a761c154d6ae/core/services/ocr2/plugins/ccip/internal/oraclelib/backfilled_oracle.go#L41)
of backfilled oracle would run log poller in a non-blocking fashion on
job creation. The new implementation has it blocking on job creation,
which is impacting load testing.

## Solution
Move back to the old background thread pattern.
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
## Solution

It adds the missing part to this PR
#1201

We can't pass the foreground routine context to the task run in the
background and could take minutes or hours to finish. Instead of that,
we create a background context and expose cancelFn. This is exactly how
it used to work before introducing loopp layer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants