Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auditing: construct metrics probe during service construction #24108

Conversation

michael-redpanda
Copy link
Contributor

Move the construction of the audit probe from service start to service construction. Previously, the audit probe was created as a unique pointer when the service was started and destroyed when the service stopped.

Since the audit service has the same lifespan as the application, creating the probe during service construction reduces the risk of dereferencing a null pointer if the probe is accessed before the service has started.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Construct audit metrics probe during service initialization to prevent null pointer access.

Move the construction of the audit probe from service start to service
construction. Previously, the audit probe was created as a unique pointer
when the service was started and destroyed when the service stopped.

Since the audit service has the same lifespan as the application, creating
the probe during service construction reduces the risk of dereferencing a
null pointer if the probe is accessed before the service has started.

Signed-off-by: Michael Boquard <[email protected]>
@michael-redpanda michael-redpanda requested a review from a team November 12, 2024 21:34
@michael-redpanda michael-redpanda self-assigned this Nov 12, 2024
@michael-redpanda michael-redpanda requested review from aanthony-rp and oleiman and removed request for a team November 12, 2024 21:34
@oleiman oleiman requested a review from BenPope November 12, 2024 22:43
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Added Ben to the review because I'm never too sure about this stuff.

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks safe, but I have concerns that the service is being used before it is started.

For example, it's not clear without deep inspection of the code that calling drain is safe before the sink has been started.

It's also not clear that client_shard_id has been constructed before events can be received on other shards.

I think it may make sense to set the drain timer in start(), and move stuff from start to the constructor where possible.

Overall, it would be better to fix this by starting the service before it is used..

@michael-redpanda
Copy link
Contributor Author

This looks safe, but I have concerns that the service is being used before it is started.

Yeah I was discussing exactly this with @oleiman yesterday. The problem is that I don't think we can kick off the sink until after the kafka API is available.

For example, it's not clear without deep inspection of the code that calling drain is safe before the sink has been started.

So the drain timer isn't armed until after the sink has been started so I believe this is safe.

It's also not clear that client_shard_id has been constructed before events can be received on other shards.

It should be constructed when the service is constructed, no?

I think it may make sense to set the drain timer in start(), and move stuff from start to the constructor where possible.

It is, in a somewhat round-about way based on the value of audit_enabled

Overall, it would be better to fix this by starting the service before it is used..

So again, I don't think we can start the sink before the kafka service is up. One idea I had was to start the service very early and then enable the sink via another method once the kafka API is up and running.

@BenPope
Copy link
Member

BenPope commented Nov 14, 2024

This looks safe, but I have concerns that the service is being used before it is started.

Yeah I was discussing exactly this with @oleiman yesterday. The problem is that I don't think we can kick off the sink until after the kafka API is available.

Agreed.

For example, it's not clear without deep inspection of the code that calling drain is safe before the sink has been started.

So the drain timer isn't armed until after the sink has been started so I believe this is safe.

OK, I see it's ultimately called from audit_log_manager::start().

It's also not clear that client_shard_id has been constructed before events can be received on other shards.

It should be constructed when the service is constructed, no?

I think I fumbled that comment. I was talking about audit_log_manager::_sink on shard client_id_shard, the concern was that the shards are created without an ordering. But if drain cannot be called before audit_log_manager::start has been called (despite receiving events), I think it's safe.

I think it may make sense to set the drain timer in start(), and move stuff from start to the constructor where possible.

It is, in a somewhat round-about way based on the value of audit_enabled

Yeah, after digging around, I see that.

Overall, it would be better to fix this by starting the service before it is used..

So again, I don't think we can start the sink before the kafka service is up. One idea I had was to start the service very early and then enable the sink via another method once the kafka API is up and running.

I get it now.

@michael-redpanda michael-redpanda merged commit c84cb6a into redpanda-data:dev Nov 14, 2024
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

Comment on lines +404 to +406
// Probe is mutable so it can be modified in const methods when they need to
// report auditing failures
mutable audit_probe _probe;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik the contract isn't "so i can modify it in const methods" it's that the public api maintains logical constness. since the probe is exposed (e.g. probe()) it's not clear that this is technically correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants