-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAT RPW #3804
NAT RPW #3804
Conversation
a9c6ae0
to
8e1ca66
Compare
e800d5b
to
d990790
Compare
d990790
to
f23e2a2
Compare
6648ec7
to
5f61f27
Compare
/// empty vec is returned. | ||
#[endpoint { | ||
method = GET, | ||
path = "/rpw/nat/ipv4/changeset/{from_gen}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nitpick -- I'd just prefix these paths with "/nat", the fact that they're called from Dendrite is an implementation detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still might remove the /rpw
prefix, the fact that Nexus processes this as an RPW is an implementation detail.
Testing resolution of oxidecomputer/buildomat#38 |
b3154cb
to
32e53be
Compare
* Add db table for tracking nat entries * Add endpoint for retrieving changesets * Update instance sagas to update table and trigger RPW * Periodically cleanup soft-deleted entries that no longer need to be sync'd by dendrite. The other half of the RPW lives in Dendrite. It will periodically check for a changeset, or check for a changeset when the trigger endpoint is called by the relevant saga / nexus operation.
32e53be
to
315ff35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first pass here is focused a bit more at the DB than the RPW itself
nexus/db-model/src/ipv4_nat_entry.rs
Outdated
// TODO correctness | ||
// If we're not going to store ipv4 and ipv6 | ||
// NAT entries in the same table, and we don't | ||
// need any of the special properties of the IpNetwork | ||
// column type, does it make sense to use a different | ||
// column type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple questions about this:
- I don't see anything in this struct that appears IPv4 specific -- doesn't
ipnetwork::IpNetwork
encapsulate both v4 and v6 addresses? Could this entire struct just beIpNatValues
(and same elsewhere in this file) or would that be a bad idea? - If that genuinely is a bad idea, https://docs.rs/ipnetwork/0.20.0/ipnetwork/struct.Ipv4Network.html exists, and we could use that instead?
Either way, we should resolve this comment before merging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By keeping the tables for v4 and v6 separate, we maintain a 1:1 mapping with the tables in dendrite
, which does make troubleshooting, observability, and state reconciliation a bit more straightforward, so my gut says "keep them separate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try using Ipv4Network
unless someone has an objection :)
nexus/db-model/src/ipv4_nat_entry.rs
Outdated
// TODO correctness | ||
// If we're not going to store ipv4 and ipv6 | ||
// NAT entries in the same table, we should probably | ||
// make the types more restrictive to prevent an | ||
// accidental ipv6 entry from being created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment
// different target sled / vpc / mac, we will violate a uniqueness | ||
// constraint (which is desired in this case). | ||
// | ||
// It seems that this isn't straightforward to express in diesel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Diesel supports
insert_into
from SELECT statements: https://docs.diesel.rs/master/diesel/prelude/trait.Insertable.html#method.insert_into - Diesel supports
NOT
: https://docs.diesel.rs/master/diesel/dsl/fn.not.html - Diesel supports
EXISTS
: https://docs.diesel.rs/master/diesel/expression/exists/fn.exists.html
I know it's not trivial, but it's probably worth expressing this in Diesel -- if we added a necessary column here, this raw SQL code would still compile, but fail to actually insert a valid entry.
I do think you had the right idea -- get moving with sql_query
-- but hopefully the following code lets us add a little more type-safety here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not executed this code, but I have confirmed it compiles:
use db::schema::ipv4_nat_entry::dsl;
use diesel::sql_types;
// Look up any NAT entries that already have the exact parameters
// we're trying to INSERT.
let matching_entry_subquery = dsl::ipv4_nat_entry
.filter(dsl::external_address.eq(nat_entry.external_address))
.filter(dsl::first_port.eq(nat_entry.first_port))
.filter(dsl::last_port.eq(nat_entry.last_port))
.filter(dsl::sled_address.eq(nat_entry.sled_address))
.filter(dsl::vni.eq(nat_entry.vni))
.filter(dsl::mac.eq(nat_entry.mac))
.select((
dsl::external_address,
dsl::first_port,
dsl::last_port,
dsl::sled_address,
dsl::vni,
dsl::mac,
));
// SELECT exactly the values we're trying to INSERT, but only
// if it does not already exist.
let new_entry_subquery = diesel::dsl::select((
nat_entry.external_address.into_sql::<sql_types::Inet>(),
nat_entry.first_port.into_sql::<sql_types::Int4>(),
nat_entry.last_port.into_sql::<sql_types::Int4>(),
nat_entry.sled_address.into_sql::<sql_types::Inet>(),
nat_entry.vni.into_sql::<sql_types::Int4>(),
nat_entry.mac.into_sql::<sql_types::BigInt>(),
)).filter(diesel::dsl::not(diesel::dsl::exists(matching_entry_subquery)));
diesel::insert_into(dsl::ipv4_nat_entry)
.values(new_entry_subquery)
.into_columns((
dsl::external_address,
dsl::first_port,
dsl::last_port,
dsl::sled_address,
dsl::vni,
dsl::mac,
))
.execute_async(&*self.pool_connection_authorized(opctx).await?)
.await
.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will give this a try! Thanks for giving this a look!
" | ||
UPDATE omicron.public.ipv4_nat_entry | ||
SET | ||
version_removed = nextval('omicron.public.ipv4_nat_version'), | ||
time_deleted = now() | ||
WHERE time_deleted IS NULL AND version_removed IS NULL AND id = $1 AND version_added = $2 | ||
", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we definitely need custom SQL to handle nextval
, but the rest of this could be vanilla diesel.
If we add the following function:
fn ipv4_nat_next_version() -> diesel::expression::SqlLiteral<BigInt> {
diesel::dsl::sql::<BigInt>("nextval('omicron.public.ipv4_nat_version')")
}
Then this particular snippet of SQL could be replaced with the following:
use db::schema::ipv4_nat_entry::dsl;
let updated_rows = diesel::update(dsl::ipv4_nat_entry)
.set((
dsl::version_removed.eq(ipv4_nat_next_version().nullable()),
dsl::time_deleted.eq(Utc::now()),
))
.filter(dsl::time_deleted.is_null())
.filter(dsl::version_removed.is_null())
.filter(dsl::id.eq(nat_entry.id))
.filter(dsl::version_added.eq(nat_entry.version_added))
.execute_async(&*self.pool_connection_authorized(opctx).await?)
.await
.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))?;
nexus/db-model/src/ipv4_nat_entry.rs
Outdated
pub version_added: SqlU32, | ||
pub version_removed: Option<SqlU32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be storing these versions as BigInts, rather than SqlU32s?
If we overflow a u32 (very possible!) these values would start throwing deserialization errors.
changeset.sort_by_key(|e| e.gen); | ||
Ok(HttpResponseOk(changeset)) | ||
}; | ||
apictx.external_latencies.instrument_dropshot_handler(&rqctx, handler).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apictx.external_latencies.instrument_dropshot_handler(&rqctx, handler).await | |
apictx.internal_latencies.instrument_dropshot_handler(&rqctx, handler).await |
let view = Ipv4NatGenView { gen }; | ||
Ok(HttpResponseOk(view)) | ||
}; | ||
apictx.external_latencies.instrument_dropshot_handler(&rqctx, handler).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apictx.external_latencies.instrument_dropshot_handler(&rqctx, handler).await | |
apictx.internal_latencies.instrument_dropshot_handler(&rqctx, handler).await |
Ok(()) | ||
} | ||
|
||
pub async fn ipv4_nat_cancel_delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be marked #[cfg(test)]
? The act of cancelling a delete seems somewhat unsafe to me outside test contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally intended for usage in saga unwinds, but I think we've since decided that we shouldn't try to unwind deletion of network configuration and instead consider it the "point of no return"
struct RpwNatPathParam { | ||
/// which change number to start generating | ||
/// the change set from | ||
from_gen: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as other comment, about u32 vs u64 representation of generation numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this still needs updating
// TODO: Remove (maybe) | ||
// This endpoint may be more trouble than it's worth. The reason this endpoint | ||
// may be a headache is because the underlying SEQUENCE in the DB will advance, | ||
// even on failed transactions. This is not a big deal, but it will lead to | ||
// callers trying to catch up to entries that don't exist if they check this | ||
// endpoint first. As an alternative, we could just page through the | ||
// changesets until we get nothing back, then revert to periodic polling | ||
// of the changeset endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where'd we land on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen a place where it has merit, so I'll remove it.
async { | ||
let log = &opctx.log; | ||
|
||
let result = self.datastore.ipv4_nat_current_version(opctx).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To verify my understanding:
- This task periodically reads the latest generation number from the DB
- It sends that generation number to Dendrite instances (
client.ipv4_nat_generation()
) - It cleans old records
So the "dendrite-playing-catch-up" entirely relies on Dendrite reaching back to the internal API within Nexus, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty close to how it works:
- Omicron periodically reads the latest generation in DB
- Omicron checks to see what generation numbers the various dendrite instances have (we are querying dendrite, not triggering any work in dendrite)
- Omicron only cleans up records older than the earliest found dendrite generation
The idea is that we don't clean up changes that a dendrite instance hasn't picked up yet. We also shouldn't have a detrimental TOCTOU race because the only way a dendrite instance moves backwards is if it crashes, and in that case it's okay that we're cleaning up old entries because we don't want that dendrite instance to re-apply them anyway.
Dendrite only catches up when:
- An event in the control plane calls the
trigger
endpoint on the dendrite API (a "doorbell" of sorts) - Dendrite's internal RPW interval timer expires
In these events, yes, Dendrite will reach back via the internal API to catch up on the changes that have occurred.
When you get a chance, could you merge in main so this can be tested in the a4x2 testbed topology? |
Note that since we are changing some of the types used by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last few comments are mostly nitpicks, but this looks good! Thanks for the query cleanup!
struct RpwNatPathParam { | ||
/// which change number to start generating | ||
/// the change set from | ||
from_gen: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this still needs updating
/// empty vec is returned. | ||
#[endpoint { | ||
method = GET, | ||
path = "/rpw/nat/ipv4/changeset/{from_gen}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still might remove the /rpw
prefix, the fact that Nexus processes this as an RPW is an implementation detail.
.await?; | ||
} | ||
IpNetwork::V6(_v6net) => { | ||
// TODO: implement handling of v6 nat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is unimplemented, we should throw an error rather than silently doing nothing
nexus/db-model/src/ipv4_nat_entry.rs
Outdated
} | ||
} | ||
|
||
/// Database representation of an Ipv4 NAT Generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it took me a while to figure this out -- this is a view of a sequence, not a table, and the log_cnt
+ is_called
values are synthetically provided from CockroachDB.
That's worth documenting! I do not find that behavior obvious - I was trying to find out where these were used.
Related: Do we actually need to include log_cnt
and is_called
here? When we query for the Ipv4NatGen, we only read last_value
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thus far we only care about last_value
, I can delete the other two columns from the struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems that since we're not using this struct anymore thanks to some of the other changes that have been made :D
@@ -2899,14 +2847,67 @@ CREATE TABLE IF NOT EXISTS omicron.public.db_metadata ( | |||
CHECK (singleton = true) | |||
); | |||
|
|||
|
|||
CREATE SEQUENCE IF NOT EXISTS omicron.public.ipv4_nat_version START 1 INCREMENT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a super minor nitpick, but we did some work to ensure that the order of declarations in dbinit.sql
should mostly not matter.
Can we move this stuff up earlier? I find it a little odd to have the ordering of dbinit be:
- Most of the structures we use
- The db_metadata table declaration
- The tables that were added for NAT
- INSERT into db_metadata
(I kinda think db_metadata's declaration + INSERT should be right next to each other, at the end of the file, if opssible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor ordering comment aside, this looks good to merge to me!
schema/crdb/10.0.0/up1.sql
Outdated
@@ -0,0 +1 @@ | |||
CREATE SEQUENCE IF NOT EXISTS omicron.public.ipv4_nat_version START 1 INCREMENT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you merge this -- looks like 10.0.0
is already used on main?
https://github.com/oxidecomputer/omicron/tree/main/schema/crdb/10.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll need to bump dendrite before merging anyway so it will give me one another chance to get my version ahead of main
😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awaiting merge of https://github.com/oxidecomputer/dendrite/pull/627
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dendrite PR merged, bumping Omicron dependency version and Schema version
This was accidentally merged in with 11.0.0 in oxidecomputer#3804 Fixes oxidecomputer#4530 (but see oxidecomputer#4531 for followup work)
need to be sync'd by dendrite.
The other half of the RPW lives in Dendrite. It will periodically
check for a changeset, or check for a changeset when the trigger
endpoint is called by the relevant saga / nexus operation.