-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove nexus-test-utils build.rs #4056
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,29 +4,76 @@ | |
|
||
//! Database testing facilities. | ||
|
||
use camino::Utf8PathBuf; | ||
use omicron_test_utils::dev; | ||
use slog::Logger; | ||
use std::path::PathBuf; | ||
|
||
/// Path to the "seed" CockroachDB directory. | ||
/// | ||
/// Populating CockroachDB unfortunately isn't free - creation of | ||
/// tables, indices, and users takes several seconds to complete. | ||
/// | ||
/// By creating a "seed" version of the database, we can cut down | ||
/// on the time spent performing this operation. Instead, we opt | ||
/// to copy the database from this seed location. | ||
fn seed_dir() -> PathBuf { | ||
PathBuf::from(concat!(env!("OUT_DIR"), "/crdb-base")) | ||
// Creates a string identifier for the current DB schema and version. | ||
// | ||
// The goal here is to allow to create different "seed" directories | ||
// for each revision of the DB. | ||
fn digest_unique_to_schema() -> String { | ||
let schema = include_str!("../../../schema/crdb/dbinit.sql"); | ||
let crdb_version = include_str!("../../../tools/cockroachdb_version"); | ||
let mut ctx = ring::digest::Context::new(&ring::digest::SHA256); | ||
ctx.update(&schema.as_bytes()); | ||
ctx.update(&crdb_version.as_bytes()); | ||
let digest = ctx.finish(); | ||
hex::encode(digest.as_ref()) | ||
} | ||
|
||
// Seed directories will be created within: | ||
// | ||
// - /tmp/crdb-base/<digest unique to schema> | ||
// | ||
// However, the process for creating these seed directories is not atomic. | ||
// We create a temporary directory within: | ||
// | ||
// - /tmp/crdb-base/... | ||
// | ||
// And rename it to the final "digest" location once it has been fully created. | ||
Comment on lines
+29
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking about this a bit more (and assuming it's hard to serialize the creation of this directory ahead of test invocations, which I assumed, but as I said, don't have a lot of experience with nextest yet and am less sure of what features it has), one way to make things more explicit could be to use a file as a lock to take ownership of setting up the directory. So the winning test path would:
To protect against races, I think an approach that might work is:
What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about this approach, but it falls over if any test crashes (or is I do understand that the Also, ultimately, if some process failed partway through creation, and we needed to hand-off creation of the "seed directory" to a different test process, we'd need additional logic to distinguish between "this directory exists because I should use it" vs "this directory exists and I should try to salvage it, or destroy it and start over". I thought the approach of "do an atomic rename" seemed simpler -- the directory can only (atomically) exist after it has been successfully created, and if any test failed during setup, their output would be isolated. Besides, the work of "spin up a CRDB node and populate it" was the default behavior for all tests using the DB before I tried adding this "seed" optimization. At worst, it's only a small performance hit for the first couple tests that are run (and if the schema changes, that seed directory still exists, so it can be re-used for subsequent test runs). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd have to second @smklein's concern about using file locks anywhere for tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, a few things about file locking:
I think this logic is relatively simple: put a marker file in the directory, or write marker data into the lock file, once the setup is complete. If the marker is not present and correct when you get to lock the file, remove the database files and start again. Otherwise, release the lock and use the database seed. You could also use a combination of the atomic directory rename you're talking about, and the lock file: lock the lock file, and:
Either way, once you drop the lock, there will be a correctly provisioned seed in database. Here is a simple example of using flock(3C) through the fs4 crate: https://github.com/jclulow/junk-locker There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's okay with ya'll, I'd like to punt on this. Not necessarily saying we shouldn't do it, but maybe not within this PR?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we're going to toss it out anyway, could we just wait until we've properly assessed and prioritized whether we should do the nextest feature? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option I've heard people use is a The cleanest solution would be the upstream feature. If that were coming soon, then that would be ideal. If not though, basically this is a game of "which way of emulating this feature sucks the least." Build scripts have their own annoying downsides. But I also agree with @smklein that this discussion feels like premature optimization. Is that actually the case? Does this patch slow down the test suite, and if so, by how much? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW @steveklabnik I was more worried about intentionally racing the tests against each other. Maybe that's not a fair concern. I also agree that the upstream feature is cleanest, and hence was curious about that. |
||
async fn ensure_seed_directory_exists(log: &Logger) -> Utf8PathBuf { | ||
let base_seed_dir = Utf8PathBuf::from_path_buf(std::env::temp_dir()) | ||
.expect("Not a UTF-8 path") | ||
.join("crdb-base"); | ||
std::fs::create_dir_all(&base_seed_dir).unwrap(); | ||
let desired_seed_dir = base_seed_dir.join(digest_unique_to_schema()); | ||
|
||
// If the directory didn't exist when we started, try to create it. | ||
// | ||
// Note that this may be executed concurrently by many tests, so | ||
// we should consider it possible for another caller to create this | ||
// seed directory before we finish setting it up ourselves. | ||
if !desired_seed_dir.exists() { | ||
let tmp_seed_dir = | ||
camino_tempfile::Utf8TempDir::new_in(base_seed_dir).unwrap(); | ||
dev::test_setup_database_seed(log, tmp_seed_dir.path()).await; | ||
|
||
// If we can successfully perform the rename, we made the seed directory | ||
// faster than other tests. | ||
// | ||
// If we couldn't perform the rename, the directory might already exist. | ||
// Check that this is the error we encountered -- otherwise, we're | ||
// struggling. | ||
if let Err(err) = | ||
std::fs::rename(tmp_seed_dir.path(), &desired_seed_dir) | ||
{ | ||
if !desired_seed_dir.exists() { | ||
panic!("Cannot rename seed directory for CockroachDB: {err}"); | ||
} | ||
} | ||
} | ||
|
||
desired_seed_dir | ||
} | ||
|
||
/// Wrapper around [`dev::test_setup_database`] which uses a a | ||
/// seed directory provided at build-time. | ||
/// seed directory that we construct if it does not already exist. | ||
pub async fn test_setup_database(log: &Logger) -> dev::db::CockroachInstance { | ||
let dir = seed_dir(); | ||
let dir = ensure_seed_directory_exists(log).await; | ||
dev::test_setup_database( | ||
log, | ||
dev::StorageSource::CopyFromSeed { input_dir: dir }, | ||
dev::StorageSource::CopyFromSeed { input_dir: dir.into() }, | ||
) | ||
.await | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data used to be placed in:
(Which is an environment variable cargo sets for build scripts)
Well, this isn't run by a build script any longer, so now it's going in
$TEST_TMPDIR
, which is why I'm removing it explicitly now.