Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor file structure of PROBE source #77

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,8 @@ jobs:

- run: nix flake check --all-systems --print-build-logs

- run: nix build --print-build-logs .#probe-bundled .#probe-py
- run: nix build --print-build-logs .#probe-bundled

# The devshell uses slightly different build process than the Nix pkg
# Might as well test that too
- run: nix develop --command just compile fix check test-native
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
**/.nextflow
**/work
**/.pytest_cache
**/__pycache__/

# build directories
**/target
Expand All @@ -21,4 +22,4 @@
**/desktop.ini

probe_log
dataflow_graph.pkl
.dmypy.json
55 changes: 36 additions & 19 deletions Justfile
Original file line number Diff line number Diff line change
@@ -1,34 +1,51 @@
fix-format-nix:
fix-nix:
alejandra .

fix-ruff:
#ruff format probe_src # TODO: uncomment
ruff check --fix probe_src
fix-py: compile-cli
# fix-py depends on compile-cli for the autogen python code
#ruff format probe_py/ tests/ libprobe/generator/ # TODO: uncomment
ruff check --fix probe_py/ tests/ libprobe/generator/

fix-format-rust:
env --chdir probe_src/frontend cargo fmt
fix-cli:
# cargo clippy refuses to run if unstaged inputs (fixes may be destructive)
# so we git add -A
env --chdir cli-wrapper git add -A
env --chdir cli-wrapper cargo clippy --fix --allow-staged -- --deny warnings
env --chdir cli-wrapper cargo fmt

fix-clippy:
git add -A
env --chdir probe_src/frontend cargo clippy --fix --allow-staged
fix: fix-nix fix-py fix-cli

check-mypy:
mypy --strict probe_src/libprobe
mypy --strict --package probe_py.generated
mypy --strict --package probe_py.manual
check-py: compile-cli
# dmypy == daemon mypy; much faster.
dmypy run -- --strict --no-namespace-packages --pretty probe_py/ tests/ libprobe/generator/

check-cli:
env --chdir cli-wrapper cargo doc --workspace

check: check-py check-cli

compile-lib:
make --directory=probe_src/libprobe all
make --directory=libprobe all

compile-cli:
env --chdir=probe_src/frontend cargo build --release
env --chdir=cli-wrapper cargo build --release
env --chdir=cli-wrapper cargo build

compile-tests:
make --directory=probe_src/tests/c all
make --directory=tests/examples all

compile: compile-lib compile-cli compile-tests

test-dev: compile
pytest probe_src --failed-first --maxfail=1
test-nix:
nix build .#probe-bundled
nix flake check --all-systems

test-native: compile
python -m pytest tests/ -ra --failed-first --maxfail=1 -v

test: test-native
# Unless you the user explicitly asks (`just test-nix`),
# we don't really need to test-nix.
# It runs the same checks as `just test` and `just check`, but in Nix.

pre-commit: fix-format-nix fix-ruff fix-format-rust fix-clippy compile check-mypy test-dev
pre-commit: fix check compile test
25 changes: 0 additions & 25 deletions Makefile

This file was deleted.

52 changes: 25 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,30 +109,28 @@ probe export --help

7. **Before submitting a PR**, run `just pre-commit` which will run pre-commit checks.

## Resarch reading list

- [_Provenance for Computational Tasks: A Survey_ by Freire, et al. in CiSE 2008](https://sci.utah.edu/~csilva/papers/cise2008a.pdf) for an overview of provenance in general.

- [_Transparent Result Caching_ by Vahdat and Anderson in USENIX ATC 1998](https://www.usenix.org/legacy/publications/library/proceedings/usenix98/full_papers/vahdat/vahdat.pdf) for an early system-level provenance tracer in Solaris using the `/proc` fs. Linux's `/proc` fs doesn't have the same functionality. However, this paper discusses two interesting application of provenance: unmake (query lineage information) and transparent Make (more generally, incremental computation).

- [_CDE: Using System Call Interposition to Automatically Create Portable Software Packages_ by Guo and Engler in USENIX ATC 2011](https://www.usenix.org/legacy/events/atc11/tech/final_files/GuoEngler.pdf) for an early system-level provenance tracer. Their only application is software execution replay, but replay is quite an important application.

- [_Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?_ by Thain, Meng, and Ivie in 2015 ](https://curate.nd.edu/articles/journal_contribution/Techniques_for_Preserving_Scientific_Software_Executions_Preserve_the_Mess_or_Encourage_Cleanliness_/24824439?file=43664937) discusses whether enabling automatic-replay is actually a good idea. A cursory glance makes PROBE seem more like "preserving the mess", but I think, with some care in the design choices, it actually can be more like "encouraging cleanliness", for example, by having heuristics that help cull/simplify provenance and generating human readable/editable package-manager recipes.

- [_SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions_ by Inam et al. in IEEE Symposium on Security and Privacy 2023](https://adambates.org/documents/Inam_Oakland23.pdf) see specifically Inam's survey of different possibilities for the "Capture layer", "Reduction layer", and "Infrastructure layer". Although provenance-for-security has different constraints than provenacne for other purposes, the taxonomy that Inam lays out is still useful. PROBE operates by intercepting libc calls, which is essentially a "middleware" in Table I (platform modification, no program modification, no config change, incomplete mediation, not tamperproof, inter-process tracing, etc.).

- [_System-Level Provenance Tracers_ by me et al. in ACM REP 2023](./docs/acm-rep-pres.pdf) for a motivation of this work. It surveys prior work, identifies potential gaps, and explains why I think library interposition is a promising path for future research.

- [_Computational Experiment Comprehension using Provenance Summarization_ by Bufford et al. in ACM REP 2023](https://dl.acm.org/doi/pdf/10.1145/3641525.3663617) discusses how to implement an interface for querying provenance information. They compare classical graph-based visualization with an interactive LLM in a user-study.

## Prior art

- [RR-debugger](https://github.com/rr-debugger/rr) which is much slower, but features more complete capturing, lets you replay but doesn't let you do any other analysis.

- [Sciunits](https://github.com/depaul-dice/sciunit) which is much slower, more likely to crash, has less complete capturing, lets you replay but doesn't let you do other analysis.

- [Reprozip](https://www.reprozip.org/) which is much slower and has less complete capturing.

- [CARE](https://proot-me.github.io/care/) which is much slower, has less complete capturing, and lets you do containerized replay but not unpriveleged native replay and not other analysis.

- [FSAtrace](https://github.com/jacereda/fsatrace) which is more likely to crash, has less complete capturing, and doesn't have replay or other analyses.
## Directory structure

- `libprobe`: Library that implements interposition (C, Make, Python; happens to be manual and code-gen).
- `libprobe/include`: Headers that will be used by the Rust wrapper to read PROBE data.
- `libprobe/src`: Main C sources of `libprobe`.
- `libprobe/generator`: Python and C-template code-generator.
- `libprobe/generated`: (Generated, not committed to Git) output of code-generation.
- `libprobe/Makefile`: Makefile that runs all of `libprobe`; run `just compile-cli` to invoke.
- `cli-wrapper`: (Cargo workspace) code that wraps libprobe.
- `cli-wrapper/cli`: (Cargo crate) main CLI.
- `cli-wrapper/lib`: (Cargo crate) supporting library functions.
- `cli-wrapper/macros`: (Cargo crate) supporting macros; they use structs from `libprobe/include` to create Rust structs and Python dataclasses.
- `cli-wrapper/frontend.nix`: Nix code that builds the Cargo workspace; Gets included in `flake.nix`.
- `probe_py`: Python Code that implements analysis of PROBE data (happens to be manual and code-gen), should be added to `$PYTHONPATH` by `nix develop`
- `probe_py/probe_py`: Main package to be imported or run.
- `probe_py/pyproject.toml`: Definition of main package and dependencies.
- `probe_py/tests`: Python unittests, i.e., `from probe_py import foobar; test_foobar()`; Run `just test-py`.
- `probe_py/mypy_stubs`: "Stub" files that tell Mypy how to check untyped library code. Should be added to `$MYPYPATH` by `nix develop`.
- `tests`: End-to-end opaque-box tests. They will be run with Pytest, but they will not test Python directly; they should always `subprocess.run(["probe", ...])`. Additionally, some tests have to be manually invoked.
- `docs`: Documentation and papers.
- `benchmark`: Programs and infrastructure for benchmarking.
- `benchmark/REPRODUCING.md`: Read this first!
- `flake.nix`: Nix code that defines packages and the devshell.
- `setup_devshell.sh`: Helps instantiate Nix devshell.
- `Justfile`: "Shortcuts" for defining and running common commands (e.g., `just --list`).
File renamed without changes.
4 changes: 2 additions & 2 deletions probe_src/frontend/Cargo.lock → cli-wrapper/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions probe_src/frontend/Cargo.toml → cli-wrapper/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[workspace]
resolver = "2"
members = [
members = [
"cli",
"lib",
"lib",
"macros",
]

Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ exec = "0.3.1"
flate2 = "1.0.30"
libc = "0.2.155"
log = "0.4.21"
probe_frontend = { path = "../lib" }
probe_lib = { path = "../lib" }
rand = "0.8.5"
serde = "1.0.203"
serde_json = "1.0.118"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use std::{

use chrono::{DateTime, SecondsFormat};
use color_eyre::eyre::{eyre, Result, WrapErr};
use probe_frontend::ops;
use probe_lib::ops;
use serde::{Deserialize, Serialize};

/// Print the ops from a probe log out for humans.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ mod dump;
/// Run commands under provenance and generate probe record directory.
mod record;

/// Wrapper over [`probe_frontend::transcribe`].
/// Wrapper over [`probe_lib::transcribe`].
mod transcribe;

/// Utility code for creating temporary directories.
Expand Down Expand Up @@ -137,7 +137,7 @@ fn main() -> Result<()> {

let exit = std::process::Command::new("python3")
.arg("-m")
.arg("probe_py.manual.cli")
.arg("probe_py.cli")
.arg(subcommand)
.args(&args)
.spawn()
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pub fn transcribe<P: AsRef<Path>, T: Write>(
) -> Result<()> {
let log_dir = Dir::temp(true).wrap_err("Failed to create temp directory for transcription")?;

probe_frontend::transcribe::parse_top_level(record_dir, &log_dir)
probe_lib::transcribe::parse_top_level(record_dir, &log_dir)
.wrap_err("Failed to transcribe record directory")?;

tar.append_dir_all(".", &log_dir)
Expand Down
File renamed without changes.
File renamed without changes.
136 changes: 136 additions & 0 deletions cli-wrapper/frontend.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
{
pkgs,
craneLib,
rust-target,
advisory-db,
system,
python,
lib,
}: rec {
# See https://crane.dev/examples/quick-start-workspace.html

src = craneLib.cleanCargoSource ./.;

# Common arguments can be set here to avoid repeating them later
commonArgs = {
inherit src;
strictDeps = true;

# all the crates in this workspace either use rust-bindgen or depend
# on local crate that does.
nativeBuildInputs = [
pkgs.rustPlatform.bindgenHook
];

CARGO_BUILD_TARGET = rust-target;
CARGO_BUILD_RUSTFLAGS = "-C target-feature=+crt-static";
CPATH = ../libprobe/include;

# pygen needs to know where to write the python file
preConfigurePhases = [
"pygenConfigPhase"
];
pygenConfigPhase = ''
export PYGEN_OUTFILE="$out/resources/ops.py"
mkdir --parents "$(dirname "$PYGEN_OUTFILE")"
echo "Sending python code to $PYGEN_OUTFILE"
'';
};

# Build *just* the cargo dependencies (of the entire workspace),
# so we can reuse all of that work (e.g. via cachix) when running in CI
# It is *highly* recommended to use something like cargo-hakari to avoid
# cache misses when building individual top-level-crates
cargoArtifacts = craneLib.buildDepsOnly commonArgs;

individualCrateArgs =
commonArgs
// {
inherit cargoArtifacts;
inherit (craneLib.crateNameFromCargoToml {inherit src;}) version;
# disable tests since we'll run them all via cargo-nextest
doCheck = false;
};

fileSetForCrate = crates:
lib.fileset.toSource {
root = ./.;
fileset = lib.fileset.unions ([
./Cargo.toml
./Cargo.lock
]
++ (builtins.map craneLib.fileset.commonCargoSources crates));
};

packages = rec {
inherit cargoArtifacts;

# Prior to this version, the old code had one derivatino per crate (probe-cli, probe-lib, and probe-macros).
# What could go wrong?
# Since the old version used `src = ./.`, it would rebuild all three if any one changed.

# craneLib's workspace example [1] says to use `src = fileSetForCrate ./path/to/crate`.
# However, when I tried doing that, it would say "failed to load manifest for workspace member lib" because "failed to read macros/Cargo.toml".
# Because `lib/Cargo.toml` has a dependency on `{path = "../macros"}`,
# I think the source code of both crates have to be present at build-time of lib.
# Which means no source filtering is possible.
# Indeed the exposed packages in craneLib's example (my-cli and my-server) [1] do not depend on each other.
# They depend on my-common, which is *not* filtered out (*is* included) in the `src` for those crates.
# If it's possible to simultaneously:
# - expose two Cargo crates A and B
# - where A depends on B
# - when A changes only A needs to be rebuilt
# then I don't know how to do it.
# Therefore, I will only offer one crate as a Nix package.
#
# https://crane.dev/examples/quick-start-workspace.html

probe-cli = craneLib.buildPackage (individualCrateArgs
// {
pname = "probe-cli";
cargoExtraArgs = "-p probe_cli";
src = fileSetForCrate [
./cli
./lib
./macros
];
});
};
checks = {
probe-workspace-clippy = craneLib.cargoClippy (commonArgs
// {
inherit (packages) cargoArtifacts;
cargoClippyExtraArgs = "--all-targets -- --deny warnings";
});

probe-workspace-doc = craneLib.cargoDoc (commonArgs
// {
inherit (packages) cargoArtifacts;
});

# Check formatting
probe-workspace-fmt = craneLib.cargoFmt {
inherit src;
};

# Audit dependencies
probe-workspace-audit = craneLib.cargoAudit {
inherit src advisory-db;
};

# Audit licenses
probe-workspace-deny = craneLib.cargoDeny {
inherit src;
};

# Run tests with cargo-nextest
# this is why `doCheck = false` on the crate derivations, so as to not
# run the tests twice.
probe-workspace-nextest = craneLib.cargoNextest (commonArgs
// {
inherit (packages) cargoArtifacts;
partitions = 1;
partitionType = "count";
});
};
}
Loading
Loading