TODO

API endpoints:

RFD 24: regions, AZs, etc
(lots more)

Work queue (see also: existing GitHub issues):

use CARGO_BIN_EXE for paths to binaries https://doc.rust-lang.org/cargo/reference/environment-variables.html#environment-variables-cargo-sets-for-crates
dropshot: allow consumers to provide error codes for dropshot errors
general maintenance and cleanup
- replace &Arc<T> with &T, and some instances of Arc<T> as well
- all identifiers could be newtypes, with a prefix for the type (like AWS "i-123" for instances)
- rethinking ApiError a bit — should it use thiserror, or at least impl std::error::Error?
- scope out switching to sync (see RFD 79)
- proper config + logging for sled agent
settle on an approach for modification of resources and implement it once
implement behavior of server restarting (e.g., sled agent starting up)
- This would help validate some of the architectural choices. Current thinking is that this will notify OXCP of the restart, and OXCP will find instances that are supposed to be on that server and run instance_ensure(). It will also want to do that for the disks associated with those instances. IMPORTANT: this process should also remove any resources that are currently on that system, so the notification to OXCP about a restart may need to include the list of resources that the SA knows about and their current states.
implement audit log
implement alerts
implement external user authentication
implement external user authorization mechanism
implement throttling and load shedding described in RFD 6
implement hardening in RFD 10
implement ETag / If-Match / If-None-Match
implement limits for all types of resources
implement scheme for API versioning
- how to identify the requested version — header or URI?
- translators for older versions?
- integration of supported API versions into build artifact?
- Should all the uses of serde_json disallow unrecognized fields? Should any?
debugging/monitoring: Prometheus?
debugging/monitoring: OpenTracing? OpenTelemetry?
debugging/monitoring: Dynamic tracing?
debugging/monitoring: Core files?
Automated testing
- General API testing: there’s a lot of boilerplate in hand-generated tests for each kind of resource. Would it be reasonable / possible to have a sort of omnibus test that’s given the OpenAPI spec (or something like it), creates a hierarchy with at least one of every possible resource, and does things like: For each possible resource
  - attempt to (create, get, put, delete) one with an invalid name
  - attempt to (GET, DELETE, PUT) one that does not exist
  - attempt to create one with invalid JSON
  - attempt to create one with a duplicate name of the one we know about
  - exercise list operation with marker and limit (may need to create many of them)
  - for each required input property:
    
    attempt to create a resource without that property
  - for each input property: attempt to create a resource with invalid values for that property
  - list instances of that resource and expect to find the one we know about
  - GET the one instance we know about
  - DELETE the one instance we know about
  - GET the one instance we know about again and expect it to fail
  - list instances again and expect to find nothing
We will need archivers for deleted records — especially saga logs

External dependencies / open questions:

Should we create a more first-class notion of objects in the API?
- This would be a good way to enforce built-in limits.
- This would be a good way to enforce uniformity of pagination.
- If each resource provides a way to construct ETags, we could provide automatic implementation of If-Match, etc.
- With the right interface, we could provide automatic implementations of PUT or PATCH with JSON Merge Patch and JSON Patch given any one of these.
would like to require that servers have unique, immutable uuids
TLS:
- How will we do TLS termination?
- How will we manage server certificates?
- How will we manage client certificates?
what does bootstrapping / key management look like?
what does internal authorization look like?

Other activities:

Performance testing
Stress testing
Fault testing / under load
Fuzz testing
Security review

Nice-to-haves:

API consistency checks: e.g., camel case every where

Things we’re going to want to build once:

metric export
structured event reporting (e.g., audit log, alert log, fault log)
opentracing-type reporting
client-side circuit breakers
service discovery
client connection pooling
server-side throttling
command-line utilities

Check out linkerd (for inspiration — it looks K8s-specific)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.adoc

TODO.adoc

TODO

Files

TODO.adoc

Latest commit

History

TODO.adoc

File metadata and controls

TODO