Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rewrite API #2789

Closed
wants to merge 29 commits into from
Closed

Implement rewrite API #2789

wants to merge 29 commits into from

Conversation

the10thWiz
Copy link
Collaborator

The goal of this PR is to implement a Rewrite API, following the basic outline presented in #2716.

The goal is to replace the existing Options on FileServer to instead use a rewrite based API. This will make FileServer pluggable, making it possible to implement things like cached compression implementable outside of Rocket.

The basic usage example is as follows:

FileServer::new("static")
   .rewrite(DotFiles)
   .rewrite(NormalizeDirs)
   .rewrite(Index("index.html"))

There are a couple of major limitations to the rewrite trait, specifically the file to be served must exist on the local file system, and the trait is synchronous, so it should not do any IO when rewriting paths. However, I would argue that if you want to move beyond these, you should just implement your own Handler, and fully replace FileServer.

@the10thWiz
Copy link
Collaborator Author

the10thWiz commented May 12, 2024

I've implemented a basic gzip compression rewriter, which provides an API similar to what was discussed in #2716.

FileServer::new("static", Options::None)
   .rewrite(NormalizeDirs)
   .rewrite(Index("index.txt"))
   .rewrite(CachedCompression::new()),

Full example, with implementation: https://github.com/the10thWiz/rocket-caching-layer

This particular implementation generates the compressed version of the file once it's been requested once, and responds with the uncompressed version in the meantime. It's not too hard to imagine different implementations, with different trade-offs. I built this as a proof-of-concept, to show that the new API actually worked for the intended use-case.

I've also marked most of the Options as deprecated, and implemented a quick check to generate an initial set of rewrites based on the Options provided.

@the10thWiz the10thWiz marked this pull request as ready for review May 12, 2024 04:49
@the10thWiz the10thWiz requested a review from SergioBenitez May 18, 2024 02:41
@SergioBenitez
Copy link
Member

I'm about to review this. Note that I tried to implement this API myself some time ago and ran into a bunch of issues. Here's my attempt, in case it helps with anything: https://paste.rs/wEcVu.rs

Hopefully you've manage to overcome the limitations I found!

@the10thWiz
Copy link
Collaborator Author

@SergioBenitez

Hopefully you've manage to overcome the limitations I found!

I'm not sure what the exact limitations you came across, but here are a few things I've done differently:

  • Rewrites can only produce the content of a file on disk. This simplifies the API somewhat, since a rewrite can only take one of a few specific actions. This also allows further modification in pretty much every case, so every rewrite is always applied in every case.
  • I've implemented DotFiles differently. Rather than adding a rewrite that either allows or disallows them, I have set up the FileServer to output a Hidden response, that the DotFiles rewrite converts to a normal File response.
  • Rather than using Cow<Path>, I'm using PathBuf. I'm pretty sure we always need to allocate the path, so we wouldn't gain much from Cow<Path>, but I'm open to switching if we can potentially remove some allocations.

There are a few points we need to decide on before this implementation can be finalized:

  • Is calling .is_dir() (internally std::fs::metadata) acceptable in an async environment? There is an async tokio version, because this is an IO call, but the current implementation uses .is_dir() internally already. My current version also uses .is_dir(), and I'm not sure if this is a big deal or not.
  • How do we want to handle Missing and IndexFile. These are an interesting case, because their primary function is a check on startup. We could extend the Rewrite API to include a startup check (and default to passing), but Missing would actually want us to ignore an error. For now, I've included IndexFile (which does nothing), and excluded Missing because I don't have a way to implement it.
  • How do we want to roll this out? My current implementation is technically backwards compatible, and outputs a deprecation warning. If we intend to introduce this as part of 0.6, we could make this a breaking change, but I'd to know your thoughts before I go in on either option.
  • Should we include a compression rewrite? I've created one (see the link in a previous comment) outside of Rocket, but compression is such a common requirement we should consider including it either in core, or as a contrib crate.

@SergioBenitez
Copy link
Member

The entire patch for my rewrite API implementation is here: https://paste.rs/AbR41.patch

I'll be referring to it in my review.

Copy link
Member

@SergioBenitez SergioBenitez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary difference I see between this API and the one I implemented, at a more fundamental level, is the degree to which it seeks to make FileServer completely generic. In my implementation, FileServer becomes purely a sequence of rewrites:

+pub struct FileServer {
+    rewriters: Vec<Arc<dyn Rewriter>>,
+}

Whereas here, we still have the notion of a root and some options. I think to consider this a successful API, we need something that looks more like the former: we want to be able to take a path from the client and progressively apply rewrites until we have the response we're looking for, with those rewrites being controlled completely in a generic manner.

To that effect, I do think we need to make FileServer be:

pub struct FileServer {
    rewriters: Vec<Arc<dyn Rewriter>>,
}

And then be able to recover the existing functionality as a series of rewrites that we provide out of the box.


The second core difference I note is in the treatment of responses, in particular with respect to headers. In my changeset, I introduce changes to NamedFile in which I add a HeaderMap to the struct:

+pub struct NamedFile {
+    file: File,
+    path: PathBuf,
+    headers: HeaderMap<'static>,
+}

(In reality I also added Metadata, but I think that was a mistake.)

I think this makes sense as it creates parity between NamedFile and FileServer. My intent was to allow the rewriter to return a Path and optionally a HeaderMap and then use those to construct a NamedFile that gets used to generate the response. This proved to not be enough as we also need to support rewriting into redirects, and thus returning a Response was added.

In your implementation, you have an enum that effectively replaces the need for Response in my changeset.

I think the right approach is somewhere in-between.


At the end of my experiment, and after taking a look at the code here, I'm left with the impression that the rewrite API I suggested is fundamentally trying to be monadic in Option<FileResponse> of some sort, and that we should really approach that API a little more directly. That is, to boil things down to a sequence of calls of the form Option<FileResponse> -> Option<FileResponse>, where FileResponse looks something like FileServerResponse:

enum FileResponse<'a> {
    File(File<'a>),
    Redirect(Redirect<'a>),
}

struct File<'a> {
    path: Cow<'a, Path>,
    headers: HeaderMap<'a>,
}

Then, FileServer is really just an Option<FileResponse<'a>> that starts as a Some(FileResponse::File(File::from(req.uri()))) (or something morally equivalent) and proceeds through a series of and_then()s, map()s, and filter()s. I think this presentation would also make the ordering argument a bit clearer as we're already used to the ordering consequences between these operations.

If this path is followed, then the Rewrite trait becomes:

pub trait Rewrite: Send + Sync + Any {
    fn rewrite<'r>(&self, response: Option<FileResponse<'r>>) -> Option<FileResponse<'r>>;
}

It can be implemented for all function of the same form (for<'r> F: Fn(Option<FileResponse<'r>>) -> Option<FileResponse<'r>>) as well as the bind form (FileResponse<'r> -> Option<FileResponse<'r>>) and the map form (FileResponse<'r> -> FileResponse<'r>). We can also specialize implementations for common forms such as File<'a> -> File<'a>, File<'a> -> Option<File<'a>>, and File<'a> -> FileResponse<'r>.

This would mean we can implement the basic usage as something like:

FileServer::new()
    .filter_file(|file| !file.path.components().any(|c| c.starts_with('.')))
    .map_file(|file| if file.path.is_dir() && !file.path.ends_with('/') {
        Redirect(Redirect::to(file.path + '/'))
    } else {
        File(file)
    })
   .map_file(|file| if file.path.is_dir() && file.path.join("index.html").exists() {
        File(file.path.join("index.html"))
    } else {
        file
    })

Of course, we'd want to provide these as reusable components, but the point is that the implementation becomes significantly more trivial.

core/http/src/uri/segments.rs Outdated Show resolved Hide resolved
@the10thWiz
Copy link
Collaborator Author

I think the APIs we have proposed (the two rewrite traits) are to some extent isomorphic. My trait has the request and file server root as additional parameters, but I'm pretty sure we could eliminate that without any issues. (I've already refactored away the need for root, and NormalizedDirs is the only one that still requires req, and this could be removed pretty easily). From there, FileServerResponse::NotFound is sort of equivalent to None.

The reason I opted to go with an explicit NotFound variant was to enable filtering dot files by default, i.e. FileServer::new("static") implies that dot files should not be served. To implement this, the None case needs to carry the path, enabling the DotFiles rewrite to convert it back to a Some case. This is also the reason I added to_path_buf_dotfiles.

I opted to retain Options for two reasons: first, to maintain backwards compatibility (and avoid the need to defer this to 0.6), but it sounds like you want this to be a breaking change. The second reason is that we can't mimic the behavior of Options::Missing and Options::IndexFile cleanly with a rewrite based API, since their primary function is a check during startup. If we are willing to lose this functionality, we can fully eliminate options from the FileServer struct.

With that in mind, FileServer becomes:

pub struct FileServer {
    root: PathBuf,
    rewrites: Vec<Arc<dyn Rewriter>>,
    rank: isize,
}

I think including root and rank is worthwhile. rank should be included, because it can't really be replicated by rewrites. I think we should include root, because we should restrict FileServer to a single root. This means handle becomes (roughly) File(self.root.join(req.uri().path()), a series of rewrites, and converting the final output to a Response.

Overall, I agree with most of the API changes you propose. The only changes I'm not willing to lose is hiding dot files by default, and some kind of startup check for whether the root directory exists.

@SergioBenitez
Copy link
Member

SergioBenitez commented May 29, 2024

The reason I opted to go with an explicit NotFound variant was to enable filtering dot files by default, i.e. FileServer::new("static") implies that dot files should not be served. To implement this, the None case needs to carry the path, enabling the DotFiles rewrite to convert it back to a Some case. This is also the reason I added to_path_buf_dotfiles.

We don't need to force filtering dot files by default: it would suffice to make the default constructor include a rewrite that filters dotfiles. This way we avoid negative logic.

I opted to retain Options for two reasons: first, to maintain backwards compatibility (and avoid the need to defer this to 0.6), but it sounds like you want this to be a breaking change.

0.6 is right around the corner. And in general, we shouldn't compromise on the API to avoid a breaking change unless there's a huge benefit to be had, which I don't feel is the case here.

The second reason is that we can't mimic the behavior of Options::Missing and Options::IndexFile cleanly with a rewrite based API, since their primary function is a check during startup. If we are willing to lose this functionality, we can fully eliminate options from the FileServer struct.

I think Options::Missing should become part of the functionality of specific rewrites. If a rewrite works from some directory, then it can/should be responsible for erroring as needed. This can be part of the constructor for that rewrite.

Options::IndexFile can and should be a rewrite, and it is in my implementation. In fact, it's one of the simplest of rewrites:

|_| File(self.0)

I think including root and rank is worthwhile. rank should be included, because it can't really be replicated by rewrites.

Absolutely.

I think we should include root, because we should restrict FileServer to a single root.

I think this is the case for FileServer today, but nothing should prevent a rewrite that looks in multiple roots, for example. Furthermore, we likely want to eventually extend FileServer to serve files included in the binary directly, and this would prevent that from being possible. In general, I don't think we gain anything from having root be endemic to FileServer but stand to lose quite a bit of flexibility.

Overall, I agree with most of the API changes you propose. The only changes I'm not willing to lose is hiding dot files by default, and some kind of startup check for whether the root directory exists.

I agree, and it doesn't need to be in conflict with the changes I propose. For the former, we can include it as part of the default constructor, not as part of FileServer directly. For the latter, a check can be done by the rewrite instead of by FileServer.

@the10thWiz
Copy link
Collaborator Author

0.6 is right around the corner
I was a bit disconnected from Rocket for the past couple months, so I wasn't sure when 0.6 would happen. Thanks for clarifying this for me.

I've implemented changes based on this discussion, and cleaned up the API quite a bit. There are a couple specific pain points with the new API, but they are some pretty minor nits, and should be fixable without any sweeping changes.

First, filter_file and map_file have weird trait bounds, since I wanted Root, BlockDotfiles and Index to work with these methods. However, I think I'm going to simplify this, and make them simple Fn(..) -> .. bounds, and convert all of the provided rewrites to just be functions with the correct signature. This will mean they aren't usable in and_rewrite, but I think this is a worthwhile sacrifice to make the API cleaner.

Second, I hoped to avoid any kind of remove_rewrites (to avoid reverse logic), but I've left it in for now. If we are willing to make allowing dot files more complicated, removing the remove_rewrites and allow_dotfiles methods forces users who want this functionality to use empty, and add Root (along with other rewrites) by hand. I'm on the fence about this, let me know what you think.

Finally, I'm going to need to rewrite the documentation and examples for FileServer. I wanted to nail down a design before I dove into this, but it's looking like it's time to work on this.

@the10thWiz
Copy link
Collaborator Author

@SergioBenitez I think I've got this to a pretty good place now. I'm pretty sure there aren't any subtle regressions left, but I'm not 100% sure. The test suite does pass now, so that's a good sign.

Looking forward, do we want to add a pre-computed compression rewrite to core or contrib? I'm happy to open another PR for an implementation. If not, I'll likely create and publish a separate crate.

core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Outdated Show resolved Hide resolved
core/lib/src/fs/server.rs Show resolved Hide resolved
@the10thWiz
Copy link
Collaborator Author

I'm updated this PR (using the changes you've made), to pass CI. Overall, I like most of the changes. I'm not sure how I feel about Rewrite implementing Rewriter, and the change to using temporary redirects.

To that end, I've changed a few things:

  • Added FileServer::directory (One doc-test uses it).
  • Updated tests to reflect a change in TrailingDirs's behavior - it now generates temporary redirects, unlike the old file server which generated permanent redirects.
  • Re-exported Rewrite and Rewriter from rocket::fs.
  • Added From<File> and From<Redirect> impls for Option<Rewrite>.

Comment on lines 224 to 228
impl<'r> From<File<'r>> for Option<Rewrite<'r>> {
fn from(value: File<'r>) -> Self {
Some(Rewrite::File(value))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the utility of this impl? I don't see it used anywhere, and I'm not sure we really want to hide the functionality under an opaque .into().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other impl (From<Redirect> for Option<Rewrite>) is actually used in at least one doc-test.

This makes it possible for a rewrite to return File { .. }.into(), rather than Some(File { .. }.into()). I think it makes the API more succinct, without losing any information. I think it might be worth looking at the other doc-tests, and using this impl if appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding documentation, now both are used (in the doc-tests for FileServer::filter and FileServer::map respectively).

Comment on lines 97 to 165
/// Constructs a new `FileServer` that serves files from the file system
/// `path` with a default rank.
///
/// Adds a set of default rewrites:
/// - [`filter_dotfiles`]: Hides all dotfiles.
/// - [`prefix(path)`](prefix): Applies the root path.
/// - [`normalize_dirs`]: Normalizes directories to have a trailing slash.
pub fn directory<P: AsRef<Path>>(path: P) -> Self {
Self::empty()
.filter(|f, _| f.is_visible())
.rewrite(Prefix::checked(path))
.rewrite(TrailingDirs)
}
Copy link
Member

@SergioBenitez SergioBenitez Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had implemented this exact method but I removed it because I doubted its utility, given that this would make new() exactly Self::directory(path).rewrite(DirIndex::unconditional("index.html")). If we're going to keep this, we should document the two methods well with why/when you would use one or the other. And we should use it in new()'s implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed something like that might have happened. It was still used in one of the doc-tests (of DirIndex specifically). I think it's worth including, but I'm thinking it might be worth changing the name in addition to writing better docs.

Comment on lines 79 to 82
/// # Windows Note
///
/// This does *not* check the file metadata on any platform, so hidden files
/// on Windows will not be detected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it so that it does check the metadata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bit more complex than the dotfiles check. First, it would only work after Prefix (much like is_dir or exists), and the current dotfiles check is typically done before Prefix.

Second, to match the behavior of dotfiles, we would likely want to also check all parent folders, which would likely represent a non-trivial amount of work. Right now, each Rewrite requires at most one syscall, and I think that's a reasonable limit.

Overall, I'm not sure if it's worthwhile. We could provide an alternate method that does check fs metadata, but I'm not sure how much value it would provide. We would also likely need to create a more complex test for this functionality, since I'm not sure git preserves Windows file properties.

the10thWiz and others added 24 commits July 6, 2024 13:21
Apparently, no_compile doesn't prevent the doctest from being compiled
- Rewriter is now Monadic on `Option<FileResponse>`
- `new` (+variants) now adds a proper set of default
  rewrites
- filter_dotfiles now removed dotfiles
- Compatibility has been broken
Changes `File` to contain the full `Origin` uri from the request, so
normalize_dir doesn't have to deal with platform specific behavior.

This is also a more convient way to handle Redirects, as well as
provide more consistent access to the query parameters.
It doesn't do what I need it do.
- Add #[non_exaustive] to allow adding more FileResponses in the future
- Improve readability of `FileServer::handle`
- Take advantage of `NameFile`'s handling missing files
- Add `strip_trailing_slash` to ensure files can be read from disk
- Change name of `missing_root`
- Update documentation to reference non-panicing
  variant
- Fix issue caused by attempting to open directories
- Introduce `FileMap` and `FileFilter` traits
- `Rewrite` and `Rewriter` are no longer re-exported from `fs`.
- Renamed `FileServer::directory` to `FileServer::new_without_index`.
- Added missing documentation in a number of places.
@SergioBenitez
Copy link
Member

Merged! 65e3b87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: merged This pull request was merged manually.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants