Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Range requests #887

Closed
wants to merge 18 commits into from
Closed

Conversation

Drakulix
Copy link

This is an attempt to fix #806 (at least partially).

This transparently handles Range requests for &[u8], Vec<u8>, File and NamedFile, when the total size is known, allowing partial downloads.

It also advertises this feature for the listed types by adding an AcceptRanges header.

The implementation follows the MDN examples and the corresponding RFC, but only supports Single part ranges, which is compliant with the RFC, as a server may choose to ignore or reject any range request.

Multipart ranges would most likely be better implemented as a custom Body variant and would be more complicated. I have no use case for those, which is why I did not bother, but if that would be blocking and someone can provide some directions/suggestion on how to implement those (I am not very familiar with rocket's codebase), I would try to tackle that as well. But I would strongly prefer to do that in a separate PR.

I am looking for some initial feedback on the approach. I did implement it in the corresponding Responder implementations to have access to the Range header of the request and the response as well. It seemed fitting. But it would probably also be possible to just parse the header in the Responder and just store that in the Response or Body and deal with it later in the chain (when writing out the Response for example).

Docs and Tests are completely lacking, also formatting was not working well, cargo fmt changed the whole codebase. So I would appreciate some more comments on what I should provide on those subjects as well.

Thanks for your time :)

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks okay as a starting point. I think it would be nicer if this could be handled transparently in Rocket regardless of the Responder used and/or provided as another Responder implementation, but that might be impractical since it requires Seek.

@SergioBenitez worked a bit on range requests a while back but never completed it; it might be good to compare this with some of the work done if I can track it down.

(Thanks for the reminder on IRC that this hadn't been looked at yet)

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
);
}
let size = body.seek(io::SeekFrom::End(0))
.expect("Attempted to retrieve size by seeking, but failed.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable with this expect. I think it's fine for Vec<u8> and [u8] for example, but I'm not so sure about File.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest to "fix" this?

To me this in an "Internal Server Error", because the state of the seek pointer is unknown at this point, which means we cannot fallback anymore, because this will also have an impact on Read.
Rocket would catch this (unlikely) panic anyway, but I guess I could create a 500-response instead and return that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I prefer to return some kind of Error directly rather than panicking. Although looking at it, Response::set_sized_body itself panics instead of returning Result.

I looked at a few of the platform implementations of Seek for File. It looks like *nix and Windows only panic when the offset is out of range for an i64, or for any platform-specific reasons (e.g. the File object is in reality a handle to a pipe or socket, not a file).

My original thinking was that in the case of a "seek to get size" error we could just seek back to the start, ignoring the range request, and/or return a range not satisfiable. But I'm not sure if that's really worth the complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one seek already has failed, it is very likely, that another to the beginning will fail as well.

The current approach is my favorite, but I if you would like to see something else, just tell me.

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
@Drakulix
Copy link
Author

@jebrosen Thanks for taking a quick look at it. I tried to address most of your concerns.

Another Responder similar to how Stream works today would also be possible. My reasoning was, that you probably want this behavior for any binary-data anyway. The only downside is, that it is not possible to make use of this functionality with any user-provided types, while theoretically any Body implementing Seek is applicable. But then again I cannot think of much more useful types implementing Seek then files and in-memory buffers.

Providing this functionality for all bodies on the other side might not be very helpful, but would introduce a good amount of additional complexity. (E.g. I cannot think of a good reason to request a certain byte-range of a json record. Most of the time this should be solved at the user-level.)

@Drakulix
Copy link
Author

Drakulix commented May 2, 2019

This is starting to collect some dust, but I would still like to see this merged someday in any shape or form. Could I get another round of feedback please? Ping @jebrosen or @SergioBenitez

@jebrosen jebrosen self-requested a review May 2, 2019 20:07
Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still love to provide a way for downstream crates to leverage this, but I'm not sure where exactly to put that. I think it would be possible to do struct RangeResponder<B: Read + Seek>(B); impl<'r, B: Read + Seek> Responder for RangeResponder, and that might also reduce some of the awkwardness I mention below.

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
@@ -217,34 +219,142 @@ impl<'r> Responder<'r> for String {
}
}

fn try_range_response<'r, B: io::Read + io::Seek + 'r>(req: Request<'r>, mut body: B) -> Result<response::Response<'r>, B> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature does feel really awkward, but that's mostly because of the Seek bound, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really because of Seek (this implementation panics, if seeking fails), but to provide a way to recover of other errors. I could move those checks into respond_to instead. That would likely fix the awkwardness, but result in some code-duplication.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have explained that more clearly: I found it awkward because this isn't something like Response -> Response, but instead this function requires the caller to move the body in and out. That's why I was hoping a Responder bounded on Seek would make that go away.

fn respond_to(self, req: &Request) -> response::Result<'r> {
let mut body = Cursor::new(self);

match try_range_response(req, body) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of the respond_to methods are harder to read than without Range support, mostly because of that awkward function signature. I might play around with this and try to make things a bit nicer.

@Drakulix
Copy link
Author

Drakulix commented May 5, 2019

I would still love to provide a way for downstream crates to leverage this, but I'm not sure where exactly to put that. I think it would be possible to do struct RangeResponder<B: Read + Seek>(B); impl<'r, B: Read + Seek> Responder for RangeResponder, and that might also reduce some of the awkwardness I mention below.

While this likely would look a lot cleaner, it would duplicate a lot of code and I would like to make this transparent. There is no reason not to provide range-support for types implementing Seek, because this depends on client support anyway and is 100% backwards compatible.

@jebrosen
Copy link
Collaborator

jebrosen commented May 5, 2019

My idea was to replace the entire try_range_request function with the hypothetical RangeResponder -- it shouldn't duplicate any code, and it should make both respond_to methods shorter. If it worked, we could write this instead:

Response::build_from(RangeResponse(self))
    .header(ContentType::Binary)
    .ok()

That way RangeResponder would be responsible for both adding the AcceptRanges header and the fallback to the whole body (reducing the duplication in respond_to), and it would be easier to reuse and maybe even make it public. I'm happy to try this route myself, but it will be a while before I can seriously focus on it.


Aside from code, we will also need:

@Drakulix
Copy link
Author

Drakulix commented May 15, 2019

My idea was to replace the entire try_range_request function with the hypothetical RangeResponder -- it shouldn't duplicate any code, and it should make both respond_to methods shorter. If it worked, we could write this instead:

Response::build_from(RangeResponse(self))
    .header(ContentType::Binary)
    .ok()

That way RangeResponder would be responsible for both adding the AcceptRanges header and the fallback to the whole body (reducing the duplication in respond_to), and it would be easier to reuse and maybe even make it public. I'm happy to try this route myself, but it will be a while before I can seriously focus on it.

Alright, I have made some improvements:

  • The first new commit adds a new range_response function, that handles everything without returning the body, utilizing Response::build_from (I was not aware of this function, thanks. With this change alone it looks much clearer now.)
  • The second new commit adds the RangeResponder as a public struct as you were suggesting. I am not totally sure, that this is a good idea. People might get the impression that File/&[u8]/Vec do not support Range requests by default. But I think this is solvable with good docs, so everybody is aware that RangeResponder is a helper struct for new downstream implementations.

But I have also introduced a new issue in the process. The file handler is now much simpler, but not equivalent. It previously used metadata to construct a Sized body manually and otherwise construct a Streaming body. The new implementation does always use Seek to determine the size (because it just uses sized_body, if no Range is requested).

I do not know, why the old code, did not use Seek as well, I assume that this was some kind of optimization? I also assume, that it is an oversight in that case, that sized_body of the ResponseBuilder uses a File in the documentation example? Because otherwise, the docs would encourage a different implementation than what Rocket does by default, which seems a little paradox?

I am not sure, there is a good way to restore the old implementation when implementing range responses like this. At least no good idea jumps to my mind. But we definitely want range responses for files by default in my opinion.

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience with this. I think I'm pretty excited about this direction after seeing it.

The second new commit adds the RangeResponder as a public struct as you were suggesting. I am not totally sure, that this is a good idea. People might get the impression that File/&[u8]/Vec do not support Range requests by default. But I think this is solvable with good docs

This is my biggest hesitation to making it public -- documentation. We should definitely document that File etc. support range requests now. If we make RangeResponder public, it will also need its own documentation and tests and maybe even a different name.

But I have also introduced a new issue in the process. The file handler is now much simpler, but not equivalent. It previously used metadata to construct a Sized body manually and otherwise construct a Streaming body. The new implementation does always use Seek to determine the size (because it just uses sized_body, if no Range is requested).

I do not know, why the old code, did not use Seek as well, I assume that this was some kind of optimization? I also assume, that it is an oversight in that case, that sized_body of the ResponseBuilder uses a File in the documentation example? Because otherwise, the docs would encourage a different implementation than what Rocket does by default, which seems a little paradox?

I am not sure, there is a good way to restore the old implementation when implementing range responses like this. At least no good idea jumps to my mind. But we definitely want range responses for files by default in my opinion.

The short answer to all of this is: it's likely platform dependent, and you can probably use File to serve things that aren't really files. But I am a bit suspicious of any file-like thing where metadata outright fails but reading the file succeeds. I think it deserves investigation, and maybe we should continue to use Stream (without Range support) in cases where the metadata check fails.


Side note: hyper 0.12 removes typed headers, so in that porting process we will probably have to revive the header parsing code from the last time Sergio worked on this in the range branch.

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
@jebrosen
Copy link
Collaborator

Note that RangeResponder can be implemented outside of Rocket itself. It's my opinion that support for Range is nonetheless important enough to include for File and buffers (privately or publicly), and if not in rocket itself then in rocket_contrib - especially for StaticFiles.

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another few math errors (thanks go to foba in IRC/Matrix who helped discover these).

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
@Drakulix Drakulix force-pushed the feature/range branch 2 times, most recently from fc289a3 to 18fc7e0 Compare June 1, 2019 11:58
@jebrosen
Copy link
Collaborator

jebrosen commented Jun 1, 2019

Awesome! I will try this out.

In addition to &[u8] Vec<u8> and File, any Responder that uses them as the underlying implementation automatically gets range request support. This includes at least NamedFile and MsgPack. I am not sure that automatically adding support to &[u8] and Vec<u8> are the right thing to do, especially as it's not possible to opt out. It might be expensive to regenerate the response data and/or likely to change between requests, which are both undesirable with Range request handling.

Is this something I should take care of? Or is that just a note for whoever will work on the hyper upgrade?

Just something to be handled during the hyper upgrade.

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with this for a while and finally realized the byte ranges are supposed to be inclusive. It's simpler to keep end as an exclusive position within the code because it corresponds directly to size, but we need to transform it on the way in and out.

core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
core/lib/src/response/responder.rs Outdated Show resolved Hide resolved
@Drakulix
Copy link
Author

Drakulix commented Jun 2, 2019

Should all be fixed. @jebrosen

Thanks for all the testing and debugging (also @ foba). My use-case for this PR is long gone, I hope this gets merged soon. 😄

@Drakulix
Copy link
Author

Drakulix commented Jun 3, 2019

Forgot to fix up the tests. Hopefully, travis is now happy again.

@jebrosen jebrosen self-requested a review June 4, 2019 01:53
@Drakulix
Copy link
Author

@jebrosen Ping. Just in case you might have forgotten this PR. No rush.

Copy link
Collaborator

@jebrosen jebrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray, it passes the "10 seconds test" without modification!

My biggest concerns are mostly procedural:

  • Whether to apply this automatically to &[u8] and Vec<u8> - my inclination is no on Vec<u8>, but &[u8] doesn't bother me as much because it could only be borrowed from global state or the request metadata.
  • Naming and placing RangeResponder

But I also had a troubling thought earlier today regarding chaining Responders. As an example, the ETag header needs to be calculated and applied based on the entire representation, and If-None-Match should be evaluated and possibly resolved as a 304 before a range response is even considered. If a responder or fairing processes Range before ETag or If-None-Match handling, the result will be wrong. Content-Encoding is a similar situation.

I think the only correct place for range request processing is as the last ("outermost") responder that runs, and this PR as written would break both Compress<File>, available in rocket_contrib, and ETag<File>, which isn't in rocket or rocket_contrib but implementations exist. I just wish I had realized this problem sooner :/

hyp_res.headers_mut().set(header::ContentLength(size));
let mut stream = hyp_res.start()?;
io::copy(body, &mut stream)?;
io::copy(&mut body.take(size), &mut stream)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergioBenitez do you know why an equivalent of this wasn't being done already? i.e. why is it allowed to pass (via set_raw_body) a body longer than the size was specified to be?

&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
}

mod tests {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other interesting test case ideas: apple-banana, 3-20, -1-3, 9-9, -0. A lot of the code here can probably be deduplicated with helper functions like assert_range_succeeds(range_spec, expected_range, expected_bytes) and assert_range_fails(range_spec).

@Drakulix
Copy link
Author

I think the only correct place for range request processing is as the last ("outermost") responder that runs, and this PR as written would break both Compress<File>, available in rocket_contrib, and ETag<File>, which isn't in rocket or rocket_contrib but implementations exist. I just wish I had realized this problem sooner :/

Okay, this sucks (but it is probably better to think about it now, rather than after the fact).
So we basically cannot do this transparently, but rather need to be able to do something like Range<Compress<File>> and add documentation to Range that it needs to be outmost?

Or is there a way to enforce the last constraint? Maybe something like a new trait FinalResponder, which can be returned by a route and is already implemented for every Responder, but cannot be wrapped by another responder, because those only accept the Responder-trait (not FinalResponder)??

This sounds really cumbersome.

@jebrosen
Copy link
Collaborator

The problem is precisely with having File (or anything else) handle Range automatically - as it breaks every subsequent fairing or wrapping Responder that cares about the body.

The obvious solutions to my mind are a) making it strictly opt in or b) apply it automatically to File, in both cases explaining the ordering problem. Fairings and Responders that would modify the body should not operate when Content-Range header has been set. One significant downside to b) is that every implementer of Responder or on_response needs to be aware of this behavior.

Option c) is to do this in rocket itself automatically, but that will be a lot more work, possibly changing Body itself, to carry the Seek requirement all the way to response time. I haven't checked whether async will make this a terrible fit either (i.e. if AsyncSeek exists and/or how AsyncRead and Seek interact)

Option d) is something like FinalResponder, and I have a gut feeling that such an API wouldn't be fun to implement or use

@Drakulix
Copy link
Author

I think we are not doing anyone a favor with Option b.

I can either quickly change this PR to do Option a and then tool around with d (if rustc is able to infer a lot of this mess, it might be reasonable to use and just a little painful to implement)

or just go for c, but I have no idea about that async-stuff.

Do you have a preference for either of those? I think both have their cons and pros, but I would definitely try to tackle them. Time is not an issue, I rather do this right then to get this merged quickly.

@jebrosen
Copy link
Collaborator

For now I think option a is the best choice, sadly. Unless a design for option c is relatively trivial (and I think it wouldn't be), I would like to avoid the headache of rewriting it again so soon for async.

The conundrum of order-sensitive Responders seriously bothers me, so I'm going to spend some time thinking on it this weekend.

@agersant
Copy link

agersant commented Jan 21, 2020

@jebrosen Bumping this because I'm very interested and activity on this PR has died down. (apologies if you had not forgotten about it)

@jebrosen
Copy link
Collaborator

So I did write a blog post with a sad conclusion, but never really publicized or linked it.

I still think "Option a", the opt-in wrapping Responder along with clear documentation explaining potential pitfalls, is the right way to go. So far having range requests be done automatically for all responses (or even just File) feels like "too much" magic that can have risky interactions with other Responders or fairings.

@agersant
Copy link

agersant commented Jan 21, 2020

Oh, thank you for linking this - I had missed it in my searches.

As an end user, I would be very content with option A. It would match exactly what I've been doing except without having to maintain an extra 160 lines of code for the one endpoint on which I want to support range requests.

@Drakulix
Copy link
Author

Just a quick note, because I have stumbled on this (rather dead) PR again:
My previous iterations on this feature were motivated by work for my previous employer. While I would like to finish this, I do not have the free time to do it (at least right now). While I appreciate the very thoughtful review process, this simply took too much time.

I am happy to answer questions about the current implementation, just ping me. But other then that, I will not further participate. I encourage anyone reading through the pull request to continue, were we left of. All the work and discussion is public and collected at this place. I hope this feature will be finished some time in the future. Good luck to everyone involved. :)

@Drakulix Drakulix closed this Oct 27, 2020
@richo
Copy link
Contributor

richo commented Nov 14, 2020

Hi! I need to support a Range request in my application, so I'm basically trying to do the math on whether a cobbled implementation for one route in my application is the right move, or whether I could carry this PR across the line.

I've just read the discussion above (And thankyou @Drakulix for most of an implementation) and my understanding is:

  • There's no real appetite for automatically implementing Range support for File and friends
  • There are real problems using Range requests with responders that would either modify the body or which make assumptions about the body, it sounds as though for now folks are comfortable with the safety rails for these things being clear documentation that explains what a bad idea that is.

So to get this across the line I believe the actions are:

  • Remove the auto implementations for File, &[u8] etc.
  • Produce documentation that clearly articulates what this does and how it interacts with other wrapping responders.
  • It looks like potentially write some more tests?

Does that sound accurate @jebrosen ?

richo added a commit to richo/Rocket that referenced this pull request Nov 14, 2020
This is a rebase by richo of PR rwf2#887 as a starting point toward
mergability.
@jebrosen
Copy link
Collaborator

jebrosen commented Nov 14, 2020

That sounds about right. Automatic compression via Content-Encoding and automatic ETag are two examples of relatively commonly used or requested middleware that are straightforward to apply to all responses, yet would do the wrong thing if the response was a partial range. The main options I see here are:

  • Use automatic Range handling by default on File et al. Fairings or Responder types that inspect or change the existing response body have to be aware of Content-Range and skip processing those responses (losing out on the functionality they provide).
  • Make automatic Range handling opt-in via APIs such as RangeResponder, NamedFile::with_range_handling(), etc., and describe the caveats in the documentation.

I'm in favor of this second option now, but I'd love to hear variations on these or other ideas.

@richo
Copy link
Contributor

richo commented Nov 14, 2020 via email

@jebrosen
Copy link
Collaborator

Ah, sorry - either way, I would like to keep a wrapping RangeResponder as an option. In that comment I was thinking more about possible defaults and/or convenient APIs, for the most common use cases.

@richo
Copy link
Contributor

richo commented Nov 15, 2020

Gotcha, I am with you now! I will try to get a PR up for a wrapping responder, once that makes it in maybe we can discuss how to make the ergnomics better?

@AseCoder
Copy link

AseCoder commented Jan 4, 2025

I know this has been dead for four years but I would really like this feature – or at least some guidance on how to implement Range requests – for my own Rocket project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Request] - Support Range header for streaming media to Safari
5 participants