Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor library so that it can work with any HTTP client crate #97

Open
evanlinjin opened this issue Sep 6, 2024 · 18 comments · May be fixed by #99
Open

Refactor library so that it can work with any HTTP client crate #97

evanlinjin opened this issue Sep 6, 2024 · 18 comments · May be fixed by #99
Assignees
Labels
enhancement New feature or request

Comments

@evanlinjin
Copy link
Member

The Problem

Right now rust-esplora-client is designed to only work with HTTP client crates that are baked in (i.e. minreq for blocking, reqwest for async). However, these HTTP client crates may not be preferred by the caller. The caller may also wish to use a different version of the HTTP client crate.

Proposed Solution

Represent each endpoint call as a distinct type. Introduce a trait that is to be implemented for each endpoint call type. This trait will contain the template of how to form the HTTP request and parse the HTTP response body and status code.

pub trait HttpRequest {
    type Output: for<'a> serde::de::Deserialize<'a>;

    fn request_url_path(&self) -> String;
    fn request_method(&self) -> &str;
    fn request_body(&self) -> Vec<u8>;
    fn parse_response<E>(status: i32, body: &[u8]) -> Result<Self::Output, ResponseError<E>>;
}

pub enum ResponseError<E> {
    Json(serde_json::Error),
    HttpStatus { status: i32, body: Vec<u8> },
    Client(E),
}

Examples of endpoint types...

pub struct GetTx {
    pub txid: Txid,
}

impl HttpRequest for GetTx {
    type Output = Option<Transaction>;

    fn request_url_path(&self) -> String {
        format!("/tx/{}/raw", self.txid)
    }

    fn request_method(&self) -> &str {
        "GET"
    }

    fn request_body(&self) -> Vec<u8> {
        Vec::with_capacity(0)
    }

    fn parse_response<E>(status: i32, body: &[u8]) -> Result<Self::Output, ResponseError<E>> {
        match status {
            200 => Ok(serde_json::from_slice(body).map_err(ResponseError::Json)?),
            404 => Ok(None),
            error_status => Err(ResponseError::HttpStatus {
                status: error_status,
                body: body.to_vec(),
            }),
        }
    }
}

pub struct PostTx {
    pub tx: Transaction,
}

impl HttpRequest for PostTx {
    type Output = ();

    fn request_url_path(&self) -> String {
        "/tx".to_string()
    }

    fn request_method(&self) -> &str {
        "POST"
    }

    fn request_body(&self) -> Vec<u8> {
        bitcoin::consensus::encode::serialize(&self.tx)
            .to_lower_hex_string()
            .as_bytes()
            .to_vec()
    }

    fn parse_response<E>(status: i32, body: &[u8]) -> Result<Self::Output, ResponseError<E>> {
        match status {
            200 => Ok(()),
            error_status => Err(ResponseError::HttpStatus {
                status: error_status,
                body: body.to_vec(),
            }),
        }
    }
}

Example for calling any endpoint with minreq:

pub fn call_with_minreq<R: HttpRequest>(
    url_base: &str,
    request: R,
) -> Result<R::Output, ResponseError<minreq::Error>> {
    let req = match request.request_method() {
        "GET" => minreq::get(format!("{}{}", url_base, request.request_url_path())),
        "POST" => minreq::post(format!("{}{}", url_base, request.request_url_path())),
        unhandled_request_method => {
            panic!("unexpected request method: {}", unhandled_request_method)
        }
    }
    .with_body(request.request_body());

    let resp = req.send().map_err(ResponseError::Client)?;
    R::parse_response(resp.status_code, resp.as_bytes())
}

Example for calling any endpoint with reqwest:

pub async fn call_with_reqwest<R: HttpRequest>(
    client: Client,
    url_base: &str,
    request: R,
) -> Result<R::Output, ResponseError<reqwest::Error>> {
    let method =
        reqwest::Method::from_str(request.request_method()).expect("request method must be valid");
    let resp = client
        .request(
            method,
            &format!("{}{}", url_base, request.request_url_path()),
        )
        .body(request.request_body().to_vec())
        .send()
        .await
        .map_err(ResponseError::Client)?;
    let status = resp.status().as_u16() as i32;
    let body = resp.bytes().await.map_err(ResponseError::Client)?.to_vec();
    R::parse_response(status, &body)
}

In Conclusion

As can be seen, the code needed for a HTTP client crate to work with the trait is minimal.

@evanlinjin evanlinjin added the enhancement New feature or request label Sep 6, 2024
@evanlinjin evanlinjin added this to BDK Sep 6, 2024
@evanlinjin evanlinjin moved this to Discussion in BDK Sep 6, 2024
@evanlinjin
Copy link
Member Author

@oleonardolima
Copy link
Collaborator

oleonardolima commented Sep 6, 2024

Indeed it's a problem, and I've been facing this while I work on adding Tor support with arti-client on #67.

Thanks for the suggestion! I've been working on this one, but with a different approach, here: 9cbc387

It mainly focuses on having a trait (e.g trait AsyncEsploraClient) that have the methods: get_response, get_opt_response, get_response_json, and so on ...

AFAICT it would require to anyone that's implementing a new client just to implement this trait for the client, as each client could (e.g reqwest) or not have (e.g hyper) useful methods for parsing, as long as they're parse to the correct types.

I would say it's on a higher level, specifically for the methods+parsing we expect. It wouldn't require for us and users have to implement a trait for each method on esplora API 🤔.

I'm wondering which one of both approaches would require minimal changes or be better on the long run. I'll tidy up my PR and also take a deeper look on your suggestion.

@evanlinjin
Copy link
Member Author

evanlinjin commented Sep 6, 2024

I think it's okay to break things completely since it's just a client library. Aiming for "minimal changes" shouldn't be one of our requirements imo.


I would say it's on a higher level, specifically for the methods+parsing we expect. It wouldn't require for us and users have to implement a trait for each method on esplora API 🤔.

Correction: It's us (the devs of rust-esplora-client) that has to create a type for each endpoint and impl a trait for each endpoint type. The users of this library definitely do not have to do this. Users just create a single function for a given HTTP client type (if it's not provided). This single function works with all endpoints.


Abstracting out the transport (IO) means we need two traits, one for blocking and one for async. Abstracting out the request formation and response handling means we only need one trait (since this will not change regardless of the transport).

Requiring users to impl a trait for an HTTP client type means they need to wrap the HTTP client type. Since one cannot impl an external trait for an external type.


However, having each endpoint as a method on a type is a nicer API. It makes all endpoints easier to find than separate types for each endpoint.

@tnull
Copy link
Contributor

tnull commented Sep 9, 2024

I think it's okay to break things completely since it's just a client library. Aiming for "minimal changes" shouldn't be one of our requirements imo.

In this regard, I want to mention that we're indeed leaning on specific error codes and reqwest error variants to discern the reasons why a specific call failed. If the updated API would still allow to do this, I agree that API breaks are not the end of the world and I'm happy to switch. However, it would be an issue for us, e.g., if the error types would get much less expressive due to the need of finding the least common denominator (in fact, we'd highly appreciate more expressive error types to allow discerning different HTTP 400 errors which is currently only possibly by matching on the error message, i.e., error-prone).

@oleonardolima
Copy link
Collaborator

Abstracting out the transport (IO) means we need two traits, one for blocking and one for async. Abstracting out the request formation and response handling means we only need one trait (since this will not change regardless of the transport).

Requiring users to impl a trait for an HTTP client type means they need to wrap the HTTP client type. Since one cannot impl an external trait for an external type.

Cool! I've thinking a bit about both approaches. I will move and explore more towards your proposed one.

As minimal changes are not a problem, it's better to make it generic over the request/response instead of IO. It becomes simple enough for the user, and more future-proof instead of the one I was working on.

--

In this regard, I want to mention that we're indeed leaning on specific error codes and reqwest error variants to discern the reasons why a specific call failed. If the updated API would still allow to do this, I agree that API breaks are not the end of the world and I'm happy to switch. However, it would be an issue for us, e.g., if the error types would get much less expressive due to the need of finding the least common denominator (in fact, we'd highly appreciate more expressive error types to allow discerning different HTTP 400 errors which is currently only possibly by matching on the error message, i.e., error-prone).

Good! Thanks for raising awareness on error handling, I'll have that in mind while working at it.

@evanlinjin
Copy link
Member Author

I think there is no point in having multiple request methods on HttpRequest. We can either introduce a type alias or a struct with public fields. I'm leaning towards struct with public fields (i.e. named RequestData) since it's easier to work with for the user.

@LLFourn
Copy link
Contributor

LLFourn commented Sep 15, 2024

Why would anyone care about what HTTP client is being used under the hood? I mean as long as it does HTTP correctly.

iirc the reason we had two different HTTP clients in the first place was that @tcharding needed something that compiled to SGX. reqwest didn't but ureq did. Then somehow we got minireq (can't remember why).

I think we should just choose an HTTP client and stick with it. We should use the same for blocking and non-blocking or even consider removing the non-blocking version.

@evanlinjin
Copy link
Member Author

evanlinjin commented Sep 16, 2024

Why would anyone care about what HTTP client is being used under the hood? I mean as long as it does HTTP correctly.

I can think of a couple examples where this is useful.

  1. If someone wants to access Esplora via some specialized network (i.e. TOR with something like arti-client).
  2. If someone is serving Esplora on a webserver with some other HTTP services and is already accessing the services with another HTTP client library and they would like to reuse the TCP connection.
  3. A combination of the aforementioned points.

I think we should just choose an HTTP client and stick with it. We should use the same for blocking and non-blocking or even consider removing the non-blocking version.

I agree with this. However, making this library easily extensible would be useful imo.

@LLFourn
Copy link
Contributor

LLFourn commented Sep 16, 2024

If someone wants to access Esplora via some specialized network (i.e. TOR with something like arti-client).

The better way to design this is to have the option for the user to pass in the TCP connection (or whatever) to the HTTP client:

https://docs.rs/hyper/latest/hyper/client/conn/http1/fn.handshake.html

If someone is serving Esplora on a webserver with some other HTTP services and is already accessing the services with another HTTP client library and they would like to reuse the TCP connection.

This seems severely under motivated. I'm pretty sure we are making a bunch of parallel connections anyway. Unless we are using HTTP 2.0 now?

The question about whether HTTP 2.0 is important because if we are creating lots of parallel connections then we'd have to pass in a "connection builder" to do parallel requests. If it's HTTP 2.0 we can just reuse the same connection always.

I agree with this. However, making this library easily extensible would be useful imo.

Yes but not extensible with multiple HTTP implementations. There's one way to do it. Stick to one implementation.

@oleonardolima
Copy link
Collaborator

oleonardolima commented Sep 16, 2024

The better way to design this is to have the option for the user to pass in the TCP connection (or whatever) to the HTTP client:

https://docs.rs/hyper/latest/hyper/client/conn/http1/fn.handshake.html

I do agree, but AFAICT hyper is the only one that currently supports this approach/design of replaceable transport. It looks like ureq will also have a similar feature on v3, but it isn't out yet.

Yes but not extensible with multiple HTTP implementations. There's one way to do it. Stick to one implementation.

Still, we'd need to have a single implementation that does both blocking/non-blocking, right?

However, we could also be opinionated about that (e.g supporting only blocking), as long as the user has alternatives.

I think that by making it extensible, we can choose to support a single implementation (e.g. minreq), while still making it simple enough for the user who would like to use something different to implement a custom handler with its HTTP client of choice.

@tcharding
Copy link
Contributor

iirc the reason we had two different HTTP clients in the first place was that @tcharding needed something that compiled to SGX

I might be guilty but I don't remember sorry, it was almost 3 years ago I stopped messing with SGX.

@evanlinjin
Copy link
Member Author

The better way to design this is to have the option for the user to pass in the TCP connection (or whatever) to the HTTP client.

Yes but not extensible with multiple HTTP implementations. There's one way to do it. Stick to one implementation.

@LLFourn, I guess you are suggesting that we stick to hyper, since, as @oleonardolima pointed out, it's the only HTTP client crate that allows the transport layer to be custom.

I think this is a great long-term solution. However, there is a lot of features that reqwest does nicely that pure hyper doesn't by itself. I.e. proxy stuff via CONNECT, TLS, figuring out which version of HTTP to use via TLS-ALPN.

We can figure how to provide these features out-of-the-box with hyper, but it will take time.

For TOR HTTP support via arti-client, I think it makes sense to do it via hyper (since it's the only way to do it right now).

Still, we'd need to have a single implementation that does both blocking/non-blocking, right?

I think it's easier to run async code in a blocking code-base than it is to do the reverse. I would be okay to support async only, and get users to use something like block_on for blocking.


In summary, I think for now, at a minimum, we should support both reqwest and hyper. reqwest would be there to have those nice features baked in. hyper is there so that we can swap-out the transport.

I think my proposal still makes sense for this. It's either we abstract the transport side of things, or we abstract out the data model. Working with reqwest vs hyper directly will be very different.

@oleonardolima
Copy link
Collaborator

I think this is a great long-term solution. However, there is a lot of features that reqwest does nicely that pure hyper doesn't by itself. I.e. proxy stuff via CONNECT, TLS, figuring out which version of HTTP to use via TLS-ALPN.

We can figure how to provide these features out-of-the-box with hyper, but it will take time.

ACK on it, I think it intersects with the ecosystem pain of not having an async HTTP client alternative to reqwest with minimal/slim dependencies.

Also agree it would be a great addition not only for rust-esplora-client scope, but the ecosystem in general, but as aforementioned it'll take time.

I also think it should be a different project/library that this one would rely on, and could lead to removing hyper and relying on it (as it's built on top of it).

In summary, I think for now, at a minimum, we should support both reqwest and hyper. reqwest would be there to have those nice features baked in. hyper is there so that we can swap-out the transport.

I think the only problem with reqwest, is its huge number of dependencies, and some problems caused by that, such as bitcoindevkit/bdk#1504.

However, it's an issue we already have today, until we have a better async option for the ecosystem, as it were discussed about at the summit.

I think my proposal still makes sense for this. It's either we abstract the transport side of things, or we abstract out the data model. Working with reqwest vs hyper directly will be very different.

Yes, I think we can keep this approach, starting at #99, but only with two supported clients for now, AsyncHyperClient and AsyncReqwestClient?

Abstracting the data model would allow us to still have a BlockingClient with minreq as an example, and allow the user to use any other one, but I now feel I need think a little bit more about it.

@tnull
Copy link
Contributor

tnull commented Sep 19, 2024

Hmm, so I'm wondering: if everybody is aware of the issues reqwest has (huge dependency tree in general, questionable implementation for blocking), and we all agree that eventually we should switch to use the new HTTP client project once someone finds the time working on it, shouldn't we just wait with the refactor until then?

Or to ask differently: what concrete/pressing issue is the complete refactoring of the rust-esplora-client crate solving that warrants spending time and effort on throwaway work (since we agree we'd want to redo it again soon) now, as well as the potential for introducing new bugs? Does anyone have a concrete pain point that needs to be addressed?

@evanlinjin
Copy link
Member Author

Or to ask differently: what concrete/pressing issue is the complete refactoring of the rust-esplora-client crate solving that warrants spending time and effort on throwaway work (since we agree we'd want to redo it again soon) now, as well as the potential for introducing new bugs? Does anyone have a concrete pain point that needs to be addressed?

The pain point is how to use a custom transport (not TCP) such as TOR. To do this with our current codebase means duplicating a lot of code.

There will be no throwaway work. In the future, if there is a new HTTP client which allows us to swap out the transport, with all those amazing features built in, this proposal will allow us to support it with ease with no breaking changes.

I think bugs will be to a minimum. This is a HTTP client library after all. As long as we test every endpoint, it is fine.

@LLFourn
Copy link
Contributor

LLFourn commented Sep 27, 2024

I think this is a great long-term solution. However, there is a lot of features that reqwest does nicely that pure hyper doesn't by itself. I.e. proxy stuff via CONNECT, TLS, figuring out which version of HTTP to use via TLS-ALPN.

We can figure how to provide these features out-of-the-box with hyper, but it will take time.

Five hours to be exact! It took me much longer than I thought. I wanted to figure out whether what I was suggesting would work and so I just I did it: https://github.com/LLFourn/hyper-http-cli

I implemented CONNECT proxy and ALPN. These were not really the hard bits. The hard bit is actually just figuring how how to glue these things together and all the circumstances where something would only work on one HTTP server but not others. There are several not obvious "util" crates you need to get things done.

I'd say hyper + tokio + rustls is the way to go despite my experience being a bit hairy. At least hyper is mostly decoupled from TLS and tokio so either could be switched out later. It would be way better if we had HTTP and TLS implementations that were done sans-io but this is world we live in.

For TOR HTTP support via arti-client, I think it makes sense to do it via hyper (since it's the only way to do it right now).

Still, we'd need to have a single implementation that does both blocking/non-blocking, right?

I think it's easier to run async code in a blocking code-base than it is to do the reverse. I would be okay to support async only, and get users to use something like block_on for blocking.

Yes I think we should only support async especially in this crate where that is mandatory due to needing to support WASM.

In summary, I think for now, at a minimum, we should support both reqwest and hyper. reqwest would be there to have those nice features baked in. hyper is there so that we can swap-out the transport.

I don't see how this works. How are you going to use ALPN logic from reqwest with arti for example. The main benefit of reqwest is that it supports WASM transparently -- which is part of the reason it doesn't support BYO TCP/TLS connection. I guess that we will want reqwest if the target is wasm and just implement the client purely with reqwest with a bunch of stuff disabled if it's WASM.

Abstracting the data model would allow us to still have a BlockingClient with minreq as an example, and allow the user to use any other one, but I now feel I need think a little bit more about it.

I really think there is no benefit to using different HTTP impls. We should just choose the best one that:

  1. Allows us to BYO TCP connections/TLS connection/whatever
  2. Supports HTTP2

As far as I can see that's hyper. I wouldn't really mind if we had a blocking client that satisfied these two things. We could manage two different clients like we do now. But if hyper is the answer then I'd say we just go async only and leave the fake "blocking" to the user.

@evanlinjin
Copy link
Member Author

Just a side note, I think fake blocking is very easy to do.

@LLFourn
Copy link
Contributor

LLFourn commented Sep 28, 2024

If we wanted to take flexibility to the next level we could implement the HTTP protocol sans-io on the bdk_esplora side like:

pub struct EsploraSyncSession<I> {
   sync_request: SyncRequest<I>
}

impl EsploraSyncSession {
    pub fn poll_request(&mut self) -> Option<(RequestHandle, ResourceRequest)> {
        // e.g.  take stuff from internal `sync_request` and turn it into `ResourceRequest`
        todo!()
    }

    pub fn provide_response(&mut self, handle: RequestHandle, Resource) -> Result<(), ResponseError> {
       // insert stuff into internal SyncResult
       todo!()
    }
  
    pub fn is_finished(&self) -> bool { todo!() }

    pub fn into_update(self) -> TxUpdate<...> { todo!() }
}


struct ResourceRequest {
    path: String // e.g. "/api/blocks/tip/height",
    method: &'static str // e.g. GET, POST
    body: Vec<u8> // mostly (maybe always?) empty
}

// leave up to imagination to fill in the other things

Then we wouldn't even need this crate for syncing in bdk. We could provide bdk_http_client that uses tokio + rustls + hyper to just implement polling a generic HttpProtocol (which would define poll_request, provide_response) to gather ResourceRequests, fetch them and provide the Resource back until its done. Both EsploraSyncSession and a "full scan" variant can be implementations of HttpProtocol trait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Discussion
5 participants