Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce configurable DNS resolution timeout #1113

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

muzarski
Copy link
Contributor

fix: #1033

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@muzarski muzarski self-assigned this Oct 30, 2024
Copy link

github-actions bot commented Oct 30, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 86e69d8

Comment on lines +765 to +829
/// Changes DNS hostname resolution timeout.
/// The default is 5 seconds.
///
/// # Example
/// ```
/// # use scylla::{Session, SessionBuilder};
/// # use std::time::Duration;
/// # async fn example() -> Result<(), Box<dyn std::error::Error>> {
/// let session: Session = SessionBuilder::new()
/// .known_node("127.0.0.1:9042")
/// .hostname_resolution_timeout(Duration::from_secs(10))
/// .build() // Turns SessionBuilder into Session
/// .await?;
/// # Ok(())
/// # }
/// ```
pub fn hostname_resolution_timeout(mut self, duration: Duration) -> Self {
self.config.hostname_resolution_timeout = Some(duration);
self
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should accept Option<Duration>, especially that the default is Some. Now it's impossible to set no timeout with SessionBuilder's API.

Copy link
Contributor Author

@muzarski muzarski Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to be consistent with other optional timeout options. See for example SessionBuilder::keepalive_interval() It accepts Duration, and sets the corresponding option to Some(duration) as well.

If user wants to set any of these to None, he can do so by direct access to the field (fields are public):

sb.config.hostname_resolution_timeout = None;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I dislike the idea of SessionBuilder exposing only subset of possible configuration options using methods, with some being available only through fields.
WDYT @Lorak-mmk ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could expose another methods (disable_hostname_resolution_timeout, disable_keepalive) which would be a bit more descriptive.

Comment on lines +282 to +286
/// Performs a DNS lookup with provided optional timeout.
async fn lookup_host_with_timeout<T: ToSocketAddrs>(
host: T,
hostname_resolution_timeout: Option<Duration>,
) -> Result<impl Iterator<Item = SocketAddr>, DnsLookupError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use impl ToSocketAddrs, as it's more concise and we get nothing with explicit type parameter here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm suprised that such a trait exists

Comment on lines +305 to 313
let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await {
Ok(addrs) => itertools::Either::Left(addrs),
// Use a default port in case of error, but propagate the original error on failure
Err(e) => {
let addrs = lookup_host((hostname, 9042)).await.or(Err(e))?;
let addrs = lookup_host_with_timeout((hostname, 9042), hostname_resolution_timeout)
.await
.or(Err(e))?;
itertools::Either::Right(addrs)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the timeout is effectively twice the value provided in the config...
I'm not convinced it's correct. This must be at least noted in documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... you are right. I wonder whether we should leave it as is, or rather recompute the timeout for the second try in case first failed.

    let deadline = hostname_resolution_timeout.map(|dur| tokio::time::Instant::now() + dur);
    let addrs = match lookup_host_with_timeout(hostname, hostname_resolution_timeout).await {
        Ok(addrs) => itertools::Either::Left(addrs),
        // Use a default port in case of error, but propagate the original error on failure
        Err(e) => {
            let new_timeout =
                deadline.map(|deadline| deadline.duration_since(tokio::time::Instant::now()));
            let addrs = lookup_host_with_timeout((hostname, 9042), new_timeout)
                .await
                .or(Err(e))?;
            itertools::Either::Right(addrs)
        }
    };

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first attempt failed from reasons different that a timeout, then such recomputation makes perfect sense. However, if it failed due to a timeout (because the wrong port somehow caused a timeout - idk how likely this scenario is), then we won't be able to do the second attempt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas @Lorak-mmk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also execute resolve_hostname function with a given timeout (i.e. tokio::time::timeout(t, resolve_hostname)). It's much easier and cleaner than recomputing the timeout. However, in such case it's still true that if first attempt takes too long, we won't be able to do a second attempt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can it happen that the port is not specified and what is the expected behavior of lookup_host in this case?

If it always fails immediately with a specific error, then we can check this error and perform lookup with default port with the same timeout.

If it can timeout, or take some time in general, then I'd perform another lookup with the same timeout and document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my investigation, it looks like port is not used in the actual DNS lookup (which makes sense).

What's the difference between String and (String, u16), then?
When String is provided, both tokio and std::net expect that it is of form "host:port". If it's not the case, the error is returned immidiately. Otherwise, host part is used in DNS lookup which may timeout.

When (String, u16) is provided, the DNS lookup is performed based on the string - port is ignored during lookup (same as above).

This means that:

  • if hostname is of form "addr:port", then the first lookup timeouts iff the second lookup (with default port) timeouts
  • if hostname is not of this form, then the first lookup will fail immidiately. We can try with default port.

In that case, I think that we can match the error returned from lookup_host_with_timeout If it's something other than DnsLookupError::Timeout(_), then we can try again with the same timeout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let's describe this behavior in the docs.

This option is responsible for setting the DNS hostname
resolution timeout.

The default timeout is set to 5 seconds.
…nfig

These structs need this context to pass it forward to functions
responsible for DNS lookup.
Passed the hostname_resolution_timeout down to the functions
responsible for DNS resolution logic.

Created a pub(crate) error type to distinguish between errors that
can occur during hostname resolution. Notice: it's pub(crate) since
the users of this API only emit logs in case of error. The errors are
not passed to public API.

Created a `lookup_host_with_timeout` function, and extracted
some logic here. The purpose is not to introduce complex branching
in original function.
@muzarski muzarski force-pushed the dns-resolution-timeout branch from a01ba38 to 86e69d8 Compare December 11, 2024 16:05
@muzarski
Copy link
Contributor Author

Rebased on main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable DNS resolve timeout
3 participants