Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add related source code locations to errors #13664

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ sqlparser = { version = "0.52.0", features = ["visitor"] }
tempfile = "3"
tokio = { version = "1.36", features = ["macros", "rt", "sync"] }
url = "2.2"
derivative = "2.2.0"

[profile.release]
codegen-units = 1
Expand All @@ -177,3 +178,7 @@ large_futures = "warn"
[workspace.lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = ["cfg(tarpaulin)"] }
unused_qualifications = "deny"

## Temp patch for sqlparser
[patch.crates-io]
sqlparser = { git = "https://github.com/apache/datafusion-sqlparser-rs.git", rev = "4ab3ab91473d152c652e6582b63abb13535703f9" }
1 change: 1 addition & 0 deletions datafusion/common/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ pyo3 = { version = "0.22.0", optional = true }
recursive = { workspace = true }
sqlparser = { workspace = true }
tokio = { workspace = true }
derivative = { workspace = true }

[target.'cfg(target_family = "wasm")'.dependencies]
web-time = "1.1.0"
Expand Down
76 changes: 72 additions & 4 deletions datafusion/common/src/column.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,38 @@
//! Column

use arrow_schema::{Field, FieldRef};
use sqlparser::tokenizer::Span;

use crate::error::_schema_err;
use crate::utils::{parse_identifiers_normalized, quote_identifier};
use crate::{DFSchema, Result, SchemaError, TableReference};
use crate::{
DFSchema, Diagnostic, DiagnosticEntry, DiagnosticEntryKind, Result, SchemaError,
TableReference,
};
use derivative::Derivative;
use std::collections::HashSet;
use std::convert::Infallible;
use std::fmt;
use std::str::FromStr;

/// A named reference to a qualified field in a schema.
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
#[derive(Debug, Derivative)]
#[derivative(PartialEq, Eq, Hash, PartialOrd, Ord, Clone)]
pub struct Column {
/// relation/table reference.
pub relation: Option<TableReference>,
/// field/column name.
pub name: String,
#[derivative(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting pattern -- it is different than the way sqlparser did it (which is to effectively wrap the span with a struct that is ignored when comparing eq, hash, etc

Did you consider a similar approach?

pub struct Column {
...
  // Wrapped span that does not contribute to PartialEq, Eq, Hash, etc
  pub span: DiagnosticOnly<Span>
}

I think this approach is less verbose and might be easier to understand for the casual reader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see WithSpan has a similar approach (but that is for wrapping things with Spans 🤔 )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed interesting. 😅 I was going to use derivative for this problem in sqlparser initially actually, but I wasn't sure if you would want to add a new dependency. If that is not a concern, I think using derivative for cases like this is pretty nice as it is also clear at the usage site that it is being ignored, unlike AttachedToken which kind of hides the behavior. If a wrapper type is preferred, I now think actually maybe the struct in sqlparser should have been called DiagnosticOnly as well 😂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AttachedToken approach that we took in sqlparser could be a functionally equivalent alternative. And I understand that not everybody might be familiar with derivative.

What I don't 100% love about it is that the PartialEq implementation on AttachedToken is a bit "impure". In that, it serves the one specific purpose of making it ignored when used in a struct field, but prevents you from actually comparing two instances of it because they would always are equal (the implementation returns always true), which is something you might want to do at some point. Especially because the name AttachedToken doesn't clearly convey that intent, and does more than just being a passthrough for PartialEq since its main purpose is tying together a Span and a Token.

When looking at a struct that uses it:

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
pub struct Select {
    /// Token for the `SELECT` keyword
    pub select_token: AttachedToken,
	// ...
}

it's not clear to me that AttachedToken is being ignored. I only realise that when I go look at the implementation of PartialEq.

Overall, the derivative approach to me more clearly lets me keep the sensible implementation of PartialEq for Span, while also conveying that "when used as a field in this particular struct, I want it to not influence the struct's comparison". But in other structs, I might want it to influence the comparison.

I personally find this approach more readable and flexible, but I'm open to converting to a wrapper type :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see WithSpan has a similar approach (but that is for wrapping things with Spans 🤔 )

Ah yes good catch! I could definitely have used derivative for WithSpan too. And about WithSpan: do you like the general idea of it?

It's meant to be a "sidecar" to get a Span all the way to where it's needed to create a meaningful diagnostic, by attaching it to values that are sort-of-related, but wouldn't make a lot of sense to add as a struct field.

For example, the two DataTypes in a binary expression come from the two expressions in the SQL query, so they (in some sense) relate to a portion of the code. But putting span as a field of DataType sounds wrong.

Maybe a better name could be SpanRelated or something like that?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective some design goals should be:

  1. Make it easy for someone reading the DataFusion source code to understand what is going on
  2. If people are not interested in the span / diagnostic information they can ignore it

From my perspective, using new crates does reduce the amount of code in the DataFusion crate, but can often increase cognitive load (as now to understand the DataFusion code you need to understand the crates)

I think the AttachedToken approach that we took in sqlparser could be a functionally equivalent alternative. And I understand that not everybody might be familiar with derivative.

Yeah, that is why I think AttachedToken might be better in this case

PartialEq = "ignore",
Hash = "ignore",
PartialOrd = "ignore",
Ord = "ignore"
)]
/// The location in the source code where this column originally came from.
/// Does not change as the [`Column`] undergoes normalization and other
/// transformations.
pub spans: Vec<Span>,
}

impl Column {
Expand All @@ -50,6 +66,7 @@ impl Column {
Self {
relation: relation.map(|r| r.into()),
name: name.into(),
spans: vec![],
}
}

Expand All @@ -58,6 +75,7 @@ impl Column {
Self {
relation: None,
name: name.into(),
spans: vec![],
}
}

Expand All @@ -68,6 +86,7 @@ impl Column {
Self {
relation: None,
name: name.into(),
spans: vec![],
}
}

Expand Down Expand Up @@ -99,7 +118,11 @@ impl Column {
// identifiers will be treated as an unqualified column name
_ => return None,
};
Some(Self { relation, name })
Some(Self {
relation,
name,
spans: vec![],
})
}

/// Deserialize a fully qualified name string into a column
Expand All @@ -113,6 +136,7 @@ impl Column {
Self {
relation: None,
name: flat_name,
spans: vec![],
},
)
}
Expand All @@ -124,6 +148,7 @@ impl Column {
Self {
relation: None,
name: flat_name,
spans: vec![],
},
)
}
Expand Down Expand Up @@ -239,7 +264,33 @@ impl Column {

// If not due to USING columns then due to ambiguous column name
return _schema_err!(SchemaError::AmbiguousReference {
field: Column::new_unqualified(self.name),
field: Column::new_unqualified(&self.name),
})
.map_err(|err| {
err.with_diagnostic(|_| {
Diagnostic::new(
[DiagnosticEntry::new(
format!("Ambiguous reference to '{}'", &self.name),
DiagnosticEntryKind::Error,
*self.spans.first().unwrap_or(&Span::empty()),
)]
.into_iter()
.chain(
columns.iter().flat_map(|c| {
c.spans.iter().map(|span| {
DiagnosticEntry::new(
format!(
"Possible reference to '{}' here",
&c.name
),
DiagnosticEntryKind::Note,
*span,
)
})
}),
),
)
})
});
}
}
Expand All @@ -254,6 +305,23 @@ impl Column {
.collect(),
})
}

/// Attaches a [`Span`] to the [`Column`], i.e. its location in the source
/// SQL query.
pub fn with_span(mut self, span: Span) -> Self {
self.spans = vec![span];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this perhaps append a span to the Column rather than overwrite it?

self
}

/// Attaches multiple [`Span`]s to the [`Column`], i.e. its locations in the
/// source SQL query.
pub fn with_spans<IT>(mut self, spans: IT) -> Self
where
IT: IntoIterator<Item = Span>,
{
self.spans = spans.into_iter().collect();
self
}
}

impl From<&str> for Column {
Expand Down
42 changes: 42 additions & 0 deletions datafusion/common/src/dfschema/fields_spans.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
use sqlparser::tokenizer::Span;

use crate::JoinType;

/// The location in the source code where the fields are defined (e.g. in the
/// body of a CTE), if any.
#[derive(Debug, Clone)]
pub struct FieldsSpans(Vec<Vec<Span>>);

impl FieldsSpans {
pub fn empty(field_count: usize) -> Self {
Self((0..field_count).map(|_| Vec::new()).collect())
}

pub fn iter(&self) -> impl Iterator<Item = &Vec<Span>> {
self.0.iter()
}

pub fn join(
&self,
other: &FieldsSpans,
join_type: &JoinType,
_left_cols_len: usize,
) -> FieldsSpans {
match join_type {
JoinType::Inner | JoinType::Left | JoinType::Right | JoinType::Full => {
FieldsSpans(self.0.iter().chain(other.0.iter()).cloned().collect())
}
JoinType::LeftSemi => todo!(),
JoinType::RightSemi => todo!(),
JoinType::LeftAnti => todo!(),
JoinType::RightAnti => todo!(),
JoinType::LeftMark => todo!(),
}
}
}

impl FromIterator<Vec<Span>> for FieldsSpans {
fn from_iter<T: IntoIterator<Item = Vec<Span>>>(iter: T) -> Self {
FieldsSpans(iter.into_iter().collect())
}
}
Loading
Loading