-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add related source code locations to errors #13664
Draft
eliaperantoni
wants to merge
9
commits into
apache:main
Choose a base branch
from
eliaperantoni:eper/diagnostics
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
e4bbffa
Temp patch to new sqlparser
alamb 3545647
Update for new sqlparser API
alamb 34c388a
feat: init diagnostics
eliaperantoni dee7959
feat: diagnostics for unknown columns and missing in `GROUP BY`
eliaperantoni 515bb5e
feat: `FieldSpans` and ambiguous column reference diagnostics
eliaperantoni cdee494
Merge branch 'main' into eper/diagnostics
eliaperantoni 03afbf6
feat: UNION wrong number of fields diagnostics
eliaperantoni 2d3e2a1
feat: collect more errors
eliaperantoni 0506a80
feat: ignorable errors
eliaperantoni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,22 +18,38 @@ | |
//! Column | ||
|
||
use arrow_schema::{Field, FieldRef}; | ||
use sqlparser::tokenizer::Span; | ||
|
||
use crate::error::_schema_err; | ||
use crate::utils::{parse_identifiers_normalized, quote_identifier}; | ||
use crate::{DFSchema, Result, SchemaError, TableReference}; | ||
use crate::{ | ||
DFSchema, Diagnostic, DiagnosticEntry, DiagnosticEntryKind, Result, SchemaError, | ||
TableReference, | ||
}; | ||
use derivative::Derivative; | ||
use std::collections::HashSet; | ||
use std::convert::Infallible; | ||
use std::fmt; | ||
use std::str::FromStr; | ||
|
||
/// A named reference to a qualified field in a schema. | ||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] | ||
#[derive(Debug, Derivative)] | ||
#[derivative(PartialEq, Eq, Hash, PartialOrd, Ord, Clone)] | ||
pub struct Column { | ||
/// relation/table reference. | ||
pub relation: Option<TableReference>, | ||
/// field/column name. | ||
pub name: String, | ||
#[derivative( | ||
PartialEq = "ignore", | ||
Hash = "ignore", | ||
PartialOrd = "ignore", | ||
Ord = "ignore" | ||
)] | ||
/// The location in the source code where this column originally came from. | ||
/// Does not change as the [`Column`] undergoes normalization and other | ||
/// transformations. | ||
pub spans: Vec<Span>, | ||
} | ||
|
||
impl Column { | ||
|
@@ -50,6 +66,7 @@ impl Column { | |
Self { | ||
relation: relation.map(|r| r.into()), | ||
name: name.into(), | ||
spans: vec![], | ||
} | ||
} | ||
|
||
|
@@ -58,6 +75,7 @@ impl Column { | |
Self { | ||
relation: None, | ||
name: name.into(), | ||
spans: vec![], | ||
} | ||
} | ||
|
||
|
@@ -68,6 +86,7 @@ impl Column { | |
Self { | ||
relation: None, | ||
name: name.into(), | ||
spans: vec![], | ||
} | ||
} | ||
|
||
|
@@ -99,7 +118,11 @@ impl Column { | |
// identifiers will be treated as an unqualified column name | ||
_ => return None, | ||
}; | ||
Some(Self { relation, name }) | ||
Some(Self { | ||
relation, | ||
name, | ||
spans: vec![], | ||
}) | ||
} | ||
|
||
/// Deserialize a fully qualified name string into a column | ||
|
@@ -113,6 +136,7 @@ impl Column { | |
Self { | ||
relation: None, | ||
name: flat_name, | ||
spans: vec![], | ||
}, | ||
) | ||
} | ||
|
@@ -124,6 +148,7 @@ impl Column { | |
Self { | ||
relation: None, | ||
name: flat_name, | ||
spans: vec![], | ||
}, | ||
) | ||
} | ||
|
@@ -239,7 +264,33 @@ impl Column { | |
|
||
// If not due to USING columns then due to ambiguous column name | ||
return _schema_err!(SchemaError::AmbiguousReference { | ||
field: Column::new_unqualified(self.name), | ||
field: Column::new_unqualified(&self.name), | ||
}) | ||
.map_err(|err| { | ||
err.with_diagnostic(|_| { | ||
Diagnostic::new( | ||
[DiagnosticEntry::new( | ||
format!("Ambiguous reference to '{}'", &self.name), | ||
DiagnosticEntryKind::Error, | ||
*self.spans.first().unwrap_or(&Span::empty()), | ||
)] | ||
.into_iter() | ||
.chain( | ||
columns.iter().flat_map(|c| { | ||
c.spans.iter().map(|span| { | ||
DiagnosticEntry::new( | ||
format!( | ||
"Possible reference to '{}' here", | ||
&c.name | ||
), | ||
DiagnosticEntryKind::Note, | ||
*span, | ||
) | ||
}) | ||
}), | ||
), | ||
) | ||
}) | ||
}); | ||
} | ||
} | ||
|
@@ -254,6 +305,23 @@ impl Column { | |
.collect(), | ||
}) | ||
} | ||
|
||
/// Attaches a [`Span`] to the [`Column`], i.e. its location in the source | ||
/// SQL query. | ||
pub fn with_span(mut self, span: Span) -> Self { | ||
self.spans = vec![span]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this perhaps append a span to the Column rather than overwrite it? |
||
self | ||
} | ||
|
||
/// Attaches multiple [`Span`]s to the [`Column`], i.e. its locations in the | ||
/// source SQL query. | ||
pub fn with_spans<IT>(mut self, spans: IT) -> Self | ||
where | ||
IT: IntoIterator<Item = Span>, | ||
{ | ||
self.spans = spans.into_iter().collect(); | ||
self | ||
} | ||
} | ||
|
||
impl From<&str> for Column { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
use sqlparser::tokenizer::Span; | ||
|
||
use crate::JoinType; | ||
|
||
/// The location in the source code where the fields are defined (e.g. in the | ||
/// body of a CTE), if any. | ||
#[derive(Debug, Clone)] | ||
pub struct FieldsSpans(Vec<Vec<Span>>); | ||
|
||
impl FieldsSpans { | ||
pub fn empty(field_count: usize) -> Self { | ||
Self((0..field_count).map(|_| Vec::new()).collect()) | ||
} | ||
|
||
pub fn iter(&self) -> impl Iterator<Item = &Vec<Span>> { | ||
self.0.iter() | ||
} | ||
|
||
pub fn join( | ||
&self, | ||
other: &FieldsSpans, | ||
join_type: &JoinType, | ||
_left_cols_len: usize, | ||
) -> FieldsSpans { | ||
match join_type { | ||
JoinType::Inner | JoinType::Left | JoinType::Right | JoinType::Full => { | ||
FieldsSpans(self.0.iter().chain(other.0.iter()).cloned().collect()) | ||
} | ||
JoinType::LeftSemi => todo!(), | ||
JoinType::RightSemi => todo!(), | ||
JoinType::LeftAnti => todo!(), | ||
JoinType::RightAnti => todo!(), | ||
JoinType::LeftMark => todo!(), | ||
} | ||
} | ||
} | ||
|
||
impl FromIterator<Vec<Span>> for FieldsSpans { | ||
fn from_iter<T: IntoIterator<Item = Vec<Span>>>(iter: T) -> Self { | ||
FieldsSpans(iter.into_iter().collect()) | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an interesting pattern -- it is different than the way sqlparser did it (which is to effectively wrap the span with a struct that is ignored when comparing eq, hash, etc
Did you consider a similar approach?
I think this approach is less verbose and might be easier to understand for the casual reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see
WithSpan
has a similar approach (but that is for wrapping things with Spans 🤔 )There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed interesting. 😅 I was going to use derivative for this problem in sqlparser initially actually, but I wasn't sure if you would want to add a new dependency. If that is not a concern, I think using derivative for cases like this is pretty nice as it is also clear at the usage site that it is being ignored, unlike
AttachedToken
which kind of hides the behavior. If a wrapper type is preferred, I now think actually maybe the struct in sqlparser should have been called DiagnosticOnly as well 😂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the
AttachedToken
approach that we took insqlparser
could be a functionally equivalent alternative. And I understand that not everybody might be familiar withderivative
.What I don't 100% love about it is that the
PartialEq
implementation onAttachedToken
is a bit "impure". In that, it serves the one specific purpose of making it ignored when used in a struct field, but prevents you from actually comparing two instances of it because they would always are equal (the implementation returns alwaystrue
), which is something you might want to do at some point. Especially because the nameAttachedToken
doesn't clearly convey that intent, and does more than just being a passthrough forPartialEq
since its main purpose is tying together aSpan
and aToken
.When looking at a struct that uses it:
it's not clear to me that
AttachedToken
is being ignored. I only realise that when I go look at the implementation ofPartialEq
.Overall, the
derivative
approach to me more clearly lets me keep the sensible implementation ofPartialEq
forSpan
, while also conveying that "when used as a field in this particular struct, I want it to not influence the struct's comparison". But in other structs, I might want it to influence the comparison.I personally find this approach more readable and flexible, but I'm open to converting to a wrapper type :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good catch! I could definitely have used
derivative
forWithSpan
too. And aboutWithSpan
: do you like the general idea of it?It's meant to be a "sidecar" to get a
Span
all the way to where it's needed to create a meaningful diagnostic, by attaching it to values that are sort-of-related, but wouldn't make a lot of sense to add as a struct field.For example, the two
DataType
s in a binary expression come from the two expressions in the SQL query, so they (in some sense) relate to a portion of the code. But puttingspan
as a field ofDataType
sounds wrong.Maybe a better name could be
SpanRelated
or something like that?.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective some design goals should be:
From my perspective, using new crates does reduce the amount of code in the DataFusion crate, but can often increase cognitive load (as now to understand the DataFusion code you need to understand the crates)
Yeah, that is why I think AttachedToken might be better in this case