Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add related source code locations to errors #13662

Open
eliaperantoni opened this issue Dec 5, 2024 · 4 comments · May be fixed by #13664
Open

Add related source code locations to errors #13662

eliaperantoni opened this issue Dec 5, 2024 · 4 comments · May be fixed by #13664
Labels
enhancement New feature or request

Comments

@eliaperantoni
Copy link

Is your feature request related to a problem or challenge?

DataFusionError does not provide information about the location inside the SQL query that caused it. It's difficult for the end user to understand where, in a large query, to go look for the cause of the error.

Example:

WITH products_with_price AS (
	SELECT
		id,
		name,
		price_usd
	FROM
		products
	JOIN
		prices
	ON
		products.id = prices.product_id
)
SELECT
	id,
	name,
	price_usd + '$' AS display_price_usd
FROM
	products_with_price

When creating a logical plan for this query, we'd return:

Cannot coerce arithmetic expression Int64 + Utf8 to valid types

It's up to the user to figure out that the error lies on the third-to-last line.

Describe the solution you'd like

With the AST nodes from the parser now containing the source code location (thanks to apache/datafusion-sqlparser-rs#1435), it now becomes possible to add the same information to errors. This could seriously improve the experience for end users, since they would be able to clearly see where in the queries the errors lie.

In the example above, the desired feature of this ticket would be to add a reference to the third-to-last line, which is the cause of the error: price_usd + '$' AS display_price_usd.

Furthermore, it would be helpful to link to additonal segments of code note that, despite not representing the root cause of the error, help the user understand the context around the error and the elements involved. In this example, we'd highlight the projected price_usd column in the CTE to help the user understand where it comes from.

Describe alternatives you've considered

DataFusionError::Context only adds additional debug strings, but doesn't attach source code location info.

Additional context

No response

@eliaperantoni eliaperantoni added the enhancement New feature or request label Dec 5, 2024
@eliaperantoni eliaperantoni linked a pull request Dec 5, 2024 that will close this issue
@comphead
Copy link
Contributor

comphead commented Dec 5, 2024

Datafusion supports backtraces to find where the problem happens in the first place https://datafusion.apache.org/user-guide/crate-configuration.html#enable-backtraces

@eliaperantoni
Copy link
Author

Datafusion supports backtraces to find where the problem happens in the first place datafusion.apache.org/user-guide/crate-configuration.html#enable-backtraces

That's true, and I think it's an invaluable tool for debugging and developing DataFusion or software that relies on it. But in my opinion it's not really the best end user facing error reporting solution.

  • It's not really readable on its own (for end users, not developers)
  • Can't extract a nicer message from it
  • References DataFusion's source code locations, not the SQL queries' locations

Take this trace:

   3: datafusion_common::error::DataFusionError::get_back_trace
             at /Users/eper/code/datafusion/datafusion/common/src/error.rs:410:30
   4: datafusion_sql::utils::check_column_satisfies_expr
             at /Users/eper/code/datafusion/datafusion/sql/src/utils.rs:153:16
   5: datafusion_sql::utils::check_columns_satisfy_exprs
             at /Users/eper/code/datafusion/datafusion/sql/src/utils.rs:141:18
   6: datafusion_sql::select::<impl datafusion_sql::planner::SqlToRel<S>>::aggregate
             at /Users/eper/code/datafusion/datafusion/sql/src/select.rs:807:9
   7: datafusion_sql::select::<impl datafusion_sql::planner::SqlToRel<S>>::select_to_plan
             at /Users/eper/code/datafusion/datafusion/sql/src/select.rs:200:13
   8: datafusion_sql::query::<impl datafusion_sql::planner::SqlToRel<S>>::query_to_plan
             at /Users/eper/code/datafusion/datafusion/sql/src/query.rs:55:28
   9: datafusion_sql::statement::<impl datafusion_sql::planner::SqlToRel<S>>::sql_statement_to_plan_with_context_impl
             at /Users/eper/code/datafusion/datafusion/sql/src/statement.rs:220:40
  10: datafusion_sql::statement::<impl datafusion_sql::planner::SqlToRel<S>>::sql_statement_to_plan
             at /Users/eper/code/datafusion/datafusion/sql/src/statement.rs:183:9
  11: datafusion_sql::statement::<impl datafusion_sql::planner::SqlToRel<S>>::statement_to_plan
             at /Users/eper/code/datafusion/datafusion/sql/src/statement.rs:171:42
  12: datafusion::execution::session_state::SessionState::statement_to_plan::{{closure}}
             at /Users/eper/code/datafusion/datafusion/core/src/execution/session_state.rs:560:9
  13: datafusion::execution::session_state::SessionState::create_logical_plan::{{closure}}
             at /Users/eper/code/datafusion/datafusion/core/src/execution/session_state.rs:593:54

The query was SELECT name FROM users GROUP BY id, so the error is that name must be in the GROUP BY. The closest hint we have from the backtrace is the call to SqlToRel::aggregate but that's it.

With the changes that I'm contributing in #13664, the same query would produce diagnostics data that look like this:

(line 1, column 8) error: 'users.name' in projection does not appear in GROUP BY clause
(line 1, column 33) note: GROUP BY clause is here
(line 1, column 33) Help: Add 'users.name' to GROUP BY clause

@comphead
Copy link
Contributor

comphead commented Dec 6, 2024

(line 1, column 8) error: 'users.name' in projection does not appear in GROUP BY clause
(line 1, column 33) note: GROUP BY clause is here
(line 1, column 33) Help: Add 'users.name' to GROUP BY clause

Thanks @eliaperantoni
But this more looks for as a parse challenge 🤔 shouldn't be it in the parser phase?
https://github.com/apache/datafusion-sqlparser-rs

@eliaperantoni
Copy link
Author

(line 1, column 8) error: 'users.name' in projection does not appear in GROUP BY clause
(line 1, column 33) note: GROUP BY clause is here
(line 1, column 33) Help: Add 'users.name' to GROUP BY clause

Thanks @eliaperantoni But this more looks for as a parse challenge 🤔 shouldn't be it in the parser phase? apache/datafusion-sqlparser-rs

I think the parser and the logical planner capture different categories of errors. They do it already now, just not in a user-friendly way.

The parser would catch errors such as a syntactically invalid query (e.g. SELECT id name FROM users, missing a comma), while the logical planner would catch more sophisticated errors that appear in a grammatically correct query (e.g. SELECT id + name FROM users, trying to do an arithmetic sum between an integer and a string).

Though, you are right that the source location information must come from the parser, and we have contributed that here datafusion-sqlparser-rs#1435. It's just that the parser alone doesn't understand how the AST tree is being interpreted, the logical planner does that.

One key consideration is that in #13664 I'm enhancing errors from the logical planner only: syntactically invalid queries would still result in a user-unfriendly error from datafusion-sqlparser-rs. Perhaps we can, at some point, do the same to datafusion-sqlparser-rs too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants