Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Column and Table comparison yields boolean output #10136

Open
1 task done
tokoko opened this issue Sep 16, 2024 · 6 comments
Open
1 task done

bug: Column and Table comparison yields boolean output #10136

tokoko opened this issue Sep 16, 2024 · 6 comments
Assignees
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@tokoko
Copy link
Contributor

tokoko commented Sep 16, 2024

What happened?

Comparing Table and Column objects escapes ibis operator overrides and instead yields a boolean output. This is not necessarily wrong, but can lead to confusing errors when user is trying to use another table in a scalar subquery, but forgets to call .as_scalar method on the table. Instead of throwing an error, such a comparison will always be interpreted as a boolean literal false.

import ibis

con = ibis.connect("duckdb://")

con.create_table("orders", orders_pa)
con.create_table("stores", stores_pa)

print(
    ibis.to_sql(
        orders.select(
            orders["fk_store_id"] == stores.select("store_id").limit(1) # .as_scalar() call missing
        )
    )
)

What version of ibis are you using?

9.5.0

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tokoko tokoko added the bug Incorrect behavior inside of ibis label Sep 16, 2024
@gforsyth gforsyth added this to the 10.0 milestone Sep 16, 2024
@anjakefala
Copy link
Contributor

/take

@anjakefala
Copy link
Contributor

Thank you for reporting this bug @tokoko, and for the start of a repro! I am going to investigate how the ibis over-ride escapes this.

@cpcloud What would be the expected behaviour? Would it be to error?

I put together a thorough repro:

import ibis
import pyarrow as pa

orders_data = pa.Table.from_pydict({
                "order_id": [1,2,3,4],
                "fk_store_id": [1,2,1,3],
                "amount": [100,200,150,300]
                })

stores_data = pa.Table.from_pydict({
    "store_id": [1,2,3],
    "name": ["Store A", "Store B", "Store C"]
    })

con = ibis.connect("duckdb://")
orders = con.create_table("orders", orders_data)
stores = con.create_table("stores", stores_data)

print("\nCase 1: Direct comparison")
expr = orders.select(
    orders["fk_store_id"],
    comparison_result=(orders["fk_store_id"] == stores.select("store_id").limit(1))
)
print("\nSQL generated:")
print(ibis.to_sql(expr))
print("\nResult:")
print(expr.execute())

print("\nCase 2: Correct usage with as_scalar()")
expr_correct = orders.select(
        orders["fk_store_id"],
        comparison_result=(orders["fk_store_id"] == stores.select("store_id").limit(1).as_scalar())
)
print("\nSQL generated:")
print(ibis.to_sql(expr_correct))
print("\nResult:")
print(expr_correct.execute())

# Case 3: What happens with different comparison types?
print("\nCase 3: Different comparison types")
# Column to literal
print("Column to literal:", type(orders["fk_store_id"] == 1))
# Column to Column
print("Column to Column:", type(orders["fk_store_id"] == orders["order_id"]))
# Column to Table
print("Column to Table:", type(orders["fk_store_id"] == stores.select("store_id").limit(1)))
# Column to Scalar
print("Column to Scalar:", type(orders["fk_store_id"] == stores.select("store_id").limit(1).as_scalar()))

This is the output:

Case 1: Direct comparison

SQL generated:
SELECT
  "t0"."fk_store_id",
  FALSE AS "comparison_result"
FROM "memory"."main"."orders" AS "t0"

Result:
   fk_store_id  comparison_result
0            1              False
1            2              False
2            1              False
3            3              False

Case 2: Correct usage with as_scalar()

SQL generated:
SELECT
  "t0"."fk_store_id",
  "t0"."fk_store_id" = (
    SELECT
      "t1"."store_id"
    FROM "memory"."main"."stores" AS "t1"
    LIMIT 1
  ) AS "comparison_result"
FROM "memory"."main"."orders" AS "t0"

Result:
   fk_store_id  comparison_result
0            1               True
1            2              False
2            1               True
3            3              False

Case 3: Different comparison types
Column to literal: <class 'ibis.expr.types.logical.BooleanColumn'>
Column to Column: <class 'ibis.expr.types.logical.BooleanColumn'>
Column to Table: <class 'bool'>
Column to Scalar: <class 'ibis.expr.types.logical.BooleanColumn'>

It seems only Columns compared with Tables is returning the boolean.

@anjakefala
Copy link
Contributor

It seems that in _binop, op_class is running into a SignatureValidationError which results to falling back to the default Python ==.

-> def _binop(op_class: type[ops.Binary], left: ir.Value, right: ir.Value) -> ir.Value:                                                                                                    
 > /home/anja/git/ibis/ibis/expr/types/core.py(803)_binop()                                                                                                                                 
 -> try:                                                                                                                                                                                    
 > /home/anja/git/ibis/ibis/expr/types/core.py(804)_binop()                                                                                                                                 
 -> node = op_class(left, right)                                                                                                                                                            
ibis.common.annotations.SignatureValidationError: Equals(r0 := DatabaseTable: memory.main.orders                                                                                           
   order_id    int64                                                                                                                                                                        
   fk_store_id int64                                                                                                                                                                        
   amount      int64                                                                                                                                                                        
                                                                                                                                                                                            
 fk_store_id: r0.fk_store_id, r0 := DatabaseTable: memory.main.stores                                                                                                                       
   store_id int64                                                                                                                                                                           
  name     string                                                                                                                                                                          
                                                                                                                                                                                            
 r1 := Project[r0]                                                                                                                                                                          
   store_id: r0.store_id                                                                                                                                                                    
                                                                                                                                                                                           
Limit[r1, n=1]) has failed due to the following errors:                                                                                                                                    
`right`: r0 := DatabaseTable: memory.main.stores                                                                                                                                         
 store_id int64                                                                                                                                                                           
  name     string                                                                                                                                                                          
                                                                                                                                                                                            
 r1 := Project[r0]                                                                                                                                                                          
   store_id: r0.store_id                                                                                                                                                                    
                                                                                                                                                                                          
 Limit[r1, n=1] is not coercible to a Value                                                                                                                                                 
                                                                                                                                                                                            
 Expected signature: Equals(left: Value, right: Value)                                                                                                                                      

@anjakefala
Copy link
Contributor

anjakefala commented Nov 14, 2024

My interpreation is that Ibis tried to create an Equals operation. The Equals operation required both operands to be Value types. It can't automatically coerce a Table to a Value. So Python's == is short-circuiting the comparison, and we lose the informative error message. I'll try to understand where the short circuiting is happening.

Edit Probably because of:

> /home/anja/git/ibis/ibis/expr/types/core.py(805)_binop()                                                                                                                                      
-> except (ValidationError, NotImplementedError):                                                                                                                                               
> /home/anja/git/ibis/ibis/expr/types/core.py(806)_binop()                                                                                                                                      
-> return NotImplemented      

And by default when there's a NotImplemented, it defaults to Python's ==.

@anjakefala
Copy link
Contributor

@cpcloud Should we be raising ValidationError instead of converting it into a NotImplemented?

@cpcloud
Copy link
Member

cpcloud commented Nov 16, 2024

I think for some cases NotImplemented might be necessary, but I would try raising the error in the __eq__ and __ne__ cases only and see if anything breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

4 participants