Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: pd.to_sql(upsert=True, upsert_on_columns=['game_id']) #60434

Open
2 of 3 tasks
vile319 opened this issue Nov 27, 2024 · 0 comments
Open
2 of 3 tasks

ENH: pd.to_sql(upsert=True, upsert_on_columns=['game_id']) #60434

vile319 opened this issue Nov 27, 2024 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@vile319
Copy link

vile319 commented Nov 27, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I think it would be fantastic if upserting was built in as part of the pandas .to_sql() function. This is because I have to make my tables have a unique index separately before using the to_sql function, which is annoying.

Feature Description

Just go ahead and make the upsert add a unique constraint to a specific table based on the specific columns that the user wants. Of course you should warn them that this will remove duplicates (but it they want to upsert, they probably don't want duplicates in the first place). Just have something similar to this when doing pd.to_sql(upsert=True, upsert_on_columns=[game_id, player_id])

`def create_table_with_unique_constraint(table_name, engine, unique_columns):
"""Create a new table with unique constraints and copy data from old table."""
# Get column names and types from existing table
inspector = sqlalchemy.inspect(engine)
columns = inspector.get_columns(table_name)

# Create column definitions preserving data types
cols_sql = ', '.join([f'{col["name"]} {col["type"]}' for col in columns])
unique_cols = ', '.join(unique_columns)

# Create new table with unique constraint
temp_table = f"{table_name}_temp"
create_sql = f'CREATE TABLE {temp_table} ({cols_sql}, UNIQUE({unique_cols}) ON CONFLICT REPLACE)'

with engine.begin() as conn:
    # Create new table
    conn.execute(text(create_sql))
    
    # Copy data from old to new table
    conn.execute(text(f"INSERT INTO {temp_table} SELECT * FROM {table_name}"))
    
    # Drop old table
    conn.execute(text(f"DROP TABLE {table_name}"))
    
    # Rename new table to original name
    conn.execute(text(f"ALTER TABLE {temp_table} RENAME TO {table_name}"))

def to_sql_upsert(df, table_name, engine, unique_columns):
"""
Write DataFrame to SQL with upsert functionality.
If table exists with unique constraint, appends directly.
If not, creates table with constraint after appending.

Args:
    df: DataFrame to write
    table_name: Name of target SQL table
    engine: SQLAlchemy engine
    unique_columns: List of columns for unique constraint
"""
inspector = sqlalchemy.inspect(engine)

# Check if table exists and has unique constraint
has_constraint = False
if inspector.has_table(table_name):
    unique_constraints = inspector.get_unique_constraints(table_name)
    for constraint in unique_constraints:
        if set(constraint['column_names']) == set(unique_columns):
            has_constraint = True
            break

# Write data
df.to_sql(table_name, engine, if_exists='append', index=False)

# Add constraint if needed
if not has_constraint:
    create_table_with_unique_constraint(table_name, engine, unique_columns)`

Alternative Solutions

I don't know of any existing functionality, and a solution as simple as this does not seem to exist in 3rd party packages

Additional Context

This SHOULD be able to handle the case where A. you have no table and it is making it for the first time (it will automatically make it into a unique index table) B. you have a table but it does not have unique on conflict replace (it will copy over the old data into the new unique index table) and C. you have a table and it does have unique on conflict replace (it will just to_sql append. From there, sql will automatically handle the upserting.
Please, let me know if you have any questions, and thank you!

@vile319 vile319 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant