Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CSV Export Creates Corrupted Files #6

Closed
arjun-katonic-ai opened this issue Nov 22, 2024 · 5 comments
Closed

[Bug]: CSV Export Creates Corrupted Files #6

arjun-katonic-ai opened this issue Nov 22, 2024 · 5 comments

Comments

@arjun-katonic-ai
Copy link

arjun-katonic-ai commented Nov 22, 2024

Description:
The export_to_csv() function generates corrupted CSV files that cannot be opened correctly in standard spreadsheet applications like Microsoft Excel or Google Sheets. The issue occurs due to improper encoding, inconsistent delimiters, and unescaped special characters (e.g., commas, quotes, and newlines) in the exported data.

Reproduction Steps:

Call export_to_csv() with data containing special characters such as quotes, commas, or newline characters.
Attempt to open the generated file in Excel or a text editor.
Observe one or more of the following issues:
File fails to open in Excel.
Rows appear misaligned due to improper handling of delimiters.
Special characters in the data are not escaped, breaking the CSV format.
Problematic Code Example:

with open('data.csv', 'w') as file:
    for row in data:
        file.write(','.join(row) + '\n')

Issues Identified:

Improper Encoding: The default encoding may not support special characters like non-ASCII symbols, leading to corrupted files.
Delimiter Issues: Data containing commas isn’t enclosed in quotes, causing misalignment.
Lack of Escaping: Special characters (e.g., quotes and newlines) in the data are not properly escaped.
Impact:
This bug affects users exporting data with special characters or international text, resulting in unusable files or data loss.

Expected Behavior:

  • Generate valid CSV files adhering to the CSV standard.
  • Properly escape special characters and delimiters.
  • Use encoding compatible with international text (e.g., UTF-8).
@arjun-katonic-ai
Copy link
Author

arjun-katonic-ai commented Nov 25, 2024

QA:
The export_to_csv() function creates corrupted files when the data contains special characters such as commas or quotes. For example, consider this data:

data = [["John, Doe", "30", "New York"], ["Jane Doe", "25", "Los Angeles"]]
export_to_csv(data)

This produces:

John, Doe,30,New York
Jane Doe,25,Los Angeles

When opened in Excel, the first row appears misaligned because the name “John, Doe” isn’t enclosed in quotes. Can we address this issue by adhering to CSV standards?

@arjun-katonic-ai
Copy link
Author

Developer:
Thanks for reporting this! I reviewed the export_to_csv() function, and here’s what I found:

  1. We’re not escaping special characters like commas, quotes, or newlines.
  2. The file is saved without specifying UTF-8 encoding, causing issues with non-ASCII characters.

I’ll refactor the function to:

  • Use Python’s csv module for proper CSV generation.
  • Ensure all special characters are escaped.
  • Set UTF-8 encoding as the default.

Working on the fix now.

@arjun-katonic-ai
Copy link
Author

Developer:
I’ve updated the export_to_csv() function to address the issues:

  1. Used the csv Module:
    The csv.writer class ensures proper escaping of special characters and delimiters.

  2. Specified UTF-8 Encoding:
    Added UTF-8 encoding to handle international text.

  3. Enhanced Testing:
    Tested with data containing special characters, quotes, and newlines.

Updated code snippet:

import csv

def export_to_csv(data, file_name='data.csv'):
    try:
        with open(file_name, 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file, quoting=csv.QUOTE_ALL)
            writer.writerows(data)
        print(f"File exported successfully: {file_name}")
    except Exception as e:
        raise ValueError(f"Failed to export CSV: {e}")

The changes are in branch fix/csv-export-bug. Please validate and let me know if further adjustments are needed.

@arjun-katonic-ai
Copy link
Author

QA:
Tested the updated function with the following scenarios:

  • Special Characters: Successfully exported data with commas, quotes, and newlines.
  • International Text: Handled non-ASCII characters like “Élève” and “München” without issues.
  • File Integrity: Opened files in Excel and Google Sheets without alignment issues or errors.

Everything looks good. This fix resolves the issue completely. Great work!

@arjun-katonic-ai
Copy link
Author

Developer:
Thanks for validating! I’ve merged the fix into the main branch. The function now generates standard-compliant CSV files with robust encoding and escaping. Marking this issue as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant