Opening from a csv file, support Polars code execution in custom operations #342

biiiipy · 2024-11-16T23:26:24Z

Environment data

VS Code version: 1.95.2
Data Wrangler Extension version (available under the Extensions sidebar): v1.12.1
Jupyter Extension version (available under the Extensions sidebar): v2024.10.0
Python Extension version (available under the Extensions sidebar): v2024.20.0
OS (Windows | Mac | Linux distro) and version: Windows
Pandas version: None
Python and/or Anaconda version: 3.11.4
Type of virtual environment used (N/A | venv | virtualenv | conda | ...): N/A

Expected behaviour

Data Wrangler can execute Polars code in custom operations window

Actual behaviour

Exception:

I think it is because opening plain csv doesn't have any runtime context, so it defaults to using pandas and doesn't support other libraries.

Steps to reproduce:

Open a CSV file
Click "Open in Data Wrangler"
Write a Polars df code like df = df.sort("value", reverse=True) and execute

Logs

Output for Jupyter in the Output panel (View→Output, change the drop-down the upper-right of the Output panel to Jupyter)

AttributeError: 'DataFrame' object has no attribute 'sort'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_40096\1255212943.py in ?(code, old_ns, new_ns)
     38             name = get_ipython().compile.cache(code)
     39         except Exception:
     40             name = "<string>"
     41 
---> 42         exec(compile(code, name, 'exec'), session['namespaces']["create"](new_ns, old_ns))

~\AppData\Local\Temp\ipykernel_40096\2332454470.py in ?()
      1 # Sort by value in descending order
----> 2 df = df.sort("value", reverse=True)

~\AppData\Roaming\Python\Python311\site-packages\pandas\core\generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'sort'

The text was updated successfully, but these errors were encountered:

pwang347 · 2024-11-20T00:17:30Z

Hi @biiiipy, thanks for opening this issue! We don't currently support using Polars code to manipulate the DataFrame (we only support loading from Polars DataFrames by converting it into Pandas).

For your use-case, do you mostly care that the exported code is in Polars? (e.g. you interact with the DataFrame during the interactive Data Wrangler session using the built-in operations UI and Pandas, and we translate the code on export, which can be used in a data pipeline written with Polars)

Or alternatively, is it more important to be able to work directly in Polars (for example, you have really large files you are working with locally that you are unable to effectively sample).

Thanks!

brianmcdonald · 2025-01-12T11:06:56Z

The second option would be preferred (although understandably more work from the dev perspective) as it allows the user to stick with using just Polars and avoids any potential translation issues or having to read both Polars and Pandas code.

pwang347 added feature Feature request info needed labels Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening from a csv file, support Polars code execution in custom operations #342

Opening from a csv file, support Polars code execution in custom operations #342

biiiipy commented Nov 16, 2024

pwang347 commented Nov 20, 2024

brianmcdonald commented Jan 12, 2025

Opening from a csv file, support Polars code execution in custom operations #342

Opening from a csv file, support Polars code execution in custom operations #342

Comments

biiiipy commented Nov 16, 2024

Environment data

Expected behaviour

Actual behaviour

Steps to reproduce:

Logs

pwang347 commented Nov 20, 2024

brianmcdonald commented Jan 12, 2025