Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening from a csv file, support Polars code execution in custom operations #342

Open
biiiipy opened this issue Nov 16, 2024 · 2 comments
Open
Labels
feature Feature request info needed

Comments

@biiiipy
Copy link

biiiipy commented Nov 16, 2024

Environment data

  • VS Code version: 1.95.2
  • Data Wrangler Extension version (available under the Extensions sidebar): v1.12.1
  • Jupyter Extension version (available under the Extensions sidebar): v2024.10.0
  • Python Extension version (available under the Extensions sidebar): v2024.20.0
  • OS (Windows | Mac | Linux distro) and version: Windows
  • Pandas version: None
  • Python and/or Anaconda version: 3.11.4
  • Type of virtual environment used (N/A | venv | virtualenv | conda | ...): N/A

Expected behaviour

Data Wrangler can execute Polars code in custom operations window

Actual behaviour

Exception:
Image

I think it is because opening plain csv doesn't have any runtime context, so it defaults to using pandas and doesn't support other libraries.

Steps to reproduce:

  1. Open a CSV file
  2. Click "Open in Data Wrangler"
  3. Write a Polars df code like df = df.sort("value", reverse=True) and execute

Logs

Output for Jupyter in the Output panel (ViewOutput, change the drop-down the upper-right of the Output panel to Jupyter)

AttributeError: 'DataFrame' object has no attribute 'sort'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_40096\1255212943.py in ?(code, old_ns, new_ns)
     38             name = get_ipython().compile.cache(code)
     39         except Exception:
     40             name = "<string>"
     41 
---> 42         exec(compile(code, name, 'exec'), session['namespaces']["create"](new_ns, old_ns))

~\AppData\Local\Temp\ipykernel_40096\2332454470.py in ?()
      1 # Sort by value in descending order
----> 2 df = df.sort("value", reverse=True)

~\AppData\Roaming\Python\Python311\site-packages\pandas\core\generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'sort'

@pwang347
Copy link
Member

Hi @biiiipy, thanks for opening this issue! We don't currently support using Polars code to manipulate the DataFrame (we only support loading from Polars DataFrames by converting it into Pandas).

For your use-case, do you mostly care that the exported code is in Polars? (e.g. you interact with the DataFrame during the interactive Data Wrangler session using the built-in operations UI and Pandas, and we translate the code on export, which can be used in a data pipeline written with Polars)

Or alternatively, is it more important to be able to work directly in Polars (for example, you have really large files you are working with locally that you are unable to effectively sample).

Thanks!

@pwang347 pwang347 added feature Feature request info needed labels Nov 20, 2024
@brianmcdonald
Copy link

The second option would be preferred (although understandably more work from the dev perspective) as it allows the user to stick with using just Polars and avoids any potential translation issues or having to read both Polars and Pandas code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request info needed
Projects
None yet
Development

No branches or pull requests

3 participants