Skip to content
Antje Janosch edited this page Nov 7, 2022 · 12 revisions

(valid for Python Integration >= 4.1, 2020-09)

Table of Contents

Python Requirements

The Python nodes can work with local Python installations. Preference settings allow to set the path to a Python executable for Python3 and 2 respectively. Within the preferences the user has to make a choice whether he wants to use Python3 or 2.

Required Packages

pandas >= 0.23.4
pickle
matplotlib

Python Nodes concept and architecture

Data transfer to/from Python

The exchange format between the Python node and Python is CSV. The following table should provide an overview over data type support. KNIME RowIds are transferred as Pandas data frame index.
Currently, experimental features of pandas 1.x, like consistent missing value support and string/boolean column types are not implemented.

KNIME Column => Pandas column
(Representation of missing values)
String (or String Compatible) Object
NaN
Double Float64
NaN
Int Int64 or Float64 (if it contains missing values)
NaN
Long -
(no way to represent missing values)
Bool Bool
True (bug in pandas import)
Date/Time/Durations
LocalDate Datetime64[ns] (no extreme value support)
NaT
LocalDateTime Datetime64[ns] (no extreme value support)
NaT
LocalTime Datetime64[ns] (adds the current date to the time!)
NaT
Duration Timedelta64 (Iso_8601 String import with parsing bugs on pandas side, recommendation to transfer as String instead)
NaT
Period Timedelta64 (Iso_8601 String import with parsing bugs on pandas side, recommendation to transfer as String instead)
NaT
Pandas Column =>
(support of missing values)
KNIME column
Object (no transfer of '\r' or '\\')
NaN
String
Float64
NaN
Double
Int64 Long
Bool Bool
datetime/timedelta
Datetime64
NaT
LocalDateTime (trucated to microseconds due to a pandas export bug)
Timedelta64
NaT
Duration (parsed Iso_8601 String)

Scripting Basics

Every Python node comes along with an example script which should work right away. A KNIME input table will be made available to Python as Pandas dataframe named kIn. A Pandas dataframe named pyOut is expected to be returned to KNIME.
KNIME flow variables can be used with the template 'FLOWVAR(myFlowVariableName)'. This part of the code is then replaced with the content of the variable before execution.

Open External

Every Python node has the option to push the input table(s) and the given script to an external Python session to provide a way of troubleshooting or prototyping.

There are two ways:

  • command line call of Python (given the executable of the preference settings) and a prepared script which reads in the KNIME input table. The Python code of the node will then be available as clipboard content
  • launching a Jupyter notebook which is prepared to read in the KNIME input table from the CSV-file and already contains the Python code of the node as well as procedures to export the result to CSV
The user can adapt the preference settings to his needs.

Command Line Setup

Nothing to do. A terminal (Mac/Linux) or Powershell (Windows) window will be opened and the selected Python executable is called with the prepared Python script. The Python code of the node is provided as clipboard content.

Jupyter Setup

Will be explained a bit more in detail at an extra Wiki-Page as it is planned to provide it for the R-nodes too.
Jupyter Preferences

Python Plots

Python Plots do provide the created image in at least two ways:

Output Options

The plot node offers an additional configuration tab to provide basic control over some image features.

File Type

Only valid for file export, not for Image Port or Node View
Support of

  • PNG
  • JPEG
  • SVG
  • PDF
  • TIF

Image Dimensions

Image width and height in pixel

DPI

Resolutions in dots per inch. Valid for all images (view, port and exported file)

File Name

If a filename is given and if the box "Write image to file" is checked, the image is exported as file. If the image already exists, it will be overwritten if the checkbox for overwriting is checked. The filename supports the following templates:

  • $$DATE$$ for the current date,
  • $$USER$$ for the user name,
  • $$WS$$ for the workspace directory, and
  • FLOWVAR(variable name) to use flow variable values in the file name.

View

The view shows a PNG with the given dimensions and the given DPI. At the lower left corner image dimensions are shown. Now, if the user resizes the image it will be rescaled which might not look very nice. But the user can force a recreation of that image by a double click on it with the new dimensions (which are still shown in the lower left corner).