-
Integration with scikit-learn, sklearn.compose can be used to create column transformer to be integrated in sklearn.pipeline.
-
Extending pandas cyberpanda official repo example of using extending pandas.
-
Fletcher use, another example of a tool using extensionArray interface that allows to use arrow columns in pandas dataframe with multiple benefits (debug information, random data generation, etc ...)
import fletcher as fr
import pandas as pd
df = pd.DataFrame({
'str_column': fr.FletcherArray(['Test', None, 'Strings'])
})
df.info()
# <class 'pandas.core.frame.DataFrame'>
# # RangeIndex: 3 entries, 0 to 2
# # Data columns (total 1 columns):
# # str_column 2 non-null string
# # dtypes: string(1)
# # memory usage: 108.0 bytes
-
Little assign trick, in python3 you can use a created value in the same assign right after creation.
-
Inplace param deprecation, to foster more "functionnal" data pipeline based on chaining of assign, map, apply, etc ...
-
Lazy evaluation, in pandas use nlargest instead of sort and head for more perf.
-
Mentions cool projects for efficient high level operation, like dask for distributed computation, Ibis a abstractor for SQL and storage plateform.
-
Arrow, the solution for pandas memory perf in the futur, fletcher give a nice introduction to what this futur will be.
-
Kernel Density Estimate, new pandas.Series.plot.kde added during a spring, that can be useful for supervised learning like SVM to normalize input.
-
Pandas'community recognize the current memory backend limitation and prepare for a switch to arrow in the futur, but you already can use Fletcher in your project 😲 .
-
Even if most of the API stay the same you better don't forgot to check for new features.
-
I got another nail the coffin to my use of inplace 😬.