All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Added upper bound
<0.24.0
for scikit-learn version (#59)
- Fix dependencies to make stacking compatible with scikit-learn 0.23+ (#54)
- Removed support for Python <3.6. (#55)
- Make stacking compatible with scikit-learn v0.22.1. (#52)
- Turn on Python 3.7 and 3.8 for Travis CI builds. (#50)
- Removed the upper version bound for sklearn. (#50)
- Update tests and requirements.txt to allow sklearn 0.20 and above. (#47)
- Instead of boolean flag for
dummy_na
, have None/False (no dummying), 'expanded' (matches previous True behavior), and 'all' (dummy NAs in all columns where they appear, not just ones we're categorically expanding). (#44)
- Raise a RuntimeError if there are more than 5000 levels in a column (#42)
- Emit a warning if the column levels during transform don't overlap at all with levels from fitting (#41)
- In
DataFrameETL
, don't check for levels to expand in columns which are slated to be dropped. This will avoid raising a warning for too many levels in a column if the user has intentionally excluded that column (#39).
- Fixed
DataFrameETL
transformations ofDataFrame
s with non-trivial index when preservingDataFrame
output type (#32, #33) - Add
pandas
version restrictions by Python version (#37) - Fix code which was incompatible with older
pandas
version (#37)
- Added debug log emits for the
DataFrameETL
transformer (#24, #27) - Added debug log emits for the
HyperbandSearchCV
estimator (#28, #29) - Emit a warning if the user attempts to expand a column with too many categories (#25, #26)
- Now caching CV indices. When CV generators are passed with
shuffle=True
and norandom_state
is set, they produce different CV folds on each call tosplit
(#22). - Updated
scipy
dependency inrequirements.txt
file toscipy>=0.14,<2.0
DataFrameETL
now correctly handles allCategorial
-type columns in inputDataFrame
s. The fix also improves execution time oftransform
calls by 2-3x (#20).
- Added
check_null_cols
argument to check for null columns (#13)
- Fixed bug with fit_params handling in stacking (#12)
- Resolved issues with one and two-level edge cases for categorical expansion (#10)
- Included
y=None
in the fit method definition of DataFrameETL (#7)
- Improved parallel performance for hyperband (#8)
- Fixed version requirements for scikit-learn to properly import
MaskedArray
(#4). - In the stacking estimators, get_params no longer throws index error
when
estimator_list
is an empty list (#6).
- initial commit