Improve crossvalidation gridsearch to support group shufflesplit #12

danielmlow · 2020-05-21T17:30:42Z

Also for nested resampling. Here's some code that one of us can start with:

import numpy as np
from sklearn.model_selection import GroupShuffleSplit

X = np.array([[0,1,2], [0,1,2],[0,1,2],[0,1,2],[0,1,2],[0,1,2], [0,1,2],[0,1,2],[0,1,2],[0,1,2]])
y = np.array([1,1,0,1,0,0,1,1,0,1])
groups = np.array([0,1,2,3,4,4,5,5,6,7])
seed_value = 1234
n_splits = 30
test_size = 0.2


folds = GroupShuffleSplit(n_splits=n_splits, test_size=test_size, random_state=seed_value)
splits = folds.split(X, y, groups=groups)


for train_index, test_index in splits:
	X_train = np.array(X)[train_index]
	y_train = np.array(y)[train_index]
	X_test = np.array(X)[test_index]
	y_test = np.array(y)[test_index]
	# you shouldnt see 4 in train and 4 in test (same with 5)
	print(groups[train_index])
	print(groups[test_index])
	print('\n')

The text was updated successfully, but these errors were encountered:

satra · 2020-05-21T22:03:26Z

just a note that this already exists as a pydra task in pydra-ml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve crossvalidation gridsearch to support group shufflesplit #12

Improve crossvalidation gridsearch to support group shufflesplit #12

danielmlow commented May 21, 2020

satra commented May 21, 2020 •

edited

Loading

Improve crossvalidation gridsearch to support group shufflesplit #12

Improve crossvalidation gridsearch to support group shufflesplit #12

Comments

danielmlow commented May 21, 2020

satra commented May 21, 2020 • edited Loading

satra commented May 21, 2020 •

edited

Loading