You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that PCA, KDiscordODetector & Telemanon don't make predictions for all data points provided. One will get this issue after training (using .fit(X)) and now want to use .predict(Y) for evaluation. Let's say we want to run the following code:
X # Training data
Y # Eval data
y_true # Eval true labels
model # Either PCA, KDiscordODetector or Telemanon
model.fit(X) # Train model on X
y_pred = model.predict(Y) # Make model prediction on Y
# Analyse evaluation results
accuracy_score(y_true, y_pred)
confusion_matrix(y_true, y_pred)
classification_report(y_true, y_pred)
The last three lines won't run because y_pred is always shorter than y_true. That is due to these methods using the function get_sub_matrices(X, window_size, step, return_numpy, flatten, flatten_order) (found in utility.py) that returns a numpy array of shape (valid_len), window_size*n_sequenses), where each row stands for a flattened submatrix (Below you will find a copy of this function). This function cuts the data up into matrices based on the window_size and step parameters. However, if the last points in the data are not enough to form a new sub-matrix, they will not be taken along in the prediction. Therefore when analysing the evaluating results, you will have to change the above example code to:
X # Training data
Y # Eval data
y_true # Eval true labels
model # Either PCA, KDiscordODetector or Telemanon
model.fit(X) # Train model on X
y_pred = model.predict(Y) # Make model prediction on Y
# Analyse evaluation results
accuracy_score(y_true[:len(y_pred], y_pred)
confusion_matrix(y_true[:len(y_pred], y_pred)
classification_report(y_true[:len(y_pred], y_pred)
Here is the code where the sub_matrices are produced:
def get_sub_matrices(X, window_size, step=1, return_numpy=True, flatten=True,
flatten_order='F'):
"""Chop a multivariate time series into sub sequences (matrices).
Parameters
----------
X : numpy array of shape (n_samples,)
The input samples.
window_size : int
The moving window size.
step_size : int, optional (default=1)
The displacement for moving window.
return_numpy : bool, optional (default=True)
If True, return the data format in 3d numpy array.
flatten : bool, optional (default=True)
If True, flatten the returned array in 2d.
flatten_order : str, optional (default='F')
Decide the order of the flatten for multivarite sequences.
‘C’ means to flatten in row-major (C-style) order.
‘F’ means to flatten in column-major (Fortran- style) order.
‘A’ means to flatten in column-major order if a is Fortran contiguous in memory,
row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory.
The default is ‘F’.
Returns
-------
X_sub : numpy array of shape (valid_len, window_size*n_sequences)
The numpy matrix with each row stands for a flattend submatrix.
"""
X = check_array(X).astype(np.float)
n_samples, n_sequences = X.shape[0], X.shape[1]
# get the valid length
valid_len = get_sub_sequences_length(n_samples, window_size, step)
X_sub = []
X_left_inds = []
X_right_inds = []
# exclude the edge
steps = list(range(0, n_samples, step))
steps = steps[:valid_len]
# print(n_samples, n_sequences)
for idx, i in enumerate(steps):
X_sub.append(X[i: i + window_size, :])
X_left_inds.append(i)
X_right_inds.append(i + window_size)
X_sub = np.asarray(X_sub)
if return_numpy:
if flatten:
temp_array = np.zeros([valid_len, window_size * n_sequences])
if flatten_order == 'C':
for i in range(valid_len):
temp_array[i, :] = X_sub[i, :, :].flatten(order='C')
else:
for i in range(valid_len):
temp_array[i, :] = X_sub[i, :, :].flatten(order='F')
return temp_array, np.asarray(X_left_inds), np.asarray(
X_right_inds)
else:
return np.asarray(X_sub), np.asarray(X_left_inds), np.asarray(
X_right_inds)
else:
return X_sub, np.asarray(X_left_inds), np.asarray(X_right_inds)
def get_sub_sequences_length(n_samples, window_size, step):
"""Pseudo chop a univariate time series into sub sequences. Return valid
length only.
Parameters
----------
X : numpy array of shape (n_samples,)
The input samples.
window_size : int
The moving window size.
step_size : int, optional (default=1)
The displacement for moving window.
Returns
-------
valid_len : int
The number of subsequences.
"""
# if X.shape[0] == 1:
# n_samples = X.shape[1]
# elif X.shape[1] == 1:
# n_samples = X.shape[0]
# else:
# raise ValueError("X is not a univarite series. The shape is {shape}.".format(shape=X.shape))
# valid_len = n_samples - window_size + 1
# valida_len = int_down(n_samples-window_size)/step + 1
valid_len = int(np.floor((n_samples - window_size) / step)) + 1
return valid_len
The text was updated successfully, but these errors were encountered:
I noticed that
PCA
,KDiscordODetector
&Telemanon
don't make predictions for all data points provided. One will get this issue after training (using .fit(X)
) and now want to use.predict(Y)
for evaluation. Let's say we want to run the following code:The last three lines won't run because
y_pred
is always shorter thany_true
. That is due to these methods using the functionget_sub_matrices(X, window_size, step, return_numpy, flatten, flatten_order)
(found inutility.py
) that returns anumpy
array of shape (valid_len), window_size*n_sequenses), where each row stands for a flattened submatrix (Below you will find a copy of this function). This function cuts the data up into matrices based on thewindow_size
andstep
parameters. However, if the last points in the data are not enough to form a new sub-matrix, they will not be taken along in the prediction. Therefore when analysing the evaluating results, you will have to change the above example code to:Here is the code where the sub_matrices are produced:
The text was updated successfully, but these errors were encountered: