Skip to content

Commit

Permalink
Merge pull request #37 from vladdsm/0.5.2
Browse files Browse the repository at this point in the history
0.5.2
  • Loading branch information
vzhomeexperiments authored Jun 20, 2021
2 parents fff901d + 21e0915 commit ce44974
Show file tree
Hide file tree
Showing 10 changed files with 300 additions and 126 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: lazytrade
Type: Package
Title: Learn Computer and Data Science using Algorithmic Trading
Version: 0.5.2.9050
Version: 0.5.2
Author: Vladimir Zhbanko
Maintainer: Vladimir Zhbanko <[email protected]>
Description: Provide sets of functions and methods to learn and practice data science using idea of algorithmic trading.
Expand Down
10 changes: 5 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,21 @@
## Planned Changes

* setup github actions
* add fail safe for function input parameters

# lazytrade 0.5.2

# Version 0.5.2

## Changes

* add second parameter to simulation function
* add second parameter to simulation function `aml_simulation`
* option to use full columns for model training when selecting 0 as a parameter `num_cols_used`
* add suppress messages option during `readr::read_csv()` function calls
* option to use full columns for model training when selecting 0 as a parameter num_cols_used
* fail safe in `aml_collect_data` function will delete already recorded rds file if it has different
amount of columns
* fail safe in `aml_collect_data` function will delete already recorded rds file if it has different amount of columns
* add new function `util_find_pid` to find the PIDs of the terminal.exe application

* function `mt_stat_transf` is now using a rule to assign 3 market type classes
* rewrite function `mt_make_model` with the same philosophy as in `aml_make_model`

# lazytrade 0.5.1

Expand Down
171 changes: 112 additions & 59 deletions R/mt_make_model.R
Original file line number Diff line number Diff line change
@@ -1,38 +1,55 @@
#' Function to train Deep Learning Classification model for Market Type recognition
#'
#' @description Function is training h2o deep learning model to match manually classified patterns of the financial
#' indicator. Main idea is to be able to detect Market Type by solely relying on the current indicator pattern.
#' This is in the attempt to evaluate current market type and to use proper trading strategy.
#' Function will always try to gather mode data to update the model.
#'
#' Selected Market Periods according to the theory from Van K. Tharp:
#' @description Function is training h2o deep learning model to match
#' classified patterns of the financial indicator.
#' Main idea is to be able to detect Market Type by solely relying on the
#' current indicator pattern.
#' This is in the attempt to evaluate current market type for trading purposes.
#'
#' Selected Market Periods could be manually classified
#' according to the theory from Van K. Tharp:
#' 1. Bull normal, BUN
#' 2. Bull volatile, BUV
#' 3. Bear normal, BEN
#' 4. Bear volatile, BEV
#' 5. Sideways quiet, RAN
#' 6. Sideways volatile, RAV
#'
#' `r lifecycle::badge('stable')`
#' For automatic classification, could only be used: BUN, BEN, RAN market types
#'
#' `r lifecycle::badge('experimental')`
#'
#' @details Function is using manually prepared dataset and tries several different random neural network structures.
#' Once the best neural network is found then the better model is trained and stored.
#' @details Function is using labeled dataset and tries several different random
#' neural network structures. Once the best neural network is found then the
#' better model is selected and stored. Dataset can be either manually labelled
#' or generated using function mt_stat_transf.R. In the latter case parameter
#' is_cluster shall be set to TRUE.
#'
#' @author (C) 2020, 2021 Vladimir Zhbanko
#' @backref Market Type research of Van Tharp Institute: <https://www.vantharp.com/>
#'
#' @param indicator_dataset Dataframe, Dataset containing indicator patterns to train the model
#' @param indicator_dataset Data frame, Data set containing indicator patterns to train the model
#' @param num_bars Integer, Number of bars used to detect pattern
#' @param timeframe Integer, Data timeframe in Minutes.
#' @param timeframe Integer, Data time frame in minutes.
#' @param path_model String, Path where the models are be stored
#' @param path_data String, Path where the aggregated historical data is stored, if exists in rds format
#' @param activate_balance Boolean, option to choose if to balance market type classes or not, default TRUE
#' @param num_nn_options Integer, value from 3 to 24 or more. Used to change number of variants
#' of the random neural network structures. Value 3 will mean that only one
#' random structure will be used. To avoid warnings make sure to set this value
#' @param path_data String, Path where the aggregated historical data is stored,
#' if exists, in rds format
#' @param activate_balance Boolean, option to choose to balance market type classes or not,
#' default TRUE
#' @param num_nn_options Integer, value from 0 to 24 or more as multiple of 3.
#' Used to change number of variants for 3 hidden layer structure.
#' Random neural network structures will be generated.
#' When value 0 is set then a fixed structure will be used as
#' defined by parameter fixed_nn_struct.
#' To avoid warnings make sure to set this value as
#' multiple of 3. Higher values will increase computation time.
#' @param fixed_nn_struct Integer vector with numeric elements, see par hidden in ?h2o.deeplearning,
#' default value is c(100,100).
#' Note this will only work if num_nn_options is 0
#' @param num_epoch Integer, see parameter epochs in ?h2o.deeplearning, default value is 100
#' Higher number may lead to long code execution
#' @param is_cluster Boolean, set TRUE to use automatically clustered data
#'
#'
#' @return Function is writing file object with the model
#' @export
#'
Expand All @@ -58,17 +75,18 @@
#' h2o.init(nthreads = 2)
#'
#'
#' # performing Deep Learning Classification using the custom function manually prepared data
#' # performing Deep Learning Classification using manually labelled data
#' mt_make_model(indicator_dataset = macd_ML60M,
#' num_bars = 64,
#' timeframe = 60,
#' path_model = path_model,
#' path_data = path_data,
#' activate_balance = TRUE,
#' num_nn_options = 3)
#' num_nn_options = 3,
#' num_epoch = 10)
#'
#' data(price_dataset_big)
#' data <- head(price_dataset_big, 500) #reduce computational time
#' data <- head(price_dataset_big, 5000) #reduce computational time
#'
#' ai_class <- mt_stat_transf(indicator_dataset = data,
#' num_bars = 64,
Expand All @@ -83,10 +101,23 @@
#' path_model = path_model,
#' path_data = path_data,
#' activate_balance = TRUE,
#' num_nn_options = 3,
#' num_nn_options = 6,
#' num_epoch = 10,
#' is_cluster = TRUE)
#'
#'
#' # performing Deep Learning Classification using the custom function auto clustered data
#' # and fixed nn structure
#' mt_make_model(indicator_dataset = ai_class,
#' num_bars = 64,
#' timeframe = 60,
#' path_model = path_model,
#' path_data = path_data,
#' activate_balance = TRUE,
#' num_nn_options = 0,
#' fixed_nn_struct = c(10, 10),
#' num_epoch = 10,
#' is_cluster = TRUE)
#'
#' # stop h2o engine
#' h2o.shutdown(prompt = FALSE)
#'
Expand All @@ -98,22 +129,31 @@
#'
#'
mt_make_model <- function(indicator_dataset,
num_bars,
num_bars = 64,
timeframe = 60,
path_model, path_data,
path_model,
path_data,
activate_balance = TRUE,
num_nn_options = 24,
fixed_nn_struct = c(100, 100),
num_epoch = 100,
is_cluster = FALSE){

requireNamespace("dplyr", quietly = TRUE)
requireNamespace("readr", quietly = TRUE)
requireNamespace("h2o", quietly = TRUE)

# generate a file name for model
m_name <- paste0("DL_Classification", "_", timeframe, "M")
m_path <- file.path(path_model, m_name)

if(is_cluster == TRUE){
num_bars <- ncol(indicator_dataset)-1
}

macd_ML2 <- indicator_dataset %>% dplyr::mutate_at("M_T", as.factor)
macd_ML2 <- indicator_dataset %>%
#make sure column with label is a factor
dplyr::mutate(across("M_T", as.factor))

# check if we don't have too much data
x1_nrows <- macd_ML2 %>% nrow()
Expand All @@ -125,19 +165,36 @@ mt_make_model <- function(indicator_dataset,
utils::head(40000)
}

# split data into 2 groups
# split data to train and test blocks
# note: model will be trained on the OLDEST data
test_ind <- 1:round(0.3*x1_nrows) #test indices 1:xxx
dat21 <- macd_ML2[test_ind, ] #dataset to test the model using 30% of data
dat22 <- macd_ML2[-test_ind, ] #dataset to train the model


# get this data into h2o:
macd_ML <- as.h2o(x = macd_ML2, destination_frame = "macd_ML")

macd_ML <- h2o::as.h2o(x = dat22, destination_frame = "macd_ML")
recent_ML <- h2o::as.h2o(x = dat21, destination_frame = "recent_ML")

# for loop to select the best neural network structure
### fix or random network structure num_nn_options <- 24
###
n_layers <- length(fixed_nn_struct)

if(num_nn_options == 0){
nn_sets <- fixed_nn_struct %>% matrix(ncol = n_layers)
} else {
nn_sets <- sample.int(n = 100, num_nn_options) %>% matrix(ncol = 3)
}

# try different models and choose the best one...
### random network structure
nn_sets <- sample.int(n = 100, num_nn_options) %>% matrix(ncol = 3)

for (i in 1:dim(nn_sets)[1]) {

# i <- 1


ModelC <- h2o.deeplearning(
ModelM <- h2o.deeplearning(
model_id = paste0("DL_Classification", "_", timeframe, "M"),
x = names(macd_ML[,1:num_bars]),
y = "M_T",
Expand All @@ -152,14 +209,18 @@ mt_make_model <- function(indicator_dataset,
distribution = "AUTO",
stopping_metric = "AUTO",
balance_classes = activate_balance,
epochs = 200)

#ModelC
#summary(ModelC)
#h2o.performance(ModelC)
RMSE <- h2o::h2o.performance(ModelC)@metrics$RMSE %>% as.data.frame()
epochs = num_epoch)

#ModelM
#summary(ModelM)
#h2o.performance(ModelM)

### define best model using RMSE
RMSE <- h2o::h2o.performance(ModelM,newdata = recent_ML)@metrics$RMSE %>%
as.data.frame()
#RMSE <- h2o::h2o.performance(ModelM)@metrics$RMSE %>% as.data.frame()
names(RMSE) <- 'RMSE'

# record results of modelling
if(!exists("df_res")){
df_res <- nn_sets[i,] %>% t() %>% as.data.frame() %>% dplyr::bind_cols(RMSE)
Expand All @@ -168,32 +229,24 @@ mt_make_model <- function(indicator_dataset,
df_res <- df_res %>% dplyr::bind_rows(df_row)
}



#save intermediate models!
# save model object
temp_model_path <- file.path(path_model, i)
if(!dir.exists(temp_model_path)){dir.create(temp_model_path)}
h2o::h2o.saveModel(ModelM, path = temp_model_path, force = T)

} # end of for loop

# find which row in the df_res has the smallest RMSE value slice(which.min(Employees))
lowest_RMSE <- df_res %>% dplyr::slice(which.min(RMSE)) %>% select(-RMSE) %>% unlist() %>% unname()

ModelC <- h2o.deeplearning(
model_id = paste0("DL_Classification", "_", timeframe, "M"),
x = names(macd_ML[,1:num_bars]),
y = "M_T",
training_frame = macd_ML,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = lowest_RMSE,
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "AUTO",
balance_classes = activate_balance,
epochs = 200)

h2o.saveModel(ModelC, path = path_model, force = TRUE)
best_row <- which.min(df_res$RMSE)

## retrieve and copy/paste the best model
best_model_location <- file.path(path_model, best_row, m_name)
best_model_destination <- file.path(path_model, m_name)
# copy best model object
file.copy(best_model_location, path_model, overwrite = TRUE)


#h2o.shutdown(prompt = FALSE)

Expand Down
1 change: 1 addition & 0 deletions R/mt_stat_evaluate.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
#' path_data = path_data,
#' activate_balance = TRUE,
#' num_nn_options = 3,
#' num_epoch = 10,
#' is_cluster = TRUE)
#'
#'
Expand Down
Loading

0 comments on commit ce44974

Please sign in to comment.