Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Team 5's pull request #6

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
14bcd95
fix issue: turn factor type columns to numeric
zoltanszebenyi May 8, 2019
2adc2b1
Create initial package
TiborSzabo42 May 8, 2019
a2d84c5
bmarketing.R moved to folder R
TiborSzabo42 May 8, 2019
a41b5e3
R script moved into root directory where it belongs
BalintKomjati May 8, 2019
bfff9e4
R script moved to inst
BalintKomjati May 8, 2019
a7f72b5
csv moved to inst
BalintKomjati May 8, 2019
90c7d2e
# calcPerformance function added
TiborSzabo42 May 8, 2019
15f9d98
added function "clean"
zoltanszebenyi May 8, 2019
5d3c3a7
Transform function added
BalintKomjati May 8, 2019
5005d23
Name of the function corrected in calcPerformance.R
TiborSzabo42 May 8, 2019
649ace5
Readme updated
TiborSzabo42 May 8, 2019
4a37c22
Transform function comments translated so Team4 can read them as well…
BalintKomjati May 9, 2019
2550682
Transform funciton comments translated to EN so Team4 colleagues can …
BalintKomjati May 9, 2019
1cdbcbc
Update DESCRIPTION
BalintKomjati May 9, 2019
61849dd
Delete .RData
BalintKomjati May 9, 2019
6d402ed
create package documentation with roxygen
zoltanszebenyi May 9, 2019
c67d7cf
create documentation for function "clean" by roxygen
zoltanszebenyi May 9, 2019
ec5fbf5
All functions are now available in the default namespace. Transform f…
BalintKomjati May 9, 2019
1778be4
functions are not REALLY available (fingers crossed)
BalintKomjati May 9, 2019
2d023c6
README.Rmd edited
TiborSzabo42 May 9, 2019
36880bf
Normality test changed to work with 5000+ samples
BalintKomjati May 9, 2019
084f410
fix issues #5, #8 and #10
zoltanszebenyi May 9, 2019
c6ec13d
update ignores
zoltanszebenyi May 9, 2019
9a8f2c0
calcPerformance now returns a list containing the results
TiborSzabo42 May 9, 2019
28e9838
Readme.Rdm is updated
TiborSzabo42 May 9, 2019
73a7641
calcPerformnce's example is corrected, fitModel.R added
TiborSzabo42 May 9, 2019
b1a80ee
fitModel.R added
TiborSzabo42 May 9, 2019
da26287
Documentations of calcPerformance.R and fitModel.R are corrected
TiborSzabo42 May 9, 2019
28c8b25
Readme.rd is updated
TiborSzabo42 May 9, 2019
373eade
add bmarketing and bmarketing2 datasets
zoltanszebenyi May 9, 2019
336b663
add example in docu for function clean
zoltanszebenyi May 9, 2019
2c52dd0
standarization added to Transformation
BalintKomjati May 9, 2019
50abd9d
function predictByModel is added
TiborSzabo42 May 9, 2019
fe3f738
Merge branch 'master' of https://github.com/BalintKomjati/bmarketing
TiborSzabo42 May 9, 2019
d60f612
Readme is updated
TiborSzabo42 May 9, 2019
d5f6ca7
fixed bmarketing2 data and implement meanimpute functionality in clean()
zoltanszebenyi May 9, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
^test\.Rproj$
^\.Rproj\.user$
^README\.Rmd$
512 changes: 512 additions & 0 deletions .Rhistory

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.Rproj.user
.Rhistory
.RData
15 changes: 15 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Package: bmarketing
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person("First", "Last", email = "[email protected]", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: What license is it under?
Encoding: UTF-8
LazyData: true
Imports: tidyverse,
rpart,
rpart.plot,
dplyr,
nortest
RoxygenNote: 6.1.1

7 changes: 7 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(calcPerformance)
export(clean)
export(fitModel)
export(predictByModel)
export(transform)
34 changes: 34 additions & 0 deletions R/Transform.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#' Log-Tranforms and standardizes input data for easy model development\cr
#'
#' If data is numeric and negative \cr
#' AND if data appears as non-normal \cr
#' the function performs a log transformation\cr
#' If data is numeric it then standardizes\cr
#'
#' @param input dataframe
#' @examples
#' @export
#'


#todos
#function should give warning if a variable was log transformed
#log transformation should be optional

transform <- function(input) {
output <- as.data.frame(lapply(input, function(x) {
if(is.numeric(x) && min(x)>0) {
if(ad.test(x)$p.value <.05) {
x<-log(x)
}
x<-scale(x)
}
x
}))
output

}




14 changes: 14 additions & 0 deletions R/bmarketing.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#' bmarketing: A package for analyzing datasets of banking customers.
#'
#' The bmarketing package provides three important functions:
#' clean, transform and calcPerformance.
#'
#' @section bmarketing functions: \cr
#' - clean: A function to clean data (clean NA values, basic checks)\cr
#' - transform: A function to log transform values\cr
#' - calcPerformance: A function to ...
#'
#' @docType package
#' @name bmarketing
#'
NULL
50 changes: 50 additions & 0 deletions R/calcPerformance.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#' Reports model's classification accuracy measures
#'
#' @param y Target variable (class or numeric)
#' @param y_pred Target variable (class or numeric)
#' @return Gives back a classification report containing the Confusion Matrix, Sensitivity, Specificity, Precision and Accuracy, and a list containing the results
#' @examples
#' y_example = c(0,1,1,0)
#' y_pred_example = c(1,1,1,0)
#' results <- calcPerfogit pushrmance(y = y_example, y_pred = y_pred_example)
#' @export
#'
calcPerformance <- function(y, y_pred) {

if( length(y) != length(y_pred) ){
stop("y and y_pred do not have the same number of observations")
}

if( any(is.na(y)) ){
stop("y contains value NA(s)")
}

if( any(is.na(y_pred)) ){
stop("y_pred contains value NA(s)")
}

cm <- table(y, y_pred)

res <- data.frame(test = c("TPR",
"TNR",
"Precision",
"Accuracy"),

value = c( round( 100 * cm[2,2] / ( cm[2,2] + cm[2,1]), 3 ),
round( 100 * cm[1,1] / ( cm[1,1] + cm[1,2]), 3 ),
round( 100 * cm[2,2] / ( cm[2,2] + cm[1,2]), 3 ),
round( 100 * mean(y == y_pred) , 3 ))
)


print( "Confusion matrix")
print( cm )
print("")
print( paste( "True Positive Rate (Sensitivity):", res[1, 2], "%" ) )
print( paste( "True Negative Rate (Specificity):", res[2, 2], "%" ) )
print( paste( "Precision:" , res[3, 2], "%" ) )
print( paste( "Accuracy:" , res[4, 2], "%" ) )

return(list(cm, res))
}

39 changes: 39 additions & 0 deletions R/clean.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#' Clean function.
#'
#' Cleans a dataset: \cr
#' - return an error if the target variable contains any missing values (NA’s). \cr
#' - Give clear warnings for all other variables which contain NA’s. \cr
#' - Remove any columns (and report as warning) which contain more than 50% NA’s
#'
#' @param x A dataframe
#' @param t The name of the target variable column of dataframe x
#' @examples
#' cleaned_data <- clean(bmarketing, "y")
#' @export
#'



clean <- function(x, t, meanimpute = FALSE) {
if (is.data.frame(x)==FALSE) {stop("Parameter x must be a dataframe")}
if (is.na(match(t, names(x)))==TRUE) {stop("Parameter t must be the name (string) of a column in the dataframe")}
if (any(is.na(x[[t]]))==TRUE) {stop(paste("The target variabe", t, "contais NA values"))}
if (any(is.na(x[ , -which(colnames(x)==t)]))==TRUE) {warning("Explanatory variables contain NA values")}
count_na <- sapply(x, function(y) sum(length(which(is.na(y))))/length(y))
cols_to_remove <- names(count_na[count_na > 0.5])
if (meanimpute) {
cols_imputed <- c()
for(i in 1:ncol(x)){
if(is.numeric(x[,i]) && any(is.na(x[,i]))) {
x[is.na(x[,i]), i] <- mean(x[,i], na.rm = TRUE)
cols_imputed <- c(cols_imputed, colnames(x)[i])
}
}
warning(paste("The following columns were meanimputed: ", paste(cols_imputed, collapse=", ")))
}
if (length(cols_to_remove)==0) {return(x)}
else {
warning(paste("The following columns are removed: ", paste(cols_to_remove,collapse=", ")))
return(x[,-which(colnames(x)==cols_to_remove)])
}
}
35 changes: 35 additions & 0 deletions R/fitModel.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#' Reports model's classification accuracy measures
#'
#' @param data data.frame used for model fitting
#' @param y name of the target variable (quoted character)
#' @param modelType string, name of the requested model type: either 'Logistic' or 'DecisionTree'
#' @param explVars either NULL or character vector containing list of explanatory variables
#' @return Return the objectum of the model
#' @examples
#' df <- data.frame(y = c(0,1,1,0), a = c('a', 'b', 'c', 'a'), b = c(12,121,11,12))
#' varList <- c('a','b')
#' results <- fitModel(data = df, y = 'y', modelType = 'Logistic', explVars = varList)
#' @export
#'

fitModel <- function(data, y, modelType, explVars = NULL) {

if( !(modelType %in% c("Logistic", "DecisionTree")) ){
stop("Unknown model type")
}

# Concatenates the model formula
if( is.null(explVars) ){
modelFormula <- paste(y, "~ .")
} else {
modelFormula <- paste(y, "~", paste(explVars, collapse = "+"))
}

# Which model is requested?
if( modelType == "DecisionTree" ){
fit <- rpart(as.formula(modelFormula), data = data)
} else {
fit <- glm(as.formula(modelFormula), data = data, family = "binomial")
}
}

29 changes: 29 additions & 0 deletions R/predictByModel.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#' Predicts output from DecisitonTree or Logistic model
#'
#' @param data dataset to be used for prediction
#' @param model2Predict model which the predictions are based on
#' @param modelType type of the model
#' @return Gives back predictions
#' @examples
#' y_example = c(0,1,1,0)
#' y_pred_example = c(1,1,1,0)
#' @export
#'

predictByModel <- function(data, model2Predict, modelType){

if( !(modelType %in% c("Logistic", "DecisionTree")) ){
stop("Unknown model type")
}

if( modelType == "Logistic"){
pred <- as.factor( (predict(object = model2Predict, data = data, type = "response") > 0.5) * 1 )
levels(pred) <- c("no", "yes")
} else {
pred <- predict( object = model2Predict, data = data, type = "class")
}

return(pred)
}


60 changes: 53 additions & 7 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,66 @@ output: github_document

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
fig.path = "man/figures/README-",
out.width = "100%"
)
library(bmarketing)
library(rpart)
library(rpart.plot)
```
# bmarketing

[![Travis Build Status](https://travis-ci.org/Quantargo/bmarketing.svg?branch=master)](https://travis-ci.org/Quantargo/bmarketing)
[![Coverage Status](https://img.shields.io/codecov/c/github/Quantargo/bmarketing/master.svg)](https://codecov.io/github/Quantargo/bmarketing?branch=master)
<!-- badges: start -->
<!-- badges: end -->

## Overview
The goal of bmarketing is to provide functions useful for data cleansing, modelling and reporting tasks.

The bmarketing dataset
## Installation

<!-- TODO: Change README to make it more descriptive, add examples, etc. -->
You can install the released version of bmarketing from [Github](https://github.com/BalintKomjati/bmarketing) with:

```{r echo=FALSE}
devtools::install_github("BalintKomjati/bmarketing")
library(bmarketing)
```

## Example

This is a basic workflow for package usage is the following:

1) Import the package

```{r}
library(bmarketing)
```

2) Import the data you want to analyse. Like

```{r}
bmarketing <- read.csv2("inst/bmarketing.csv",dec = ".")
```

3) Do some data cleansing with function clean()

```{r}
bmarketing <- clean(x = bmarketing, t = "y")
```

4) Fit a Decision Tree, plot the results, give predictions:

```{r}
dt_model <- fitModel(data = bmarketing, y = 'y', modelType = 'DecisionTree')
rpart.plot(dt_model)

predictions <- predictByModel(data = bmarketing, model2Predict = dt_model, modelType = "DecisionTree")
```

5) Finally, you can create a report for model performance:
```{r}
results <- calcPerformance(y = bmarketing$y, y_pred = predictions)
```

Loading