Skip to content

Latest commit

 

History

History
207 lines (150 loc) · 29.3 KB

README_OLD.md

File metadata and controls

207 lines (150 loc) · 29.3 KB

HitCount Star Fork Star

This is a work-in-progress website consisting of R panel data and optimization examples for Statistics/Econometrics/Economic Analysis. Book version: bookdown site and bookdown pdf. Materials gathered from various projects in which R code is used. Files are from Fan's R4Econ repository. This is not a R package, but a list of examples in PDF/HTML/Rmd formats. REconTools is a package that can be installed with tools used in projects involving R.

Bullet points show which base R, tidyverse or other functions/commands are used to achieve various objectives. An effort is made to use only base R and tidyverse packages whenever possible to reduce dependencies. The goal of this repository is to make it easier to find/re-use codes produced for various projects.

From Fan's other repositories: For dynamic borrowing and savings problems, see Dynamic Asset Repository; For code examples, see also Matlab Example Code and Stata Example Code; For intro econ with Matlab, see Intro Mathematics for Economists, and for intro stat with R, see Intro Statistics for Undergraduates. See here for all of Fan's public repositories.

Please contact FanWangEcon for issues or problems.

1 Array, Matrix, Dataframe

1.1 List

  1. Multi-dimensional Named Lists: rmd | r | pdf | html
    • Initiate Empty List. Named one and two dimensional lists.
    • r: vector(mode = "list", length = it_N) + names(list) <- paste0('e',seq()) + dimnames(ls2d)[[1]] <- paste0('r',seq()) + dimnames(ls2d)[[2]] <- paste0('c',seq())
    • tidyr: unnest()

1.2 Array

  1. Arrays Operations in R: rmd | r | pdf | html
    • Basic array operations in R.
    • r: head() + tail() + na_if()
  2. Generate Special Arrays: rmd | r | pdf | html
    • Generate special arrays: log spaced array
    • r: seq()
  3. String Operations: rmd | r | pdf | html
    • Split, concatenate, subset strings
    • r: paste0() + sub() + gsub() + grepl() + sprintf() + tail() + strsplit() + basename() + dirname()
  4. Array Combinations as Matrix: rmd | r | pdf | html
    • Combinations of two arrays to matrix form (meshgrid)
    • tidyr: expand_grid() + expand.grid()

1.3 Matrix

  1. Matrix Basics: rmd | r | pdf | html
    • Generate and combine fixed and random matrixes
    • R: rbind() + matrix
  2. Linear Algebra Operations: rmd | r | pdf | html

1.4 Variables in Dataframes

  1. Tibble Basics: rmd | r | pdf | html
    • generate tibbles, rename tibble variables, tibble row and column names
    • rename numeric sequential columns with string prefix and suffix
    • dplyr: as_tibble(mt) + rename_all(~c(ar_names)) + rename_at(vars(starts_with("xx")), funs(str_replace(., "yy", "yyyy")) + rename_at(vars(num_range('',ar_it)), funs(paste0(st,.))) + rowid_to_column() + colnames + rownames
  2. Label and Combine Factor Variables: rmd | r | pdf | html
    • Convert numeric variables to factor variables, generate joint factors, and label factors.
    • Graph MPG and 1/4 Miles Time (qsec) from the mtcars dataset over joint shift-type (am) and engine-type (vs) categories.
    • forcats: as_factor() + fct_recode() + fct_cross()
  3. Examples of Random Draws in R: rmd | r | pdf | html
  4. R Tibble Dataframe NA Values: rmd | r | pdf | html
  5. R Tibble Dataframe String Manipulations: rmd | r | pdf | html

2 Summarize Data

2.1 Counting Observation

  1. Counting Basics: rmd | r | pdf | html
    • uncount to generate panel skeleton from years in survey
    • dplyr: uncount(yr_n) + group_by() + mutate(yr = row_number() + start_yr)

2.2 Sorting, Indexing, Slicing

  1. Sorted Index, Interval Index and Expand Value from One Row: rmd | r | pdf | html
    • Sort and generate index for rows
    • Generate negative and positive index based on deviations
    • Populate Values from one row to other rows
    • dplyr: arrange() + row_number() + mutate(lowest = min(Sepal.Length)) + case_when(row_number()==x ~ Septal.Length) + mutate(Sepal.New = Sepal.Length[Sepal.Index == 1])

2.3 Group Statistics

  1. Count Unique Groups and Mean within Groups: rmd | r | pdf | html
    • Unique groups defined by multiple values and count obs within group.
    • Mean, sd, observation count for non-NA within unique groups.
    • dplyr: group_by() + summarise(n()) + summarise_if(is.numeric, funs(mean = mean(., na.rm = TRUE), n = sum(is.na(.)==0)))
  2. By Groups, One Variable All Statistics: rmd | r | pdf | html
    • Pick stats, overall, and by multiple groups, stats as matrix or wide row with name=(ctsvar + catevar + catelabel).
    • tidyr: group_by() + summarize_at(, funs()) + rename(!!var := !!sym(var)) + mutate(!!var := paste0(var,'str',!!!syms(vars))) + gather() + unite() + spread(varcates, value)
  3. By within Individual Groups Variables, Averages: rmd | r | pdf | html
    • By Multiple within Individual Groups Variables.
    • Averages for all numeric variables within all groups of all group variables. Long to Wide to very Wide.
    • tidyr: gather() + group_by() + summarise_if(is.numeric, funs(mean(., na.rm = TRUE))) + mutate(all_m_cate = paste0(variable, '_c', value)) + unite() + spread()

2.4 Distributional Statistics

  1. Tibble Basics: rmd | r | pdf | html
    • input multiple variables with comma separated text strings
    • quantitative/continuous and categorical/discrete variables
    • histogram and summary statistics
    • tibble: ar_one <- c(107.72,101.28) + ar_two <- c(101.72,101.28) + mt_data <- cbind(ar_one, ar_two) + as_tibble(mt_data)

2.5 Summarize Multiple Variables

  1. R Example Apply the Same Function Over Multiple Variables: rmd | r | pdf | html

3 Functions

3.1 Dataframe Mutate

  1. Nonlinear Function over Rows: rmd | r | pdf | html
    • Evaluate nonlinear function f(x_i, y_i, ar_x, ar_y, c, d), where c and d are constants, and ar_x and ar_y are arrays, both fixed. x_i and y_i vary over each row of matrix.
    • dplyr: rowwise() + mutate(out = funct(inputs))
  2. DPLYR Evaluate Functions at Many States and Choices Each State: rmd | r | pdf | html

3.2 Dataframe Do Anything

  1. Evaluate Function Do Anything Group Stack Results: rmd | r | pdf | html
    • Group dataframe by categories, compute category specific output scalar or arrays based on within category variable information.
    • dplyr: group_by(ID) + do(inc = rnorm(.$N, mean=.$mn, sd=.$sd)) + unnest(c(inc)) + left_join(df, by="ID")
  2. DPLYR Expand Dataframe with Function: rmd | r | pdf | html

3.3 Apply and pmap

  1. Apply and Mutate over Rows: rmd | r | pdf | html
    • Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
    • Get same results using apply, sapply, and dplyr mutate.
    • r: do.call() + apply(mt, 1, func) + sapply(ls_ar, func, ar1, ar2)
    • purrr: rowwise() + unnest(out) + pmap(func) + unlist()

4 Panel

4.1 Generate and Join

  1. TIDYVERSE Generate Panel Data Structures: rmd | r | pdf | html
    • Build skeleton panel frame with N observations and T periods.
    • tidyr: rowid_to_column() + uncount() + group_by() + row_number() + ungroup()
  2. R DPLYR Join Multiple Dataframes Together: rmd | r | pdf | html
    • Join dataframes together with one or multiple keys. Stack dataframes together.
    • dplyr: filter() + rename(!!sym(vsta) := !!sym(vstb)) + mutate(var = rnom(n())) + left_join(df, by=(c('id'='id', 'vt'='vt'))) + left_join(df, by=setNames(c('id', 'vt'), c('id', 'vt'))) + bind_rows()

4.2 Wide and Long

  1. TIDYR Pivot Wider and Pivot Longer Examples: rmd | r | pdf | html
    • Long roster to wide roster and cumulative sum attendance by date.
    • dplyr: mutate(var = case_when(rnorm(n()) < 0 ~ 1, TRUE ~ 0)) + rename_at(vars(num_range('', ar_it)), list(~paste0(st_prefix, . , ''))) + mutate_at(vars(contains(str)), list(~replace_na(., 0))) + mutate_at(vars(contains(str)), list(~cumsum(.)))

5 Linear Regression

5.1 OLS and IV

  1. IV/OLS Regression: rmd | r | pdf | html
    • R Instrumental Variables and Ordinary Least Square Regression store all Coefficients and Diagnostics as Dataframe Row.
    • aer: *library(aer) + ivreg(as.formula, diagnostics = TRUE) *
  2. M Outcomes and N RHS Alternatives: rmd | r | pdf | html
    • There are M outcome variables and N alternative explanatory variables. Regress all M outcome variables on N endogenous/independent right hand side variables one by one, with controls and/or IVs, collect coefficients.
    • dplyr: bind_rows(lapply(listx, function(x)(bind_rows(lapply(listy, regf.iv))) + starts_with() + ends_with() + reduce(full_join)

5.2 Decomposition

  1. Regression Decomposition: rmd | r | pdf | html
    • Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
    • dplyr: gather() + group_by(var) + mutate_at(vars, funs(mean = mean(.))) + rowSums(matmat) + mutate_if(is.numeric, funs(frac = (./value_var)))*

6 Nonlinear Regression

6.1 Logit Regression

  1. Logit Regression: rmd | r | pdf | html
    • Logit regression testing and prediction.
    • stats: glm(as.formula(), data, family='binomial') + predict(rs, newdata, type = "response")

7 Optimization

7.1 Bisection

  1. Concurrent Bisection over Dataframe Rows: rmd | r | pdf | html
    • Post multiple regressions, fraction of outcome variables' variances explained by multiple subsets of right hand side variables.
    • tidyr: pivot_longer(cols = starts_with('abc'), names_to = c('a', 'b'), names_pattern = paste0('prefix', "(.)_(.)"), values_to = val) + pivot_wider(names_from = !!sym(name), values_from = val) + mutate(!!sym(abc) := case_when(efg < 0 ~ !!sym(opq), TRUE ~ iso))
    • gglot2: geom_line() + facet_wrap() + geom_hline()

8 Mathmatics and Statistics

8.1 Distributions

  1. Integrate Normal Shocks: rmd | r | pdf | html
    • Random Sampling (Monte Carlo) integrate shocks.
    • Trapezoidal rule (symmetric rectangles) integrate normal shock.

8.2 Analytical Solutions

  1. linear solve x with f(x) = 0: rmd | r | pdf | html
    • Evaluate and solve statistically relevant problems with one equation and one unknown that permit analytical solutions.

8.3 Inequality Models

  1. Gini for Discrete Samples: rmd | r | pdf | html
    • Given sample of data points that are discrete, compute the approximate gini coefficient.
    • r: sort() + cumsum() + sum()
  2. CES abd Atkinson Utility: rmd | r | pdf | html
    • Analyze how changing individual outcomes shift utility given inequality preference parameters.
    • Draw Cobb-Douglas, Utilitarian and Leontief indifference curve
    • r: apply(mt, 1, funct(x){}) + do.call(rbind, ls_mt)
    • tidyr: expand_grid()
    • ggplot2: geom_line() + facet_wrap()

Please contact for issues or problems.

RepoSize CodeSize Language Release License