-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
510 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,339 @@ | ||
--- | ||
title: "Hierarchical Spatial Simultaneous Autoregressive Model (HSAR)" | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{HSAR} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
|
||
|
||
An application of HSAR for asking prices in the municipality of Athens | ||
==================================================================== | ||
|
||
An application of `hsar()`, based on rel data, will be illustrated. The design of the weight matrices needed and the random effect design matrix will be explained. | ||
|
||
### Libraries | ||
|
||
We start by loading the libraries that will be used. | ||
|
||
|
||
``` r | ||
library(sf) | ||
## Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE | ||
library(spdep) | ||
## Loading required package: spData | ||
library(tidyverse) | ||
## ── Attaching core tidyverse packages ─────────────────────────────────────────── tidyverse 2.0.0 ── | ||
## ✔ dplyr 1.1.4 ✔ readr 2.1.5 | ||
## ✔ forcats 1.0.0 ✔ stringr 1.5.1 | ||
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 | ||
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1 | ||
## ✔ purrr 1.0.2 | ||
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ── | ||
## ✖ dplyr::filter() masks stats::filter() | ||
## ✖ dplyr::lag() masks stats::lag() | ||
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors | ||
library(HSAR) | ||
``` | ||
|
||
### Reading the datasets | ||
|
||
At the higher level, we have the seven departments of the municipality of Athens and at the lower level we have the point data of the properties. | ||
|
||
|
||
``` r | ||
data(depmunic) | ||
data(properties) | ||
plot(st_geometry(depmunic),col = sf.colors(12, categorical = TRUE), border = 'grey') | ||
plot(st_geometry(properties),add=TRUE,col="red",pch=16,cex=0.6) | ||
``` | ||
|
||
![](../man/figures/hsar/p1-1.png) | ||
|
||
The characteristics that come with the areal data are the id of the department, the number of airbnb properties, the number of museums, the population, the number of citizens with origin a non european union country, the area of the green space (m^2) and the area of the polygon (km^2). | ||
|
||
|
||
``` r | ||
names(depmunic) | ||
## [1] "num_dep" "airbnb" "museums" "population" "pop_rest" "greensp" "area" | ||
## [8] "geometry" | ||
depmunic$pop_rest | ||
## [1] 8202 5009 2735 4167 5099 16531 8017 | ||
``` | ||
|
||
The characteristics of the properties are the size (m^2), the asking price (euros), the price per square meter, the age (years) and the shortest distance to metro/train station (m). | ||
|
||
|
||
``` r | ||
names(properties) | ||
## [1] "id" "size" "price" "prpsqm" "age" "dist_metro" "geometry" | ||
hist(properties$age, xlab = "Age", main="Age of the properties") | ||
``` | ||
|
||
![](../man/figures/hsar/p2-1.png) | ||
|
||
Now we are going to create two more variables at the higher, municipality department, level. The first one is the population density per 10k citizens, and the second one is the percentage of non EU citizens. | ||
|
||
|
||
``` r | ||
depmunic$popdens <- depmunic$population/ (10000*depmunic$area) | ||
depmunic$foreigners <- 100 * depmunic$pop_rest/ depmunic$population | ||
``` | ||
|
||
The next step is to create the model data that are going to use in the hsar model. For that, we need for each property (lower data), the data from the relevant department(higher level). | ||
|
||
|
||
``` r | ||
properties_in_dd <- st_join(properties, depmunic, join = st_within) | ||
``` | ||
|
||
So now, we know each property, in which department resides and the coresponding data for that polygon. We also need that data in sorting order. | ||
|
||
|
||
``` r | ||
model.data <- properties_in_dd[order(properties_in_dd$num_dep),] | ||
``` | ||
|
||
### Create matrices used in the hsar function | ||
|
||
In order to run the model we need to create the effect design matrix (Delta), the weight matrix for the high-level - polygon data (M), and the weight matrix for the lower level - point data (W). | ||
|
||
In order to define the random effect matrix, we start with estimating the number of properties in each municipality department | ||
|
||
|
||
``` r | ||
properties_count <- count(as_tibble(model.data), num_dep) | ||
MM <- as.data.frame(properties_count) | ||
``` | ||
|
||
and by geting the total number of municipality departments (7), we define a vector with the number of municipality department that each property belongs | ||
|
||
|
||
``` r | ||
Utotal <- dim(MM)[1] | ||
Unum <- MM[,2] | ||
Uid <- rep(c(1:Utotal),Unum) | ||
``` | ||
|
||
We then define the random effect matrix (Delta) wich has a dimension of 1000x7 | ||
|
||
|
||
``` r | ||
n <- nrow(properties) | ||
Delta <- matrix(0,nrow=n,ncol=Utotal) | ||
for(i in 1:Utotal) { | ||
Delta[Uid==i,i] <- 1 | ||
} | ||
|
||
Delta <- as(Delta,"dgCMatrix") | ||
``` | ||
|
||
|
||
Now we estimate the spatial weight matrix at the higher level which in our case is the municipality departments (polygons). So we start with poly2nb which constructs the neighbours list for polygons and then with nb2mat we generate the weight matrix for the neighbours list previously created. Then we transform the weight matrix in a sparse matrix format. | ||
|
||
|
||
``` r | ||
nb.list <- poly2nb(depmunic) | ||
mat.list <- nb2mat(nb.list,style="W") | ||
M <- as(mat.list,"dgCMatrix") | ||
``` | ||
|
||
to have a closer look at M , we can visualize it | ||
|
||
|
||
``` r | ||
plot(st_geometry(depmunic),border = 'grey') | ||
plot(st_centroid(depmunic), add = TRUE) | ||
## Warning: st_centroid assumes attributes are constant over geometries | ||
## Warning in plot.sf(st_centroid(depmunic), add = TRUE): ignoring all but the first attribute | ||
plot(nb.list, st_centroid(depmunic), add = TRUE) | ||
## Warning: st_centroid assumes attributes are constant over geometries | ||
``` | ||
|
||
![](../man/figures/hsar/p3-1.png) | ||
|
||
Similarly, we create the spatial weight matrix at the lower level of properties (point data). So we create the neighbour list at a distance of 1300 meters | ||
|
||
|
||
``` r | ||
nb.1300 <- dnearneigh(properties,0,1300) | ||
|
||
``` | ||
|
||
and the weights matrix W as follows | ||
|
||
|
||
``` r | ||
mat.1300 <- nb2mat(nb.1300,style="W") | ||
W <- as(mat.1300,"dgCMatrix") | ||
``` | ||
|
||
For the W matrix, we can check the neighbours statistics | ||
|
||
|
||
``` r | ||
nb.1300 | ||
## Neighbour list object: | ||
## Number of regions: 1000 | ||
## Number of nonzero links: 170254 | ||
## Percentage nonzero weights: 17.0254 | ||
## Average number of links: 170.254 | ||
``` | ||
|
||
### Run the models | ||
|
||
So, having ready the matrices Delta, M and W, we wun the `hsar()` function | ||
|
||
|
||
``` r | ||
res.formula <- prpsqm ~ size + age + greensp + population + museums + airbnb | ||
res <- hsar(res.formula,data=model.data,W=W,M=M,Delta=Delta, | ||
burnin=500, Nsim=1000) | ||
## Warning in spdep::mat2listw(W): style is M (missing); style should be set to a valid value | ||
## Warning in sn2listw(df, style = style, zero.policy = zero.policy, from_mat2listw = TRUE): style is | ||
## M (missing); style should be set to a valid value | ||
## Warning in spdep::mat2listw(W): style is M (missing); style should be set to a valid value | ||
## Warning in sn2listw(df, style = style, zero.policy = zero.policy, from_mat2listw = TRUE): style is | ||
## M (missing); style should be set to a valid value | ||
summary(res) | ||
## | ||
## Call: | ||
## hsar(formula = res.formula, data = model.data, W = W, M = M, | ||
## Delta = Delta, burnin = 500, Nsim = 1000) | ||
## Type: hsar | ||
## | ||
## Coefficients: | ||
## Mean SD | ||
## (Intercept) 1.880461e+03 9.794419e+00 | ||
## size 4.301935e+00 4.713561e-01 | ||
## age -1.991273e+01 1.339089e+00 | ||
## greensp 7.790633e-04 7.862559e-04 | ||
## population -9.240542e-03 2.397341e-03 | ||
## museums -4.531282e+01 1.033345e+01 | ||
## airbnb 5.803532e-01 2.232756e-01 | ||
## | ||
## Spatial Coefficients: | ||
## rho lambda | ||
## [1,] 0.201284 0.195202 | ||
## | ||
## Diagnostics | ||
## Deviance information criterion (DIC): 28197.36 | ||
## Effective number of parameters (pd): -1.635173 | ||
## Log likelihood: -14100.32 | ||
## Pseudo R squared: 0.3587075 | ||
## | ||
## Impacts: | ||
## direct indirect total | ||
## (Intercept) 1.881108e+03 4.730904e+02 2.354198e+03 | ||
## size 4.303415e+00 1.082290e+00 5.385706e+00 | ||
## age -1.991958e+01 -5.009688e+00 -2.492927e+01 | ||
## greensp 7.793313e-04 1.959984e-04 9.753297e-04 | ||
## population -9.243721e-03 -2.324756e-03 -1.156848e-02 | ||
## museums -4.532841e+01 -1.139990e+01 -5.672831e+01 | ||
## airbnb 5.805528e-01 1.460065e-01 7.265593e-01 | ||
## | ||
## Quantiles: | ||
## 5% 25% 50% 75% 95% | ||
## (Intercept) 1.863987e+03 1.874152e+03 1.881128e+03 1.887705e+03 1.894747e+03 | ||
## size 3.501653e+00 3.996034e+00 4.328532e+00 4.625902e+00 5.033293e+00 | ||
## age -2.205857e+01 -2.086760e+01 -1.991433e+01 -1.902301e+01 -1.773964e+01 | ||
## greensp -2.238065e-04 2.536224e-04 6.319674e-04 1.162900e-03 2.560007e-03 | ||
## population -1.313017e-02 -1.086792e-02 -9.282930e-03 -7.491176e-03 -5.489072e-03 | ||
## museums -6.251765e+01 -5.195792e+01 -4.486279e+01 -3.794775e+01 -2.997397e+01 | ||
## airbnb 1.780049e-01 4.327122e-01 6.070358e-01 7.323369e-01 9.083725e-01 | ||
``` | ||
|
||
and the two simpler models defined for rho = 0 and lambda=0. | ||
So, firstly, assuming rho = 0 (no interaction effects at the lower level) we get | ||
|
||
|
||
``` r | ||
res_1 <- hsar(res.formula,data=model.data,W=NULL,M=M,Delta=Delta,burnin=500, Nsim=1000) | ||
## Warning in spdep::mat2listw(W): style is M (missing); style should be set to a valid value | ||
## Warning in sn2listw(df, style = style, zero.policy = zero.policy, from_mat2listw = TRUE): style is | ||
## M (missing); style should be set to a valid value | ||
summary(res_1) | ||
## | ||
## Call: | ||
## hsar(formula = res.formula, data = model.data, W = NULL, M = M, | ||
## Delta = Delta, burnin = 500, Nsim = 1000) | ||
## Type: hsar with rho = 0 | ||
## | ||
## Coefficients: | ||
## Mean SD | ||
## (Intercept) 1.880969e+03 1.004751e+01 | ||
## size 4.299105e+00 4.215697e-01 | ||
## age -1.996865e+01 1.310667e+00 | ||
## greensp 5.304726e-04 6.544811e-04 | ||
## population -6.654456e-03 1.060707e-03 | ||
## museums -4.511823e+01 9.175141e+00 | ||
## airbnb 7.234227e-01 2.314932e-01 | ||
## | ||
## Spatial Coefficients: | ||
## lambda | ||
## 0.070876 | ||
## | ||
## Diagnostics | ||
## Deviance information criterion (DIC): 28198.46 | ||
## Effective number of parameters (pd): -1.870333 | ||
## Log likelihood: -14101.1 | ||
## Pseudo R squared: 0.3582752 | ||
## | ||
## Quantiles: | ||
## 5% 25% 50% 75% 95% | ||
## (Intercept) 1.864880e+03 1.874552e+03 1.880957e+03 1.887456e+03 1.897001e+03 | ||
## size 3.610783e+00 4.006603e+00 4.322592e+00 4.583691e+00 4.983597e+00 | ||
## age -2.207850e+01 -2.087071e+01 -1.995724e+01 -1.901799e+01 -1.790105e+01 | ||
## greensp -6.921436e-04 1.703272e-04 5.736104e-04 9.622126e-04 1.489968e-03 | ||
## population -8.359189e-03 -7.311060e-03 -6.666792e-03 -5.998892e-03 -4.847390e-03 | ||
## museums -6.003316e+01 -5.108847e+01 -4.522155e+01 -3.899018e+01 -2.960428e+01 | ||
## airbnb 3.630902e-01 5.791529e-01 7.000081e-01 8.588804e-01 1.159867e+00 | ||
``` | ||
|
||
and secondly, given lambda = 0 (no interaction at the higher level) we get | ||
|
||
|
||
``` r | ||
res_2 <- hsar(res.formula,data=model.data,W=W,M=NULL,Delta=Delta,burnin=500, Nsim=1000) | ||
## Warning in spdep::mat2listw(W): style is M (missing); style should be set to a valid value | ||
## Warning in sn2listw(df, style = style, zero.policy = zero.policy, from_mat2listw = TRUE): style is | ||
## M (missing); style should be set to a valid value | ||
summary(res_2) | ||
## | ||
## Call: | ||
## hsar(formula = res.formula, data = model.data, W = W, M = NULL, | ||
## Delta = Delta, burnin = 500, Nsim = 1000) | ||
## Type: hsar with lambda = 0 | ||
## | ||
## Coefficients: | ||
## Mean SD | ||
## (Intercept) 1.880360e+03 9.638392003 | ||
## size 4.274640e+00 0.475946778 | ||
## age -2.000767e+01 1.416531702 | ||
## greensp 8.151624e-05 0.001316126 | ||
## population -8.668497e-03 0.002604684 | ||
## museums -4.511405e+01 9.814353408 | ||
## airbnb 6.302068e-01 0.318353574 | ||
## | ||
## Spatial Coefficients: | ||
## rho | ||
## 0.189216 | ||
## | ||
## Diagnostics | ||
## Deviance information criterion (DIC): 28196.66 | ||
## Effective number of parameters (pd): -1.67705 | ||
## Log likelihood: -14100.01 | ||
## Pseudo R squared: 0.3591423 | ||
## | ||
## Quantiles: | ||
## 5% 25% 50% 75% 95% | ||
## (Intercept) 1864.24515875 1.873775e+03 1.880736e+03 1.886645e+03 1.895744e+03 | ||
## size 3.50820040 3.921107e+00 4.282089e+00 4.580065e+00 5.067648e+00 | ||
## age -22.32636764 -2.097454e+01 -1.997216e+01 -1.902378e+01 -1.763772e+01 | ||
## greensp -0.00297142 -2.411191e-04 5.260136e-04 9.078372e-04 1.450913e-03 | ||
## population -0.01257212 -1.045538e-02 -8.815213e-03 -7.081089e-03 -4.074837e-03 | ||
## museums -61.04617221 -5.171848e+01 -4.530204e+01 -3.812907e+01 -2.869411e+01 | ||
## airbnb 0.20185017 4.330003e-01 5.837671e-01 7.511525e-01 1.288780e+00 | ||
``` |
Oops, something went wrong.