Skip to content

Latest commit

 

History

History
69 lines (66 loc) · 4.01 KB

README.md

File metadata and controls

69 lines (66 loc) · 4.01 KB

forestGISML: link to Journal Article (Computers and Electronics in Agriculture) and Master Thesis

graphicalAbstract

1 Overview [ Step 1: Scope ]

This research is about forest productivity, identifying the conditioning factors such as: climatic variables derived from rainfall and temperatures, topographic attributes derived from digital elevation model, edaphic attributes (geology composition and soil attributes) that contribute to explain the forest growth. In order to identify these factors, different machine learning algorithms methods have been applied.

1.1 Goal

The goal is to reach the best model that contributes to explain the predicted observed variable. The variable observed or target variable represents the site index (SI=mean height dominant tree at a site) values localized by site. The input of the model starts by using the site index values derived from a non-linear regression model that establish growth canopy potential at a site and fit the multiple factors into the models, which could explain the productivity by location.

1.2 Data Sourced

The SI was sourced by the forestry company Timberlands Pacific.
Supervisors for this research:
Dr. Matthew J Cracknell (Postdoctoral Research Fellow in Earth Informatics at the ARC Industrial Transformation Research Hub for Transforming the Mining Value Chain),
∆ Dr. Robert Musk (Data Analyst/Forest Scientist at Timberlands Pacific)

1.3 Data Dimension [ Step 2: Data Definition & Baseline ]

The survey selected for analysis was collected from five sites with wide variation in landscape conditioning and climate factors, and a diverse geology and soil attributes related to the area that influence the productivity of radiata pine across the estate. The data used for the modelling is comprised by 23 datasets and 953556 observations.

1.4 Contribution

This research will contribute by developing a novel ensemble learning base technique which will produce predictive models to assist a forest manager in optimal resource utilization with maximization of productivity.

2 Scripts Guideline

2.1 Getting Started

  1. libraries.txt
  2. data_preprocessing.py

2.2 Machine Learning Algorithms (MLAs) [ Step 3: Modeling: train model ]

∆ linear_regression.py
∆ polynomial_regression.py
∆ decisionTree_regression.py
∆ randomForest_regression.py
∆ gradientBoostedDT_regression.py

2.3 Exploratory Data Analysis (EDA)

∆ EDA.py
--Data visualization: bi-variate plots, bar plots...
--EDA Report (summary)

2.4 Statistics

∆ statistics.py
--Descriptive Statistics
--Correlation Coefficient analysis (Spearman)
--Principal Component Analysis (PCA)
--Regression assumptions: Kernel Density Estimapte plot, Shapiro-Wilk Test, Normal Q-Q plot test Normal distribution Plot
--Confidence Intervals for Regression Accuracy
--Prediction Interval with 95%

2.5 Exploratory Spatial Data Analysis (ESDA)

∆ spatial_analysis.py
∆ spatialAnalysis_beforeML.py
∆ spatialAnalysis_afterML.py
--Spatial Visualization: shapefile, raster, las files
--Convert to shapefile, raster, add geometry points, assign projection
--Terrain analysis (digital elevation model=DEM)
--LiDAR (laz, las format) visualization with python
--Spatial Autocorrelation: spatial weights, Moran's I

2.6 Performance and Model Validation [ Step 3: Modeling: perform error analysis ]

∆ validation_models.py
--Summary of ouputs Binder