forestGISML: link to Journal Article (Computers and Electronics in Agriculture) and Master Thesis
This research is about forest productivity, identifying the conditioning factors such as: climatic variables derived from rainfall and temperatures, topographic attributes derived from digital elevation model, edaphic attributes (geology composition and soil attributes) that contribute to explain the forest growth. In order to identify these factors, different machine learning algorithms methods have been applied.
The goal is to reach the best model that contributes to explain the predicted observed variable. The variable observed or target variable represents the site index (SI=mean height dominant tree at a site) values localized by site. The input of the model starts by using the site index values derived from a non-linear regression model that establish growth canopy potential at a site and fit the multiple factors into the models, which could explain the productivity by location.
The SI was sourced by the forestry company Timberlands Pacific.
Supervisors for this research:
∆ Dr. Matthew J Cracknell (Postdoctoral
Research Fellow in Earth Informatics at the ARC Industrial Transformation Research Hub for
Transforming the Mining Value Chain),
∆ Dr. Robert Musk (Data Analyst/Forest Scientist at
Timberlands Pacific)
The survey selected for analysis was collected from five sites with wide variation in landscape conditioning and climate factors, and a diverse geology and soil attributes related to the area that influence the productivity of radiata pine across the estate. The data used for the modelling is comprised by 23 datasets and 953556 observations.
This research will contribute by developing a novel ensemble learning base technique which will produce predictive models to assist a forest manager in optimal resource utilization with maximization of productivity.
- libraries.txt
- data_preprocessing.py
∆ linear_regression.py
∆ polynomial_regression.py
∆ decisionTree_regression.py
∆ randomForest_regression.py
∆ gradientBoostedDT_regression.py
∆ EDA.py
--Data visualization: bi-variate plots, bar plots...
--EDA Report (summary)
∆ statistics.py
--Descriptive Statistics
--Correlation Coefficient analysis (Spearman)
--Principal Component Analysis (PCA)
--Regression assumptions: Kernel Density Estimapte plot, Shapiro-Wilk Test, Normal Q-Q plot test Normal distribution Plot
--Confidence Intervals for Regression Accuracy
--Prediction Interval with 95%
∆ spatial_analysis.py
∆ spatialAnalysis_beforeML.py
∆ spatialAnalysis_afterML.py
--Spatial Visualization: shapefile, raster, las files
--Convert to shapefile, raster, add geometry points, assign projection
--Terrain analysis (digital elevation model=DEM)
--LiDAR (laz, las format) visualization with python
--Spatial Autocorrelation: spatial weights, Moran's I