Code to create process spatial data into grid for the continental US. The data includes weather, soil and landuse
<style> </style>i. Describe a database with weather, soil and land use for the US
ii. Share the codes used to obtain it, and the methodology followed for aggregating the data
iii. Provide a tutorial on how to use it for maps and for obtaining characteristics at a exact point (example: for trial characterization)
Questions about the code and methodology: German Mandrini, Dpt of Crop Sciences, University of Illinois at Urbana-Champaign, [email protected]
Questions about collaborations: Nicolas F Martin, Dpt of Crop Sciences, University of Illinois at Urbana-Champaign, [email protected]
The script grid_tutorial.R provides a tutorial with examples of possible uses of the database
Link to database files: https://uofi.box.com/s/yatgv535y7ouai3k8b0nwluj062uy6sj
The database has four output files:
Spatial data: it is the only spatial data, consisting in a 5 by 5 km grid over the US. Columns: · id_5000: unique number that identifies each cell. Key for merging with all other data sets. · id_tile: unique number that classifies a each tile. A tile is a group of cells used for splitting the data set into small portions that could be processed. · cult_count: the Nass provides a raster called National Cultivated layer that has where each 30 by 30 cell was classified as planted or not planted considering the time period from 2013 to 2017. This variable is the count of 30x30 m cells in each 5000x5000 m cell · US_state: State in the US. Based on the centroid of the cell · US_region: Region in the US. Based on the centroid of the cell
|
||||||||||||||||||||||||||
Landuse: csv file, showing the land allocation for each of the 5x5 km cell. Only 19 representative crops were selected: Corn, Soybeans, Winter Wheat, Fallow/Idle Cropland, Alfalfa, Spring Wheat, Cotton,Sorghum, Dbl Crop WinWht/Soybeans, Rice, Barley, Dry Beans, Durum Wheat, Canola, Oats, Peanuts, Almonds, Sunflower, Peas.
Columns: · id_5000: unique number that identifies each cell. Key for merging with all other data sets. · id_tile: unique number that classifies each tile. · Source: where data was obtained · Variable: one of the 19 crops. · Unit: unit in which the variable was measured. In this case it is the count of 30x30 m cells inside each of the 5000x5000 m cell. It can be easily converted to area multiplying the count by the size. · Year: year in which the variable was measured · Value: value of the variable More info: https://www.nass.usda.gov/Research_and_Science/Cropland/Release/
|
||||||||||||||||||||||||||
Weather: csv file, showing weather variables for each of the 5x5 km cells. Four variables were selected: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad). Prcp is showed as total amount by month. Tmax, Tmin and srad are daily averages by month.
Columns: · id_5000: unique number that identifies each cell. Key for merging with all other data sets. · id_tile: unique number that classifies each tile. · Source: where data was obtained · Variable: · Unit: unit in which the variable was measured · Year: year in which the variable was measured · Month: in which the variable was measured · Value: value of the variable More info: https://daac.ornl.gov/DAYMET/guides/Daymet_V3_CFMosaics.html Soils: csv file, showing weather variables for each of the 5x5 km cells. Four variables were selected: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad). Prcp is showed as total amount by month. Tmax, Tmin and srad are daily averages by month.
Columns: · id_5000: unique number that identifies each cell. Key for merging with all other data sets. · id_tile: unique number that classifies each tile. · Source: where data was obtained · Unit: unit in which the variable was measured · Value: value of the variable · Variables:
More info: https://files.isric.org/soilgrids/data/recent/META_GEOTIFF_1B.csv
|
The following scripts are only relevant to update the database or create another one.
Link to folder with all data files: https://uofi.box.com/s/k46278qikvncl9marbmhnetbrn7fqz72
(download to computer and name as “grid_data_box” and direct the scripts to it)
Order |
File name |
Objective |
1 |
functions_grid_Dec10.R |
Functions used at different points of the project |
2 |
grid_5000_creation_Dic10.R |
Create a grid of 5X5 km over the US |
3 |
daymetA_download.R |
Download daily weather data |
4 |
daymetB_make_monthly.R |
Transform the daily data into monthly (call) |
5 |
daymetC_make_monthly_parallel.R |
Transform the daily data into monthly (execute) |
6 |
daymetD_processing_call.R |
Process the monthly data for each cell of the grid (call) |
7 |
daymetE_processing_parallel.R |
Process the monthly data for each cell of the grid (execute) |
8 |
landuseA_mergeCDL.R |
Merge several CDL rasters (downloaded manually) |
9 |
landuseB_process_CDL_paralel.R |
Obtains for each cell of the grid the CDL information |
11 |
soilsA_call.R |
Obtains for each cell of the grid the soil information |
10 |
soilsB_processing_paralel.R |
Obtains for each cell of the grid the soil information |
National Cultivated Layer: The Cultivated Layer is based on the most recent five years (2013-2017).
The National Cultivated Layer is a 30x30 m raster that classified each cell depending if it was cultivated at least one year between 2013 and 2017 or not.
A 5x5 km was created using the same extent and for each cell the count of 30x30 cultivated cells was performed. Only cells whose count was positive were kept. Each cell was identified by a unique number called id_5000.
Tiles creation: another raster was created with a 100x100 km resolution. Each tile was identified with a unique number called id_tile. The 5x5 raster was overlapped with this 100x100km raster and the id_tile was transferred. This allowed to group the 5x5 cells into tiles of 400 cells.
The spatial file has 141,438 cells. Each one has an area of 5x5 km
· National CDLs from 2008 to 2017. It is a 30 by 30 m raster where each cell is assigned to an id that reference different use of the land
· https://www.nass.usda.gov/Research_and_Science/Cropland/Release/
From all the different uses in the National CDLs, we focused on 19 crops, called Target Crops. They are:
1 |
Corn |
2 |
Soybeans |
3 |
Winter Wheat |
4 |
Fallow/Idle Cropland |
5 |
Alfalfa |
6 |
Spring Wheat |
7 |
Cotton |
8 |
Sorghum |
9 |
Dbl Crop WinWht/Soybeans |
10 |
Rice |
11 |
Barley |
12 |
Dry Beans |
13 |
Durum Wheat |
14 |
Canola |
15 |
Oats |
16 |
Peanuts |
17 |
Almonds |
18 |
Sunflower |
19 |
Peas. |
For each of the id_5000, the count of 30x30 cells used for each crop each year was performed. This value can easily be converted into area by considering the size of the cells or into proportion of land allocation over the total size of the cell. For example: if a cell has a count of corn in 2008 of 4500, this means that from the 25 km2 of the cell, 4.05 km2 were used for corn. That is 16.2% of the area.
A second cleaning process was done over the spatial grid. The cells that did not have any of the chosen 19 crops in the 10 considered years were removed. This allows to decrease the size of the files and processing time.
The land use table has 9,423,912 rows. Each row is the count of cells for a given crop in a given year
· Seminar: https://www.youtube.com/watch?v=lR--GmLCkPU
· Daymetr package of R
· Catalog: https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.html
· User guide: https://daac.ornl.gov/DAYMET/guides/Daymet_V3_CFMosaics.html
Four variables were downloaded for the period 1980 to 2017. The variables are: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad). They were processed following 3 steps
This was done using R studio and the function GET from the http package. For each of the 4 variables, for each year a raster stack of 365 layers was download. Each layer has daily data for a 1x1 km raster.
Each of the stacks was time aggregated by month. In case of precipitation the daily values were added by month. For the other variables, the daily values were averaged by month.
For each of the id_5000 cells, the average values of all the 1x1 cells was obtained and summarized in a data table.
The weather data table has 191,789,928 rows. It is composed by the monthly value of the four variables for the time period between 1980 and 2018 , for each of the 5x5 cells.
· Source: Soilgrids.com
· Catalog: https://files.isric.org/soilgrids/data/recent/
· Description of variables: https://files.isric.org/soilgrids/data/recent/META_GEOTIFF_1B.csv
Different variables that characterize soil chemical and physical properties were selected (see above for the description of the variables).
The raster files were downloaded from the soilgrid catalog. Then for each id_5000 cell they mean value of each soil variable was calculated.
The soils data table has 1,697,256 rows. It is composed by the value of the 12 variables , for each of the 5x5 cells.