Skip to content

Code to create process spatial data into a 5x5 km grid for the continental US. The data includes weather, soil, and land-use

Notifications You must be signed in to change notification settings

germanmandrini/grid_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 

Repository files navigation

SPATIAL GRIDDED DATA FOR WEATHER, SOIL AND LAND USE

Code to create process spatial data into grid for the continental US. The data includes weather, soil and landuse

<style> </style>

OBJECTIVES

      i.        Describe a database with weather, soil and land use for the US

     ii.        Share the codes used to obtain it, and the methodology followed for aggregating the data

   iii.        Provide a tutorial on how to use it for maps and for obtaining characteristics at a exact point (example: for trial characterization)

Contact

Questions about the code and methodology: German Mandrini, Dpt of Crop Sciences, University of Illinois at Urbana-Champaign, [email protected]

Questions about collaborations: Nicolas F Martin, Dpt of Crop Sciences, University of Illinois at Urbana-Champaign, [email protected]

 

1.        DATABASE DESCRIPTION

The script grid_tutorial.R provides a tutorial with examples of possible uses of the database

Link to database files: https://uofi.box.com/s/yatgv535y7ouai3k8b0nwluj062uy6sj

The database has four output files:

  Spatial data: it is the only spatial data, consisting in a 5 by 5 km grid over the US.

 Columns:

·         id_5000: unique number that identifies each cell. Key for merging with all other data sets.

·         id_tile: unique number that classifies a each tile. A tile is a group of cells used for splitting the data set into small portions that could be processed.

·         cult_count: the Nass provides a raster called National Cultivated layer that has where each 30 by 30 cell was classified as planted or not planted considering the time period from 2013 to 2017. This variable is the count of 30x30 m cells in each 5000x5000 m cell

·         US_state: State in the US. Based on the centroid of the cell

·         US_region: Region in the US. Based on the centroid of the cell

 

Landuse: csv file, showing the land allocation for each of the 5x5 km cell.

Only 19 representative crops were selected: Corn, Soybeans, Winter Wheat, Fallow/Idle Cropland, Alfalfa, Spring Wheat, Cotton,Sorghum, Dbl Crop WinWht/Soybeans, Rice, Barley, Dry Beans, Durum Wheat, Canola, Oats, Peanuts, Almonds, Sunflower, Peas.

 

Columns:

·         id_5000: unique number that identifies each cell. Key for merging with all other data sets.

·         id_tile: unique number that classifies each tile.

·         Source: where data was obtained

·         Variable: one of the 19 crops.

·         Unit: unit in which the variable was measured. In this case it is the count of 30x30 m cells inside each of the 5000x5000 m cell. It can be easily converted to area multiplying the count by the size.

·         Year: year in which the variable was measured

·         Value: value of the variable

More info: https://www.nass.usda.gov/Research_and_Science/Cropland/Release/

 

Weather: csv file, showing weather variables for each of the 5x5 km cells.

Four variables were selected: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad). Prcp is showed as total amount by month. Tmax, Tmin and srad are daily averages by month.

 

Columns:

·         id_5000: unique number that identifies each cell. Key for merging with all other data sets.

·         id_tile: unique number that classifies each tile.

·         Source: where data was obtained

·         Variable:

·         Unit: unit in which the variable was measured

·         Year: year in which the variable was measured

·         Month: in which the variable was measured

·         Value: value of the variable

More info: https://daac.ornl.gov/DAYMET/guides/Daymet_V3_CFMosaics.html

Soils: csv file, showing weather variables for each of the 5x5 km cells.

Four variables were selected: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad). Prcp is showed as total amount by month. Tmax, Tmin and srad are daily averages by month.

 

Columns:

·         id_5000: unique number that identifies each cell. Key for merging with all other data sets.

·         id_tile: unique number that classifies each tile.

·         Source: where data was obtained

·         Unit: unit in which the variable was measured

·         Value: value of the variable

·         Variables:

 

Variable

Description

BDRICM_M

Depth to bedrock (R horizon) up to 200 cm

CLYPPT_M_sl3

Clay content (0-2 micro meter) mass fraction in % at depth 0.15 m

OCDENS_M_sl1

Soil organic carbon density in kg per cubic-m at depth 0.00 m

PHIHOX_M_sl3

Soil pH x 10 in H2O  at depth 0.15 m

SNDPPT_M_sl3

Sand content (50-2000 micro meter) mass fraction in % at depth 0.15 m

AWCh1_M_sl1

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 0 cm

AWCh1_M_sl2

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 5 cm

AWCh1_M_sl3

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 15 cm

AWCh1_M_sl4

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 30 cm

AWCh1_M_sl5

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 60 cm

AWCh1_M_sl6

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 100 cm

AWCh1_M_sl7

Derived available soil water capacity (volumetric fraction) with FC = pF 2.0 for depth 200 cm

 

More info: https://files.isric.org/soilgrids/data/recent/META_GEOTIFF_1B.csv

 

 

 

2.        CODES INDEX

 

The following scripts are only relevant to update the database or create another one.

Link to folder with all data files: https://uofi.box.com/s/k46278qikvncl9marbmhnetbrn7fqz72

(download to computer and name as “grid_data_box” and direct the scripts to it)

Order

File name

Objective

1

functions_grid_Dec10.R

Functions used at different points of the project

2

grid_5000_creation_Dic10.R

Create a grid of 5X5 km over the US

3

daymetA_download.R

Download daily weather data

4

daymetB_make_monthly.R

Transform the daily data into monthly (call)

5

daymetC_make_monthly_parallel.R

Transform the daily data into monthly (execute)

6

daymetD_processing_call.R

Process the monthly data for each cell of the grid (call)

7

daymetE_processing_parallel.R

Process the monthly data for each cell of the grid (execute)

8

landuseA_mergeCDL.R

Merge several CDL rasters (downloaded manually)

9

landuseB_process_CDL_paralel.R

Obtains for each cell of the grid the CDL information

11

soilsA_call.R

Obtains for each cell of the grid the soil information

10

soilsB_processing_paralel.R

Obtains for each cell of the grid the soil information

 

3.        METHODOLOGY

 

3.1.  GRID CREATION

3.1.1.                SOURCE

National Cultivated Layer: The Cultivated Layer is based on the most recent five years (2013-2017).

3.1.2.                PROCESSING

The National Cultivated Layer is a 30x30 m raster that classified each cell depending if it was cultivated at least one year between 2013 and 2017 or not.

A 5x5 km was created using the same extent and for each cell the count of 30x30 cultivated cells was performed. Only cells whose count was positive were kept. Each cell was identified by a unique number called id_5000.

Tiles creation: another raster was created with a 100x100 km resolution. Each tile was identified with a unique number called id_tile. The 5x5 raster was overlapped with this 100x100km raster and the id_tile was transferred. This allowed to group the 5x5 cells into tiles of 400 cells.

3.1.3.                SIZE

The spatial file has 141,438 cells. Each one has an area of 5x5 km

3.2.  LAND ALLOCATION

3.2.1.                SOURCE

·         National CDLs from 2008 to 2017. It is a 30 by 30 m raster where each cell is assigned to an id that reference different use of the land

·         https://www.nass.usda.gov/Research_and_Science/Cropland/Release/

3.2.2.                PROCESSING

From all the different uses in the National CDLs, we focused on 19 crops, called Target Crops. They are:

1

Corn

2

 Soybeans

3

 Winter Wheat

4

 Fallow/Idle Cropland

5

 Alfalfa

6

 Spring Wheat

7

 Cotton

8

 Sorghum

9

 Dbl Crop WinWht/Soybeans

10

 Rice

11

 Barley

12

 Dry Beans

13

 Durum Wheat

14

  Canola

15

 Oats

16

 Peanuts

17

 Almonds

18

 Sunflower

19

 Peas.

 

For each of the id_5000, the count of 30x30 cells used for each crop each year was performed. This value can easily be converted into area by considering the size of the cells or into proportion of land allocation over the total size of the cell. For example: if a cell has a count of corn in 2008 of 4500, this means that from the 25 km2 of the cell, 4.05 km2 were used for corn. That is 16.2% of the area.

A second cleaning process was done over the spatial grid. The cells that did not have any of the chosen 19 crops in the 10 considered years were removed. This allows to decrease the size of the files and processing time.

3.2.3.                SIZE

The land use table has 9,423,912 rows. Each row is the count of cells for a given crop in a given year

3.3.  WEATHER

3.3.1.                SOURCE

·         Seminar: https://www.youtube.com/watch?v=lR--GmLCkPU

·         Daymetr package of R

·         Catalog: https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.html

·         User guide: https://daac.ornl.gov/DAYMET/guides/Daymet_V3_CFMosaics.html

3.3.2.                PROCESSING

Four variables were downloaded for the period 1980 to 2017. The variables are: precipitation (prcp), maximum and minimum temperature (tmax, tmin) and solar radiation (srad).  They were processed following 3 steps

3.3.3.                STEP 1: DOWNLOADING.

This was done using R studio and the function GET from the http package. For each of the 4 variables, for each year a raster stack of 365 layers was download. Each layer has daily data for a 1x1 km raster.

3.3.4.                STEP 2: TIME AGGREGATION

Each of the stacks was time aggregated by month. In case of precipitation the daily values were added by month. For the other variables, the daily values were averaged by month.

3.3.5.                STEP 3: SPATIAL AGGREGATION

For each of the id_5000 cells, the average values of all the 1x1 cells was obtained and summarized in a data table.

3.3.6.                SIZE

The weather data table has 191,789,928 rows. It is composed by the monthly value of the four variables for the time period between 1980 and 2018 , for each of the 5x5 cells.

3.4.  SOILS

3.4.1.                SOURCE

·         Source: Soilgrids.com

·         Catalog: https://files.isric.org/soilgrids/data/recent/

·         Description of variables: https://files.isric.org/soilgrids/data/recent/META_GEOTIFF_1B.csv

3.4.2.                PROCESSING

Different variables that characterize soil chemical and physical properties were selected (see above for the description of the variables).

The raster files were downloaded from the soilgrid catalog. Then for each id_5000 cell they mean value of each soil variable was calculated.

3.4.3.                SIZE

The soils data table has 1,697,256 rows. It is composed by the value of the 12 variables , for each of the 5x5 cells.

 

 

About

Code to create process spatial data into a 5x5 km grid for the continental US. The data includes weather, soil, and land-use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages