Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with tibble format in sits #1224

Open
safalabolo opened this issue Oct 15, 2024 · 0 comments
Open

Issue with tibble format in sits #1224

safalabolo opened this issue Oct 15, 2024 · 0 comments
Assignees
Milestone

Comments

@safalabolo
Copy link

I would like to use sits starting from my raster data and ground truth, but I am encountering some issues. In particular, it seems that the tibble I created (tibble_result), although very similar to the one provided in the documentation (cerrado_2classes), is not in a format that sits can process.
Below is the code.
Thank you for your feedback.

Load necessary packages

library(sf) # For handling spatial data (shapefiles)
library(terra) # For handling raster data
library(tidyverse) # For data manipulation and visualization

1. Load the shapefile containing points

shapefile_path <- "D:/2023_agropontino/03_EO/class/input/shape_sub/gt_erbaceo_distretti_maggio_agosto_rev2.shp"
points <- st_read(shapefile_path) # Read the shapefile into a spatial dataframe
Reading layer gt_erbaceo_distretti_maggio_agosto_rev2' from data source D:\2023_agropontino\03_EO\class\input\shape_sub\gt_erbaceo_distretti_maggio_agosto_rev2.shp' using driver `ESRI Shapefile'
Simple feature collection with 1987 features and 10 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 319366.8 ymin: 4569939 xmax: 353301.1 ymax: 4604548
Projected CRS: WGS 84 / UTM zone 33N

2. Create a data frame for correspondence mapping

corrispondenze <- data.frame(

  • class2 = c(0, 2, 10, 11, 12, 13, 14, 15, 16, 17, 22),
  • nome = c("nudo", "kiwi", "vernino", "mais1", "ortive1", "mais2", "medica", "doppia", "ortive2", "asparago", "erbai")
  • )

3. Join the correspondence dataframe with the shapefile

points <- points %>%

  • left_join(corrispondenze, by = "class2") # Merge based on the 'class2' column

4. Verify the addition of the 'nome' column

print(head(points)) # Display the first few rows to confirm the merge
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 330199.6 ymin: 4573826 xmax: 352157.7 ymax: 4595130
Projected CRS: WGS 84 / UTM zone 33N
Source gid id name crop crop2 date note class class2 nome geometry
1 gt_20230810 107 1300 WPT 2157 residui grano residui grano 20230810 10 10 vernino POINT (330456.3 4594962)
2 gt_20230810 183 1376 WPT 2233 residui grano residui grano 20230810 0 0 nudo POINT (330199.6 4595130)
3 gt_20230810 231 1424 WPT 1143 giardino in irrigazione giardino irriguo 20230810 x 5 22 erbai POINT (351926.8 4573826)
4 gt_20230810 234 1427 WPT 1146 residui grano residui grano 20230810 0 0 nudo POINT (351995 4574030)
5 gt_20230810 237 1430 WPT 1149 zucchine zucchine 20230810 15 15 doppia POINT (352157.7 4574082)
6 gt_20230810 238 1431 WPT 1150 ortaggi ortaggi 20230810 17 17 asparago POINT (352142 4573974)

4. (Optional) Save the new shapefile with the added column

#output_shapefile_path <- "D:/2023_agropontino/03_EO/class/input/shape_sub/gt_erbaceo_distretti_maggio_agosto_rev2_modificato.shp"
#st_write(points, output_shapefile_path) # Save the modified points with the new column

5. Define the input directory and load NDVI raster files

indir <- 'D:/2023_agropontino/03_EO/class/input/raster/ndvi_ws_asIn_conf/'
inFile <- list.files(path = indir, pattern = glob2rx("*.tif$"), full.names = TRUE, recursive = FALSE) # List all .tif files

3. Extract dates from the .tif filenames

dates_ndvi <- as.Date(substring(basename(inFile), 1, 8), "%Y%m%d") # Convert the extracted substrings to Date objects

6. Use terra::rast to create a raster stack (SpatRaster)

ndvi_stack <- rast(inFile) # Load raster files into a SpatRaster object

7. Ensure the projection of points matches that of the raster

points <- st_transform(points, crs = crs(ndvi_stack)) # Transform the CRS of points to match the raster's

8. Precise extraction of NDVI values from points for each date

ndvi_values <- terra::extract(ndvi_stack, vect(points), fun = mean, na.rm = TRUE) # Extract mean NDVI values at point locations

7. Create a tibble for each point

tibble_result <- tibble(

  • latitude = st_coordinates(points)[, 2], # Get latitude from coordinates
  • longitude = st_coordinates(points)[, 1], # Get longitude from coordinates
  • start_date = min(dates_ndvi), # Start date (first observation)
  • end_date = max(dates_ndvi), # End date (last observation)
  • label = as.character(points$nome), # Labels from points
  • cube = "S2" # Dataset name
  • )

9. Add extracted NDVI values to the tibble

Convert ndvi_values to a dataframe and remove the ID column

ndvi_df <- as.data.frame(ndvi_values)
colnames(ndvi_df) <- c("ID", as.character(dates_ndvi)) # Rename columns with dates
ndvi_df <- ndvi_df[, -1] # Remove the ID column

Create a new tibble to combine NDVI information

tibble_ndvi <- tibble(

  • Index = rep(dates_ndvi, each = nrow(ndvi_df)), # Repeat dates for each point
  • NDVI = as.vector(t(ndvi_df)) # Transpose NDVI values into a vector
  • )

Add NDVI values to the resulting tibble

tibble_result <- tibble_result %>%

  • mutate(time_series = map(1:nrow(ndvi_df), ~ tibble_ndvi[seq(.x, nrow(tibble_ndvi), by = nrow(points)), ])) # Create time series for each point

print(tibble_result) # Print the result tibble

A tibble: 1,987 × 7

latitude longitude start_date end_date label cube time_series

1 4594962. 330456. 2023-01-05 2023-12-21 vernino S2 <tibble [42 × 2]>
2 4595130. 330200. 2023-01-05 2023-12-21 nudo S2 <tibble [42 × 2]>
3 4573826. 351927. 2023-01-05 2023-12-21 erbai S2 <tibble [42 × 2]>
4 4574030. 351995. 2023-01-05 2023-12-21 nudo S2 <tibble [42 × 2]>
5 4574082. 352158. 2023-01-05 2023-12-21 doppia S2 <tibble [42 × 2]>
6 4573974. 352142. 2023-01-05 2023-12-21 asparago S2 <tibble [42 × 2]>
7 4573924. 352174. 2023-01-05 2023-12-21 doppia S2 <tibble [42 × 2]>
8 4573893. 352402. 2023-01-05 2023-12-21 nudo S2 <tibble [42 × 2]>
9 4574184. 352539. 2023-01-05 2023-12-21 doppia S2 <tibble [42 × 2]>
10 4574203. 352513. 2023-01-05 2023-12-21 asparago S2 <tibble [42 × 2]>

ℹ 1,977 more rows

ℹ Use print(n = ...) to see more rows

tibble_result[[7]][[1]] # Access specific entry in the tibble

A tibble: 42 × 2

Index NDVI

1 2023-01-05 0.221
2 2023-01-30 0.185
3 2023-02-04 0.807
4 2023-02-14 0.960
5 2023-03-16 0.309
6 2023-03-21 0.00304
7 2023-03-26 0.768
8 2023-04-10 0.799
9 2023-04-20 0.346
10 2023-04-25 0.120

ℹ 32 more rows

ℹ Use print(n = ...) to see more rows

Load additional package for time series analysis

library(sits)

Check data types of the tibble

sapply(tibble_result, class) # Check classes of columns in the result tibble
latitude longitude start_date end_date label cube time_series
"numeric" "numeric" "Date" "Date" "character" "character" "list"
sapply(cerrado_2classes, class) # Check classes in another dataset
longitude latitude start_date end_date label cube time_series
"numeric" "numeric" "Date" "Date" "character" "character" "list"

Show the first few rows of the result tibble

head(tibble_result) # Display the first rows of the tibble

A tibble: 6 × 7

latitude longitude start_date end_date label cube time_series

1 4594962. 330456. 2023-01-05 2023-12-21 vernino S2 <tibble [42 × 2]>
2 4595130. 330200. 2023-01-05 2023-12-21 nudo S2 <tibble [42 × 2]>
3 4573826. 351927. 2023-01-05 2023-12-21 erbai S2 <tibble [42 × 2]>
4 4574030. 351995. 2023-01-05 2023-12-21 nudo S2 <tibble [42 × 2]>
5 4574082. 352158. 2023-01-05 2023-12-21 doppia S2 <tibble [42 × 2]>
6 4573974. 352142. 2023-01-05 2023-12-21 asparago S2 <tibble [42 × 2]>

head(cerrado_2classes) # Display the first rows of the other dataset

A tibble: 6 × 7

longitude latitude start_date end_date label cube time_series

1 -54.2 -14.0 2000-09-13 2001-08-29 Cerrado MOD13Q1 <tibble [23 × 3]>
2 -54.2 -14.0 2001-09-14 2002-08-29 Cerrado MOD13Q1 <tibble [23 × 3]>
3 -54.2 -14.0 2002-09-14 2003-08-29 Cerrado MOD13Q1 <tibble [23 × 3]>
4 -54.2 -14.0 2003-09-14 2004-08-28 Cerrado MOD13Q1 <tibble [23 × 3]>
5 -54.2 -14.0 2004-09-13 2005-08-29 Cerrado MOD13Q1 <tibble [23 × 3]>
6 -54.2 -14.0 2005-09-14 2006-08-29 Cerrado MOD13Q1 <tibble [23 × 3]>

Perform k-fold validation using Random Forest on the results

rfor_validate <- sits_kfold_validate(

  • samples = tibble_result, # Use the result tibble as samples
  • folds = 5, # Number of folds for validation
  • ml_method = sits_rfor(), # Specify the machine learning method (Random Forest)
  • multicores = 5 # Number of cores for parallel processing
  • )
    Error in purrr::map():
    ℹ In index: 1.
    Caused by error in UseMethod():
    ! su un oggetto di classe ".predictors" è stato usato un metodo non applicabile per "c('tbl_df', 'tbl', 'data.frame')"
    Run rlang::last_trace() to see where the error occurred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

No branches or pull requests

3 participants