From 28fc2b44b3764cc4bba0a18de5020911e94693db Mon Sep 17 00:00:00 2001 From: dagibbs22 Date: Mon, 29 Mar 2021 10:30:24 -0400 Subject: [PATCH] Version 1 2 1 2020 tcl update (#12) * Starting to work on updating flux model with 2020 tree cover loss. First, updating burned area. * Fixing command for uploading hdf to s3. * Trying again from step 2 of burned area. * Trying again from step 2 of burned area. * Updated c++ scripts for 2020 TCL and made some other 2020 updates. Ran into some issue at the end of step 4 of burned area, so trying that again. * Ran individual model steps locally for test tile 00N_110E. Now going to try to change emissions c++ to use a constants header file. * Ran individual model steps locally for test tile 00N_110E. Added constants.h for emissions model, so standard and sensitivity analyses get their constants and file patterns from the same sources. Easier for updating sensitivity analyses that way. Couldn't figure out how to make equations.cpp also used constants.h, so the model_years in equations.cpp needs to be updated separately for now. Using constants.h compiles and runs locally. * Ran individual model steps locally for test tile 00N_110E. Added constants.h for emissions model, so standard and sensitivity analyses get their constants and file patterns from the same sources. Easier for updating sensitivity analyses that way. Couldn't figure out how to make equations.cpp also used constants.h, so the model_years in equations.cpp needs to be updated separately for now. Using constants.h compiles and runs locally. * Ran individual model steps locally for test tile 00N_110E. Added constants.h for emissions model, so standard and sensitivity analyses get their constants and file patterns from the same sources. Easier for updating sensitivity analyses that way. Couldn't figure out how to make equations.cpp also used constants.h, so the model_years in equations.cpp needs to be updated separately for now. Using constants.h compiles and runs locally. * Ran individual model steps locally for test tile 00N_110E. Added constants.h for emissions model, so standard and sensitivity analyses get their constants and file patterns from the same sources. Easier for updating sensitivity analyses that way. Couldn't figure out how to make equations.cpp also used constants.h, so the model_years in equations.cpp needs to be updated separately for now. Using constants.h compiles and runs locally. * Corrected the calls for checking for empty tiles in annual and gross removals script. * Corrected the calls for checking for empty tiles in annual and gross removals script. Now, annual removals only uploads tiles with data. Gross removals should be able to skip tile_ids that don't have the necessary input tiles (gain year couny and annual removals). * Corrected the calls for checking for empty tiles in annual and gross removals script. Now, annual removals only uploads tiles with data. Gross removals should be able to skip tile_ids that don't have the necessary input tiles (gain year couny and annual removals). * Still trying to figure out whether only annual removal factor tiles that have data are being copied to s3. * Fixing the tile data check function. * Correcting issue with tiles not existing for carbon pool creation. * Changing output folder dates. * Revised readme. Model ran on test tiles through aggregation. Need to fix supplementary output creation script error. * Revised readme. Model ran on test tiles through aggregation. Need to fix supplementary output creation script error. * Revised readme. Model ran on test tiles through aggregation. Need to fix supplementary output creation script error. * Final test of supplementary output creation step. * Going to run the full flux model with 2020 TCL data as model v1.2.1. Running from model_extent to emission-year carbon pool creation with today's date and everything after (emissions onwards) with date 20219999 because I'll need to rerun those steps with the updated drivers model. Tested on a few test tiles but this is my first run on all tiles with the 2020 update. * Going to run the full flux model with 2020 TCL data as model v1.2.1. Running from model_extent to emission-year carbon pool creation with today's date and everything after (emissions onwards) with date 20219999 because I'll need to rerun those steps with the updated drivers model. Tested on a few test tiles but this is my first run on all tiles with the 2020 update. * I've gotten errors a few times during full model runs when the script is trying to upload model logs. It seems to happen towards the end of a tile list and in stages that use more processors (though that could be coincidental). It seems to happen only sometimes, because I ran the same script a few times and sometimes the error happened and sometimes it didn't. My guess is that sometimes different log uploads compete with each other and that causes an error. So I'm reducing the calls for uploading the log by removing uploads when normal statements are printed (in print_log). Instead, I added upload_log() commands at the end of each tile being processed (in end_of_fx_summary), at the end of each model stage (in upload_final_set), and at the very end of the model (last line of run_full_model). Hopefully this'll reduce the conflict between log uploads. Exceptions and subprocess commands still automatically trigger log uploads. * Various small fixes to model. Should run smoothly from model_extent until create_supplementary_outputs. Ran into error with gross removals tile not existing on that stage. * Fixing the supplementary outputs step. Issue with what tile_id_list to use. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * Jimmy MacCarthy updated the TCL 2020 drivers, so ready to process the 2020 drivers and run emissions, net flux, and their aggregation/supplementary outputs for real now. * During emissions metadata creation, the log upload function got overwhelmed. Removed the log upload call in the subprocess.check_call function. * Need to rerun emissions onwards because I have corrected 2020 TCL driver map now (previous run used an erroneous driver map). This should be the final run of the 2020 emissions model (v1.2.1). * Need to rerun emissions onwards because I have corrected 2020 TCL driver map now (previous run used an erroneous driver map). This should be the final run of the 2020 emissions model (v1.2.1). * Need to rerun emissions onwards because I have corrected 2020 TCL driver map now (previous run used an erroneous driver map). This should be the final run of the 2020 emissions model (v1.2.1). * Still got an error from uploading logs too quickly to s3. So, removed log upload call from the subprocess call. * For emissions and final steps using the final, corrected drivers for flux model v1.2.1 (2001-2020 update). * Now with the correctly reprojected driver map. * Ran into issue with deleting extra tiles after creating supplementary outputs. * Successfully ran main flux model with biomass_soil emissions, going to run soil_only gross emissions for model v1.2.1 (2001-2020) now. * Successfully ran main flux model with biomass_soil emissions, going to run soil_only gross emissions for model v1.2.1 (2001-2020) now. * Successfully ran main flux model with biomass_soil emissions, going to run soil_only gross emissions for model v1.2.1 (2001-2020) now. Corrected the output soil_only emissions tile names. * Successfully ran main flux model with biomass_soil emissions, going to run soil_only gross emissions for model v1.2.1 (2001-2020) now. Corrected the output soil_only emissions tile names... needed one final correction. --- Dockerfile | 5 +- analyses/mp_aggregate_results_to_4_km.py | 6 +- analyses/mp_create_supplementary_outputs.py | 8 +- analyses/net_flux.py | 8 +- burn_date/hansen_burnyear_final.py | 4 +- burn_date/mp_burn_year.py | 87 ++-- burn_date/stack_ba_hv.py | 3 +- burn_date/utilities.py | 2 +- carbon_pools/create_carbon_pools.py | 23 +- carbon_pools/mp_create_carbon_pools.py | 18 +- carbon_pools/mp_create_soil_C.py | 268 ++++++----- constants_and_names.py | 94 ++-- data_import.bat | 32 ++ data_prep/mp_model_extent.py | 4 +- data_prep/mp_plantation_preparation.py | 12 +- data_prep/mp_prep_other_inputs.py | 446 +++++++++--------- data_prep/plantation_preparation.py | 20 +- ...c_gross_emissions_convert_to_grassland.cpp | 101 ++-- .../cpp_util/calc_gross_emissions_generic.cpp | 166 ++++--- .../calc_gross_emissions_no_shifting_ag.cpp | 97 ++-- .../calc_gross_emissions_soil_only.cpp | 134 ++++-- emissions/cpp_util/constants.h | 57 +++ emissions/cpp_util/equations.cpp | 19 +- emissions/mp_calculate_gross_emissions.py | 17 +- ...nual_gain_rate_AGC_BGC_all_forest_types.py | 18 +- gain/forest_age_category_IPCC.py | 5 +- gain/gain_year_count_all_forest_types.py | 6 +- gain/gross_removals_all_forest_types.py | 24 +- ...nual_gain_rate_AGC_BGC_all_forest_types.py | 22 +- gain/mp_gain_year_count_all_forest_types.py | 6 +- gain/mp_gross_removals_all_forest_types.py | 27 +- readme.md | 95 ++-- run_full_model.py | 15 +- universal_util.py | 152 ++++-- 34 files changed, 1226 insertions(+), 775 deletions(-) create mode 100644 data_import.bat create mode 100644 emissions/cpp_util/constants.h diff --git a/Dockerfile b/Dockerfile index 82bc904e..b2ee223d 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,7 +11,7 @@ ENV SECRETS_PATH /usr/secrets # set timezone fo tzdata RUN ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime -# Install missing dependencies +# Install dependencies RUN apt-get update -y && apt-get install -y \ make \ automake \ @@ -53,6 +53,9 @@ RUN cd /usr/include && ln -s ./ gdal #https://www.continualintegration.com/miscellaneous-articles/all/how-do-you-troubleshoot-usr-bin-env-python-no-such-file-or-directory/ RUN ln -s /usr/bin/python3 /usr/bin/python +# Enable ec2 to interact with GitHub +RUN git config --global user.email dagibbs22@gmail.com + ## Check out the branch that I'm currently using for model development #RUN git checkout model_v_1.2.0 # diff --git a/analyses/mp_aggregate_results_to_4_km.py b/analyses/mp_aggregate_results_to_4_km.py index 6d1945a0..e108ece7 100644 --- a/analyses/mp_aggregate_results_to_4_km.py +++ b/analyses/mp_aggregate_results_to_4_km.py @@ -44,8 +44,8 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux # Files to download for this script download_dict = { - cn.annual_gain_AGC_all_types_dir: [cn.pattern_annual_gain_AGC_all_types], - cn.cumul_gain_AGCO2_BGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types], + # cn.annual_gain_AGC_all_types_dir: [cn.pattern_annual_gain_AGC_all_types], + # cn.cumul_gain_AGCO2_BGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types], cn.gross_emis_all_gases_all_drivers_biomass_soil_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil], cn.net_flux_dir: [cn.pattern_net_flux] } @@ -219,7 +219,7 @@ def mp_aggregate_results_to_4_km(sensit_type, thresh, tile_id_list, std_net_flux for tile_name in tile_list: tile_id = uu.get_tile_id(tile_name) - os.remove('{0}_{1}.tif'.format(tile_id, pattern)) + # os.remove('{0}_{1}.tif'.format(tile_id, pattern)) os.remove('{0}_{1}_rewindow.tif'.format(tile_id, pattern)) os.remove('{0}_{1}_0_4deg.tif'.format(tile_id, pattern)) diff --git a/analyses/mp_create_supplementary_outputs.py b/analyses/mp_create_supplementary_outputs.py index 3300d288..489e0c68 100644 --- a/analyses/mp_create_supplementary_outputs.py +++ b/analyses/mp_create_supplementary_outputs.py @@ -32,8 +32,10 @@ def mp_create_supplementary_outputs(sensit_type, tile_id_list, run_date = None): os.chdir(cn.docker_base_dir) + tile_id_list_outer = tile_id_list + # If a full model run is specified, the correct set of tiles for the particular script is listed - if tile_id_list == 'all': + if tile_id_list_outer == 'all': # List of tiles to run in the model tile_id_list_outer = uu.tile_list_s3(cn.net_flux_dir, sensit_type) @@ -43,7 +45,7 @@ def mp_create_supplementary_outputs(sensit_type, tile_id_list, run_date = None): # Files to download for this script download_dict = { - cn.cumul_gain_AGCO2_BGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types], + # cn.cumul_gain_AGCO2_BGCO2_all_types_dir: [cn.pattern_cumul_gain_AGCO2_BGCO2_all_types], cn.gross_emis_all_gases_all_drivers_biomass_soil_dir: [cn.pattern_gross_emis_all_gases_all_drivers_biomass_soil], cn.net_flux_dir: [cn.pattern_net_flux] } @@ -107,7 +109,7 @@ def mp_create_supplementary_outputs(sensit_type, tile_id_list, run_date = None): # List of tiles to run in the model tile_id_list_input = uu.tile_list_s3(input_dir, sensit_type) else: - tile_id_list_input = tile_id_list + tile_id_list_input = tile_id_list_outer uu.print_log(tile_id_list_input) uu.print_log("There are {} tiles to process".format(str(len(tile_id_list_input))) + "\n") diff --git a/analyses/net_flux.py b/analyses/net_flux.py index 788e4a4d..50b116d4 100644 --- a/analyses/net_flux.py +++ b/analyses/net_flux.py @@ -29,9 +29,9 @@ def net_calc(tile_id, pattern, sensit_type): kwargs = removals_src.meta # Grabs the windows of the tile (stripes) so we can iterate over the entire tif without running out of memory windows = removals_src.block_windows(1) - uu.print_log(" Gross removals tile {} found".format(removals_in)) + uu.print_log(" Gross removals tile found for {}".format(removals_in)) except: - uu.print_log(" No gross removals tile {} found".format(removals_in)) + uu.print_log(" No gross removals tile found for {}".format(removals_in)) try: emissions_src = rasterio.open(emissions_in) @@ -39,9 +39,9 @@ def net_calc(tile_id, pattern, sensit_type): kwargs = emissions_src.meta # Grabs the windows of the tile (stripes) so we can iterate over the entire tif without running out of memory windows = emissions_src.block_windows(1) - uu.print_log(" Gross emissions tile {} found".format(emissions_in)) + uu.print_log(" Gross emissions tile found for {}".format(emissions_in)) except: - uu.print_log(" No gross emissions tile {} found".format(emissions_in)) + uu.print_log(" No gross emissions tile found for {}".format(emissions_in)) # Skips the tile if there is neither a gross emissions nor a gross removals tile. # This should only occur for biomass_swap sensitivity analysis, which gets its net flux tile list from diff --git a/burn_date/hansen_burnyear_final.py b/burn_date/hansen_burnyear_final.py index d5c2f8d0..845d4914 100644 --- a/burn_date/hansen_burnyear_final.py +++ b/burn_date/hansen_burnyear_final.py @@ -23,7 +23,7 @@ def hansen_burnyear(tile_id): # once metadata tags have been added. out_tile_no_tag = '{0}_{1}_no_tag.tif'.format(tile_id, cn.pattern_burn_year) out_tile = '{0}_{1}.tif'.format(tile_id, cn.pattern_burn_year) - loss = '{0}_{1}.tif'.format(cn.pattern_loss, tile_id) + loss = '{0}.tif'.format(tile_id) # Does not continue processing tile if no loss (because there will not be any output) if not os.path.exists(loss): @@ -145,7 +145,7 @@ def hansen_burnyear(tile_id): out_tile_tagged.update_tags( units='year (2001, 2002, 2003...)') out_tile_tagged.update_tags( - source='MODIS collection 6 burned area') + source='MODIS collection 6 burned area, https://modis-fire.umd.edu/files/MODIS_C6_BA_User_Guide_1.3.pdf') out_tile_tagged.update_tags( extent='global') diff --git a/burn_date/mp_burn_year.py b/burn_date/mp_burn_year.py index 1ed072a8..b568b428 100644 --- a/burn_date/mp_burn_year.py +++ b/burn_date/mp_burn_year.py @@ -1,28 +1,23 @@ ''' -Creates tiles of when tree cover loss coincides with burning. -There are kind of four steps to this: 1) acquire raw hdfs from MODIS burned area ftp; 2) make tifs of burned area for -each year in each MODUS h-v tile; 3) make annual Hansen tiles of burned area; 4) make tiles of where TCL and burning -coincided (same year or with 1 year lag). +Creates tiles of when tree cover loss coincides with burning or preceded burning by one year. +There are four steps to this: 1) acquire raw hdfs from MODIS burned area sftp; 2) make tifs of burned area for +each year in each MODUS h-v tile; 3) make annual Hansen-style (extent, res, etc.) tiles of burned area; +4) make tiles of where TCL and burning coincided (same year or with 1 year lag). To update this, steps 1-3 can be run on only the latest year of MODIS burned area product. Only step 4 needs to be run on the entire time series. That is, steps 1-3 operate on burned area products separately for each year, so adding another year of data won't change steps 1-3 for preceding years. -When I ran this for the model v1.2.0 update, I ran it step by step, so I've never run this all in one go and don't know -if there are issues with doing that (storage, path names, etc.). However, any issues like that should be easy enough -to fix now that this is consolidated into one master script. - -Step 4 takes many hours to run, mostly because it only uses five processors since each one requires so much memory. -The other three steps can also take a few hours, I believe. Point is-- updating burned area takes a while. - -This is still basically as Sam Gibbes wrote it in early 2018, with file name changes and other cosmetic changes -by David Gibbs. The real processing code is still all by Sam. - -NOTE: The step in which hdf files are downloaded from the MODIS burned area site using wget (step 1) requires +NOTE: The step in which hdf files are opened and converted to tifs (step 2) requires osgeo/gdal:ubuntu-full-X.X.X Docker image. The "small' Docker image doesn't have an hdf driver in gdal, so it can't read the hdf files on the ftp site. The rest of the burned area analysis can be done with a 'small' version of the Docker image (though that would require terminating the Docker container and restarting it, which would only make sense if the analysis was being continued later). +Step 4 takes many hours to run, mostly because it only uses five processors since each one requires so much memory. +The other steps might take an hour or two to run. + +This is still basically as Sam Gibbes wrote it in early 2018, with file name changes and other input/output changes +by David Gibbs. The real processing code is still all by Sam's parts. ''' import multiprocessing @@ -61,19 +56,10 @@ def mp_burn_year(tile_id_list, run_date = None): output_dir_list = [cn.burn_year_dir] output_pattern_list = [cn.pattern_burn_year] - # Step 1: - # Downloads the latest year of raw burn area hdfs to the spot machine. - # This step requires using osgeo/gdal:ubuntu-full-X.X.X Docker image because the small image doesn't have an - # hdf driver in gdal. - file_name = "*.hdf" - raw_source = '{0}/20{1}'.format(cn.burn_area_raw_ftp, cn.loss_years) - cmd = ['wget', '-r', '--ftp-user=user', '--ftp-password=burnt_data', '--accept', file_name] - cmd += ['--no-directories', '--no-parent', raw_source] - uu.log_subprocess_output_full(cmd) - - # Uploads the latest year of raw burn area hdfs to s3 - cmd = ['aws', 's3', 'cp', '.', cn.burn_year_hdf_raw_dir, '--recursive', '--exclude', '*', '--include', '*hdf'] - uu.log_subprocess_output_full(cmd) + # A date can optionally be provided. + # This replaces the date in constants_and_names. + if run_date is not None: + output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) global_grid_hv = ["h00v08", "h00v09", "h00v10", "h01v07", "h01v08", "h01v09", "h01v10", "h01v11", "h02v06", "h02v08", "h02v09", "h02v10", "h02v11", "h03v06", "h03v07", "h03v09", "h03v10", "h03v11", @@ -106,8 +92,34 @@ def mp_burn_year(tile_id_list, run_date = None): "h32v11", "h32v12", "h33v07", "h33v08", "h33v09", "h33v10", "h33v11", "h34v07", "h34v08", "h34v09", "h34v10", "h35v08", "h35v09", "h35v10"] + + # Step 1: download hdf files for relevant year(s) from sftp site. + # This only needs to be done for the most recent year of data. + + ''' + Downloading the hdf files from the sftp burned area site is done outside the script in the sftp shell on the command line. + This will download all the 2020 hdfs to the spot machine. It will take a few minutes before the first + hdf is downloaded but then it should go quickly. + Change 2020 to other year for future years of downloads. + https://modis-fire.umd.edu/files/MODIS_C6_BA_User_Guide_1.3.pdf, page 24, section 4.1.3 + + sftp fire@fuoco.geog.umd.edu + [For password] burnt + cd data/MODIS/C6/MCD64A1/HDF + ls [to check that it's the folder with all the tile folders] + get h??v??/MCD64A1.A2020* + bye //exits the stfp shell + ''' + + # Uploads the latest year of raw burn area hdfs to s3. + # All hdfs go in this folder + cmd = ['aws', 's3', 'cp', '{0}/burn_date/'.format(cn.docker_app), cn.burn_year_hdf_raw_dir, '--recursive', '--exclude', '*', '--include', '*hdf'] + uu.log_subprocess_output_full(cmd) + + # Step 2: - # Makes burned area rasters for each year for each MODIS horizontal-vertical tile + # Makes burned area rasters for each year for each MODIS horizontal-vertical tile. + # This only needs to be done for the most recent year of data (set in stach_ba_hv). uu.print_log("Stacking hdf into MODIS burned area tifs by year and MODIS hv tile...") count = multiprocessing.cpu_count() @@ -116,17 +128,20 @@ def mp_burn_year(tile_id_list, run_date = None): pool.close() pool.join() - # For single processor use - for hv_tile in global_grid_hv: - stack_ba_hv.stack_ba_hv(hv_tile) + # # For single processor use + # for hv_tile in global_grid_hv: + # stack_ba_hv.stack_ba_hv(hv_tile) # Step 3: # Creates a 10x10 degree wgs 84 tile of .00025 res burned year. # Downloads all MODIS hv tiles from s3, # makes a mosaic for each year, and warps to Hansen extent. - # Range is inclusive at lower end and exclusive at upper end (e.g., 2001, 2020 goes from 2001 to 2019) - for year in range(2019, 2020): + # Range is inclusive at lower end and exclusive at upper end (e.g., 2001, 2021 goes from 2001 to 2020). + # This only needs to be done for the most recent year of data. + # NOTE: The first time I ran this for the 2020 TCL update, I got an error about uploading the log to s3 + # after most of the tiles were processed. I didn't know why it happened, so I reran the step and it went fine. + for year in range(2020, 2021): uu.print_log("Processing", year) @@ -188,13 +203,15 @@ def mp_burn_year(tile_id_list, run_date = None): # Step 4: # Creates a single Hansen tile covering all years that represents where burning coincided with tree cover loss + # or preceded TCL by one year. + # This needs to be done on all years each time burned area is updated. # Downloads the loss tiles uu.s3_folder_download(cn.loss_dir, '.', 'std', cn.pattern_loss) uu.print_log("Extracting burn year data that coincides with tree cover loss...") - # Downloads the 10x10 deg burn year tiles (1 for each year in which there was burned areaa), stack and evaluate + # Downloads the 10x10 deg burn year tiles (1 for each year in which there was burned area), stack and evaluate # to return burn year values on hansen loss pixels within 1 year of loss date if cn.count == 96: processes = 5 diff --git a/burn_date/stack_ba_hv.py b/burn_date/stack_ba_hv.py index 73da78a4..b06798b0 100644 --- a/burn_date/stack_ba_hv.py +++ b/burn_date/stack_ba_hv.py @@ -11,7 +11,7 @@ def stack_ba_hv(hv_tile): - for year in range(2019, 2020): # End year is not included in burn year product + for year in range(2020, 2021): # End year is not included in burn year product # Download hdf files from s3 into folders by h and v output_dir = utilities.makedir('{0}/{1}/raw/'.format(hv_tile, year)) @@ -23,7 +23,6 @@ def stack_ba_hv(hv_tile): if len(hdf_files) > 0: array_list = [] for hdf in hdf_files: - uu.print_log("converting hdf to array") array = utilities.hdf_to_array(hdf) array_list.append(array) diff --git a/burn_date/utilities.py b/burn_date/utilities.py index f01bbb98..ff0b4109 100644 --- a/burn_date/utilities.py +++ b/burn_date/utilities.py @@ -130,7 +130,7 @@ def makedir(dir): def download_df(year, hv_tile, output_dir): - include = '*A{0}*{1}*'.format(year, hv_tile) + include = 'MCD64A1.A{0}*{1}*'.format(year, hv_tile) cmd = ['aws', 's3', 'cp', cn.burn_year_hdf_raw_dir, output_dir, '--recursive', '--exclude', "*", '--include', include] diff --git a/carbon_pools/create_carbon_pools.py b/carbon_pools/create_carbon_pools.py index 8ac9d58b..df536c62 100644 --- a/carbon_pools/create_carbon_pools.py +++ b/carbon_pools/create_carbon_pools.py @@ -70,10 +70,10 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent): loss_year = '{}_{}.tif'.format(tile_id, cn.pattern_Mekong_loss_processed) else: uu.print_log(" Hansen loss tile found for {}".format(tile_id)) - loss_year = '{0}_{1}.tif'.format(cn.pattern_loss, tile_id) + loss_year = '{0}.tif'.format(tile_id) - # This input should exist - removal_forest_type_src = rasterio.open(removal_forest_type) + # This input is required to exist + loss_year_src = rasterio.open(loss_year) # Opens the input tiles if they exist try: @@ -107,17 +107,17 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent): uu.print_log(" No gain tile found for", tile_id) try: - loss_year_src = rasterio.open(loss_year) - uu.print_log(" Loss tile found for", tile_id) + removal_forest_type_src = rasterio.open(removal_forest_type) + uu.print_log(" Removal type tile found for", tile_id) except: - uu.print_log(" No loss tile found for", tile_id) + uu.print_log(" No removal type tile found for", tile_id) # Grabs the windows of a tile to iterate over the entire tif without running out of memory - windows = removal_forest_type_src.block_windows(1) + windows = loss_year_src.block_windows(1) # Grabs metadata for one of the input tiles, like its location/projection/cellsize - kwargs = removal_forest_type_src.meta + kwargs = loss_year_src.meta # Updates kwargs for the output dataset. # Need to update data type to float 32 so that it can handle fractional carbon @@ -129,7 +129,6 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent): dtype='float32' ) - # The output files: aboveground carbon density in 2000 and in the year of loss. Creates names and rasters to write to. if '2000' in carbon_pool_extent: output_pattern_list = [cn.pattern_AGC_2000] @@ -167,7 +166,7 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent): for idx, window in windows: # Reads the input tiles' windows. For windows from tiles that may not exist, an array of all 0s is created. - removal_forest_type_window = removal_forest_type_src.read(1, window=window) + loss_year_window = loss_year_src.read(1, window=window) try: annual_gain_AGC_window = annual_gain_AGC_src.read(1, window=window) except: @@ -177,9 +176,9 @@ def create_AGC(tile_id, sensit_type, carbon_pool_extent): except: cumul_gain_AGCO2_window = np.zeros((window.height, window.width), dtype='float32') try: - loss_year_window = loss_year_src.read(1, window=window) + removal_forest_type_window = removal_forest_type_src.read(1, window=window) except: - loss_year_window = np.zeros((window.height, window.width), dtype='uint8') + removal_forest_type_window = np.zeros((window.height, window.width), dtype='uint8') try: gain_window = gain_src.read(1, window=window) except: diff --git a/carbon_pools/mp_create_carbon_pools.py b/carbon_pools/mp_create_carbon_pools.py index 6f57794b..c714b335 100644 --- a/carbon_pools/mp_create_carbon_pools.py +++ b/carbon_pools/mp_create_carbon_pools.py @@ -1,5 +1,5 @@ ''' -This script creates carbon emitted_pools. +This script creates carbon pools in the year of loss (emitted-year carbon) and in 2000. For the year 2000, it creates aboveground, belowground, deadwood, litter, and total carbon emitted_pools (soil is created in a separate script but is brought in to create total carbon). All but total carbon are to the extent of WHRC and mangrove biomass 2000, while total carbon is to the extent of WHRC AGB, mangrove AGB, and soil C. @@ -48,9 +48,10 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da # If a full model run is specified, the correct set of tiles for the particular script is listed. - # For runs generating carbon pools in emissions year, only tiles with model extent and loss are relevant. + # For runs generating carbon pools in emissions year, only tiles with model extent and loss are relevant + # because there must be loss pixels for emissions-year carbon pools to exist. if (tile_id_list == 'all') & (carbon_pool_extent == 'loss'): - # Lists the tiles that have both model extent and loss pixels + # Lists the tiles that have both model extent and loss pixels, both being necessary precursors for emissions model_extent_tile_id_list = uu.tile_list_s3(cn.model_extent_dir, sensit_type=sensit_type) loss_tile_id_list = uu.tile_list_s3(cn.loss_dir, sensit_type=sensit_type) uu.print_log("Carbon pool at emissions year is combination of model_extent and loss tiles:") @@ -197,7 +198,7 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da processes = 16 # 16 processors = XXX GB peak else: processes = 20 # 25 processors > 750 GB peak; 16 = 560 GB peak; - # 18 = 570 GB peak; 19 = 620 GB peak; 20 = 670 GB peak; 21 > 750 GB peak + # 18 = 570 GB peak; 19 = 620 GB peak; 20 = 690 GB peak (stops at 600, then increases slowly); 21 > 750 GB peak else: # For 2000, or loss & 2000 processes = 15 # 12 processors = 490 GB peak (stops around 455, then increases slowly); 15 = XXX GB peak else: @@ -239,7 +240,7 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if sensit_type == 'biomass_swap': processes = 30 # 30 processors = XXX GB peak else: - processes = 38 # 20 processors = 370 GB peak; 32 = 590 GB peak; 36 = 670 GB peak; 38 = 700 GB peak + processes = 39 # 20 processors = 370 GB peak; 32 = 590 GB peak; 36 = 670 GB peak; 38 = 690 GB peak; 39 = XXX GB peak else: # For 2000, or loss & 2000 processes = 30 # 20 processors = 370 GB peak; 25 = 460 GB peak; 30 = XXX GB peak else: @@ -272,7 +273,6 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da tiles_to_delete = [] tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_BGC_2000))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_removal_forest_type))) - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_loss))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_gain))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_soil_C_full_extent_2000))) @@ -291,7 +291,7 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if sensit_type == 'biomass_swap': processes = 10 # 10 processors = XXX GB peak else: - processes = 14 # 32 processors = >750 GB peak; 24 > 750 GB peak; 14 = 650 GB peak; 15 = 700 GB peak + processes = 15 # 32 processors = >750 GB peak; 24 > 750 GB peak; 14 = 685 GB peak (stops around 600, then increases very very slowly); 15 = 700 GB peak else: # For 2000, or loss & 2000 ### Note: deleted precip, elevation, and WHRC AGB tiles at equatorial latitudes as deadwood and litter were produced. ### There wouldn't have been enough room for all deadwood and litter otherwise. @@ -356,7 +356,7 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if sensit_type == 'biomass_swap': processes = 36 # 36 processors = XXX GB peak else: - processes = 42 # 24 processors = 360 GB peak; 32 = 490 GB peak; 38 = 580 GB peak; 42 = XXX GB peak + processes = 44 # 24 processors = 360 GB peak; 32 = 490 GB peak; 38 = 580 GB peak; 42 = 640 GB peak; 44 = XXX GB peak else: # For 2000, or loss & 2000 processes = 12 # 12 processors = XXX GB peak else: @@ -410,7 +410,7 @@ def mp_create_carbon_pools(sensit_type, tile_id_list, carbon_pool_extent, run_da if sensit_type == 'biomass_swap': processes = 14 # 14 processors = XXX GB peak else: - processes = 18 # 20 processors > 750 GB peak (by just a bit, I think); 15 = 550 GB peak; 18 = XXX GB peak + processes = 19 # 20 processors > 750 GB peak (by just a bit, I think); 15 = 550 GB peak; 18 = 660 GB peak; 19 = XXX GB peak else: # For 2000, or loss & 2000 processes = 12 # 12 processors = XXX GB peak else: diff --git a/carbon_pools/mp_create_soil_C.py b/carbon_pools/mp_create_soil_C.py index 1540b196..c2d2e8b0 100644 --- a/carbon_pools/mp_create_soil_C.py +++ b/carbon_pools/mp_create_soil_C.py @@ -36,7 +36,8 @@ def mp_create_soil_C(tile_id_list): if tile_id_list == 'all': # List of tiles to run in the model tile_id_list = uu.create_combined_tile_list(cn.WHRC_biomass_2000_unmasked_dir, - cn.mangrove_biomass_2000_dir + cn.mangrove_biomass_2000_dir, + set3=cn.gain_dir ) uu.print_log(tile_id_list) @@ -50,104 +51,104 @@ def mp_create_soil_C(tile_id_list): ### Soil carbon density - # uu.print_log("Downloading mangrove soil C rasters") - # uu.s3_file_download(os.path.join(cn.mangrove_soil_C_dir, cn.name_mangrove_soil_C), cn.docker_base_dir, sensit_type) - # - # # For downloading all tiles in the input folders. - # input_files = [cn.mangrove_biomass_2000_dir] - # - # for input in input_files: - # uu.s3_folder_download(input, cn.docker_base_dir, sensit_type) - # - # # Download raw mineral soil C density tiles. - # # First tries to download index.html.tmp from every folder, then goes back and downloads all the tifs in each folder - # # Based on https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it - # # There are 12951 tiles and it takes about 3 hours to download them! - # cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', - # '--accept', '*.tif', '{}'.format(cn.mineral_soil_C_url)] - # uu.log_subprocess_output_full(cmd) - # - # uu.print_log("Unzipping mangrove soil C rasters...") - # cmd = ['unzip', '-j', cn.name_mangrove_soil_C, '-d', cn.docker_base_dir] - # uu.log_subprocess_output_full(cmd) - # - # # Mangrove soil receives precedence over mineral soil - # uu.print_log("Making mangrove soil C vrt...") - # check_call('gdalbuildvrt mangrove_soil_C.vrt *{}*.tif'.format(cn.pattern_mangrove_soil_C_raw), shell=True) - # uu.print_log("Done making mangrove soil C vrt") - # - # uu.print_log("Making mangrove soil C tiles...") - # - # if cn.count == 96: - # processes = 32 # 32 processors = 570 GB peak - # else: - # processes = int(cn.count/3) - # uu.print_log('Mangrove soil C max processors=', processes) - # pool = multiprocessing.Pool(processes) - # pool.map(create_soil_C.create_mangrove_soil_C, tile_id_list) - # pool.close() - # pool.join() - # - # # # For single processor use - # # for tile_id in tile_id_list: - # # - # # create_soil_C.create_mangrove_soil_C(tile_id) - # - # uu.print_log('Done making mangrove soil C tiles', '\n') - # - # uu.print_log("Making mineral soil C vrt...") - # check_call('gdalbuildvrt mineral_soil_C.vrt *{}*'.format(cn.pattern_mineral_soil_C_raw), shell=True) - # uu.print_log("Done making mineral soil C vrt") - # - # # Creates mineral soil C density tiles - # source_raster = 'mineral_soil_C.vrt' - # out_pattern = 'mineral_soil' - # dt = 'Int16' - # if cn.count == 96: - # processes = 50 # 32 processors = 100 GB peak; 50 = XXX GB peak - # else: - # processes = int(cn.count/2) - # uu.print_log("Creating mineral soil C density tiles with {} processors...".format(processes)) - # pool = multiprocessing.Pool(processes) - # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - # pool.close() - # pool.join() - # - # # # For single processor use - # # for tile_id in tile_id_list: - # # - # # create_soil_C.create_mineral_soil_C(tile_id) - # - # uu.print_log("Done making mineral soil C tiles", "\n") - # - # - # uu.print_log("Making combined (mangrove & non-mangrove) soil C tiles...") - # - # if cn.count == 96: - # processes = 45 # 45 processors = XXX GB peak - # else: - # processes = int(cn.count/2) - # uu.print_log('Combined soil C max processors=', processes) - # pool = multiprocessing.Pool(processes) - # pool.map(create_soil_C.create_combined_soil_C, tile_id_list) - # pool.close() - # pool.join() - # - # # # For single processor use - # # for tile in tile_list: - # # - # # create_soil_C.create_combined_soil_C(tile_id) + uu.print_log("Downloading mangrove soil C rasters") + uu.s3_file_download(os.path.join(cn.mangrove_soil_C_dir, cn.name_mangrove_soil_C), cn.docker_base_dir, sensit_type) + + # For downloading all tiles in the input folders. + input_files = [cn.mangrove_biomass_2000_dir] + + for input in input_files: + uu.s3_folder_download(input, cn.docker_base_dir, sensit_type) + + # Download raw mineral soil C density tiles. + # First tries to download index.html.tmp from every folder, then goes back and downloads all the tifs in each folder + # Based on https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it + # There are 12951 tiles and it takes about 3 hours to download them! + cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', + '--accept', '*.tif', '{}'.format(cn.mineral_soil_C_url)] + uu.log_subprocess_output_full(cmd) + + uu.print_log("Unzipping mangrove soil C rasters...") + cmd = ['unzip', '-j', cn.name_mangrove_soil_C, '-d', cn.docker_base_dir] + uu.log_subprocess_output_full(cmd) + + # Mangrove soil receives precedence over mineral soil + uu.print_log("Making mangrove soil C vrt...") + check_call('gdalbuildvrt mangrove_soil_C.vrt *{}*.tif'.format(cn.pattern_mangrove_soil_C_raw), shell=True) + uu.print_log("Done making mangrove soil C vrt") + + uu.print_log("Making mangrove soil C tiles...") + + if cn.count == 96: + processes = 32 # 32 processors = 570 GB peak + else: + processes = int(cn.count/3) + uu.print_log('Mangrove soil C max processors=', processes) + pool = multiprocessing.Pool(processes) + pool.map(create_soil_C.create_mangrove_soil_C, tile_id_list) + pool.close() + pool.join() + + # # For single processor use + # for tile_id in tile_id_list: # - # uu.print_log("Done making combined soil C tiles") + # create_soil_C.create_mangrove_soil_C(tile_id) + + uu.print_log('Done making mangrove soil C tiles', '\n') + + uu.print_log("Making mineral soil C vrt...") + check_call('gdalbuildvrt mineral_soil_C.vrt *{}*'.format(cn.pattern_mineral_soil_C_raw), shell=True) + uu.print_log("Done making mineral soil C vrt") + + # Creates mineral soil C density tiles + source_raster = 'mineral_soil_C.vrt' + out_pattern = 'mineral_soil' + dt = 'Int16' + if cn.count == 96: + processes = 50 # 32 processors = 100 GB peak; 50 = XXX GB peak + else: + processes = int(cn.count/2) + uu.print_log("Creating mineral soil C density tiles with {} processors...".format(processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + pool.close() + pool.join() + + # # For single processor use + # for tile_id in tile_id_list: # - # uu.print_log("Uploading soil C density tiles") - # uu.upload_final_set(output_dir_list[0], output_pattern_list[0]) + # create_soil_C.create_mineral_soil_C(tile_id) + + uu.print_log("Done making mineral soil C tiles", "\n") + + + uu.print_log("Making combined (mangrove & non-mangrove) soil C tiles...") + + if cn.count == 96: + processes = 45 # 45 processors = XXX GB peak + else: + processes = int(cn.count/2) + uu.print_log('Combined soil C max processors=', processes) + pool = multiprocessing.Pool(processes) + pool.map(create_soil_C.create_combined_soil_C, tile_id_list) + pool.close() + pool.join() + + # # For single processor use + # for tile in tile_list: # - # # Need to delete soil c density rasters because they have the same pattern as the standard deviation rasters - # uu.print_log("Deleting raw soil C density rasters") - # c_stocks = glob.glob('*{}*'.format(cn.pattern_soil_C_full_extent_2000)) - # for c_stock in c_stocks: - # os.remove(c_stock) + # create_soil_C.create_combined_soil_C(tile_id) + + uu.print_log("Done making combined soil C tiles") + + uu.print_log("Uploading soil C density tiles") + uu.upload_final_set(output_dir_list[0], output_pattern_list[0]) + + # Need to delete soil c density rasters because they have the same pattern as the standard deviation rasters + uu.print_log("Deleting raw soil C density rasters") + c_stocks = glob.glob('*{}*'.format(cn.pattern_soil_C_full_extent_2000)) + for c_stock in c_stocks: + os.remove(c_stock) ### Soil carbon density uncertainty @@ -159,35 +160,35 @@ def mp_create_soil_C(tile_id_list): vrt_CI95 = 'mineral_soil_C_CI95.vrt' soil_C_stdev_global = 'soil_C_stdev.tif' - # # Download raw mineral soil C density 5% CI tiles - # # First tries to download index.html.tmp from every folder, then goes back and downloads all the tifs in each folder - # # Based on https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it - # # Like soil C density rasters, there are 12951 tifs and they take about 3 hours to download. - # os.mkdir(dir_CI05) - # - # cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', - # '--directory-prefix={}'.format(dir_CI05), - # '--accept', '*.tif', '{}'.format(cn.CI5_mineral_soil_C_url)] - # uu.log_subprocess_output_full(cmd) - # - # uu.print_log("Making mineral soil C 5% CI vrt...") + # Download raw mineral soil C density 5% CI tiles + # First tries to download index.html.tmp from every folder, then goes back and downloads all the tifs in each folder + # Based on https://stackoverflow.com/questions/273743/using-wget-to-recursively-fetch-a-directory-with-arbitrary-files-in-it + # Like soil C density rasters, there are 12951 tifs and they take about 3 hours to download. + os.mkdir(dir_CI05) - # check_call('gdalbuildvrt {0} {1}*{2}*'.format(vrt_CI05, dir_CI05, cn.pattern_uncert_mineral_soil_C_raw), shell=True) - # uu.print_log("Done making mineral soil C CI05 vrt") - # - # # Download raw mineral soil C density 5% CI tiles - # # Like soil C density rasters, there are 12951 tifs and they take about 3 hours to download. - # os.mkdir(dir_CI95) - # - # cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', - # '--directory-prefix={}'.format(dir_CI95), - # '--accept', '*.tif', '{}'.format(cn.CI95_mineral_soil_C_url)] - # uu.log_subprocess_output_full(cmd) - # - # uu.print_log("Making mineral soil C 95% CI vrt...") + cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', + '--directory-prefix={}'.format(dir_CI05), + '--accept', '*.tif', '{}'.format(cn.CI5_mineral_soil_C_url)] + uu.log_subprocess_output_full(cmd) - # check_call('gdalbuildvrt {0} {1}*{2}*'.format(vrt_CI95, dir_CI95, cn.pattern_uncert_mineral_soil_C_raw), shell=True) - # uu.print_log("Done making mineral soil C CI95 vrt") + uu.print_log("Making mineral soil C 5% CI vrt...") + + check_call('gdalbuildvrt {0} {1}*{2}*'.format(vrt_CI05, dir_CI05, cn.pattern_uncert_mineral_soil_C_raw), shell=True) + uu.print_log("Done making mineral soil C CI05 vrt") + + # Download raw mineral soil C density 5% CI tiles + # Like soil C density rasters, there are 12951 tifs and they take about 3 hours to download. + os.mkdir(dir_CI95) + + cmd = ['wget', '--recursive', '-nH', '--cut-dirs=6', '--no-parent', '--reject', 'index.html*', + '--directory-prefix={}'.format(dir_CI95), + '--accept', '*.tif', '{}'.format(cn.CI95_mineral_soil_C_url)] + uu.log_subprocess_output_full(cmd) + + uu.print_log("Making mineral soil C 95% CI vrt...") + + check_call('gdalbuildvrt {0} {1}*{2}*'.format(vrt_CI95, dir_CI95, cn.pattern_uncert_mineral_soil_C_raw), shell=True) + uu.print_log("Done making mineral soil C CI95 vrt") uu.print_log("Creating raster of standard deviations in soil C at native SoilGrids250 resolution. This may take a while...") @@ -227,6 +228,26 @@ def mp_create_soil_C(tile_id_list): pool.close() pool.join() + + # Checks the gross removals outputs for tiles with no data + for output_pattern in output_pattern_list: + if cn.count <= 2: # For local tests + processes = 1 + uu.print_log("Checking for empty tiles of {0} pattern with {1} processors using light function...".format( + output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty_light, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + else: + processes = 55 # 50 processors = XXX GB peak + uu.print_log( + "Checking for empty tiles of {0} pattern with {1} processors...".format(output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + uu.print_log("Uploading soil C density standard deviation tiles") uu.upload_final_set(output_dir_list[1], output_pattern_list[1]) @@ -245,5 +266,6 @@ def mp_create_soil_C(tile_id_list): # Create the output log uu.initiate_log(tile_id_list, run_date=run_date) + tile_id_list = uu.tile_id_list_check(tile_id_list) mp_create_soil_C(tile_id_list=tile_id_list) \ No newline at end of file diff --git a/constants_and_names.py b/constants_and_names.py index ca695e05..ab93a420 100644 --- a/constants_and_names.py +++ b/constants_and_names.py @@ -8,12 +8,12 @@ ######## ######## # Model version -version = '1.2.0' +version = '1.2.1' version_filename = version.replace('.', '_') # Number of years of tree cover loss. If input loss raster is changed, this must be changed, too. -loss_years = 19 +loss_years = 20 # Number of years in tree cover gain. If input gain raster is changed, this must be changed, too. gain_years = 12 @@ -108,7 +108,7 @@ ### Model extent ###### pattern_model_extent = 'model_extent' -model_extent_dir = os.path.join(s3_base_dir, 'model_extent/standard/20200920/') +model_extent_dir = os.path.join(s3_base_dir, 'model_extent/standard/20210223/') ###### ### Biomass tiles @@ -145,9 +145,9 @@ gain_spreadsheet = 'gain_rate_continent_ecozone_age_20200820.xlsx' gain_spreadsheet_dir = os.path.join(s3_base_dir, 'removal_rate_tables/') -# Annual Hansen loss tiles (2001-2019) -pattern_loss = 'GFW2019' -loss_dir = 's3://gfw2-data/forest_change/hansen_2019/' +# Annual Hansen loss tiles (2001-2020) +pattern_loss = '' +loss_dir = 's3://gfw-data-lake/umd_tree_cover_loss/v1.8/raw/geotiff/' # Hansen gain tiles (2001-2012) pattern_gain = 'Hansen_GFC2015_gain' @@ -209,17 +209,17 @@ # Drivers of tree cover loss drivers_raw_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/raw/') -pattern_drivers_raw = 'Goode_FinalClassification_19_Excludeduncertain_Expanded_05pcnt_reproj__20200722' +pattern_drivers_raw = 'Final_Classification_2020__reproj_nearest_0-005_0-005_deg__20210323.tif' pattern_drivers = 'tree_cover_loss_driver_processed' -drivers_processed_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/processed/drivers_2019/20200724/') +drivers_processed_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/tree_cover_loss_drivers/processed/drivers_2020/20210323/') # Burn year -burn_area_raw_ftp = 'ftp://ba1.geog.umd.edu/Collection6/HDF/' # Copies a specific year of burn data -burn_year_hdf_raw_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/20200807/raw_hdf/') -burn_year_stacked_hv_tif_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/20200807/stacked_hv_tifs/') -burn_year_warped_to_Hansen_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/20200807/burn_year_warped_to_Hansen/') -pattern_burn_year = "burnyear" -burn_year_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/20200807/burn_year_with_Hansen_loss/') +burn_area_raw_ftp = 'sftp://fuoco.geog.umd.edu/data/MODIS/C6/MCD64A1/HDF/' # per https://modis-fire.umd.edu/files/MODIS_C6_BA_User_Guide_1.3.pdf +burn_year_hdf_raw_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/raw_hdf/') +burn_year_stacked_hv_tif_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/stacked_hv_tifs/') +burn_year_warped_to_Hansen_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/burn_year_10x10_clip/') +pattern_burn_year = "burnyear_with_Hansen_loss" +burn_year_dir = os.path.join(s3_base_dir, 'other_emissions_inputs/burn_year/burn_year_with_Hansen_loss/20210218/') ###### ### Plantation processing @@ -261,7 +261,7 @@ # Age categories over entire model extent, as a precursor to assigning IPCC default removal rates pattern_age_cat_IPCC = 'forest_age_category_IPCC__1_young_2_mid_3_old' -age_cat_IPCC_dir = os.path.join(s3_base_dir, 'forest_age_category_IPCC/standard/20200920/') +age_cat_IPCC_dir = os.path.join(s3_base_dir, 'forest_age_category_IPCC/standard/20210223/') ### US-specific removal precursors @@ -324,29 +324,29 @@ # Annual aboveground biomass gain rate using IPCC default removal rates pattern_annual_gain_AGB_IPCC_defaults = 'annual_removal_factor_AGB_Mg_ha_IPCC_defaults_all_ages' -annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20200920/') +annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20210223/') # Annual aboveground biomass gain rate using IPCC default removal rates pattern_annual_gain_BGB_IPCC_defaults = 'annual_removal_factor_BGB_Mg_ha_IPCC_defaults_all_ages' -annual_gain_BGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGB_IPCC_defaults_all_ages/standard/20200920/') +annual_gain_BGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGB_IPCC_defaults_all_ages/standard/20210223/') # Annual aboveground gain rate for all forest types pattern_annual_gain_AGC_all_types = 'annual_removal_factor_AGC_Mg_ha_all_forest_types' -annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_all_forest_types/standard/20200920/') +annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_all_forest_types/standard/20210223/') # Annual belowground gain rate for all forest types pattern_annual_gain_BGC_all_types = 'annual_removal_factor_BGC_Mg_ha_all_forest_types' -annual_gain_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGC_all_forest_types/standard/20200920/') +annual_gain_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_BGC_all_forest_types/standard/20210223/') -# Annual belowground gain rate for all forest types +# Annual aboveground+belowground gain rate for all forest types pattern_annual_gain_AGC_BGC_all_types = 'annual_removal_factor_AGC_BGC_Mg_ha_all_forest_types' -annual_gain_AGC_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_BGC_all_forest_types/standard/20200920/') +annual_gain_AGC_BGC_all_types_dir = os.path.join(s3_base_dir, 'annual_removal_factor_AGC_BGC_all_forest_types/standard/20210223/') ### Removal forest types (sources) # Forest type used in removals model pattern_removal_forest_type = 'removal_forest_type' -removal_forest_type_dir = os.path.join(s3_base_dir, 'removal_forest_type/standard/20200920/') +removal_forest_type_dir = os.path.join(s3_base_dir, 'removal_forest_type/standard/20210223/') # Removal model forest type codes mangrove_rank = 6 @@ -361,26 +361,26 @@ # Number of gain years for all forest types pattern_gain_year_count = 'gain_year_count_all_forest_types' -gain_year_count_dir = os.path.join(s3_base_dir, 'gain_year_count_all_forest_types/standard/20200920/') +gain_year_count_dir = os.path.join(s3_base_dir, 'gain_year_count_all_forest_types/standard/20210224/') ### Cumulative carbon dioxide removals # Gross aboveground removals for all forest types pattern_cumul_gain_AGCO2_all_types = 'gross_removals_AGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_AGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_all_forest_types/standard/per_hectare/20200920/') +cumul_gain_AGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_all_forest_types/standard/per_hectare/20210224/') # Gross belowground removals for all forest types pattern_cumul_gain_BGCO2_all_types = 'gross_removals_BGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_BGCO2_all_forest_types/standard/per_hectare/20200920/') +cumul_gain_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_BGCO2_all_forest_types/standard/per_hectare/20210224/') # Gross aboveground and belowground removals for all forest types in all pixels pattern_cumul_gain_AGCO2_BGCO2_all_types = 'gross_removals_AGCO2_BGCO2_Mg_ha_all_forest_types_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_hectare/20200824/') +cumul_gain_AGCO2_BGCO2_all_types_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_hectare/20210224/') # Gross aboveground and belowground removals for all forest types in pixels within forest extent pattern_cumul_gain_AGCO2_BGCO2_all_types_forest_extent = 'gross_removals_AGCO2_BGCO2_Mg_ha_all_forest_types_forest_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_hectare/20200920/') +cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_hectare/20210225/') ###### @@ -416,7 +416,7 @@ ## Carbon emitted_pools in loss year # Date to include in the output directory for all emissions year carbon emitted_pools -emis_pool_run_date = '20200920' +emis_pool_run_date = '20210224' # Aboveground carbon in the year of emission for all forest types in loss pixels pattern_AGC_emis_year = "Mg_AGC_ha_emis_year" @@ -487,7 +487,7 @@ ### Emissions from biomass and soil (all carbon emitted_pools) # Date to include in the output directory -emis_run_date_biomass_soil = '20200824' +emis_run_date_biomass_soil = '20210323' # pattern_gross_emis_commod_biomass_soil = 'gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) pattern_gross_emis_commod_biomass_soil = 'gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) @@ -526,7 +526,7 @@ ### Emissions from soil only # Date to include in the output directory -emis_run_date_soil_only = '20200828' +emis_run_date_soil_only = '20210324' pattern_gross_emis_commod_soil_only = 'gross_emis_commodity_Mg_CO2e_ha_soil_only_2001_{}'.format(loss_years) gross_emis_commod_soil_only_dir = '{0}gross_emissions/commodities/soil_only/standard/{1}/'.format(s3_base_dir, emis_run_date_soil_only) @@ -564,11 +564,11 @@ # Net emissions for all forest types and all carbon emitted_pools in all pixels pattern_net_flux = 'net_flux_Mg_CO2e_ha_biomass_soil_2001_{}'.format(loss_years) -net_flux_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_hectare/20200824/') +net_flux_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_hectare/20210323/') # Net emissions for all forest types and all carbon emitted_pools in forest extent pattern_net_flux_forest_extent = 'net_flux_Mg_CO2e_ha_biomass_soil_forest_extent_2001_{}'.format(loss_years) -net_flux_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_hectare/20200920/') +net_flux_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_hectare/20210323/') ### Per pixel model outputs @@ -576,27 +576,27 @@ # Gross removals per pixel in all pixels pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent = 'gross_removals_AGCO2_BGCO2_Mg_pixel_all_forest_types_full_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_pixel/20200824/') - -# Gross emissions per pixel in all pixels -pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent = 'gross_emis_all_gases_all_drivers_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) -gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/full_extent/per_pixel/20200824/') - -# Net flux per pixel in all pixels -pattern_net_flux_per_pixel_full_extent = 'net_flux_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) -net_flux_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_pixel/20200824/') +cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/full_extent/per_pixel/20210225/') # Gross removals per pixel in forest extent pattern_cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent = 'gross_removals_AGCO2_BGCO2_Mg_pixel_all_forest_types_forest_extent_2001_{}'.format(loss_years) -cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_pixel/20200824/') +cumul_gain_AGCO2_BGCO2_all_types_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_removals_AGCO2_BGCO2_all_forest_types/standard/forest_extent/per_pixel/20210225/') + +# Gross emissions per pixel in all pixels +pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent = 'gross_emis_all_gases_all_drivers_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) +gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/full_extent/per_pixel/20210323/') # Gross emissions per pixel in forest extent pattern_gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent = 'gross_emis_all_gases_all_drivers_Mg_CO2e_pixel_biomass_soil_forest_extent_2001_{}'.format(loss_years) -gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/forest_extent/per_pixel/20200824/') +gross_emis_all_gases_all_drivers_biomass_soil_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'gross_emissions/all_drivers/all_gases/biomass_soil/standard/forest_extent/per_pixel/20210323/') + +# Net flux per pixel in all pixels +pattern_net_flux_per_pixel_full_extent = 'net_flux_Mg_CO2e_pixel_biomass_soil_full_extent_2001_{}'.format(loss_years) +net_flux_per_pixel_full_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/full_extent/per_pixel/20210323/') # Net flux per pixel in forest extent pattern_net_flux_per_pixel_forest_extent = 'net_flux_Mg_CO2e_pixel_biomass_soil_forest_extent_2001_{}'.format(loss_years) -net_flux_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_pixel/20200824/') +net_flux_per_pixel_forest_extent_dir = os.path.join(s3_base_dir, 'net_flux_all_forest_types_all_drivers/biomass_soil/standard/forest_extent/per_pixel/20210323/') ### 4x4 km aggregation tiles for mapping @@ -606,7 +606,7 @@ pattern_aggreg_sensit_perc_diff = 'net_flux_0_4deg_modelv{}_perc_diff_std'.format(version_filename) pattern_aggreg_sensit_sign_change = 'net_flux_0_4deg_modelv{}_sign_change_std'.format(version_filename) -output_aggreg_dir = os.path.join(s3_base_dir, '0_4deg_output_aggregation/biomass_soil/standard/20200920/') +output_aggreg_dir = os.path.join(s3_base_dir, '0_4deg_output_aggregation/biomass_soil/standard/20210323/') @@ -644,11 +644,11 @@ # Standard deviation for annual aboveground biomass removal factors using IPCC default removal rates pattern_stdev_annual_gain_AGB_IPCC_defaults = 'annual_removal_factor_stdev_AGB_Mg_ha_IPCC_defaults_all_ages' -stdev_annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20200824/') +stdev_annual_gain_AGB_IPCC_defaults_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGB_IPCC_defaults_all_ages/standard/20210223/') # Standard deviation for aboveground and belowground removal factors for all forest types pattern_stdev_annual_gain_AGC_all_types = 'annual_removal_factor_stdev_AGC_Mg_ha_all_forest_types' -stdev_annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGC_all_forest_types/standard/20200920/') +stdev_annual_gain_AGC_all_types_dir = os.path.join(s3_base_dir, 'stdev_annual_removal_factor_AGC_all_forest_types/standard/20210223/') # Raw mineral soil C file site diff --git a/data_import.bat b/data_import.bat new file mode 100644 index 00000000..bbf7e558 --- /dev/null +++ b/data_import.bat @@ -0,0 +1,32 @@ +:: Script to import all csvs created by Geotrellis zonal stats into tables in a local Postgres database +:: Lines must be uncommented according to the model being imported, e.g., standard, maxgain, soil_only, etc. +:: David Gibbs, david.gibbs@wri.org + +FOR %%I IN (output\carbonflux_20210324_0439\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy standard_iso_summary_20210323 FROM %%I CSV HEADER DELIMITER e'\t' +FOR %%I IN (output\carbonflux_20210324_0439\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy standard_iso_change_20210323 FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\soil_only\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy soil_only_iso_summary_20200904 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\soil_only\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy soil_only_iso_change_20200904 FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\maxgain\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy maxgain_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\maxgain\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy maxgain_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\no_shifting_ag\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy no_shifting_ag_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\no_shifting_ag\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy no_shifting_ag_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\convert_to_grassland\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy convert_to_grassland_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\convert_to_grassland\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy convert_to_grassland_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\biomass_swap\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy biomass_swap_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\biomass_swap\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy biomass_swap_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\US_removals\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy US_removals_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\US_removals\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy US_removals_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\no_primary_gain\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy no_primary_gain_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\no_primary_gain\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy no_primary_gain_iso_change FROM %%I CSV HEADER DELIMITER e'\t' + +::FOR %%I IN (output\legal_Amazon_loss\iso\summary\*.csv) DO psql -d flux_model -U postgres -c "\copy legal_Amazon_loss_iso_summary_20200921 FROM %%I CSV HEADER DELIMITER e'\t' +::FOR %%I IN (output\legal_Amazon_loss\iso\change\*.csv) DO psql -d flux_model -U postgres -c "\copy legal_Amazon_loss_iso_change_20200921 FROM %%I CSV HEADER DELIMITER e'\t' + + diff --git a/data_prep/mp_model_extent.py b/data_prep/mp_model_extent.py index 9f5edb31..4ddce956 100644 --- a/data_prep/mp_model_extent.py +++ b/data_prep/mp_model_extent.py @@ -96,7 +96,7 @@ def mp_model_extent(sensit_type, tile_id_list, run_date = None): # 36 = 550 GB peak; 40 = 590 GB peak; 42 = XXX GB peak else: processes = 3 - uu.print_log('Removal model forest extent processors=', processes) + uu.print_log('Model extent processors=', processes) pool = multiprocessing.Pool(processes) pool.map(partial(model_extent.model_extent, pattern=pattern, sensit_type=sensit_type), tile_id_list) pool.close() @@ -116,7 +116,7 @@ def mp_model_extent(sensit_type, tile_id_list, run_date = None): pool.close() pool.join() else: - processes = 50 # 50 processors = XXX GB peak + processes = 58 # 50 processors = 620 GB peak; 55 = 640 GB; 58 = 650 GB (continues to increase very slowly several hundred tiles in) uu.print_log("Checking for empty tiles of {0} pattern with {1} processors...".format(output_pattern, processes)) pool = multiprocessing.Pool(processes) pool.map(partial(uu.check_and_delete_if_empty, output_pattern=output_pattern), tile_id_list) diff --git a/data_prep/mp_plantation_preparation.py b/data_prep/mp_plantation_preparation.py index 335c0409..a86ef009 100644 --- a/data_prep/mp_plantation_preparation.py +++ b/data_prep/mp_plantation_preparation.py @@ -349,12 +349,12 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): pool.join() # This rasterizes the plantation removal factor standard deviations - # processes=50 peaks at about 450 GB + # processes=50 peaks at about 450 GB num_of_processes = 50 - pool = Pool(num_of_processes) - pool.map(plantation_preparation.create_1x1_plantation_stdev_from_1x1_planted, planted_list_1x1) - pool.close() - pool.join() + pool = Pool(num_of_processes) + pool.map(plantation_preparation.create_1x1_plantation_stdev_from_1x1_planted, planted_list_1x1) + pool.close() + pool.join() ### All script entry points meet here: creation of 10x10 degree planted forest gain rate and rtpe tiles @@ -414,7 +414,7 @@ def mp_plantation_preparation(gadm_index_shp, planted_index_shp): plant_stdev_1x1_vrt = 'plant_stdev_1x1.vrt' # Creates a mosaic of all the 1x1 plantation gain rate standard deviation tiles - print "Creating vrt of 1x1 plantation gain rate standard deviation tiles" + uu.print_log("Creating vrt of 1x1 plantation gain rate standard deviation tiles") os.system('gdalbuildvrt {} plant_stdev_*.tif'.format(plant_stdev_1x1_vrt)) # Creates 10x10 degree tiles of plantation gain rate standard deviation by iterating over the set of pixel area tiles supplied diff --git a/data_prep/mp_prep_other_inputs.py b/data_prep/mp_prep_other_inputs.py index 84b4db82..da0dbeef 100644 --- a/data_prep/mp_prep_other_inputs.py +++ b/data_prep/mp_prep_other_inputs.py @@ -1,7 +1,9 @@ ''' This script processes the inputs for the emissions script that haven't been processed by another script. At this point, that is: climate zone, Indonesia/Malaysia plantations before 2000, tree cover loss drivers (TSC drivers), -and combining IFL2000 (extratropics) and primary forests (tropics) into a single layer. +combining IFL2000 (extratropics) and primary forests (tropics) into a single layer, +Hansenizing some removal factor standard deviation inputs, Hansenizing the European removal factors, +and Hansenizing three US-specific removal factor inputs. ''' from subprocess import Popen, PIPE, STDOUT, check_call @@ -25,34 +27,54 @@ def mp_prep_other_inputs(tile_id_list, run_date): # If a full model run is specified, the correct set of tiles for the particular script is listed if tile_id_list == 'all': # List of tiles to run in the model + ### BUG: THIS SHOULD ALSO INCLUDE cn.annual_gain_AGC_BGC_planted_forest_unmasked_dir IN ITS LIST tile_id_list = uu.create_combined_tile_list(cn.WHRC_biomass_2000_unmasked_dir, cn.mangrove_biomass_2000_dir, - set3=cn.annual_gain_AGC_BGC_planted_forest_unmasked_dir + set3=cn.gain_dir ) uu.print_log(tile_id_list) uu.print_log("There are {} tiles to process".format(str(len(tile_id_list))) + "\n") + ''' + Before processing the driver, it needs to be reprojected from Goode Homolosine to WGS84. + gdal_warp is producing a weird output, so I did it in ArcMap for the 2020 update, + with the output cell size being 0.01 x 0.01 degree and the method being nearest. + + arcpy.ProjectRaster_management(in_raster="C:/GIS/Drivers of loss/2020_drivers__tif__from_Forrest_Follett_20210323/FinalClassification_2020_v2__from_Jimmy_MacCarthy_20210323.tif", + out_raster="C:/GIS/Drivers of loss/2020_drivers__tif__from_Forrest_Follett_20210323/Final_Classification_2020__reproj_nearest_0-005_0-005_deg__20210323.tif", + out_coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]", + resampling_type="NEAREST", cell_size="0.005 0.005", geographic_transform="", + Registration_Point="", + in_coor_system="PROJCS['WGS_1984_Goode_Homolosine',GEOGCS['GCS_unknown',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Goode_Homolosine'],PARAMETER['False_Easting',0.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',0.0],PARAMETER['Option',1.0],UNIT['Meter',1.0]]", + vertical="NO_VERTICAL") + ''' # List of output directories and output file name patterns - output_dir_list = [cn.climate_zone_processed_dir, cn.plant_pre_2000_processed_dir, - cn.drivers_processed_dir, cn.ifl_primary_processed_dir, - cn.annual_gain_AGC_natrl_forest_young_dir, - cn.stdev_annual_gain_AGC_natrl_forest_young_dir, - cn.annual_gain_AGC_BGC_natrl_forest_Europe_dir, - cn.stdev_annual_gain_AGC_BGC_natrl_forest_Europe_dir, - cn.FIA_forest_group_processed_dir, - cn.age_cat_natrl_forest_US_dir, - cn.FIA_regions_processed_dir] - output_pattern_list = [cn.pattern_climate_zone, cn.pattern_plant_pre_2000, - cn.pattern_drivers, cn.pattern_ifl_primary, - cn.pattern_annual_gain_AGC_natrl_forest_young, - cn.pattern_stdev_annual_gain_AGC_natrl_forest_young, - cn.pattern_annual_gain_AGC_BGC_natrl_forest_Europe, - cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe, - cn.pattern_FIA_forest_group_processed, - cn.pattern_age_cat_natrl_forest_US, - cn.pattern_FIA_regions_processed] + output_dir_list = [ + # cn.climate_zone_processed_dir, cn.plant_pre_2000_processed_dir, + cn.drivers_processed_dir + # cn.ifl_primary_processed_dir, + # cn.annual_gain_AGC_natrl_forest_young_dir, + # cn.stdev_annual_gain_AGC_natrl_forest_young_dir, + # cn.annual_gain_AGC_BGC_natrl_forest_Europe_dir, + # cn.stdev_annual_gain_AGC_BGC_natrl_forest_Europe_dir, + # cn.FIA_forest_group_processed_dir, + # cn.age_cat_natrl_forest_US_dir, + # cn.FIA_regions_processed_dir + ] + output_pattern_list = [ + # cn.pattern_climate_zone, cn.pattern_plant_pre_2000, + cn.pattern_drivers + # cn.pattern_ifl_primary, + # cn.pattern_annual_gain_AGC_natrl_forest_young, + # cn.pattern_stdev_annual_gain_AGC_natrl_forest_young, + # cn.pattern_annual_gain_AGC_BGC_natrl_forest_Europe, + # cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe, + # cn.pattern_FIA_forest_group_processed, + # cn.pattern_age_cat_natrl_forest_US, + # cn.pattern_FIA_regions_processed + ] # If the model run isn't the standard one, the output directory and file names are changed @@ -69,42 +91,40 @@ def mp_prep_other_inputs(tile_id_list, run_date): output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) - # Files to process: climate zone, IDN/MYS plantations before 2000, tree cover loss drivers, combine IFL and primary forest - uu.s3_file_download(os.path.join(cn.climate_zone_raw_dir, cn.climate_zone_raw), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.plant_pre_2000_raw_dir, '{}.zip'.format(cn.pattern_plant_pre_2000_raw)), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.drivers_raw_dir, '{}.zip'.format(cn.pattern_drivers_raw)), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.annual_gain_AGC_BGC_natrl_forest_Europe_raw_dir, cn.name_annual_gain_AGC_BGC_natrl_forest_Europe_raw), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw_dir, cn.name_stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.FIA_regions_raw_dir, cn.name_FIA_regions_raw), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.age_cat_natrl_forest_US_raw_dir, cn.name_age_cat_natrl_forest_US_raw), cn.docker_base_dir, sensit_type) - uu.s3_file_download(os.path.join(cn.FIA_forest_group_raw_dir, cn.name_FIA_forest_group_raw), cn.docker_base_dir, sensit_type) - # For some reason, using uu.s3_file_download or otherwise using AWSCLI as a subprocess doesn't work for this raster. - # Thus, using wget instead. - cmd = ['wget', '{}'.format(cn.annual_gain_AGC_natrl_forest_young_raw_URL), '-P', '{}'.format(cn.docker_base_dir)] - process = Popen(cmd, stdout=PIPE, stderr=STDOUT) - with process.stdout: - uu.log_subprocess_output(process.stdout) - uu.s3_file_download(cn.stdev_annual_gain_AGC_natrl_forest_young_raw_URL, cn.docker_base_dir, sensit_type) - cmd = ['aws', 's3', 'cp', cn.primary_raw_dir, cn.docker_base_dir, '--recursive'] - uu.log_subprocess_output_full(cmd) - - uu.s3_flexible_download(cn.ifl_dir, cn.pattern_ifl, cn.docker_base_dir, sensit_type, tile_id_list) - - uu.print_log("Unzipping pre-2000 plantations...") - cmd = ['unzip', '-j', '{}.zip'.format(cn.pattern_plant_pre_2000_raw)] - uu.log_subprocess_output_full(cmd) - - uu.print_log("Unzipping drivers...") - cmd = ['unzip', '-j', '{}.zip'.format(cn.pattern_drivers_raw)] - uu.log_subprocess_output_full(cmd) - - - # Creates tree cover loss driver tiles - source_raster = '{}.tif'.format(cn.pattern_drivers_raw) + # # Files to process: climate zone, IDN/MYS plantations before 2000, tree cover loss drivers, combine IFL and primary forest + # uu.s3_file_download(os.path.join(cn.climate_zone_raw_dir, cn.climate_zone_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.plant_pre_2000_raw_dir, '{}.zip'.format(cn.pattern_plant_pre_2000_raw)), cn.docker_base_dir, sensit_type) + uu.s3_file_download(os.path.join(cn.drivers_raw_dir, cn.pattern_drivers_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.annual_gain_AGC_BGC_natrl_forest_Europe_raw_dir, cn.name_annual_gain_AGC_BGC_natrl_forest_Europe_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw_dir, cn.name_stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.FIA_regions_raw_dir, cn.name_FIA_regions_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.age_cat_natrl_forest_US_raw_dir, cn.name_age_cat_natrl_forest_US_raw), cn.docker_base_dir, sensit_type) + # uu.s3_file_download(os.path.join(cn.FIA_forest_group_raw_dir, cn.name_FIA_forest_group_raw), cn.docker_base_dir, sensit_type) + # # For some reason, using uu.s3_file_download or otherwise using AWSCLI as a subprocess doesn't work for this raster. + # # Thus, using wget instead. + # cmd = ['wget', '{}'.format(cn.annual_gain_AGC_natrl_forest_young_raw_URL), '-P', '{}'.format(cn.docker_base_dir)] + # process = Popen(cmd, stdout=PIPE, stderr=STDOUT) + # with process.stdout: + # uu.log_subprocess_output(process.stdout) + # uu.s3_file_download(cn.stdev_annual_gain_AGC_natrl_forest_young_raw_URL, cn.docker_base_dir, sensit_type) + # cmd = ['aws', 's3', 'cp', cn.primary_raw_dir, cn.docker_base_dir, '--recursive'] + # uu.log_subprocess_output_full(cmd) + # + # uu.s3_flexible_download(cn.ifl_dir, cn.pattern_ifl, cn.docker_base_dir, sensit_type, tile_id_list) + # + # uu.print_log("Unzipping pre-2000 plantations...") + # cmd = ['unzip', '-j', '{}.zip'.format(cn.pattern_plant_pre_2000_raw)] + # uu.log_subprocess_output_full(cmd) + + # Creates tree cover loss driver tiles. + # The raw driver tile should have NoData for unassigned drivers as opposed to 0 for unassigned drivers. + # For the 2020 driver update, I reclassified the 0 values as NoData in ArcMap. I also unprojected the global drivers + # map to WGS84 because running the homolosine projection that Jimmy provided was giving incorrect processed results. + source_raster = cn.pattern_drivers_raw out_pattern = cn.pattern_drivers dt = 'Byte' if cn.count == 96: - processes = 80 # 45 processors = 70 GB peak; 70 = 90 GB peak; 80 = XXX GB peak + processes = 87 # 45 processors = 70 GB peak; 70 = 90 GB peak; 80 = 100 GB peak; 87 = 125 GB peak else: processes = int(cn.count/2) uu.print_log("Creating tree cover loss driver tiles with {} processors...".format(processes)) @@ -114,165 +134,167 @@ def mp_prep_other_inputs(tile_id_list, run_date): pool.join() - # Creates young natural forest removal rate tiles - source_raster = cn.name_annual_gain_AGC_natrl_forest_young_raw - out_pattern = cn.pattern_annual_gain_AGC_natrl_forest_young - dt = 'float32' - if cn.count == 96: - processes = 80 # 32 processors = 210 GB peak; 60 = 370 GB peak; 80 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating young natural forest gain rate tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - # Creates young natural forest removal rate standard deviation tiles - source_raster = cn.name_stdev_annual_gain_AGC_natrl_forest_young_raw - out_pattern = cn.pattern_stdev_annual_gain_AGC_natrl_forest_young - dt = 'float32' - if cn.count == 96: - processes = 80 # 32 processors = 210 GB peak; 60 = 370 GB peak; 80 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating standard deviation for young natural forest removal rate tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - - # Creates pre-2000 oil palm plantation tiles - if cn.count == 96: - processes = 80 # 45 processors = 100 GB peak; 80 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating pre-2000 oil palm plantation tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(prep_other_inputs.rasterize_pre_2000_plantations, tile_id_list) - pool.close() - pool.join() - - - # Creates climate zone tiles - if cn.count == 96: - processes = 80 # 45 processors = 230 GB peak (on second step); 80 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating climate zone tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(prep_other_inputs.create_climate_zone_tiles, tile_id_list) - pool.close() - pool.join() - - # Creates European natural forest removal rate tiles - source_raster = cn.name_annual_gain_AGC_BGC_natrl_forest_Europe_raw - out_pattern = cn.pattern_annual_gain_AGC_BGC_natrl_forest_Europe - dt = 'float32' - if cn.count == 96: - processes = 60 # 32 processors = 60 GB peak; 60 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating European natural forest gain rate tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - # Creates European natural forest standard deviation of removal rate tiles - source_raster = cn.name_stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw - out_pattern = cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe - dt = 'float32' - if cn.count == 96: - processes = 32 # 32 processors = 60 GB peak; 60 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating standard deviation for European natural forest gain rate tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - - # Creates a vrt of the primary forests with nodata=0 from the continental primary forest rasters - uu.print_log("Creating vrt of humid tropial primary forest...") - primary_vrt = 'primary_2001.vrt' - os.system('gdalbuildvrt -srcnodata 0 {} *2001_primary.tif'.format(primary_vrt)) - uu.print_log(" Humid tropical primary forest vrt created") - - # Creates primary forest tiles - source_raster = primary_vrt - out_pattern = 'primary_2001' - dt = 'Byte' - if cn.count == 96: - processes = 45 # 45 processors = 650 GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating primary forest tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - - # Creates a combined IFL/primary forest raster - # Uses very little memory since it's just file renaming - if cn.count == 96: - processes = 60 # 60 processors = 10 GB peak - else: - processes = int(cn.count/2) - uu.print_log("Assigning each tile to ifl2000 or primary forest with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(prep_other_inputs.create_combined_ifl_primary, tile_id_list) - pool.close() - pool.join() - - - # Creates forest age category tiles for US forests - source_raster = cn.name_age_cat_natrl_forest_US_raw - out_pattern = cn.pattern_age_cat_natrl_forest_US - dt = 'Byte' - if cn.count == 96: - processes = 70 # 32 processors = 35 GB peak; 70 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating US forest age category tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - # Creates forest groups for US forests - source_raster = cn.name_FIA_forest_group_raw - out_pattern = cn.pattern_FIA_forest_group_processed - dt = 'Byte' - if cn.count == 96: - processes = 80 # 32 processors = 25 GB peak; 80 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating US forest group tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - # Creates FIA regions for US forests - source_raster = cn.name_FIA_regions_raw - out_pattern = cn.pattern_FIA_regions_processed - dt = 'Byte' - if cn.count == 96: - processes = 70 # 32 processors = 35 GB peak; 70 = XXX GB peak - else: - processes = int(cn.count/2) - uu.print_log("Creating US forest region tiles with {} processors...".format(processes)) - pool = multiprocessing.Pool(processes) - pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) - pool.close() - pool.join() - - - for output_pattern in [cn.pattern_annual_gain_AGC_natrl_forest_young, cn.pattern_stdev_annual_gain_AGC_natrl_forest_young]: + # # Creates young natural forest removal rate tiles + # source_raster = cn.name_annual_gain_AGC_natrl_forest_young_raw + # out_pattern = cn.pattern_annual_gain_AGC_natrl_forest_young + # dt = 'float32' + # if cn.count == 96: + # processes = 80 # 32 processors = 210 GB peak; 60 = 370 GB peak; 80 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating young natural forest gain rate tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # # Creates young natural forest removal rate standard deviation tiles + # source_raster = cn.name_stdev_annual_gain_AGC_natrl_forest_young_raw + # out_pattern = cn.pattern_stdev_annual_gain_AGC_natrl_forest_young + # dt = 'float32' + # if cn.count == 96: + # processes = 80 # 32 processors = 210 GB peak; 60 = 370 GB peak; 80 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating standard deviation for young natural forest removal rate tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # + # # Creates pre-2000 oil palm plantation tiles + # if cn.count == 96: + # processes = 80 # 45 processors = 100 GB peak; 80 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating pre-2000 oil palm plantation tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(prep_other_inputs.rasterize_pre_2000_plantations, tile_id_list) + # pool.close() + # pool.join() + # + # + # # Creates climate zone tiles + # if cn.count == 96: + # processes = 80 # 45 processors = 230 GB peak (on second step); 80 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating climate zone tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(prep_other_inputs.create_climate_zone_tiles, tile_id_list) + # pool.close() + # pool.join() + # + # # Creates European natural forest removal rate tiles + # source_raster = cn.name_annual_gain_AGC_BGC_natrl_forest_Europe_raw + # out_pattern = cn.pattern_annual_gain_AGC_BGC_natrl_forest_Europe + # dt = 'float32' + # if cn.count == 96: + # processes = 60 # 32 processors = 60 GB peak; 60 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating European natural forest gain rate tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # # Creates European natural forest standard deviation of removal rate tiles + # source_raster = cn.name_stdev_annual_gain_AGC_BGC_natrl_forest_Europe_raw + # out_pattern = cn.pattern_stdev_annual_gain_AGC_BGC_natrl_forest_Europe + # dt = 'float32' + # if cn.count == 96: + # processes = 32 # 32 processors = 60 GB peak; 60 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating standard deviation for European natural forest gain rate tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # + # # Creates a vrt of the primary forests with nodata=0 from the continental primary forest rasters + # uu.print_log("Creating vrt of humid tropial primary forest...") + # primary_vrt = 'primary_2001.vrt' + # os.system('gdalbuildvrt -srcnodata 0 {} *2001_primary.tif'.format(primary_vrt)) + # uu.print_log(" Humid tropical primary forest vrt created") + # + # # Creates primary forest tiles + # source_raster = primary_vrt + # out_pattern = 'primary_2001' + # dt = 'Byte' + # if cn.count == 96: + # processes = 45 # 45 processors = 650 GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating primary forest tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # + # # Creates a combined IFL/primary forest raster + # # Uses very little memory since it's just file renaming + # if cn.count == 96: + # processes = 60 # 60 processors = 10 GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Assigning each tile to ifl2000 or primary forest with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(prep_other_inputs.create_combined_ifl_primary, tile_id_list) + # pool.close() + # pool.join() + # + # + # # Creates forest age category tiles for US forests + # source_raster = cn.name_age_cat_natrl_forest_US_raw + # out_pattern = cn.pattern_age_cat_natrl_forest_US + # dt = 'Byte' + # if cn.count == 96: + # processes = 70 # 32 processors = 35 GB peak; 70 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating US forest age category tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # # Creates forest groups for US forests + # source_raster = cn.name_FIA_forest_group_raw + # out_pattern = cn.pattern_FIA_forest_group_processed + # dt = 'Byte' + # if cn.count == 96: + # processes = 80 # 32 processors = 25 GB peak; 80 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating US forest group tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # # Creates FIA regions for US forests + # source_raster = cn.name_FIA_regions_raw + # out_pattern = cn.pattern_FIA_regions_processed + # dt = 'Byte' + # if cn.count == 96: + # processes = 70 # 32 processors = 35 GB peak; 70 = XXX GB peak + # else: + # processes = int(cn.count/2) + # uu.print_log("Creating US forest region tiles with {} processors...".format(processes)) + # pool = multiprocessing.Pool(processes) + # pool.map(partial(uu.mp_warp_to_Hansen, source_raster=source_raster, out_pattern=out_pattern, dt=dt), tile_id_list) + # pool.close() + # pool.join() + # + # + for output_pattern in [cn.pattern_drivers + # ,cn.pattern_annual_gain_AGC_natrl_forest_young, cn.pattern_stdev_annual_gain_AGC_natrl_forest_young + ]: # For some reason I can't figure out, the young forest rasters (rate and stdev) have NaN values in some places where 0 (NoData) # should be. These NaN values show up as values when the check_and_delete_if_empty function runs, making the tiles not diff --git a/data_prep/plantation_preparation.py b/data_prep/plantation_preparation.py index b30d88bf..351d942d 100644 --- a/data_prep/plantation_preparation.py +++ b/data_prep/plantation_preparation.py @@ -188,9 +188,9 @@ def create_1x1_plantation_stdev_from_1x1_planted(tile_1x1): ymax_1x1 = int(coords[2]) ymin_1x1 = ymax_1x1 - 1 - print "For", tile_1x1, "-- xmin_1x1:", xmin_1x1, "; xmax_1x1:", xmax_1x1, "; ymin_1x1", ymin_1x1, "; ymax_1x1:", ymax_1x1 + print("For", tile_1x1, "-- xmin_1x1:", xmin_1x1, "; xmax_1x1:", xmax_1x1, "; ymin_1x1", ymin_1x1, "; ymax_1x1:", ymax_1x1) - print "There are plantations in {}. Converting to stdev raster...".format(tile_1x1) + print("There are plantations in {}. Converting to stdev raster...".format(tile_1x1)) # https://gis.stackexchange.com/questions/187224/how-to-use-gdal-rasterize-with-postgis-vector cmd = ['gdal_rasterize', '-tr', '{}'.format(cn.Hansen_res), '{}'.format(cn.Hansen_res), '-co', 'COMPRESS=LZW', 'PG:dbname=ubuntu', @@ -260,33 +260,33 @@ def create_10x10_plantation_type(tile_id, plant_type_1x1_vrt): else: - print " No data found. Not copying {}.".format(tile_id) + print(" No data found. Not copying {}.".format(tile_id)) # Combines the 1x1 plantation tiles into 10x10 plantation carbon gain rate tiles, the final output of this process def create_10x10_plantation_gain_stdev(tile_id, plant_stdev_1x1_vrt): - print "Getting bounding coordinates for tile", tile_id + print("Getting bounding coordinates for tile", tile_id) xmin, ymin, xmax, ymax = uu.coords(tile_id) - print " xmin:", xmin, "; xmax:", xmax, "; ymin", ymin, "; ymax:", ymax + print(" xmin:", xmin, "; xmax:", xmax, "; ymin", ymin, "; ymax:", ymax) tile_10x10 = '{0}_{1}.tif'.format(tile_id, cn.pattern_planted_forest_stdev_unmasked) - print "Rasterizing", tile_10x10 + print("Rasterizing", tile_10x10) cmd = ['gdalwarp', '-tr', '{}'.format(str(cn.Hansen_res)), '{}'.format(str(cn.Hansen_res)), '-co', 'COMPRESS=LZW', '-tap', '-te', str(xmin), str(ymin), str(xmax), str(ymax), '-dstnodata', '0', '-t_srs', 'EPSG:4326', '-overwrite', '-ot', 'Float32', plant_stdev_1x1_vrt, tile_10x10] subprocess.check_call(cmd) - print "Checking if {} contains any data...".format(tile_id) + print("Checking if {} contains any data...".format(tile_id)) stats = uu.check_for_data_old(tile_10x10) if stats[0] > 0: - print " Data found in {}. Copying tile to s3...".format(tile_id) + print(" Data found in {}. Copying tile to s3...".format(tile_id)) uu.upload_final(cn.planted_forest_stdev_unmasked_dir, tile_id, cn.pattern_planted_forest_stdev_unmasked) - print " Tile converted and copied to s3" + print(" Tile converted and copied to s3") else: - print " No data found. Not copying {}.".format(tile_id) + print(" No data found. Not copying {}.".format(tile_id)) diff --git a/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.cpp b/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.cpp index c978a606..3d4980df 100644 --- a/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.cpp +++ b/emissions/cpp_util/calc_gross_emissions_convert_to_grassland.cpp @@ -36,6 +36,7 @@ // These provide constants for the emissions equations #include "flu_val.cpp" #include "equations.cpp" +#include "constants.h" using namespace std; @@ -54,59 +55,67 @@ string infolder = argv[3]; // The folder which has all the input files cout << "Gross emissions C++ infolder:" << infolder << endl; // Model constants +int model_years; // How many loss years are in the model +model_years = constants::model_years; +string model_years_str; +model_years_str = to_string(model_years); + int CH4_equiv; // The CO2 equivalency (global warming potential) of CH4 -CH4_equiv = 28; +CH4_equiv = constants::CH4_equiv; int N2O_equiv; // The CO2 equivalency (global warming potential) of N2O -N2O_equiv = 265; +N2O_equiv = constants::N2O_equiv; float C_to_CO2; // The conversion of carbon to CO2 -C_to_CO2 = 44.0/12.0; +C_to_CO2 = constants::C_to_CO2; -int model_years; // How many loss years are in the model -model_years = 19; -string model_years_str; -model_years_str = to_string(model_years); +float biomass_to_c; // Fraction of carbon in biomass +biomass_to_c = constants::biomass_to_c; int tropical; // The ecozone code for the tropics -tropical = 1; +tropical = constants::tropical; int temperate; // The ecozone code for the temperate zone -temperate = 3; +temperate = constants::temperate; int boreal; // The ecozone code for the boreal zone -boreal = 2; +boreal = constants::boreal; + +int soil_emis_period; // The number of years over which soil emissions are calculated (separate from model years) +soil_emis_period = constants::soil_emis_period; + // Input files -// Carbon pools -string agc_name = infolder + tile_id + "_Mg_AGC_ha_emis_year.tif"; -string bgc_name = infolder + tile_id + "_Mg_BGC_ha_emis_year.tif"; -string dead_name = infolder + tile_id + "_Mg_deadwood_C_ha_emis_year_2000.tif"; -string litter_name = infolder + tile_id + "_Mg_litter_C_ha_emis_year_2000.tif"; -string soil_name = infolder + tile_id + "_Mg_soil_C_ha_emis_year_2000.tif"; +// Carbon pools use the standard names for this sensitivity analysis +string agc_name = infolder + tile_id + constants::AGC_emis_year + ".tif"; +string bgc_name = infolder + tile_id + constants::BGC_emis_year + ".tif"; +string dead_name = infolder + tile_id + constants::deadwood_C_emis_year + ".tif"; +string litter_name = infolder + tile_id + constants::litter_C_emis_year + ".tif"; +string soil_name = infolder + tile_id + constants::soil_C_emis_year + ".tif"; // Other inputs -string loss_name = infolder + "GFW2019_" + tile_id + ".tif"; -string burn_name = infolder + tile_id + "_burnyear.tif"; -string ecozone_name = infolder + tile_id + "_fao_ecozones_bor_tem_tro_processed.tif"; -string climate_name = infolder + tile_id + "_climate_zone_processed.tif"; -string drivermodel_name = infolder + tile_id + "_tree_cover_loss_driver_processed.tif"; -string peat_name = infolder + tile_id + "_peat_mask_processed.tif"; -string ifl_primary_name = infolder + tile_id + "_ifl_2000_primary_2001_merged.tif"; -string plant_name = infolder + tile_id + "_plantation_type_oilpalm_woodfiber_other_unmasked.tif"; +string loss_name = infolder + tile_id + ".tif"; +string burn_name = infolder + tile_id + constants::burnyear; +string ecozone_name = infolder + tile_id + constants::fao_ecozones; +string climate_name = infolder + tile_id + constants::climate_zones; +string drivermodel_name = infolder + tile_id + constants::tcl_drivers; +string peat_name = infolder + tile_id + constants::peat_mask; +string ifl_primary_name = infolder + tile_id + constants::ifl_primary; +string plant_name = infolder + tile_id + constants::plantation_type; // Output files: tonnes CO2/ha for each tree cover loss driver, their total, and the node for the decision tree // that determines emissions -string out_name1 = tile_id + "_gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name2 = tile_id + "_gross_emis_shifting_ag_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name3 = tile_id + "_gross_emis_forestry_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name4 = tile_id + "_gross_emis_wildfire_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name5 = tile_id + "_gross_emis_urbanization_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name6 = tile_id + "_gross_emis_no_driver_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name10 = tile_id + "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name11 = tile_id + "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name12 = tile_id + "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; -string out_name20 = tile_id + "_gross_emis_decision_tree_nodes_biomass_soil_2001_" + model_years_str + "_convert_to_grassland.tif"; +string out_name1 = tile_id + constants::commod_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name2 = tile_id + constants::shifting_ag_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name3 = tile_id + constants::forestry_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name4 = tile_id + constants::wildfire_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name5 = tile_id + constants::urbanization_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name6 = tile_id + constants::no_driver_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name10 = tile_id + constants::all_gases_all_drivers_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name11 = tile_id + constants::CO2_only_all_drivers_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name12 = tile_id + constants::non_CO2_all_drivers_emis + model_years_str + "_convert_to_grassland.tif"; +string out_name20 = tile_id + constants::decision_tree_all_drivers_emis + model_years_str + "_convert_to_grassland.tif"; + // Setting up the variables to hold the pixel location in x/y values int x, y; @@ -402,9 +411,9 @@ for(x=0; x 0) // Commodity, peat { @@ -566,10 +575,10 @@ for(x=0; x 0) // Shifting ag, peat { @@ -719,8 +728,8 @@ for(x=0; x 0) // Forestry, peat { @@ -777,8 +786,8 @@ for(x=0; x 0) // Wildfire, peat { @@ -836,10 +845,10 @@ for(x=0; x 0) // Urbanization, peat { @@ -981,8 +990,8 @@ for(x=0; x 0) // No driver, peat { diff --git a/emissions/cpp_util/calc_gross_emissions_generic.cpp b/emissions/cpp_util/calc_gross_emissions_generic.cpp index 95b62f3a..396accfa 100644 --- a/emissions/cpp_util/calc_gross_emissions_generic.cpp +++ b/emissions/cpp_util/calc_gross_emissions_generic.cpp @@ -33,9 +33,10 @@ #include #include -// These provide constants for the emissions equations +// These provide constants for the emissions equations and universal constants #include "flu_val.cpp" #include "equations.cpp" +#include "constants.h" using namespace std; @@ -54,86 +55,121 @@ string infolder = argv[3]; // The folder which has all the input files cout << "Gross emissions C++ infolder:" << infolder << endl; // Model constants +int model_years; // How many loss years are in the model +model_years = constants::model_years; +string model_years_str; +model_years_str = to_string(model_years); + int CH4_equiv; // The CO2 equivalency (global warming potential) of CH4 -CH4_equiv = 28; +CH4_equiv = constants::CH4_equiv; int N2O_equiv; // The CO2 equivalency (global warming potential) of N2O -N2O_equiv = 265; +N2O_equiv = constants::N2O_equiv; float C_to_CO2; // The conversion of carbon to CO2 -C_to_CO2 = 44.0/12.0; +C_to_CO2 = constants::C_to_CO2; -int model_years; // How many loss years are in the model -model_years = 19; -string model_years_str; -model_years_str = to_string(model_years); +float biomass_to_c; // Fraction of carbon in biomass +biomass_to_c = constants::biomass_to_c; int tropical; // The ecozone code for the tropics -tropical = 1; +tropical = constants::tropical; int temperate; // The ecozone code for the temperate zone -temperate = 3; +temperate = constants::temperate; int boreal; // The ecozone code for the boreal zone -boreal = 2; +boreal = constants::boreal; + +int soil_emis_period; // The number of years over which soil emissions are calculated (separate from model years) +soil_emis_period = constants::soil_emis_period; + // Input files // Carbon pools - // Carbon pools default to the standard model names -string agc_name = infolder + tile_id + "_Mg_AGC_ha_emis_year.tif"; -string bgc_name = infolder + tile_id + "_Mg_BGC_ha_emis_year.tif"; -string dead_name = infolder + tile_id + "_Mg_deadwood_C_ha_emis_year_2000.tif"; -string litter_name = infolder + tile_id + "_Mg_litter_C_ha_emis_year_2000.tif"; -string soil_name = infolder + tile_id + "_Mg_soil_C_ha_emis_year_2000.tif"; +string agc_name = infolder + tile_id + constants::AGC_emis_year + ".tif"; +string bgc_name = infolder + tile_id + constants::BGC_emis_year + ".tif"; +string dead_name = infolder + tile_id + constants::deadwood_C_emis_year + ".tif"; +string litter_name = infolder + tile_id + constants::litter_C_emis_year + ".tif"; +string soil_name = infolder + tile_id + constants::soil_C_emis_year + ".tif"; if (sensit_type != "std") { - agc_name = infolder + tile_id + "_Mg_AGC_ha_emis_year_" + sensit_type +".tif"; - bgc_name = infolder + tile_id + "_Mg_BGC_ha_emis_year_" + sensit_type +".tif"; - dead_name = infolder + tile_id + "_Mg_deadwood_C_ha_emis_year_2000_" + sensit_type +".tif"; - litter_name = infolder + tile_id + "_Mg_litter_C_ha_emis_year_2000_" + sensit_type +".tif"; - soil_name = infolder + tile_id + "_Mg_soil_C_ha_emis_year_2000_" + sensit_type +".tif"; + agc_name = infolder + tile_id + constants::AGC_emis_year + "_" + sensit_type +".tif"; + bgc_name = infolder + tile_id + constants::BGC_emis_year + "_" + sensit_type +".tif"; + dead_name = infolder + tile_id + constants::deadwood_C_emis_year + "_" + sensit_type +".tif"; + litter_name = infolder + tile_id + constants::litter_C_emis_year + "_" + sensit_type +".tif"; + soil_name = infolder + tile_id + constants::soil_C_emis_year + "_" + sensit_type +".tif"; } // Other inputs -string loss_name = infolder + "GFW2019_" + tile_id + ".tif"; -if (sensit_type == "legal_Amazon_loss"); - loss_name = infolder + tile_id + "_legal_Amazon_annual_loss_2001_2019.tif"; -string burn_name = infolder + tile_id + "_burnyear.tif"; -string ecozone_name = infolder + tile_id + "_fao_ecozones_bor_tem_tro_processed.tif"; -string climate_name = infolder + tile_id + "_climate_zone_processed.tif"; -string drivermodel_name = infolder + tile_id + "_tree_cover_loss_driver_processed.tif"; -string peat_name = infolder + tile_id + "_peat_mask_processed.tif"; -string ifl_primary_name = infolder + tile_id + "_ifl_2000_primary_2001_merged.tif"; -string plant_name = infolder + tile_id + "_plantation_type_oilpalm_woodfiber_other_unmasked.tif"; +string loss_name = infolder + tile_id + ".tif"; + +if (sensit_type == "legal_Amazon_loss") { + loss_name = infolder + tile_id + constants::legal_Amazon_loss; +} + +string burn_name = infolder + tile_id + constants::burnyear; +string ecozone_name = infolder + tile_id + constants::fao_ecozones; +string climate_name = infolder + tile_id + constants::climate_zones; +string drivermodel_name = infolder + tile_id + constants::tcl_drivers; +string peat_name = infolder + tile_id + constants::peat_mask; +string ifl_primary_name = infolder + tile_id + constants::ifl_primary; +string plant_name = infolder + tile_id + constants::plantation_type; // Output files: tonnes CO2/ha for each tree cover loss driver, their total, and the node for the decision tree // that determines emissions // Output files default to the standard model names -string out_name1 = tile_id + "_gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name2 = tile_id + "_gross_emis_shifting_ag_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name3 = tile_id + "_gross_emis_forestry_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name4 = tile_id + "_gross_emis_wildfire_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name5 = tile_id + "_gross_emis_urbanization_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name6 = tile_id + "_gross_emis_no_driver_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name10 = tile_id + "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name11 = tile_id + "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name12 = tile_id + "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + ".tif"; -string out_name20 = tile_id + "_gross_emis_decision_tree_nodes_biomass_soil_2001_" + model_years_str + ".tif"; +string out_name1 = tile_id + constants::commod_emis + model_years_str + ".tif"; +string out_name2 = tile_id + constants::shifting_ag_emis + model_years_str + ".tif"; +string out_name3 = tile_id + constants::forestry_emis + model_years_str + ".tif"; +string out_name4 = tile_id + constants::wildfire_emis + model_years_str + ".tif"; +string out_name5 = tile_id + constants::urbanization_emis + model_years_str + ".tif"; +string out_name6 = tile_id + constants::no_driver_emis + model_years_str + ".tif"; +string out_name10 = tile_id + constants::all_gases_all_drivers_emis + model_years_str + ".tif"; +string out_name11 = tile_id + constants::CO2_only_all_drivers_emis + model_years_str + ".tif"; +string out_name12 = tile_id + constants::non_CO2_all_drivers_emis + model_years_str + ".tif"; +string out_name20 = tile_id + constants::decision_tree_all_drivers_emis + model_years_str + ".tif"; if (sensit_type != "std") { - out_name1 = tile_id + "_gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name2 = tile_id + "_gross_emis_shifting_ag_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name3 = tile_id + "_gross_emis_forestry_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name4 = tile_id + "_gross_emis_wildfire_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name5 = tile_id + "_gross_emis_urbanization_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name6 = tile_id + "_gross_emis_no_driver_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name10 = tile_id + "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name11 = tile_id + "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name12 = tile_id + "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; - out_name20 = tile_id + "_gross_emis_decision_tree_nodes_biomass_soil_2001_" + model_years_str + "_" + sensit_type + ".tif"; + out_name1 = tile_id + constants::commod_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name2 = tile_id + constants::shifting_ag_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name3 = tile_id + constants::forestry_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name4 = tile_id + constants::wildfire_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name5 = tile_id + constants::urbanization_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name6 = tile_id + constants::no_driver_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name10 = tile_id + constants::all_gases_all_drivers_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name11 = tile_id + constants::CO2_only_all_drivers_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name12 = tile_id + constants::non_CO2_all_drivers_emis + model_years_str + "_" + sensit_type + ".tif"; + out_name20 = tile_id + constants::decision_tree_all_drivers_emis + model_years_str + "_" + sensit_type + ".tif"; } +//// Print input and output tile names +//cout << "AGC full tile:" << agc_name << endl; +//cout << "BGC full tile:" << bgc_name << endl; +//cout << "deadwood full tile:" << dead_name << endl; +//cout << "litter full tile:" << litter_name << endl; +//cout << "soil full tile:" << soil_name << endl; +// +//cout << "burn tile:" << burn_name << endl; +//cout << "ecozone tile:" << ecozone_name << endl; +//cout << "climate zone tile:" << climate_name << endl; +//cout << "driver tile:" << drivermodel_name << endl; +//cout << "peat tile:" << peat_name << endl; +//cout << "ifl tile:" << ifl_primary_name << endl; +//cout << "plantation tile:" << plant_name << endl; +// +//cout << "commod tile:" << out_name1 << endl; +//cout << "shifting ag tile:" << out_name2 << endl; +//cout << "forestry tile:" << out_name3 << endl; +//cout << "wildfire tile:" << out_name4 << endl; +//cout << "urbanization tile:" << out_name5 << endl; +//cout << "no driver tile:" << out_name6 << endl; +//cout << "all gases tile:" << out_name10 << endl; +//cout << "CO2 only tile:" << out_name11 << endl; +//cout << "non-CO2 tile:" << out_name12 << endl; +//cout << "decision tree tile:" << out_name20 << endl; + // Setting up the variables to hold the pixel location in x/y values int x, y; @@ -429,9 +465,9 @@ for(x=0; x 0) // Commodity, peat { @@ -593,10 +629,10 @@ for(x=0; x 0) // Shifting ag, peat { @@ -719,7 +755,7 @@ for(x=0; x 780) && (agc_data[x] < 155) && (agc_data[x] > 154) && (loss_data[x] = 3) && (soil_data[x] = 135) && (drivermodel_data[x] == 2)) @@ -776,8 +812,8 @@ for(x=0; x 0) // Forestry, peat { @@ -834,8 +870,8 @@ for(x=0; x 0) // Wildfire, peat { @@ -893,10 +929,10 @@ for(x=0; x 0) // Urbanization, peat { @@ -1038,8 +1074,8 @@ for(x=0; x 0) // No driver, peat { diff --git a/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.cpp b/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.cpp index 51b8532c..b7217a55 100644 --- a/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.cpp +++ b/emissions/cpp_util/calc_gross_emissions_no_shifting_ag.cpp @@ -36,6 +36,7 @@ // These provide constants for the emissions equations #include "flu_val.cpp" #include "equations.cpp" +#include "constants.h" using namespace std; @@ -54,59 +55,67 @@ string infolder = argv[3]; // The folder which has all the input files cout << "Gross emissions C++ infolder:" << infolder << endl; // Model constants +int model_years; // How many loss years are in the model +model_years = constants::model_years; +string model_years_str; +model_years_str = to_string(model_years); + int CH4_equiv; // The CO2 equivalency (global warming potential) of CH4 -CH4_equiv = 28; +CH4_equiv = constants::CH4_equiv; int N2O_equiv; // The CO2 equivalency (global warming potential) of N2O -N2O_equiv = 265; +N2O_equiv = constants::N2O_equiv; float C_to_CO2; // The conversion of carbon to CO2 -C_to_CO2 = 44.0/12.0; +C_to_CO2 = constants::C_to_CO2; -int model_years; // How many loss years are in the model -model_years = 19; -string model_years_str; -model_years_str = to_string(model_years); +float biomass_to_c; // Fraction of carbon in biomass +biomass_to_c = constants::biomass_to_c; int tropical; // The ecozone code for the tropics -tropical = 1; +tropical = constants::tropical; int temperate; // The ecozone code for the temperate zone -temperate = 3; +temperate = constants::temperate; int boreal; // The ecozone code for the boreal zone -boreal = 2; +boreal = constants::boreal; + +int soil_emis_period; // The number of years over which soil emissions are calculated (separate from model years) +soil_emis_period = constants::soil_emis_period; + // Input files -// Carbon pools -string agc_name = infolder + tile_id + "_Mg_AGC_ha_emis_year.tif"; -string bgc_name = infolder + tile_id + "_Mg_BGC_ha_emis_year.tif"; -string dead_name = infolder + tile_id + "_Mg_deadwood_C_ha_emis_year_2000.tif"; -string litter_name = infolder + tile_id + "_Mg_litter_C_ha_emis_year_2000.tif"; -string soil_name = infolder + tile_id + "_Mg_soil_C_ha_emis_year_2000.tif"; +// Carbon pools use the standard names for this sensitivity analysis +string agc_name = infolder + tile_id + constants::AGC_emis_year + ".tif"; +string bgc_name = infolder + tile_id + constants::BGC_emis_year + ".tif"; +string dead_name = infolder + tile_id + constants::deadwood_C_emis_year + ".tif"; +string litter_name = infolder + tile_id + constants::litter_C_emis_year + ".tif"; +string soil_name = infolder + tile_id + constants::soil_C_emis_year + ".tif"; // Other inputs -string loss_name = infolder + "GFW2019_" + tile_id + ".tif"; -string burn_name = infolder + tile_id + "_burnyear.tif"; -string ecozone_name = infolder + tile_id + "_fao_ecozones_bor_tem_tro_processed.tif"; -string climate_name = infolder + tile_id + "_climate_zone_processed.tif"; -string drivermodel_name = infolder + tile_id + "_tree_cover_loss_driver_processed.tif"; -string peat_name = infolder + tile_id + "_peat_mask_processed.tif"; -string ifl_primary_name = infolder + tile_id + "_ifl_2000_primary_2001_merged.tif"; -string plant_name = infolder + tile_id + "_plantation_type_oilpalm_woodfiber_other_unmasked.tif"; +string loss_name = infolder + tile_id + ".tif"; +string burn_name = infolder + tile_id + constants::burnyear; +string ecozone_name = infolder + tile_id + constants::fao_ecozones; +string climate_name = infolder + tile_id + constants::climate_zones; +string drivermodel_name = infolder + tile_id + constants::tcl_drivers; +string peat_name = infolder + tile_id + constants::peat_mask; +string ifl_primary_name = infolder + tile_id + constants::ifl_primary; +string plant_name = infolder + tile_id + constants::plantation_type; // Output files: tonnes CO2/ha for each tree cover loss driver, their total, and the node for the decision tree // that determines emissions -string out_name1 = tile_id + "_gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name2 = tile_id + "_gross_emis_shifting_ag_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name3 = tile_id + "_gross_emis_forestry_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name4 = tile_id + "_gross_emis_wildfire_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name5 = tile_id + "_gross_emis_urbanization_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name6 = tile_id + "_gross_emis_no_driver_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name10 = tile_id + "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name11 = tile_id + "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name12 = tile_id + "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; -string out_name20 = tile_id + "_gross_emis_decision_tree_nodes_biomass_soil_2001_" + model_years_str + "_no_shifting_ag.tif"; +string out_name1 = tile_id + constants::commod_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name2 = tile_id + constants::shifting_ag_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name3 = tile_id + constants::forestry_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name4 = tile_id + constants::wildfire_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name5 = tile_id + constants::urbanization_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name6 = tile_id + constants::no_driver_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name10 = tile_id + constants::all_gases_all_drivers_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name11 = tile_id + constants::CO2_only_all_drivers_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name12 = tile_id + constants::non_CO2_all_drivers_emis + model_years_str + "_no_shifting_ag.tif"; +string out_name20 = tile_id + constants::decision_tree_all_drivers_emis + model_years_str + "_no_shifting_ag.tif"; + // Setting up the variables to hold the pixel location in x/y values int x, y; @@ -400,9 +409,9 @@ for(x=0; x 0) // Commodity, peat { @@ -563,8 +572,8 @@ for(x=0; x 0) // Forestry, peat { @@ -621,8 +630,8 @@ for(x=0; x 0) // Wildfire, peat { @@ -680,10 +689,10 @@ for(x=0; x 0) // Urbanization, peat { @@ -825,8 +834,8 @@ for(x=0; x 0) // No driver, peat { diff --git a/emissions/cpp_util/calc_gross_emissions_soil_only.cpp b/emissions/cpp_util/calc_gross_emissions_soil_only.cpp index 17d09d53..96878a47 100644 --- a/emissions/cpp_util/calc_gross_emissions_soil_only.cpp +++ b/emissions/cpp_util/calc_gross_emissions_soil_only.cpp @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -36,6 +37,7 @@ // These provide constants for the emissions equations #include "flu_val.cpp" #include "equations.cpp" +#include "constants.h" using namespace std; @@ -54,59 +56,97 @@ string infolder = argv[3]; // The folder which has all the input files cout << "Gross emissions C++ infolder:" << infolder << endl; // Model constants +int model_years; // How many loss years are in the model +model_years = constants::model_years; +string model_years_str; +model_years_str = to_string(model_years); + int CH4_equiv; // The CO2 equivalency (global warming potential) of CH4 -CH4_equiv = 28; +CH4_equiv = constants::CH4_equiv; int N2O_equiv; // The CO2 equivalency (global warming potential) of N2O -N2O_equiv = 265; +N2O_equiv = constants::N2O_equiv; float C_to_CO2; // The conversion of carbon to CO2 -C_to_CO2 = 44.0/12.0; +C_to_CO2 = constants::C_to_CO2; -int model_years; // How many loss years are in the model -model_years = 19; -string model_years_str; -model_years_str = to_string(model_years); +float biomass_to_c; // Fraction of carbon in biomass +biomass_to_c = constants::biomass_to_c; int tropical; // The ecozone code for the tropics -tropical = 1; +tropical = constants::tropical; int temperate; // The ecozone code for the temperate zone -temperate = 3; +temperate = constants::temperate; int boreal; // The ecozone code for the boreal zone -boreal = 2; +boreal = constants::boreal; + +int soil_emis_period; // The number of years over which soil emissions are calculated (separate from model years) +soil_emis_period = constants::soil_emis_period; + // Input files -// Carbon pools -string agc_name = infolder + tile_id + "_Mg_AGC_ha_emis_year.tif"; -string bgc_name = infolder + tile_id + "_Mg_BGC_ha_emis_year.tif"; -string dead_name = infolder + tile_id + "_Mg_deadwood_C_ha_emis_year_2000.tif"; -string litter_name = infolder + tile_id + "_Mg_litter_C_ha_emis_year_2000.tif"; -string soil_name = infolder + tile_id + "_Mg_soil_C_ha_emis_year_2000.tif"; +// Carbon pools use the standard names for this sensitivity analysis +string agc_name = infolder + tile_id + constants::AGC_emis_year + ".tif"; +string bgc_name = infolder + tile_id + constants::BGC_emis_year + ".tif"; +string dead_name = infolder + tile_id + constants::deadwood_C_emis_year + ".tif"; +string litter_name = infolder + tile_id + constants::litter_C_emis_year + ".tif"; +string soil_name = infolder + tile_id + constants::soil_C_emis_year + ".tif"; // Other inputs -string loss_name = infolder + "GFW2019_" + tile_id + ".tif"; -string burn_name = infolder + tile_id + "_burnyear.tif"; -string ecozone_name = infolder + tile_id + "_fao_ecozones_bor_tem_tro_processed.tif"; -string climate_name = infolder + tile_id + "_climate_zone_processed.tif"; -string drivermodel_name = infolder + tile_id + "_tree_cover_loss_driver_processed.tif"; -string peat_name = infolder + tile_id + "_peat_mask_processed.tif"; -string ifl_primary_name = infolder + tile_id + "_ifl_2000_primary_2001_merged.tif"; -string plant_name = infolder + tile_id + "_plantation_type_oilpalm_woodfiber_other_unmasked.tif"; +string loss_name = infolder + tile_id + ".tif"; +string burn_name = infolder + tile_id + constants::burnyear; +string ecozone_name = infolder + tile_id + constants::fao_ecozones; +string climate_name = infolder + tile_id + constants::climate_zones; +string drivermodel_name = infolder + tile_id + constants::tcl_drivers; +string peat_name = infolder + tile_id + constants::peat_mask; +string ifl_primary_name = infolder + tile_id + constants::ifl_primary; +string plant_name = infolder + tile_id + constants::plantation_type; // Output files: tonnes CO2/ha for each tree cover loss driver, their total, and the node for the decision tree -// that determines emissions -string out_name1 = tile_id + "_gross_emis_commodity_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name2 = tile_id + "_gross_emis_shifting_ag_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name3 = tile_id + "_gross_emis_forestry_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name4 = tile_id + "_gross_emis_wildfire_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name5 = tile_id + "_gross_emis_urbanization_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name6 = tile_id + "_gross_emis_no_driver_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name10 = tile_id + "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name11 = tile_id + "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name12 = tile_id + "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_soil_only_2001_" + model_years_str + ".tif"; -string out_name20 = tile_id + "_gross_emis_decision_tree_nodes_soil_only_2001_" + model_years_str + ".tif"; +// that determines emissions. +// regex_replace from https://stackoverflow.com/a/41294178 +string out_name1_pre = constants::commod_emis; +out_name1_pre = std::regex_replace(out_name1_pre, std::regex("biomass_soil"), "soil_only"); +string out_name1 = tile_id + out_name1_pre + model_years_str + ".tif"; + +string out_name2_pre = constants::shifting_ag_emis; +out_name2_pre = std::regex_replace(out_name2_pre, std::regex("biomass_soil"), "soil_only"); +string out_name2 = tile_id + out_name2_pre + model_years_str + ".tif"; + +string out_name3_pre = constants::forestry_emis; +out_name3_pre = std::regex_replace(out_name3_pre, std::regex("biomass_soil"), "soil_only"); +string out_name3 = tile_id + out_name3_pre + model_years_str + ".tif"; + +string out_name4_pre = constants::wildfire_emis; +out_name4_pre = std::regex_replace(out_name4_pre, std::regex("biomass_soil"), "soil_only"); +string out_name4 = tile_id + out_name4_pre + model_years_str + ".tif"; + +string out_name5_pre = constants::urbanization_emis; +out_name5_pre = std::regex_replace(out_name5_pre, std::regex("biomass_soil"), "soil_only"); +string out_name5 = tile_id + out_name5_pre + model_years_str + ".tif"; + +string out_name6_pre = constants::no_driver_emis; +out_name6_pre = std::regex_replace(out_name6_pre, std::regex("biomass_soil"), "soil_only"); +string out_name6 = tile_id + out_name6_pre + model_years_str + ".tif"; + +string out_name10_pre = constants::all_gases_all_drivers_emis; +out_name10_pre = std::regex_replace(out_name10_pre, std::regex("biomass_soil"), "soil_only"); +string out_name10 = tile_id + out_name10_pre + model_years_str + ".tif"; + +string out_name11_pre = constants::CO2_only_all_drivers_emis; +out_name11_pre = std::regex_replace(out_name11_pre, std::regex("biomass_soil"), "soil_only"); +string out_name11 = tile_id + out_name11_pre + model_years_str + ".tif"; + +string out_name12_pre = constants::non_CO2_all_drivers_emis; +out_name12_pre = std::regex_replace(out_name12_pre, std::regex("biomass_soil"), "soil_only"); +string out_name12 = tile_id + out_name12_pre + model_years_str + ".tif"; + +string out_name20_pre = constants::decision_tree_all_drivers_emis; +out_name20_pre = std::regex_replace(out_name20_pre, std::regex("biomass_soil"), "soil_only"); +string out_name20 = tile_id + out_name20_pre + model_years_str + ".tif"; + // Setting up the variables to hold the pixel location in x/y values int x, y; @@ -402,9 +442,9 @@ for(x=0; x 0) // Commodity, peat { @@ -566,10 +606,10 @@ for(x=0; x 0) // Shifting ag, peat { @@ -719,8 +759,8 @@ for(x=0; x 0) // Forestry, peat { @@ -777,8 +817,8 @@ for(x=0; x 0) // Wildfire, peat { @@ -836,10 +876,10 @@ for(x=0; x 0) // Urbanization, peat { @@ -981,8 +1021,8 @@ for(x=0; x 0) // No driver, peat { diff --git a/emissions/cpp_util/constants.h b/emissions/cpp_util/constants.h new file mode 100644 index 00000000..d613275c --- /dev/null +++ b/emissions/cpp_util/constants.h @@ -0,0 +1,57 @@ +// Model constants and variable names used in all c++ scripts except equations.cpp. +// model_years needs to be updated in equations.cpp separately at this time. + +namespace constants +{ + // Emissions constants + // per https://www.learncpp.com/cpp-tutorial/global-constants-and-inline-variables/ + constexpr int model_years {20}; // How many loss years are in the model. Must also be updated in equations.cpp! + + constexpr int CH4_equiv {28}; // The CO2 equivalency (global warming potential) of CH4 + + constexpr int N2O_equiv {265}; // The CO2 equivalency (global warming potential) of N2O + + constexpr float C_to_CO2 {44.0/12.0}; // The conversion of carbon to CO2 + + constexpr float biomass_to_c {0.47}; // Fraction of carbon in biomass + + constexpr int tropical {1}; // The ecozone code for the tropics + + constexpr int temperate {3}; // The ecozone code for the temperate zone + + constexpr int boreal {2}; // The ecozone code for the boreal zone + + constexpr int soil_emis_period {20}; // The number of years over which soil emissions are calculated (separate from model years) + + // Input and output variable patterns + // per https://stackoverflow.com/questions/27123306/is-it-possible-to-use-stdstring-in-a-constexpr + // Inputs + constexpr char AGC_emis_year[] = "_Mg_AGC_ha_emis_year"; + constexpr char BGC_emis_year[] = "_Mg_BGC_ha_emis_year"; + constexpr char deadwood_C_emis_year[] = "_Mg_deadwood_C_ha_emis_year_2000"; + constexpr char litter_C_emis_year[] = "_Mg_litter_C_ha_emis_year_2000"; + constexpr char soil_C_emis_year[] = "_Mg_soil_C_ha_emis_year_2000"; + + constexpr char legal_Amazon_loss[] = "_legal_Amazon_annual_loss_2001_2019.tif"; + + constexpr char burnyear[] = "_burnyear_with_Hansen_loss.tif"; + constexpr char fao_ecozones[] = "_fao_ecozones_bor_tem_tro_processed.tif"; + constexpr char climate_zones[] = "_climate_zone_processed.tif"; + constexpr char tcl_drivers[] = "_tree_cover_loss_driver_processed.tif"; + constexpr char peat_mask[] = "_peat_mask_processed.tif"; + constexpr char ifl_primary[] = "_ifl_2000_primary_2001_merged.tif"; + constexpr char plantation_type[] = "_plantation_type_oilpalm_woodfiber_other_unmasked.tif"; + + // Outputs + constexpr char commod_emis[] = "_gross_emis_commodity_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char shifting_ag_emis[] = "_gross_emis_shifting_ag_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char forestry_emis[] = "_gross_emis_forestry_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char wildfire_emis[] = "_gross_emis_wildfire_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char urbanization_emis[] = "_gross_emis_urbanization_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char no_driver_emis[] = "_gross_emis_no_driver_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char all_gases_all_drivers_emis[] = "_gross_emis_all_gases_all_drivers_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char CO2_only_all_drivers_emis[] = "_gross_emis_CO2_only_all_drivers_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char non_CO2_all_drivers_emis[] = "_gross_emis_non_CO2_all_drivers_Mg_CO2e_ha_biomass_soil_2001_"; + constexpr char decision_tree_all_drivers_emis[] = "_gross_emis_decision_tree_nodes_biomass_soil_2001_"; + +} \ No newline at end of file diff --git a/emissions/cpp_util/equations.cpp b/emissions/cpp_util/equations.cpp index ff3e0587..6f93b47d 100644 --- a/emissions/cpp_util/equations.cpp +++ b/emissions/cpp_util/equations.cpp @@ -4,13 +4,20 @@ #include #include #include + +//// http://www.math.uaa.alaska.edu/~afkjm/csce211/handouts/SeparateCompilation.pdf +//#ifndef CONSTANTS_H +//#define CONSTANTS_H +////#include "constants.h" +//#endif + using namespace std; void def_variables(float *q, int ecozone, int forestmodel_data, int ifl, int climate, int plant_data, int lossyr) { int model_years; // How many loss years are in the model - model_years = 19; + model_years = 20; int tropical; // The ecozone code for the tropics tropical = 1; @@ -19,6 +26,16 @@ void def_variables(float *q, int ecozone, int forestmodel_data, int ifl, int cli int boreal; // The ecozone code for the boreal zone boreal = 2; +// int model_years; // How many loss years are in the model +// model_years = constants::model_years; +// +// int tropical; // The ecozone code for the tropics +// tropical = constants::tropical; +// int temperate; // The ecozone code for the temperate zone +// temperate = constants::temperate; +// int boreal; // The ecozone code for the boreal zone +// boreal = constants::boreal; + // instantiates Cf, CO2, CH4, N2O, peatburn, peat_drain_total. // peatburn and peat_drain_annual both have CO2 and non-CO2 components. They are calculated separately and // passed back to the main script as separate values. diff --git a/emissions/mp_calculate_gross_emissions.py b/emissions/mp_calculate_gross_emissions.py index a6f5e26b..18191cfc 100644 --- a/emissions/mp_calculate_gross_emissions.py +++ b/emissions/mp_calculate_gross_emissions.py @@ -11,7 +11,7 @@ For the sensitivity analyses that use a different gross emissions C++ script (currently, soil_only, no_shifting_ag, and convert_to_grassland), do: c++ /usr/local/app/emissions/cpp_util/calc_gross_emissions_.cpp -o /usr/local/app/emissions/cpp_util/calc_gross_emissions_.exe -lgdal -Run mp_calculate_gross_emissions.py by typing python mp_calculate_gross_emissions.py -p [POOL_OPTION] -t [MODEL_TYPE]. +Run by typing python mp_calculate_gross_emissions.py -p [POOL_OPTION] -t [MODEL_TYPE] -l [TILE_LIST] -d [RUN_DATE] The Python script will call the compiled C++ code as needed. The other C++ scripts (equations.cpp and flu_val.cpp) do not need to be compiled. The --emitted_pools-to-use argument specifies whether to calculate gross emissions from biomass+soil or just from soil. @@ -166,14 +166,15 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d uu.print_log("Changing output directory and file name pattern based on sensitivity analysis") output_dir_list = uu.alter_dirs(sensit_type, output_dir_list) output_pattern_list = uu.alter_patterns(sensit_type, output_pattern_list) - uu.print_log(output_dir_list) - uu.print_log(output_pattern_list) # A date can optionally be provided by the full model script or a run of this script. # This replaces the date in constants_and_names. if run_date is not None: output_dir_list = uu.replace_output_dir_date(output_dir_list, run_date) + uu.print_log(output_dir_list) + uu.print_log(output_pattern_list) + # The C++ code expects certain tiles for every input 10x10. # However, not all Hansen tiles have all of these inputs. @@ -192,7 +193,7 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d uu.create_blank_tile_txt() for pattern in pattern_list: - pool = multiprocessing.Pool(processes=60) # 60 = 100 GB peak + pool = multiprocessing.Pool(processes=80) # 60 = 100 GB peak; 80 = XXX GB peak pool.map(partial(uu.make_blank_tile, pattern=pattern, folder=folder, sensit_type=sensit_type), tile_id_list) pool.close() @@ -211,7 +212,7 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d if sensit_type == 'biomass_swap': processes = 15 # 15 processors = XXX GB peak else: - processes = 19 # 17 = 650 GB peak; 18 = 677 GB peak; 19 = 714 GB peak + processes = 19 # 17 = 650 GB peak; 18 = 677 GB peak; 19 = 716 GB peak else: processes = 9 uu.print_log('Gross emissions max processors=', processes) @@ -235,7 +236,7 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d uu.print_log("Adding metadata tags for pattern {}".format(pattern)) if cn.count == 96: - processes = 45 # 45 processors = XXX GB peak + processes = 75 # 45 processors = ~30 GB peak; 55 = XXX GB peak; 75 = XXX GB peak else: processes = 9 uu.print_log('Adding metadata tags max processors=', processes) @@ -276,15 +277,11 @@ def mp_calculate_gross_emissions(sensit_type, tile_id_list, emitted_pools, run_d # Create the output log uu.initiate_log(tile_id_list=tile_id_list, sensit_type=sensit_type, run_date=run_date, emitted_pools=emitted_pools) - # Checks whether the sensitivity analysis argument is valid - uu.check_sensit_type(sensit_type) - # Checks whether the sensitivity analysis and tile_id_list arguments are valid uu.check_sensit_type(sensit_type) if 's3://' in tile_id_list: tile_id_list = uu.tile_list_s3(tile_id_list, 'std') - else: tile_id_list = uu.tile_id_list_check(tile_id_list) diff --git a/gain/annual_gain_rate_AGC_BGC_all_forest_types.py b/gain/annual_gain_rate_AGC_BGC_all_forest_types.py index 7918d278..366b348b 100644 --- a/gain/annual_gain_rate_AGC_BGC_all_forest_types.py +++ b/gain/annual_gain_rate_AGC_BGC_all_forest_types.py @@ -10,7 +10,7 @@ def annual_gain_rate_AGC_BGC_all_forest_types(tile_id, output_pattern_list, sensit_type): - uu.print_log("Mapping forest type for removal and AGB and BGB removal rates:", tile_id) + uu.print_log("Mapping removal rate source and AGB and BGB removal rates:", tile_id) # Start time start = datetime.datetime.now() @@ -64,50 +64,50 @@ def annual_gain_rate_AGC_BGC_all_forest_types(tile_id, output_pattern_list, sens mangrove_AGB_src = rasterio.open(mangrove_AGB) mangrove_BGB_src = rasterio.open(mangrove_BGB) mangrove_AGB_stdev_src = rasterio.open(mangrove_AGB_stdev) - uu.print_log(" Mangrove tiles (AGB and BGB) found for {}".format(tile_id)) + uu.print_log(" Mangrove tiles (AGB and BGB) for {}".format(tile_id)) except: uu.print_log(" No mangrove tile for {}".format(tile_id)) try: europe_AGC_BGC_src = rasterio.open(europe_AGC_BGC) europe_AGC_BGC_stdev_src = rasterio.open(europe_AGC_BGC_stdev) - uu.print_log(" Europe removal factor tile found for {}".format(tile_id)) + uu.print_log(" Europe removal factor tile for {}".format(tile_id)) except: uu.print_log(" No Europe removal factor tile for {}".format(tile_id)) try: plantations_AGC_BGC_src = rasterio.open(plantations_AGC_BGC) plantations_AGC_BGC_stdev_src = rasterio.open(plantations_AGC_BGC_stdev) - uu.print_log(" Planted forest tile found for {}".format(tile_id)) + uu.print_log(" Planted forest tile for {}".format(tile_id)) except: uu.print_log(" No planted forest tile for {}".format(tile_id)) try: us_AGC_BGC_src = rasterio.open(us_AGC_BGC) us_AGC_BGC_stdev_src = rasterio.open(us_AGC_BGC_stdev) - uu.print_log(" US removal factor tile found for {}".format(tile_id)) + uu.print_log(" US removal factor tile for {}".format(tile_id)) except: uu.print_log(" No US removal factor tile for {}".format(tile_id)) try: young_AGC_src = rasterio.open(young_AGC) young_AGC_stdev_src = rasterio.open(young_AGC_stdev) - uu.print_log(" Young forest removal factor tile found for {}".format(tile_id)) + uu.print_log(" Young forest removal factor tile for {}".format(tile_id)) except: uu.print_log(" No young forest removal factor tile for {}".format(tile_id)) try: age_category_src = rasterio.open(age_category) - uu.print_log(" Age category tile found for {}".format(tile_id)) + uu.print_log(" Age category tile for {}".format(tile_id)) except: uu.print_log(" No age category tile for {}".format(tile_id)) try: ipcc_AGB_default_src = rasterio.open(ipcc_AGB_default) ipcc_AGB_default_stdev_src = rasterio.open(ipcc_AGB_default_stdev) - uu.print_log(" IPCC default removal rate tile found for {}".format(tile_id)) - except: uu.print_log(" IPCC default removal rate tile for {}".format(tile_id)) + except: + uu.print_log(" No IPCC default removal rate tile for {}".format(tile_id)) # Opens the output tile, giving it the arguments of the input tiles removal_forest_type_dst = rasterio.open(removal_forest_type, 'w', **kwargs) diff --git a/gain/forest_age_category_IPCC.py b/gain/forest_age_category_IPCC.py index a6620c09..49d48660 100644 --- a/gain/forest_age_category_IPCC.py +++ b/gain/forest_age_category_IPCC.py @@ -17,7 +17,6 @@ def forest_age_category(tile_id, gain_table_dict, pattern, sensit_type): # Gets the bounding coordinates of each tile. Needed to determine if the tile is in the tropics (within 30 deg of the equator) xmin, ymin, xmax, ymax = uu.coords(tile_id) - uu.print_log(" ymax:", ymax) # Default value is that the tile is not in the tropics tropics = 0 @@ -27,7 +26,7 @@ def forest_age_category(tile_id, gain_table_dict, pattern, sensit_type): tropics = 1 - uu.print_log(" Tile in tropics:", tropics) + uu.print_log(" Tile {} in tropics:".format(tile_id), tropics) # Names of the input tiles gain = '{0}_{1}.tif'.format(cn.pattern_gain, tile_id) @@ -49,7 +48,7 @@ def forest_age_category(tile_id, gain_table_dict, pattern, sensit_type): elif sensit_type == 'Mekong_loss': loss = '{0}_{1}.tif'.format(tile_id, cn.pattern_Mekong_loss_processed) else: - loss = '{0}_{1}.tif'.format(cn.pattern_loss, tile_id) + loss = '{0}.tif'.format(tile_id) uu.print_log("Using Hansen loss tile {0} for {1} model run".format(tile_id, sensit_type)) uu.print_log(" Assigning age categories") diff --git a/gain/gain_year_count_all_forest_types.py b/gain/gain_year_count_all_forest_types.py index aa51f412..ee7f2246 100644 --- a/gain/gain_year_count_all_forest_types.py +++ b/gain/gain_year_count_all_forest_types.py @@ -15,7 +15,7 @@ def tile_names(tile_id, sensit_type): if sensit_type == 'legal_Amazon_loss': loss = '{0}_{1}.tif'.format(tile_id, cn.pattern_Brazil_annual_loss_processed) else: - loss = '{0}_{1}.tif'.format(cn.pattern_loss, tile_id) + loss = '{0}.tif'.format(tile_id) gain = '{0}_{1}.tif'.format(cn.pattern_gain, tile_id) model_extent = uu.sensit_tile_rename(sensit_type, tile_id, cn.pattern_model_extent) @@ -263,6 +263,8 @@ def create_gain_year_count_merge(tile_id, pattern, sensit_type): nodata=0 ) + uu.print_log(" No change tile exists for {} by default".format(tile_id)) + # Opens the other gain year count tiles. They may not exist for all other tiles. try: loss_only_gain_years_src = rasterio.open(loss_only_gain_years) @@ -282,10 +284,8 @@ def create_gain_year_count_merge(tile_id, pattern, sensit_type): except: uu.print_log(" No loss and gain tile found for {}".format(tile_id)) - # Opens the output tile, giving it the arguments of the input tiles gain_year_count_merged_dst = rasterio.open(gain_year_count_merged, 'w', **kwargs) - uu.print_log("No change tile exists for {} by default".format(tile_id)) # Adds metadata tags to the output raster uu.add_rasterio_tags(gain_year_count_merged_dst, sensit_type) diff --git a/gain/gross_removals_all_forest_types.py b/gain/gross_removals_all_forest_types.py index 045c3689..6329cb6a 100644 --- a/gain/gross_removals_all_forest_types.py +++ b/gain/gross_removals_all_forest_types.py @@ -24,10 +24,26 @@ def gross_removals_all_forest_types(tile_id, output_pattern_list, sensit_type): cumulative_gain_BGCO2 = '{0}_{1}.tif'.format(tile_id, output_pattern_list[1]) cumulative_gain_AGCO2_BGCO2 = '{0}_{1}.tif'.format(tile_id, output_pattern_list[2]) - # Opens the input tiles - gain_rate_AGC_src = rasterio.open(gain_rate_AGC) - gain_rate_BGC_src = rasterio.open(gain_rate_BGC) - gain_year_count_src = rasterio.open(gain_year_count) + # Opens the input tiles if they exist. If one of the inputs doesn't exist, + try: + gain_rate_AGC_src = rasterio.open(gain_rate_AGC) + uu.print_log(" Aboveground removal factor tile for", tile_id) + except: + uu.print_log(" No aboveground removal factor tile {}. Not creating gross removals.".format(tile_id)) + return + try: + gain_rate_BGC_src = rasterio.open(gain_rate_BGC) + uu.print_log(" Belowground removal factor tile for", tile_id) + except: + uu.print_log(" No belowground removal factor tile {}. Not creating gross removals.".format(tile_id)) + return + try: + gain_year_count_src = rasterio.open(gain_year_count) + uu.print_log(" Gain year count tile for", tile_id) + except: + uu.print_log(" No gain year count tile for {}. Not creating gross removals.".format(tile_id)) + return + # Grabs metadata for an input tile kwargs = gain_rate_AGC_src.meta diff --git a/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py b/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py index b2c6a541..a37c13d4 100644 --- a/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py +++ b/gain/mp_annual_gain_rate_AGC_BGC_all_forest_types.py @@ -91,7 +91,7 @@ def mp_annual_gain_rate_AGC_BGC_all_forest_types(sensit_type, tile_id_list, run_ if sensit_type == 'biomass_swap': processes = 13 else: - processes = 17 # 30 processors > 740 GB peak; 18 = >740 GB peak; 16 = 660 GB peak; 17 = XXX GB peak + processes = 17 # 30 processors > 740 GB peak; 18 = >740 GB peak; 16 = 660 GB peak; 17 = >680 GB peak else: processes = 2 uu.print_log('Removal factor processors=', processes) @@ -105,6 +105,26 @@ def mp_annual_gain_rate_AGC_BGC_all_forest_types(sensit_type, tile_id_list, run_ # for tile_id in tile_id_list: # annual_gain_rate_AGC_BGC_all_forest_types.annual_gain_rate_AGC_BGC_all_forest_types(tile_id, sensit_type) + # Checks the gross removals outputs for tiles with no data + for output_pattern in output_pattern_list: + if cn.count <= 2: # For local tests + processes = 1 + uu.print_log("Checking for empty tiles of {0} pattern with {1} processors using light function...".format( + output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty_light, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + else: + processes = 55 # 50 processors = XXX GB peak + uu.print_log( + "Checking for empty tiles of {0} pattern with {1} processors...".format(output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + + # Uploads output tiles to s3 for i in range(0, len(output_dir_list)): uu.upload_final_set(output_dir_list[i], output_pattern_list[i]) diff --git a/gain/mp_gain_year_count_all_forest_types.py b/gain/mp_gain_year_count_all_forest_types.py index b4fc5cc1..06fbd6e8 100644 --- a/gain/mp_gain_year_count_all_forest_types.py +++ b/gain/mp_gain_year_count_all_forest_types.py @@ -1,5 +1,5 @@ ''' -Creates tiles of the number of years in which carbon removals occur during the model duration (2001 to 2019 currently). +Creates tiles of the number of years in which carbon removals occur during the model duration (2001 to 2020 currently). It is based on the annual Hansen loss data and the 2000-2012 Hansen gain data. First it separately calculates rasters of gain years for model pixels that had loss only, gain only, neither loss nor gain, and both loss and gain. @@ -27,8 +27,8 @@ def mp_gain_year_count_all_forest_types(sensit_type, tile_id_list, run_date = No # If a full model run is specified, the correct set of tiles for the particular script is listed if tile_id_list == 'all': - # List of tiles to run in the model - tile_id_list = uu.tile_list_s3(cn.model_extent_dir, sensit_type) + # No point in making gain year count tiles for tiles that don't have annual removals + tile_id_list = uu.tile_list_s3(cn.annual_gain_AGC_all_types_dir, sensit_type) uu.print_log(tile_id_list) uu.print_log("There are {} tiles to process".format(str(len(tile_id_list))) + "\n") diff --git a/gain/mp_gross_removals_all_forest_types.py b/gain/mp_gross_removals_all_forest_types.py index 0356037b..3b1c11fa 100644 --- a/gain/mp_gross_removals_all_forest_types.py +++ b/gain/mp_gross_removals_all_forest_types.py @@ -26,7 +26,11 @@ def mp_gross_removals_all_forest_types(sensit_type, tile_id_list, run_date = Non # If a full model run is specified, the correct set of tiles for the particular script is listed if tile_id_list == 'all': # List of tiles to run in the model - tile_id_list = uu.tile_list_s3(cn.model_extent_dir, sensit_type) + # tile_id_list = uu.tile_list_s3(cn.model_extent_dir, sensit_type) + gain_year_count_tile_id_list = uu.tile_list_s3(cn.gain_year_count_dir, sensit_type=sensit_type) + annual_removals_tile_id_list = uu.tile_list_s3(cn.annual_gain_AGC_all_types_dir, sensit_type=sensit_type) + tile_id_list = list(set(gain_year_count_tile_id_list).intersection(annual_removals_tile_id_list)) + uu.print_log("Gross removals tile_id_list is combination of gain_year_count and annual_removals tiles:") uu.print_log(tile_id_list) uu.print_log("There are {} tiles to process".format(str(len(tile_id_list))) + "\n") @@ -69,7 +73,7 @@ def mp_gross_removals_all_forest_types(sensit_type, tile_id_list, run_date = Non if sensit_type == 'biomass_swap': processes = 18 else: - processes = 22 # 50 processors > 740 GB peak; 25 = >740 GB peak; 15 = 490 GB peak; 20 = 590 GB peak; 22 = XXX GB peak + processes = 22 # 50 processors > 740 GB peak; 25 = >740 GB peak; 15 = 490 GB peak; 20 = 590 GB peak; 22 = 710 GB peak else: processes = 2 uu.print_log('Gross removals max processors=', processes) @@ -83,6 +87,25 @@ def mp_gross_removals_all_forest_types(sensit_type, tile_id_list, run_date = Non # for tile_id in tile_id_list: # gross_removals_all_forest_types.gross_removals_all_forest_types(tile_id, output_pattern_list, sensit_type) + # Checks the gross removals outputs for tiles with no data + for output_pattern in output_pattern_list: + if cn.count <= 2: # For local tests + processes = 1 + uu.print_log("Checking for empty tiles of {0} pattern with {1} processors using light function...".format( + output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty_light, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + else: + processes = 55 # 55 processors = 670 GB peak + uu.print_log( + "Checking for empty tiles of {0} pattern with {1} processors...".format(output_pattern, processes)) + pool = multiprocessing.Pool(processes) + pool.map(partial(uu.check_and_delete_if_empty, output_pattern=output_pattern), tile_id_list) + pool.close() + pool.join() + # Uploads output tiles to s3 for i in range(0, len(output_dir_list)): diff --git a/readme.md b/readme.md index 45fbad3a..8b8d3ff6 100644 --- a/readme.md +++ b/readme.md @@ -3,7 +3,7 @@ ### Purpose and scope This model maps gross annual greenhouse gas emissions from forests, gross carbon removals (sequestration) by forests, and the difference between them -(net flux), all between 2001 and 2019. +(net flux), all between 2001 and 2020. Gross emissions includes CO2, NH4, and N20 and all carbon pools, and gross removals includes removals into aboveground and belowground biomass carbon. Although the model is run for all tree canopy densities (per Hansen et al. 2013), it is most relevant to pixels with canopy density >30% in 2000 or pixels which subsequently had tree cover gain (per Hansen et al. 2013). @@ -26,36 +26,52 @@ The input processing scripts are scattered among almost all the folders, unfortu which I haven't fixed yet. The data prep scripts are generally in the folder for which their outputs are most relevant. ### Outputs -There are three key outputs produced: gross GHG emissions, gross removals, and net flux, all for 2001-2019. +There are three key outputs produced: gross GHG emissions, gross removals, and net flux, all for 2001-2020. These are produced at two resolutions: 0.00025 x 0.00025 degrees (approximately 30 x 30 m at the equator) in 10 x 10 degree rasters (to make outputs a manageable size), and 0.04 x 0.04 degrees (approximately 4 x 4 km at the equator) as global rasters. +Model runs also automatically generate a txt log that is saved to s3. This log includes nearly everything that is output in the console. +This log is useful for documenting model runs and checking for mistakes/errors in retrospect, although it does not capture errors that terminate the model. +For example, users can examine it to see if the correct input tiles were downloaded or if the intended tiles were used during the model run. + #### 30-m outputs The 30-m outputs are used for zonal statistics analyses (i.e. emissions, removals, or net in polygons of interest) and mapping on the Global Forest Watch web platform or at small scales (where 30-m pixels can be distinguished). Individual emissions can be assigned years based on Hansen loss during further analyses but removals and net flux are cumulative over the entire model run and cannot be assigned specific years. -This 30-m output is in megagrams CO2e/ha 2001-2019 (i.e. densities) and includes all tree cover densities ("full extent") +This 30-m output is in megagrams CO2e/ha 2001-2020 (i.e. densities) and includes all tree cover densities ("full extent"): `(((TCD2000>0 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)`. However, the model is designed to be used specifically for forests, so the model creates three derivative 30-m outputs for each key output (gross emissions, gross removals, net flux) as well (only for the standard model, not for sensitivity analyses): -1) Per pixel values for the full model extent (all tree cover densities) +1) Per pixel values for the full model extent (all tree cover densities): `(((TCD2000>0 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` -2) Per hectare values for forest pixels only +2) Per hectare values for forest pixels only: `(((TCD2000>30 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` -3) Per pixel values for forest pixels only +3) Per pixel values for forest pixels only: `(((TCD2000>30 AND WHRC AGB2000>0) OR Hansen gain=1 OR mangrove AGB2000>0) NOT IN pre-2000 plantations)` -The per hectare outputs are used for making maps, while the per pixel outputs are used for analyses because the values +The per hectare outputs are used for making pixel-level maps (essentially showing emission and removal factors), +while the per pixel outputs are used for analyses because the values of those pixels can be summed within areas of interest. -(The pixels of the per hectare outputs should not be summed but they can be averaged.) +(The pixels of the per hectare outputs should not be summed but they can be averaged in areas of interest.) Statistics from this model are always based on the "forest extent" rasters, not the "full extent" rasters. The full model extent outputs should generally not be used but are created by the model in case they are needed. +In addition to these three key outputs, there are many intermediate output rasters from the model, +some of which may be useful for QC, analyses by area of interest, or something else. +All of these are at 0.00025 x 0.00025 degree resolution and reported as per hectare values (as opposed to per pixel values), if applicable. +Intermediate outputs include the annual aboveground and belowground biomass removal rates +for all kinds of forests, the type of removal factor applied to each pixel, the carbon pool densities in 2000, +carbon pool densities in the year of tree cover loss, and the number of years in which removals occurred. + +Almost all model output have metadata associated with them, viewable using the `gdalinfo` command line utility (https://gdal.org/programs/gdalinfo.html). +Metadata includes units, date created, model version, geographic extent, and more. Unfortunately, the metadata are not viewable in ArcMap +or in the versions of these files downloadable from the Global Forest Watch Open Data Portal. + #### 4-km outputs The 4-km outputs are used for static large-scale maps, like in publications and presentations. @@ -66,19 +82,6 @@ Although gross emissions are traditionally given positive (+) values and gross removals are traditionally given negative (-) values, the 30-m gross removals rasters are positive, while the 4-km gross removals rasters are negative. Net flux at both scales can be positive or negative depending on the balance of emissions and removals in the area of interest. -In addition to these three key outputs, there are many intermediate output rasters from the model, -some of which may be useful for QC, analyses by area of interest, or something else. -All of these are at 0.00025 x 0.00025 degree resolution and reported as per hectare values (as opposed to per pixel values), if applicable. -Intermediate outputs include the annual aboveground and belowground biomass removal rates -for all kinds of forests, the type of removal factor applied to each pixel, the carbon pool densities in 2000, -carbon pool densities in the year of tree cover loss, and the number of years in which removals occurred. - -Almost all model output have metadata associated with them, viewable using the `gdalinfo` command line utility (https://gdal.org/programs/gdalinfo.html). -Metadata includes units, date created, model version, geographic extent, and more. Unfortunately, the metadata are not viewable in ArcMap. - -Model runs also automatically generate a txt log that is saved to s3. This log includes nearly everything that is output in the console. -This log is useful for documenting model runs and checking for mistakes/errors in retrospect, although it does not capture errors that terminate the model. -For example, users can examine it to see if the correct input tiles were downloaded or if the intended tiles were used during the model run. ### Running the model There are two ways to run the model: as a series of individual scripts, or from a master script, which runs the individual scripts sequentially. @@ -196,23 +199,59 @@ Some use all tiles and some use a smaller extent. | `legal_Amazon_loss` | Uses Brazil's PRODES annual deforestation system instead of Hansen loss | Legal Amazon| `mp_model_extent.py` | | `Mekong_loss` | Uses Hansen loss v2.0 (multiple loss in same pixel). NOTE: Not used for flux model v1.2.0, so this is not currently supported. | Mekong region | N/A | + +### Updating the model with new tree cover loss +For the current general configuration of the model, these are the changes that need to be made to update the +model with a new year of tree cover loss data. In the order in which the changes would be needed for rerunning the model: + +1) Update the model version in `constants_and_names.py`. + +2) Change the tree cover loss tile source to the new tree cover loss tiles in `constants_and_names.py`. If the tree cover +loss tile pattern is different from the previous year, that will need to be changed in several places in various scripts, unfortunately. + +3) Change the number of loss years in `constants_and_names.py`. + +4) Make sure that and changes in forest age category produced by `mp_forest_age_category_IPCC.py` + and the number of gain years produced by `mp_gain_year_count_all_forest_types.py` still make sense. + +5) Obtain and pre-process the updated drivers of tree cover loss model in `mp_prep_other_inputs.py`. + +6) Create a new year of burned area data using `mp_burn_year.py` (multiple changes to script needed, and potentially + some reworking if the burned area ftp site has changed its structure or download protocol). + +7) In `constants.h`, change the number of model years. + +8) In `equations.cpp`, change the number of model years. + +Strictly speaking, if only the drivers, burn year, and tree cover loss are being updated, the model only needs to be +run from forest_age_category_IPCC onwards (loss affects IPCC age category but model extent isn't affected by +any of these inputs). +However, for completeness, I suggest running all stages of the model from model_extent onwards for an update so that +model outputs from all stages have the same version in their metadata and the same dates of output as the model stages +that are actually being changed. + + ### Modifying the model It is recommended that any changes to the model be tested in a local Docker instance before running on an EC2 instance. +I like to output files to test folders with dates 20219999 because that is clearly not a real run date. A standard development route is: -1) make changes to a single model script and run using the single processor option on a single tile (easiest for debugging) in local Docker, +1) Make changes to a single model script and run using the single processor option on a single tile (easiest for debugging) in local Docker. -2) run single script on a few representative tiles using a single processor in local Docker, +2) Run single script on a few representative tiles using a single processor in local Docker. -3) run single script on a few representative tiles using multiple processor option, +3) Run single script on a few representative tiles using multiple processor option. -4) run the single script from the master script on a few representative tiles using multiple processor option to confirm that changes work when using master script, +4) Run the single script from the master script on a few representative tiles using multiple processor option to + confirm that changes work when using master script. -5) run single script on a few representative tiles using multiple processors on EC2 instance (need to commit and push changes to GitHub first), +5) Run single script on a few representative tiles using multiple processors on EC2 instance (need to commit and push changes to GitHub first). -6) run master script on all tiles using multiple processors on EC2 instance. If the changes likely affected memory usage, make sure to watch memory with `htop` to make sure that too much memory isn't required. If too much memory is needed, reduce the number of processors being called in the script. +6) Run master script on all tiles using multiple processors on EC2 instance. + If the changes likely affected memory usage, make sure to watch memory with `htop` to make sure that too much memory isn't required. + If too much memory is needed, reduce the number of processors being called in the script. -Obviously, depending on the changes being made, some of these steps can be ommitted. +Depending on the complexity of the changes being made, some of these steps can be ommitted. ### Dependencies Theoretically, this model should run anywhere that the correct Docker container can be started and there is access to the AWS s3 bucket. diff --git a/run_full_model.py b/run_full_model.py index 65827aa5..e9a5ae3f 100644 --- a/run_full_model.py +++ b/run_full_model.py @@ -229,7 +229,7 @@ def main (): output_dir_list = output_dir_list + [cn.net_flux_dir] - if create_supplementary_outputs in actual_stages: + if 'create_supplementary_outputs' in actual_stages: output_dir_list = output_dir_list + \ [cn.cumul_gain_AGCO2_BGCO2_all_types_per_pixel_full_extent_dir, cn.cumul_gain_AGCO2_BGCO2_all_types_forest_extent_dir, @@ -479,7 +479,6 @@ def main (): uu.print_log(":::::Freeing up memory for net flux creation by deleting unneeded tiles") tiles_to_delete = [] - tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_loss))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_gross_emis_non_co2_all_drivers_biomass_soil))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_gross_emis_co2_only_all_drivers_biomass_soil))) tiles_to_delete.extend(glob.glob('*{}*tif'.format(cn.pattern_gross_emis_commod_biomass_soil))) @@ -539,6 +538,16 @@ def main (): # Converts gross emissions, gross removals and net flux from per hectare rasters to per pixel rasters if 'create_supplementary_outputs' in actual_stages: + uu.print_log(":::::Deleting rewindowed tiles") + tiles_to_delete = [] + tiles_to_delete.extend(glob.glob('*rewindow*tif')) + uu.print_log(" Deleting", len(tiles_to_delete), "tiles...") + + for tile_to_delete in tiles_to_delete: + os.remove(tile_to_delete) + uu.print_log(":::::Deleted unneeded tiles") + uu.check_storage() + uu.print_log(":::::Creating supplementary versions of main model outputs (forest extent, per pixel)") start = datetime.datetime.now() @@ -573,7 +582,7 @@ def main (): script_end = datetime.datetime.now() script_elapsed_time = script_end - script_start uu.print_log(":::::Processing time for entire run:", script_elapsed_time, "\n") - + uu.upload_log() if __name__ == '__main__': main() \ No newline at end of file diff --git a/universal_util.py b/universal_util.py index 095fd94a..544c5f62 100644 --- a/universal_util.py +++ b/universal_util.py @@ -16,6 +16,9 @@ import re import pandas as pd from osgeo import gdal +import time +from random import seed +from random import random # Prints the date as YYYYmmdd_hhmmss d = datetime.datetime.today() @@ -26,6 +29,11 @@ # Uploads the output log to the designated s3 folder def upload_log(): + # Builds a slightly variable delay into the log uploading so that a ton of log uploads at once don't overwhelm s3 + seed() + lag = random()*2 + time.sleep(lag) + cmd = ['aws', 's3', 'cp', os.path.join(cn.docker_app, cn.model_log), cn.model_log_dir, '--quiet'] check_call(cmd) @@ -51,25 +59,41 @@ def initiate_log(tile_id_list=None, sensit_type=None, run_date=None, stage_input logging.info("Standard net flux for comparison with sensitivity analysis net flux (optional): {}".format(std_net_flux)) logging.info("Include mangrove removal scripts in model run (optional): {}".format(include_mangroves)) logging.info("Include US removal scripts in model run (optional): {}".format(include_us)) - logging.info("AWS ec2 instance type and AMI id:") - # try: - # cmd = ['curl', 'http://169.254.169.254/latest/meta-data/instance-type'] # https://stackoverflow.com/questions/625644/how-to-get-the-instance-id-from-within-an-ec2-instance - # process = Popen(cmd, stdout=PIPE, stderr=STDOUT) - # with process.stdout: - # log_subprocess_output(process.stdout) - # cmd = ['curl', 'http://169.254.169.254/latest/meta-data/ami-id'] # https://stackoverflow.com/questions/625644/how-to-get-the-instance-id-from-within-an-ec2-instance - # process = Popen(cmd, stdout=PIPE, stderr=STDOUT) - # with process.stdout: - # log_subprocess_output(process.stdout) - # except: - # logging.info("Not running on AWS ec2 instance") - logging.info("Available processors: {}".format(cn.count)) - logging.info("") + logging.info("AWS ec2 instance type and AMI ID:") + + # https://stackoverflow.com/questions/13735051/how-to-capture-curl-output-to-a-file + # https://stackoverflow.com/questions/625644/how-to-get-the-instance-id-from-within-an-ec2-instance + try: + cmd = ['curl', 'http://169.254.169.254/latest/meta-data/instance-type', '-o', 'instance_type.txt', '--silent'] + process = Popen(cmd, stdout=PIPE, stderr=STDOUT) + with process.stdout: + log_subprocess_output(process.stdout) + cmd = ['curl', 'http://169.254.169.254/latest/meta-data/ami-id', '-o', 'ami_id.txt', '--silent'] + process = Popen(cmd, stdout=PIPE, stderr=STDOUT) + with process.stdout: + log_subprocess_output(process.stdout) + + type_file = open("instance_type.txt", "r") + type_lines = type_file.readlines() + for line in type_lines: + logging.info(" Instance type: {}".format(line.strip())) + + ami_file = open("ami_id.txt", "r") + ami_lines = ami_file.readlines() + for line in ami_lines: + logging.info(" AMI ID: {}".format(line.strip())) + + except: + logging.info(" Not running on AWS ec2 instance") + + logging.info("Available processors: {}".format(cn.count) + "\n") # Suppresses logging from rasterio and botocore below ERROR level for the entire model logging.getLogger("rasterio").setLevel(logging.ERROR) # https://www.tutorialspoint.com/How-to-disable-logging-from-imported-modules-in-Python logging.getLogger("botocore").setLevel(logging.ERROR) # "Found credentials in environment variables." is logged by botocore: https://github.com/boto/botocore/issues/1841 + upload_log() + # Prints the output statement in the console and adds it to the log. It can handle an indefinite number of string to print def print_log(*args): @@ -86,8 +110,12 @@ def print_log(*args): # Prints to console print("LOG: " + full_statement) - # Every time a line is added to the log, it is copied to s3 - upload_log() + # # Every time a line is added to the log, it is copied to s3. + # # NOTE: During model 1.2.1 runs, I started getting repeated errors about uploading the log to s3. + # # I don't know why it happened, but my guess is that it's because I had too many things trying to copy + # # to that s3 location at once. So I'm reducing the occasions for uploading the log by removing uploads + # # whenever anything is printed. Instead, I'll upload at the end of each tile and each model stage. + # upload_log() # Logs fatal errors to the log txt, uploads to s3, and then terminates the program with an exception in the console @@ -128,8 +156,10 @@ def log_subprocess_output(pipe): # logging.info("\n") # print("\n") - # After the subprocess finishes, the log is uploaded to s3 - upload_log() + # # After the subprocess finishes, the log is uploaded to s3. + # # Having too many tiles finish running subprocesses at once can cause the upload to get overwhelmed and cause + # # an error. So, I've commented out the log upload because it's not really necessary here. + # upload_log() def log_subprocess_output_simple(cmd): @@ -161,8 +191,8 @@ def log_subprocess_output_full(cmd): # logging.info("\n") # print("\n") - # After the subprocess finishes, the log is uploaded to s3 - upload_log() + # # After the subprocess finishes, the log is uploaded to s3 + # upload_log() # Checks the OS for how much storage is available in the system, what's being used, and what percent is being used @@ -222,7 +252,7 @@ def tile_list_s3(source, sensit_type='std'): else: new_source = source.replace('standard', sensit_type) - print_log("Creating list of tiles in", new_source) + print_log('\n' + "Creating list of tiles in", new_source) ## For an s3 folder in a bucket using AWSCLI # Captures the list of the files in the folder @@ -255,7 +285,7 @@ def tile_list_s3(source, sensit_type='std'): # In case the change of directories to look for sensitivity versions yields an empty folder. # This could be done better by using boto3 to check the potential s3 folders for files upfront but I couldn't figure # out how to do that. - print_log("Creating list of tiles in", source) + print_log('\n' + "Creating list of tiles in", source) ## For an s3 folder in a bucket using AWSCLI # Captures the list of the files in the folder @@ -526,8 +556,13 @@ def count_tiles_s3(source, pattern=None): num = len(line.strip('\n').split(" ")) tile_name = line.strip('\n').split(" ")[num - 1] + if pattern == '': # For Hansen loss tiles, which have no pattern + if tile_name.endswith('.tif'): + tile_id = get_tile_id(tile_name) + file_list.append(tile_id) + # For gain, tcd, and pixel area tiles, which have the tile_id after the the pattern - if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area, cn.pattern_loss]: + elif pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area]: if tile_name.endswith('.tif'): tile_id = get_tile_id(tile_name) file_list.append(tile_id) @@ -583,7 +618,9 @@ def s3_flexible_download(source_dir, pattern, dest, sensit_type, tile_id_list): # Creates a full download name (path and file) for tile_id in tile_id_list: - if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area, cn.pattern_loss]: # For tiles that do not have the tile_id first + if pattern == cn.pattern_loss: # For Hansen loss tiles + source = '{0}{1}.tif'.format(source_dir, tile_id) + elif pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area]: # For tiles that do not have the tile_id first source = '{0}{1}_{2}.tif'.format(source_dir, pattern, tile_id) else: # For every other type of tile source = '{0}{1}_{2}.tif'.format(source_dir, tile_id, pattern) @@ -606,11 +643,27 @@ def s3_folder_download(source, dest, sensit_type, pattern = None): local_tile_count = len(glob.glob('*{}*.tif'.format(pattern))) # For tile types that have the tile_id after the pattern - if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_loss, cn.pattern_pixel_area]: + if pattern in [cn.pattern_gain, cn.pattern_tcd, cn.pattern_pixel_area]: local_tile_count = len(glob.glob('{}*.tif'.format(pattern))) + # Because loss tiles don't have a pattern after the tile_id, thee count of loss tiles on the spot machine + # needs to be handled separately. + if pattern == '': + + local_tile_count = 0 + + all_local_tiles = glob.glob('*tif') + + for tile in all_local_tiles: + + # Loss tiles are 12 characters long (just [tile_id].tif). Only tiles with that length name are counted. + # Using regex to identify the loss tiles would be better but I couldn't get that to work. + if len(tile) == 12: + local_tile_count = local_tile_count + 1 + + print_log("There are", local_tile_count, "tiles on the spot machine with the pattern", pattern) # Changes the path to download from based on the sensitivity analysis being run and whether that particular input @@ -732,8 +785,14 @@ def s3_file_download(source, dest, sensit_type): # If not already downloaded, first tries to download the sensitivity analysis version try: - # Based on https://www.thetopsites.net/article/50187246.shtml#:~:text=Fastest%20way%20to%20find%20out,does%20not%20exist%22%20if%20s3. - s3.Object('gfw2-data', '{0}/{1}'.format(dir_sens[15:], file_name_sens)).load() + # Determines which bucket to check + if 'gfw-data-lake' in source: + bucket = 'gfw-data-lake' + folder = source[19:] + else: + bucket = 'gfw2-data' + folder = source[15:] + s3.Object(bucket, '{0}/{1}'.format(folder, file_name_sens)).load() cmd = ['aws', 's3', 'cp', '{0}/{1}'.format(dir_sens, file_name_sens), dest, '--only-show-errors'] log_subprocess_output_full(cmd) print_log(" Option 2 success: Sensitivity analysis tile {0}/{1} found on s3 and downloaded".format(dir_sens, file_name_sens)) @@ -759,8 +818,15 @@ def s3_file_download(source, dest, sensit_type): # If not already downloaded, final optionis to try to download the standard version of the tile. # If this doesn't work, the script throws a fatal error because no variant of this tile was found. try: + # Determines which bucket to check + if 'gfw-data-lake' in source: + bucket = 'gfw-data-lake' + folder = source[19:] + else: + bucket = 'gfw2-data' + folder = source[15:] # Based on https://www.thetopsites.net/article/50187246.shtml#:~:text=Fastest%20way%20to%20find%20out,does%20not%20exist%22%20if%20s3. - s3.Object('gfw2-data', '{0}'.format(source[15:])).load() + s3.Object(bucket, folder).load() cmd = ['aws', 's3', 'cp', source, dest, '--only-show-errors'] log_subprocess_output_full(cmd) print_log(" Option 4 success: Standard tile {} found on s3 and downloaded".format(source)) @@ -786,18 +852,25 @@ def s3_file_download(source, dest, sensit_type): source = os.path.join(dir, file_name) try: + # Determines which bucket to check + if 'gfw-data-lake' in source: + bucket = 'gfw-data-lake' + folder = source[19:] + else: + bucket = 'gfw2-data' + folder = source[15:] # Based on https://www.thetopsites.net/article/50187246.shtml#:~:text=Fastest%20way%20to%20find%20out,does%20not%20exist%22%20if%20s3. - s3.Object('gfw2-data', '{0}'.format(source[15:])).load() + s3.Object(bucket, folder).load() cmd = ['aws', 's3', 'cp', source, dest, '--only-show-errors'] log_subprocess_output_full(cmd) - print_log(" Option 2 success: Tile {} found on s3 and downloaded".format(source)) + print_log(" Option 2 success: Tile {} found on s3 and downloaded".format(source) + "\n") print_log("") return except botocore.exceptions.ClientError as e: if e.response['Error']['Code'] == "404": - print_log(" Option 2 failure: Tile {0} not found on s3. Tile not found but it seems it should be. Check file paths and names.".format(source)) + print_log(" Option 2 failure: Tile {0} not found on s3. Tile not found but it seems it should be. Check file paths and names.".format(source) + "\n") else: - print_log(" Option 2 failure: Some other error occurred while looking for {0}".format(source)) + print_log(" Option 2 failure: Some other error occurred while looking for {0}".format(source) + "\n") # Uploads all tiles of a pattern to specified location def upload_final_set(upload_dir, pattern): @@ -812,6 +885,9 @@ def upload_final_set(upload_dir, pattern): except: print_log("Error uploading output tile(s)") + # Uploads the log as each model stage is finished + upload_log() + # Uploads tile to specified location def upload_final(upload_dir, tile_id, pattern): @@ -863,6 +939,11 @@ def check_and_delete_if_empty(tile_id, output_pattern): tile_name = '{0}_{1}.tif'.format(tile_id, output_pattern) + # Only checks for data if the tile exists + if not os.path.exists(tile_name): + print_log(tile_name, "does not exist. Skipping check of whether there is data.") + return + print_log("Checking if {} contains any data...".format(tile_name)) no_data = check_for_data(tile_name) @@ -922,6 +1003,9 @@ def end_of_fx_summary(start, tile_id, pattern): print_log("Processing time for tile", tile_id, ":", elapsed_time) count_completed_tiles(pattern) + # Uploads the log as each tile is finished + upload_log() + # Warps raster to Hansen tiles using multiple processors def mp_warp_to_Hansen(tile_id, source_raster, out_pattern, dt): @@ -1023,11 +1107,11 @@ def make_blank_tile(tile_id, pattern, folder, sensit_type): # Preferentially uses Hansen loss tile as the template for creating a blank plantation tile # (tile extent, resolution, pixel alignment, compression, etc.). # If the tile is already on the spot machine, it uses the downloaded tile. - if os.path.exists(os.path.join(folder, '{0}_{1}.tif'.format(cn.pattern_loss, tile_id))): + if os.path.exists(os.path.join(folder, '{0}.tif'.format(tile_id))): print_log("Hansen loss tile exists for {}. Using that as template for blank tile.".format(tile_id)) cmd = ['gdal_merge.py', '-createonly', '-init', '0', '-co', 'COMPRESS=LZW', '-ot', 'Byte', '-o', '{0}/{1}_{2}.tif'.format(folder, tile_id, pattern), - '{0}/{1}_{2}.tif'.format(folder, cn.pattern_loss, tile_id)] + '{0}/{1}.tif'.format(folder, tile_id)] check_call(cmd) # If the Hansen loss tile isn't already on the spot machine