Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider refactoring components with strict methodological significance to pacta.data.preparation or pacta.scenario.preparation #94

Closed
13 tasks done
jdhoffa opened this issue Feb 8, 2024 · 3 comments

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Feb 8, 2024

PROBABLY DON'T DO THIS PRIOR TO SUCCESSFUL DELIVERY OF PACTA COP CH 2024

Towards a world where workflows handle file I/O and configuration and DevOps mainly
and pacta.* handles as much methodology as possible. Relates to RMI/practices#2

  • maybe refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/357 and use pacta.data.preparation::determine_relevant_years() #208)

    relevant_years <- sort(
    unique(
    market_share_target_reference_year:(market_share_target_reference_year + time_horizon)
    )
    )

  • refactor to pacta.scenario.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/358 and use pacta.data.preparation::prepare_scenarios_long() #209) not going to do this until after a switch to pacta.scenario.data.preparation is made

    # scenario values will be linearly interpolated for each group below
    interpolation_groups <- c(
    "source",
    "scenario",
    "sector",
    "technology",
    "scenario_geography",
    "indicator",
    "units"
    )
    scenario_raw_data %>%
    pacta.scenario.preparation::interpolate_yearly(!!!rlang::syms(interpolation_groups)) %>%
    filter(.data$year >= .env$market_share_target_reference_year) %>%
    pacta.scenario.preparation::add_market_share_columns(reference_year = market_share_target_reference_year) %>%
    pacta.scenario.preparation::format_p4i(green_techs) %>%

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/359 and use pacta.data.preparation::standardize_asset_type_names() #210)

    factset_issue_code_bridge <-
    pacta.data.preparation::factset_issue_code_bridge %>%
    select(issue_type_code, asset_type) %>%
    mutate(
    asset_type = case_when(
    .data$asset_type == "Listed Equity" ~ "Equity",
    .data$asset_type == "Corporate Bond" ~ "Bonds",
    .data$asset_type == "Fund" ~ "Funds",
    .data$asset_type == "Other" ~ "Others",
    TRUE ~ "Others"
    )
    )

  • refactor to pacta.scenario.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/358 and use pacta.data.preparation::prepare_scenarios_long() #209) not going to do this until after a switch to pacta.scenario.data.preparation is made

    scenarios_long <- scenario_raw %>%
    inner_join(
    pacta.scenario.preparation::scenario_source_pacta_geography_bridge,
    by = c(
    scenario_source = "source",
    scenario_geography = "scenario_geography_source"
    )
    ) %>%
    select(-"scenario_geography") %>%
    rename(scenario_geography = "scenario_geography_pacta") %>%
    filter(
    .data$scenario_source %in% .env$scenario_sources_list,
    .data$ald_sector %in% c(.env$sector_list, .env$other_sector_list),
    .data$scenario_geography %in% unique(.env$scenario_regions$scenario_geography),
    .data$year %in% unique(
    c(.env$relevant_years, .env$market_share_target_reference_year + 10)
    )
    )

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/348)

    ar_company_id__country_of_domicile <-
    entity_info %>%
    select("ar_company_id", "country_of_domicile") %>%
    filter(!is.na(.data$ar_company_id)) %>%
    distinct()
    ar_company_id__credit_parent_ar_company_id <-
    entity_info %>%
    select("ar_company_id", "credit_parent_ar_company_id") %>%
    filter(!is.na(.data$ar_company_id)) %>%
    distinct()

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/360 and Use pacta.data.preparation::prepare_masterdata_debt() #211)

    masterdata_debt <- readr::read_csv(masterdata_debt_path, na = "", show_col_types = FALSE)
    company_id__creditor_company_id <-
    masterdata_debt %>%
    select("company_id", "creditor_company_id") %>%
    distinct() %>%
    mutate(across(.cols = dplyr::everything(), .fns = as.character))
    masterdata_debt %>%
    pacta.data.preparation::prepare_masterdata(
    ar_company_id__country_of_domicile,
    pacta_financial_timestamp,
    zero_emission_factor_techs
    ) %>%
    left_join(company_id__creditor_company_id, by = c(id = "company_id")) %>%
    left_join(ar_company_id__credit_parent_ar_company_id, by = c(id = "ar_company_id")) %>%
    mutate(id = if_else(!is.na(.data$credit_parent_ar_company_id), .data$credit_parent_ar_company_id, .data$id)) %>%
    mutate(id = if_else(!is.na(.data$creditor_company_id), .data$creditor_company_id, .data$id)) %>%
    mutate(id_name = "credit_parent_ar_company_id") %>%
    group_by(
    .data$id, .data$id_name, .data$ald_sector, .data$ald_location,
    .data$technology, .data$year, .data$country_of_domicile,
    .data$ald_production_unit, .data$ald_emissions_factor_unit,
    ) %>%
    summarise(
    ald_emissions_factor = stats::weighted.mean(.data$ald_emissions_factor, .data$ald_production, na.rm = TRUE),
    ald_production = sum(.data$ald_production, na.rm = TRUE),
    .groups = "drop"
    ) %>%
    saveRDS(file.path(data_prep_outputs_path, "masterdata_debt_datastore.rds"))

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/348 and https://github.com/RMI-PACTA/pacta.data.preparation/pull/353)

    ar_company_id__sectors_with_assets__ownership <-
    readRDS(file.path(data_prep_outputs_path, "masterdata_ownership_datastore.rds")) %>%
    filter(year %in% relevant_years) %>%
    select(ar_company_id = id, ald_sector) %>%
    distinct() %>%
    group_by(ar_company_id) %>%
    summarise(sectors_with_assets = paste(unique(ald_sector), collapse = " + "))
    financial_data %>%
    left_join(factset_entity_id__ar_company_id, by = "factset_entity_id") %>%
    left_join(factset_entity_id__security_mapped_sector, by = "factset_entity_id") %>%
    left_join(ar_company_id__sectors_with_assets__ownership, by = "ar_company_id") %>%
    mutate(has_asset_level_data = if_else(is.na(sectors_with_assets) | sectors_with_assets == "", FALSE, TRUE)) %>%
    mutate(has_ald_in_fin_sector = if_else(stringr::str_detect(sectors_with_assets, security_mapped_sector), TRUE, FALSE)) %>%
    select(
    isin,
    has_asset_level_data,
    has_ald_in_fin_sector,
    sectors_with_assets
    ) %>%
    saveRDS(file.path(data_prep_outputs_path, "abcd_flags_equity.rds"))

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/348 and https://github.com/RMI-PACTA/pacta.data.preparation/pull/353)

    ar_company_id__sectors_with_assets__debt <-
    readRDS(file.path(data_prep_outputs_path, "masterdata_debt_datastore.rds")) %>%
    filter(year %in% relevant_years) %>%
    select(ar_company_id = id, ald_sector) %>%
    distinct() %>%
    group_by(ar_company_id) %>%
    summarise(sectors_with_assets = paste(unique(ald_sector), collapse = " + "))
    financial_data %>%
    left_join(factset_entity_id__ar_company_id, by = "factset_entity_id") %>%
    left_join(factset_entity_id__security_mapped_sector, by = "factset_entity_id") %>%
    left_join(ar_company_id__sectors_with_assets__debt, by = "ar_company_id") %>%
    mutate(has_asset_level_data = if_else(is.na(sectors_with_assets) | sectors_with_assets == "", FALSE, TRUE)) %>%
    mutate(has_ald_in_fin_sector = if_else(stringr::str_detect(sectors_with_assets, security_mapped_sector), TRUE, FALSE)) %>%
    left_join(
    select(entity_info, "factset_entity_id", "credit_parent_id"),
    by = "factset_entity_id"
    ) %>%
    mutate(
    # If FactSet has no credit_parent, we define the company as it's own parent
    credit_parent_id = if_else(is.na(credit_parent_id), factset_entity_id, credit_parent_id)
    ) %>%
    group_by(credit_parent_id) %>%
    summarise(
    has_asset_level_data = sum(has_asset_level_data, na.rm = TRUE) > 0,
    has_ald_in_fin_sector = sum(has_ald_in_fin_sector, na.rm = TRUE) > 0,
    sectors_with_assets = paste(sort(unique(na.omit(unlist(str_split(sectors_with_assets, pattern = " [+] "))))), collapse = " + ")
    ) %>%
    ungroup() %>%

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/351)

    fund_data <-
    fund_data %>%
    group_by(factset_fund_id, fund_reported_mv) %>%
    filter((fund_reported_mv[[1]] - sum(holding_reported_mv)) / fund_reported_mv[[1]] > -1e-5) %>%
    ungroup()
    # build MISSINGWEIGHT for under and over
    fund_missing_mv <-
    fund_data %>%
    group_by(factset_fund_id, fund_reported_mv) %>%
    summarise(
    holding_isin = "MISSINGWEIGHT",
    holding_reported_mv = fund_reported_mv[[1]] - sum(holding_reported_mv),
    .groups = "drop"
    ) %>%
    ungroup() %>%
    filter(holding_reported_mv != 0)
    fund_data %>%
    bind_rows(fund_missing_mv) %>%
    saveRDS(file.path(data_prep_outputs_path, "fund_data.rds"))

  • refactor to pacta.data.prepartion (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/352)

    isin_to_fund_table <- readRDS(factset_isin_to_fund_table_path)
    # filter out fsyms that have more than 1 row and no fund data
    isin_to_fund_table <-
    isin_to_fund_table %>%
    mutate(has_fund_data = factset_fund_id %in% fund_data$factset_fund_id) %>%
    group_by(fsym_id) %>%
    mutate(n = n()) %>%
    filter(n == 1 | (n > 1 & has_fund_data)) %>%
    ungroup() %>%
    select(-n, -has_fund_data)
    # filter out fsyms that have more than 1 row and have fund data for both rows
    isin_to_fund_table <-
    isin_to_fund_table %>%
    mutate(has_fund_data = factset_fund_id %in% fund_data$factset_fund_id) %>%
    group_by(fsym_id) %>%
    mutate(n = n()) %>%
    filter(!(all(has_fund_data) & n > 1)) %>%
    ungroup() %>%
    select(-n, -has_fund_data)
    isin_to_fund_table %>%
    saveRDS(file.path(data_prep_outputs_path, "isin_to_fund_table.rds"))

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/348)

    iss_company_emissions <-
    readRDS(factset_iss_emissions_data_path) %>%
    group_by(factset_entity_id) %>%
    summarise(
    icc_total_emissions = sum(icc_total_emissions + icc_scope_3_emissions, na.rm = TRUE),
    .groups = "drop"
    ) %>%
    mutate(icc_total_emissions_units = "tCO2e") # units are defined in the ISS/FactSet documentation (see #144)

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/361 and use new ISS prep functions #213)

    iss_entity_emission_intensities <-
    readRDS(factset_entity_financing_data_path) %>%
    left_join(currencies, by = "currency") %>%
    mutate(
    ff_mkt_val = ff_mkt_val * exchange_rate,
    ff_debt = ff_debt * exchange_rate,
    currency = "USD"
    ) %>%
    select(-exchange_rate) %>%
    group_by(factset_entity_id, currency) %>%
    summarise(
    ff_mkt_val = sum(ff_mkt_val, na.rm = TRUE),
    ff_debt = sum(ff_debt, na.rm = TRUE),
    .groups = "drop"
    ) %>%
    inner_join(iss_company_emissions, by = "factset_entity_id") %>%
    transmute(
    factset_entity_id = factset_entity_id,
    emission_intensity_per_mkt_val = if_else(
    ff_mkt_val == 0,
    NA_real_,
    icc_total_emissions / ff_mkt_val
    ),
    emission_intensity_per_debt = if_else(
    ff_debt == 0,
    NA_real_,
    icc_total_emissions / ff_debt
    ),
    ff_mkt_val,
    ff_debt,
    units = paste0(icc_total_emissions_units, " / ", "$ USD")
    )

  • refactor to pacta.data.preparation (done in https://github.com/RMI-PACTA/pacta.data.preparation/pull/361 and use new ISS prep functions #213)

    iss_entity_emission_intensities %>%
    inner_join(factset_entity_info, by = "factset_entity_id") %>%
    group_by(sector_code, factset_sector_desc, units) %>%
    summarise(
    emission_intensity_per_mkt_val = weighted.mean(
    emission_intensity_per_mkt_val,
    ff_mkt_val,
    na.rm = TRUE
    ),
    emission_intensity_per_debt = weighted.mean(
    emission_intensity_per_debt,
    ff_debt,
    na.rm = TRUE
    ),
    .groups = "drop"
    ) %>%
    ungroup() %>%

AB#10388

@jdhoffa
Copy link
Member Author

jdhoffa commented Feb 8, 2024

cc: @AlexAxthelm and @cjyetman this would get us closer to having "all methodology" in pacta.* and "all file I/O" in workflow.*

@cjyetman
Copy link
Member

note that any of the code here referencing masterdata* is unlikely to get added to pacta.data.preparation since we're very close to not using/relying on the masterdata files at all anymore

cjyetman added a commit that referenced this issue Mar 25, 2024
@cjyetman cjyetman added the ADO label Mar 25, 2024
cjyetman added a commit that referenced this issue Apr 5, 2024
cjyetman added a commit that referenced this issue Apr 5, 2024
cjyetman added a commit that referenced this issue Apr 5, 2024
@cjyetman
Copy link
Member

cjyetman commented Apr 5, 2024

all but the scenario stuff has been implemented, closing
(scenario stuff will defer until pacta.scenario.data.preparation is implemented)

@cjyetman cjyetman closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants