From 92b980513ffa39eb59696b94ca1e615f948b8477 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Fri, 6 Dec 2024 14:15:41 +0100 Subject: [PATCH 1/8] grammar fixes in cookbook analysis --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index 9c8a7a7f..c80dacec 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -106,7 +106,7 @@ The next step in the analysis is to run the matching process. Assuming you have pacta.multi.loanbook::match_loanbooks(config_path) ``` -After the matching process is complete, you will need to do some manual matching. This means that you will need to manually inspect the suggested matches that the tool has found and decide which ones to keep or to remove. This is especially important when using text based matching, as there is no guarantee that similar company names as identified by the algorithms will actually refer to the same companies in the raw loan books and the ABCD. Thus, a manually validation step is crucial in the analysis, as the quality of the matches will determine the quality of the results of any further calculations. +After the matching process is complete, you will need to do some manual matching. This means that you will need to manually inspect the suggested matches that the tool has found and decide which ones to keep or to remove. This is especially important when using text based matching, as there is no guarantee that similar company names as identified by the algorithms will actually refer to the same companies in the raw loan books and the ABCD. Thus, a manual validation step is crucial in the analysis, as the quality of the matches will determine the quality of the results of any further calculations. The manual matching process is not automated and will require some time and effort on your part. You can find the matched loan books in the `.../matched_loanbooks` folder. The matched loan books will be stored in CSV files, one for each raw loan book. You can open these files in a spreadsheet program to verify the matches. Importantly, you will need to make a copy for each of the matched loan book files in the same `.../matched_loanbooks` folder and rename that copy by adding the suffix `_manual` to the file name. The following steps of the analysis expect this pattern, so it is important to follow this naming convention. From 25174fac9a67d6a004a7a888b2aba2c25d554a98 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Fri, 6 Dec 2024 14:20:52 +0100 Subject: [PATCH 2/8] grammar fix --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index c80dacec..aa17b978 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -115,7 +115,7 @@ You can find more detailed information about the matching process in the [traini ### Some expectations for the matching process - It is unlikely that you will be able to match all of the loans from your raw loan books to the ABCD data set. This is expected and has the following reasons: - - Raw loan books often include companies that are not in scope of the PACTA analysis, for example there may be companies active in the financial sector or in manufacturing of IT products. Both these sectors are fully out of scope. There may also be companies that are active in upstream or downstream activities of the sectors covered by PACTA. This means that the company activities are not at the part of the value chain that is covered by PACTA and accordingly the companies are not matched. Examples for this are power distribution companies or companies that manufacture air crafts. + - Raw loan books often include companies that are not in scope of the PACTA analysis, for example there may be companies active in the financial sector or in manufacturing of IT products. Both these sectors are fully out of scope. There may also be companies that are active in upstream or downstream activities of the sectors covered by PACTA. This means that the company activities are not at the part of the value chain that is covered by PACTA and accordingly the companies are not matched. Examples for this are power distribution companies or companies that manufacture aircrafts. - The ABCD data set may not cover all companies that are in scope of the PACTA analysis. While coverage of the real economy sectors is usually rather high in the data sets that are commonly used for PACTA, there are gaps. This implies that some in-scope companies cannot be matched because the ABCD data set does not include them. Advanced users may research the production profiles of such companies by themselves and add them to the ABCD data manually, however this is a very involved process and not standard procedure and will therefore not be covered in this cookbook. - If you are using sector classifications for the matching process (which is recommended whenever possible), some matches may not be identified in case the companies in the raw loan book are misclassified. For example, if a utility that is focused on coal-fired power generation is classified as a coal mining company, the matching function will not suggest a match. - Given that it is unlikely to match all loans, it is recommended to try and match the companies with the largest financial exposures first, as this ensures the best possible financial coverage of the loan book in the analysis. From d83cbd72f1edd4d72082d4ce1ee94084dfd3267d Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Fri, 6 Dec 2024 14:25:40 +0100 Subject: [PATCH 3/8] grammar fixes --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index aa17b978..b77ad1dc 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -126,7 +126,7 @@ You can find more detailed information about the matching process in the [traini The `match_loanbooks()` function has a number of options that can be set in the `config.yml` file. These options include: - specifications for the approach to matching the raw loan book with the ABCD [relevant section on matching](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#matching) in the `vignette("config_yml")`). Note that these parameters are all based on the `r2dii.match::match_name` function and pass the parameters directly to that function. For more information on the options available, see the [documentation of the r2dii.match package](https://rmi-pacta.github.io/r2dii.match/reference/match_name.html). This also covers matching based on unique identifiers, which is the most reliable way to match companies, but requires that both the raw loan books and the ABCD contain such identifiers. -- whether to use a manually prepared sector classification system for matching the loan books to in-scope PACTA sectors, see the [relevant section on matching](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#matching) in the `vignette("config_yml")`), or not. If there is no need to use a manually prepared sector classification file, the sector classification systems provided in `r2dii.data::sector_classifications` can be used, which currently cover the following sector classifications: `r unique(r2dii.data::sector_classifications$code_system)`. If it is not possible to map the loans in your loan books to any of these systems, you can prepare your own mapping file that follows the same structure as the sector classification files in `r2dii.data::sector_classifications` and use the config file to instruct the code to use this file for matching. Notice that this will only be a promising approach, if the classifications you are using are sufficiently granular to map to PACTA sectors without excessive ambiguity. +- whether to use a manually prepared sector classification system for matching the loan books to in-scope PACTA sectors, see the [relevant section on matching](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#matching) in the `vignette("config_yml")`), or not. If there is no need to use a manually prepared sector classification file, the sector classification systems provided in `r2dii.data::sector_classifications` can be used, which currently cover the following sector classifications: `r unique(r2dii.data::sector_classifications$code_system)`. If it is not possible to map the loans in your loan books to any of these systems, you can prepare your own mapping file that follows the same structure as the sector classification files in `r2dii.data::sector_classifications` and use the config file to instruct the code to use this file for matching. Note that this will only be a promising approach if the classifications you are using are sufficiently granular to map to PACTA sectors without excessive ambiguity. ### Addressing misclassfied loans From 07c79eb942975d0ce67b57d93f3bb25311450b19 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Fri, 6 Dec 2024 14:28:28 +0100 Subject: [PATCH 4/8] spelling fix --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index b77ad1dc..c8762614 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -132,7 +132,7 @@ The `match_loanbooks()` function has a number of options that can be set in the There are two ways to appropriately handle misclassified loans that are identified as in-scope in the raw data set but are then not matched. -1. Correct the classification in the raw loan book and re-run the matching process. If the loan was clearly mis-classified, this may be the most appropriate way to handle the issue. It may be a good idea to record any such changes made in the input data though. The upsdie of this approach is that the loan will now either be matched correctly, as it will be assigned the sector that the company should have and therefore find an entry in the ABCD data set to match against. Or, if there is still no match to be found in the ABCD, the loan will correctly be missing in the appropriate sector and therefore indicate a lower match success rate where it should. +1. Correct the classification in the raw loan book and re-run the matching process. If the loan was clearly mis-classified, this may be the most appropriate way to handle the issue. It may be a good idea to record any such changes made in the input data though. The upside of this approach is that the loan will now either be matched correctly, as it will be assigned the sector that the company should have and therefore find an entry in the ABCD data set to match against. Or, if there is still no match to be found in the ABCD, the loan will correctly be missing in the appropriate sector and therefore indicate a lower match success rate where it should. 2. If a manual re-classification of the raw loan book is not possible or desired, the calculation of the match success rate can be corrected by adding a file `loans_to_remove.csv` to the input directory. This file should include the columns `id_loan` and `group_id` to indicate the precise mis-classified loan and the loan book in which it was found. This combination of loan and loan book will then be excluded from the match success calculation. The reason why it is a good idea to either correct mis-classified loans or disregard them in the calculation of the match success rate is that a mis-classified loan cannot possibly be matched in a given sector. Therefore, no amount of work would be sufficient to improve the sector match success rate, because it is calculated against an incorrect baseline. Technically, the user is not forced to correct misclassifications, and there may be a limit to how much time should be spent on this, but it is recommended to at least correct large mis-classified loans. From a155770fdc440aadcdbf21caef2c517b1a63f995 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Mon, 9 Dec 2024 13:57:16 +0100 Subject: [PATCH 5/8] Update cookbook_running_the_analysis.Rmd --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index c8762614..8bc4a1c5 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -145,7 +145,7 @@ If you want to apply the sector split to the loan books, you should keep all rel The next step is to prioritize the manually verified matched loan books and analyze their coverage, both relative to the raw loan book inputs (the "match success rate") and to the production capacity in the wider economy (the "loan book production coverage"). Prioritizing the loan books means that you will only keep the best identified match for each loan and use that in the following steps of the analysis. -You will probably want to check the status of your loan book and production coverage several times, as it is rare to get to the desired level of matching in one iteration. This means you may want to repeat the previous step (matching the loan books, likely using different parameters for different iterations) and this step (prioritizing the matched loan books and analyzing their match success rate) a number of times to reach the best possible outcome. To prioritize your matched loan books and calculate display the coverage diagnostics, you will use the `prioritise_and_diagnose()` function. This call will store matched prioritized loan book files and coverage diagnostics in a directory that you have indicated as the value corresponding to the key `dir_prioritized_loanbooks_and_diagnostics` in the `config.yml`. You can then run the function as follows: +You will probably want to check the status of your loan book and production coverage several times, as it is rare to get to the desired level of matching in one iteration. This means you may want to repeat the previous step (matching the loan books, likely using different parameters for different iterations) and this step (prioritizing the matched loan books and analyzing their match success rate) a number of times to reach the best possible outcome. To prioritize your matched loan books and calculate the coverage diagnostics, you will use the `prioritise_and_diagnose()` function. This call will store matched prioritized loan book files and coverage diagnostics in a directory that you have indicated as the value corresponding to the key `dir_prioritized_loanbooks_and_diagnostics` in the `config.yml`. You can then run the function as follows: ```r pacta.multi.loanbook::prioritise_and_diagnose(config_path) From 60e0480f2015337602f7bfd80837cf51e75fc534 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Mon, 9 Dec 2024 14:01:13 +0100 Subject: [PATCH 6/8] Update cookbook_running_the_analysis.Rmd --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index 8bc4a1c5..45752382 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -145,7 +145,7 @@ If you want to apply the sector split to the loan books, you should keep all rel The next step is to prioritize the manually verified matched loan books and analyze their coverage, both relative to the raw loan book inputs (the "match success rate") and to the production capacity in the wider economy (the "loan book production coverage"). Prioritizing the loan books means that you will only keep the best identified match for each loan and use that in the following steps of the analysis. -You will probably want to check the status of your loan book and production coverage several times, as it is rare to get to the desired level of matching in one iteration. This means you may want to repeat the previous step (matching the loan books, likely using different parameters for different iterations) and this step (prioritizing the matched loan books and analyzing their match success rate) a number of times to reach the best possible outcome. To prioritize your matched loan books and calculate the coverage diagnostics, you will use the `prioritise_and_diagnose()` function. This call will store matched prioritized loan book files and coverage diagnostics in a directory that you have indicated as the value corresponding to the key `dir_prioritized_loanbooks_and_diagnostics` in the `config.yml`. You can then run the function as follows: +You will probably want to check the status of your loan book and production coverage several times, as it is rare to get to the desired level of matching in one iteration. This means you may want to repeat the previous step (matching the loan books, likely using different parameters for different iterations) and this step (prioritizing the matched loan books and analyzing their match success rate) a number of times to reach the best possible outcome. To prioritize your matched loan books and calculate the coverage diagnostics, you will use the `prioritise_and_diagnose()` function. This call will store matched prioritized loan book files and coverage diagnostics in a directory that you have indicated as the value corresponding to the key `dir_prioritized_loanbooks_and_diagnostics` in the `config.yml`. You can run the function as follows: ```r pacta.multi.loanbook::prioritise_and_diagnose(config_path) From fc3487cca7dff04de97bf878c5be393fdd65d3a9 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Mon, 9 Dec 2024 14:18:14 +0100 Subject: [PATCH 7/8] Update cookbook_running_the_analysis.Rmd --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index 45752382..fa390364 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -179,7 +179,7 @@ The `analysis()` function has a number of options that can be set in the `config All these options are documented in more detail the [section on project parameters](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#project_parameters) in the `vignette("config_yml")`. -Usually, it will be interesting to run the analysis for more than one by_group, possibly also for multiple combinations of the other parameters. You will therefore have to run the analysis as many times as there are combinations of interest that you wish to generate results for. +Usually, it will be interesting to run the analysis for more than one `by_group`, possibly also for multiple combinations of the other parameters. You will therefore have to run the analysis as many times as there are combinations of interest that you wish to generate results for. **PREVIOUS CHAPTER:** [Preparatory Steps](cookbook_preparatory_steps.html) From 919e36986ab4714ea77174e8360f2bb631e9a841 Mon Sep 17 00:00:00 2001 From: CJ Yetman Date: Mon, 9 Dec 2024 15:35:15 +0100 Subject: [PATCH 8/8] Update vignettes/cookbook_running_the_analysis.Rmd --- vignettes/cookbook_running_the_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/cookbook_running_the_analysis.Rmd b/vignettes/cookbook_running_the_analysis.Rmd index 1e8b24fd..b5bad17b 100644 --- a/vignettes/cookbook_running_the_analysis.Rmd +++ b/vignettes/cookbook_running_the_analysis.Rmd @@ -147,7 +147,7 @@ The `analysis()` function has a number of options that can be set in the `config All these options are documented in more detail the [section on project parameters](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#project_parameters) in the `vignette("config_yml")`. -Usually, it will be interesting to run the analysis for more than one `by_group`, possibly also for multiple combinations of the other parameters. You will therefore have to run the analysis as many times as there are combinations of interest that you wish to generate results for. +Usually, it will be interesting to run the analysis for more than one [`by_group` value](https://rmi-pacta.github.io/pacta.multi.loanbook/articles/config_yml.html#by_group), possibly also for multiple combinations of the other parameters. You will therefore have to run the analysis as many times as there are combinations of interest that you wish to generate results for. **PREVIOUS CHAPTER:** [Preparatory Steps](cookbook_preparatory_steps.html)