Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2570 Filter out rows not requiring analysis #192

Merged
merged 3 commits into from
Sep 27, 2023

Conversation

manukala6
Copy link
Member

@manukala6 manukala6 commented Sep 18, 2023

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Performs analysis for difference geometries and areas with null GADM values

Issue Number: GTC-2570

What is the new behavior?

  • Ignores rows where location_id = -2
  • Ignores rows where gadm_id = "null"

Does this introduce a breaking change?

  • Yes
  • No

Other information

@@ -48,6 +48,8 @@ object AFiAnalysis extends SummaryAnalysis {
val summaryDF = AFiAnalysis.aggregateResults(
AFiDF
.getFeatureDataFrame(summaryRDD, spark)
.filter($"location_id" =!= -2)
Copy link
Member

@jterry64 jterry64 Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually we'd want to filter out -2 before we run the analysis, since the first time a user runs this, -2 will be the same as -1, a huge dissolved polygon. So kind of a waste to compute all that and then toss it away. You could run this same filter but for the input feature DF.

@codecov-commenter
Copy link

Codecov Report

Patch has no changes to coverable lines.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Files Changed Coverage
...org/globalforestwatch/summarystats/afi/AFiDF.scala 0.00%

📢 Thoughts on this report? Let us know!.

@manukala6 manukala6 requested a review from jterry64 September 19, 2023 17:17
@@ -40,5 +40,7 @@ object AFiDF extends SummaryDF {
}
.toDF("id", "error", "dataGroup", "data")
.select($"id.*", $"error.*", $"dataGroup.*", $"data.*")
.filter($"location_id" =!= -2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should've been clearer: we should filter these out at the very beginning, like if AFiCommand class, before we even run the analysis.

@manukala6 manukala6 requested a review from jterry64 September 27, 2023 15:47
Copy link
Member

@jterry64 jterry64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for figuring this out!

@manukala6 manukala6 merged commit ab4e45b into develop Sep 27, 2023
@manukala6 manukala6 deleted the bugfix/afi_ignore_certain_ids branch September 27, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants