-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derive auxiliary for selective editing #139
Conversation
dataframe[ | ||
(dataframe[period] == previous_period) & (dataframe[question_no] == 49) | ||
][[imputation_class, construction_link, question_no]] | ||
.drop_duplicates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should look for duplicates in imputation_class and question_no as a quick check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have a generic validation, but something similar is done when producing SE outputs
|
||
standardising_factor["imputation_flags_adjustedresponse"] = standardising_factor[ | ||
question_output = pd.merge(standardising_factor, auxiliary_value, on=["reference", "imputation_class", "questioncode"], how="left") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't make sense to merge on reference and imputation_class, one or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me, Nice work :) I've made a couple of small comments and one nitpick about using .loc to remove the double indexing, not sure which method is actually better though
@@ -28,7 +30,7 @@ def calculate_predicted_value( | |||
and imputed_value | |||
|
|||
""" | |||
|
|||
# TODO: This has already been combined somewhere updstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function still needed in the pipeline. From what I can see its only called in unit tests
name of column in dataframe containing construction link variable | ||
imputation_class : str | ||
name of column in dataframe containing imputation class, where | ||
there is one contruction link per imputation_class and period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one contruction link per imputation_class and period | |
there is one construction link per imputation_class and period |
q40["auxiliary_value"] = q40[frozen_turnover] | ||
|
||
previous_period = period_selected - pd.DateOffset(months=1) | ||
prev_const_link = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Could maybe refactor to use .loc not sure which one is better in the long term?
dataframe.loc[
(dataframe[period] == previous_period) & (dataframe[question_no] == 49),
[imputation_class, construction_link, question_no]
]
dataframe[ | ||
(dataframe[period] == previous_period) & (dataframe[question_no] == 49) | ||
][[imputation_class, construction_link, question_no]] | ||
.drop_duplicates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have a generic validation, but something similar is done when producing SE outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing PR from approved to request changes to avoid it being merged early following todays stand-up
I am continuing this work. Removing this review so that others can approve without me changing the review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, merging :)
Pull Request Title
Derive auxiliary variable for the selective editing output.
The functionality has been written and added to the output but hasn't been tested yet.
Summary
Derive the required auxiliary variables for Q49 for the selective editing output for SPP
Type of Change
Checklists
This pull request meets the following requirements:
Creator Checklist
If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.
Reviewer Checklist
Additional Information
Please provide any additional information or context that would help the reviewer understand the changes in this pull request.
Related Issues
Link any related issues or pull requests here.