-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pep_sex_2024 changes made #1110
base: master
Are you sure you want to change the base?
Conversation
@kurus21 Can you remove input & output folder and confirm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The File has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kuru. Looks good.
skiprows=7, | ||
skipfooter=102, | ||
header=None) | ||
df.columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename() instead of assuming column order.
skipfooter=102, | ||
header=None) | ||
df.columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename()
'White Total', 'White Male', 'White Female', 'NonWhite Total', | ||
'NonWhite Male', 'NonWhite Female' | ||
] | ||
df = df.drop(columns=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more readable to list columns of interest to be retained:
df.drop(columns=df.columns.difference(['Count_Person_Male', 'Count_Person_Female']), inplace=True)
Then it can be moved outside the if/else block
# adding geoid, year and measurement method | ||
df['Year'] = year | ||
df.insert(0, 'geo_ID', 'country/USA', True) | ||
df['Measurement_Method'] = 'dcAggregate/CensusPEPSurvey_PartialAggregate' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems common to both if and else and can be moved out.
for col in float_col.columns.values: | ||
df[col] = df[col].astype('int64') | ||
df[col] = df[col].astype("str").str.replace("-1", "") | ||
df.rename(columns={'SEX': 'Year'}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the column 'SEX' being renamed to 'Year' here and in functions below.
'POPEST_FEM': 'Count_Person_Female', | ||
'YEAR': 'Year' | ||
}) | ||
df = df.drop(columns=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be easier to to do df.drop(columns=df.columns.difference([])..)
'Count_Person_Male', 'Count_Person_Female' | ||
] | ||
df = pd.read_excel(file_path, skiprows=5, skipfooter=7, header=None) | ||
df.columns = column_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use df.rename()
'July2022Female', | ||
'July2023Male', | ||
'July2023Female', | ||
'2023Total', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generalize this to 2024 and future years?
"sc-est2023-syasex-": _state_2023, | ||
"sc-est2023-agesex-": _state_2023, | ||
"cc-est2023-agesex-": _county_2023, | ||
"cc-est2023-agesex-a": _county_2023 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also extend to handle future years assuming the same format?
No description provided.