Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

75 add option to sum in editing rates #96

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

amynickolls
Copy link
Contributor

@amynickolls amynickolls commented Oct 3, 2024

Users can now choose between multiplying or summing rates when editing these (exit rates, entry rate, and transition rates)

DynamicRateForm
Takes a dataframe, from which it instantiates multiply and add fields
Outputs a dataframe from the save function with multiply and add columns, leaves out rows where both values are empty
Negative numbers not allowed in multiply fields
One one of multiply or add can be used in each row
Displays initial data where previous values have been saved

image

Updates to JSONencoder/decoder to handle dataframes (both multiindex and single index)

Updates to combine_rates in Multinomial to handle new logic: rate * rate_change or + rate_change dependant on column. No longer left joins so users can edit rates at 0 in original prediction.

Updates to templatetags to show field errors in forms.

Unit tests added for DynamicRateForm

@amynickolls amynickolls linked an issue Oct 3, 2024 that may be closed by this pull request
Copy link
Contributor

@MichaelHanksSF MichaelHanksSF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tab1tha Can you please review everything down to and including tests.
@amynickolls Can we discuss utils and views?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tab1tha can you review this html please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tab1tha and again please?

dm_regional_app/utils.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amynickolls Again, I'm not quite sure of the rationale behind these changes and why the old method using combine_first() wouldn't have worked. Moreover, it looks to me like the method of comparing data with number_adjustments will always result in the old data being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine first on paper should have worked but when testing it wasn't prioritising 0's coming in from the new data. I don't understand your comment re: old data being used - we want to keep old data where there isn't new data coming in.

ssda903/multinomial.py Show resolved Hide resolved
dm_regional_app/forms.py Show resolved Hide resolved
ssda903/multinomial.py Outdated Show resolved Hide resolved
dm_regional_app/forms.py Outdated Show resolved Hide resolved
dm_regional_app/forms.py Show resolved Hide resolved
dm_regional_app/forms.py Outdated Show resolved Hide resolved
dm_regional_app/forms.py Show resolved Hide resolved


@register.filter
def convert_df_and_dynamicrateform_to_table(df, form):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about these functions - is the end goal to create tables from the dynamic form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we end up with the form input integrated into the table - means we can make it look like a multi-index and get the original data in there too

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, and why is the html built here rather than using a template and passing data to fill in the template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on our call, happy for this to be a refactor in another ticket that we can tackle later 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of us might have made an issue for this in different words

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test suite 🤓

dm_regional_app/tests/test_dynamicrateform.py Outdated Show resolved Hide resolved
dm_regional_app/tests/test_dynamicrateform.py Outdated Show resolved Hide resolved
dm_regional_app/tests/test_dynamicrateform.py Show resolved Hide resolved
Comment on lines 35 to 40
# Check for Series
if "__type__" in obj and obj["__type__"] == "pd.Series":
if obj.get("is_multiindex", False):
index = pd.MultiIndex.from_tuples(obj["index"])
index = pd.MultiIndex.from_tuples(
obj["index"], names=obj.get("index_names")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding - when do Series need to be handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now only adjusted_proportions and adjusted_costs that are series now - none will actually be multiindex now that the rates will be dataframes. We can chat through this in more detail!

Comment on lines 47 to 51
if obj.get("is_multiindex", False):
index = pd.MultiIndex.from_tuples(
obj["index"], names=obj.get("index_names")
)
return pd.DataFrame(obj["data"], columns=obj["columns"], index=index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely possible this is just pandas notation I don't understand 😅 but this conditional looks like if obj is not multiindex, then it is read as MultiIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought this but as it was working as intended left it. On further investigation it was working because (I think) the check was just seeing if it could get "is_multiindex" as when I changed the logic to if obj.get("is_multiindex") == True: it stopped working. No idea why it wasn't breaking on single index series/dfs.

On investigating this I'm noticing some weird behaviour - whenever I'm saving to the scenario, anything I've already saved is rerunning through the encoder. Is this expected behaviour?

e.g I'm saving adjusted rates but have previously saved entry rates - this will encode both items? Might be worth discussing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah sounds like it's worth discussing this one too!

ssda903/multinomial.py Show resolved Hide resolved
if "__type__" in obj and obj["__type__"] == "pd.Series":
if obj.get("is_multiindex", False):
index = pd.MultiIndex.from_tuples(obj["index"])
if obj.get("is_multiindex") == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do

Suggested change
if obj.get("is_multiindex") == True:
if obj.get("is_multiindex"):

but up to you on what you think is more readable!

dm_regional_app/utils.py Outdated Show resolved Hide resolved
Comment on lines +85 to +94
if isinstance(obj, pd.DataFrame):
# Handle MultiIndex DataFrame
if isinstance(obj.index, pd.MultiIndex):
index = [
list(tup) for tup in obj.index
] # Convert MultiIndex to list of lists
index_names = obj.index.names # Get index names
else:
index = obj.index.tolist()
index_names = obj.index.names # Get index names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more of a pythonic question here around separation of concerns.

As python is not a typed language (but we do introduce some via the typing library) when there are type checks within functions (both here and above) it makes me think about if we are trying to do "too much" with one class. I'm curious for your thoughts on having two encoders and decoders - one that handles series and one that handles dataframes. In my mind that would help with clarity i.e. DataFrameAwareJSONEncoder vs SeriesAwareJSONEncoder. And could help with tests as well - more specific tests to target individual functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyramic I would also be curious for your thoughts here.
@amynickolls I don't see this as a blocker to the code merge - just an interesting question that could shape some of our design decisions going forward!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind splitting them - they do handle dates as well which I think would need to be a secondary function of each en/decoder anyway (although I can test this - it's possible that this was mainly an issue in the dictionaries, but assume it would be the same for dates in columns).

Copy link

@cyramic cyramic Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that separating them would make them more maintainable and more easily testable. I'm also thinking that since they are similar structures there may be an argument for maybe a design pattern to do this such that all encoders defined of this type follow a standard structure. Maybe something like a Builder or maybe one of the Factory patterns? That might be overdoing it though, so happy to get push back as I don't want to over-engineer this. What do you think @christina-moore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, I think a Factory/AbstractFactory pattern could be useful here - it doesn't feel like there is quite enough complexity/steps for the Builder, but it would be good to implement the open/closed principle for future updates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this to be pulled into another PR/issue for clarity

dm_regional_app/utils.py Outdated Show resolved Hide resolved
dm_regional_app/utils.py Show resolved Hide resolved
dm_regional_app/views.py Show resolved Hide resolved
Copy link
Contributor

@MichaelHanksSF MichaelHanksSF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this - thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the change to a function - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to sum in editing rates
5 participants