-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generic.concat keep average #739
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -526,8 +526,8 @@ def concat( | |
darray = d.astype(np.float32) | ||
|
||
dfs_o.WriteItemTimeStepNext(0, darray) | ||
end_time = ( | ||
start_time + timedelta(seconds=timestep * dt) | ||
end_time = start_time + timedelta( | ||
seconds=timestep * dt | ||
) # reuse last timestep since there is no EndDateTime attribute in t_axis. | ||
dfs_i.Close() | ||
|
||
|
@@ -552,6 +552,82 @@ def concat( | |
) # get end time from current file | ||
dfs_i.Close() | ||
|
||
if keep == "average": | ||
if i == 0: | ||
# For first file, write all timesteps normally | ||
for timestep in range(n_time_steps): | ||
current_time = start_time + timedelta(seconds=timestep * dt) | ||
|
||
for item in range(n_items): | ||
itemdata = dfs_i.ReadItemTimeStep(int(item + 1), int(timestep)) | ||
darray = itemdata.Data.astype(np.float32) | ||
dfs_o.WriteItemTimeStepNext(0, darray) | ||
end_time = start_time + timedelta(seconds=(n_time_steps - 1) * dt) | ||
first_start_time = start_time # Store the start time of first file | ||
dfs_i.Close() | ||
else: | ||
# For subsequent files, we need to handle overlapping periods | ||
# Calculate overlap in timesteps | ||
if start_time <= end_time: | ||
|
||
overlap_end = int((end_time - start_time).total_seconds() / dt) + 1 | ||
|
||
# Read current file data | ||
for timestep in range(n_time_steps): | ||
current_time = start_time + timedelta(seconds=timestep * dt) | ||
|
||
for item in range(n_items): | ||
itemdata = dfs_i.ReadItemTimeStep( | ||
int(item + 1), int(timestep) | ||
) | ||
current_data = itemdata.Data.astype(np.float32) | ||
|
||
if ( | ||
int(timestep) <= overlap_end | ||
): # Convert back to int for comparison | ||
# In overlapping period | ||
existing_pos = int( | ||
(current_time - first_start_time).total_seconds() | ||
/ dt | ||
- 1 | ||
) | ||
if existing_pos >= 0: | ||
# Read existing data | ||
dfs_o.Flush() | ||
temp_dfs = DfsFileFactory.DfsGenericOpen( | ||
str(outfilename) | ||
) | ||
Comment on lines
+596
to
+599
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also kind of crazy. It crashes and complains that it cannot open outfilename, but if I do dfs_o.Flush() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I already uploaded a failing unit test |
||
existing_data = temp_dfs.ReadItemTimeStep( | ||
int(item + 1), int(existing_pos) | ||
).Data | ||
temp_dfs.Close() | ||
|
||
# Calculate average | ||
averaged_data = (existing_data + current_data) / 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is calculating the average as soon as the dfs2's overlap |
||
# Write averaged data | ||
# dfs_o.WriteItemTimeStep( | ||
# item + 1, existing_pos, 0, averaged_data | ||
# ) # This is what we need, but it does not wo | ||
dfs_o.WriteItemTimeStepNext(0, averaged_data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But this line is wrong, since it is 'appending' at the end. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ryan-kipawa Is this something that you would like to dig into? |
||
|
||
else: | ||
# After overlap period - write new data | ||
dfs_o.WriteItemTimeStepNext(0, current_data) | ||
|
||
else: | ||
# No overlap - write all timesteps | ||
for timestep in range(n_time_steps): | ||
current_time = start_time + timedelta(seconds=timestep * dt) | ||
for item in range(n_items): | ||
itemdata = dfs_i.ReadItemTimeStep( | ||
str(item + 1), str(timestep) | ||
) | ||
darray = itemdata.Data.astype(np.float32) | ||
dfs_o.WriteItemTimeStepNext(0, darray) | ||
|
||
end_time = start_time + timedelta(seconds=(n_time_steps - 1) * dt) | ||
dfs_i.Close() | ||
|
||
dfs_o.Close() | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to calculate the timestep where the overlap occurs, calcuate the average, write that and then the rest, instead of writing the entire contents and then trying to overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I would suggest an algorithm like this (file1 starts before file2):
using the alpha notation above it will be super simple to later add a keep="linear" that linear mixes the two in the overlapping period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't shoot the messenger, I chat'gpted this solution due to my lack of mikecore competences