Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic.concat keep average #739

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
[![Full test](https://github.com/DHI/mikeio/actions/workflows/full_test.yml/badge.svg)](https://github.com/DHI/mikeio/actions/workflows/full_test.yml)
[![PyPI version](https://badge.fury.io/py/mikeio.svg)](https://badge.fury.io/py/mikeio)
![OS](https://img.shields.io/badge/OS-Windows%20%7C%20Linux-blue)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/DHI/mikeio/blob/main/License.txt)

Read, write and manipulate dfs0, dfs1, dfs2, dfs3, dfsu and mesh files.

Expand Down
1 change: 1 addition & 0 deletions docs/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# MIKE IO: input/output of MIKE files in Python
![Python version](https://img.shields.io/pypi/pyversions/mikeio.svg)
[![PyPI version](https://badge.fury.io/py/mikeio.svg)](https://badge.fury.io/py/mikeio)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/DHI/mikeio/blob/main/License.txt)

Read, write and manipulate dfs0, dfs1, dfs2, dfs3, dfsu and mesh files.

Expand Down
80 changes: 78 additions & 2 deletions mikeio/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,8 +526,8 @@ def concat(
darray = d.astype(np.float32)

dfs_o.WriteItemTimeStepNext(0, darray)
end_time = (
start_time + timedelta(seconds=timestep * dt)
end_time = start_time + timedelta(
seconds=timestep * dt
) # reuse last timestep since there is no EndDateTime attribute in t_axis.
dfs_i.Close()

Expand All @@ -552,6 +552,82 @@ def concat(
) # get end time from current file
dfs_i.Close()

if keep == "average":
if i == 0:
# For first file, write all timesteps normally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to calculate the timestep where the overlap occurs, calcuate the average, write that and then the rest, instead of writing the entire contents and then trying to overwrite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I would suggest an algorithm like this (file1 starts before file2):

  • open file1
  • write file1 timesteps until start of file file2
  • open file2
  • write overlapping steps by with alpha*data1 + (1-alpha)*data2 until end of file1
  • close file1
  • write rest of file2
  • close file2

using the alpha notation above it will be super simple to later add a keep="linear" that linear mixes the two in the overlapping period.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't shoot the messenger, I chat'gpted this solution due to my lack of mikecore competences

for timestep in range(n_time_steps):
current_time = start_time + timedelta(seconds=timestep * dt)

for item in range(n_items):
itemdata = dfs_i.ReadItemTimeStep(int(item + 1), int(timestep))
darray = itemdata.Data.astype(np.float32)
dfs_o.WriteItemTimeStepNext(0, darray)
end_time = start_time + timedelta(seconds=(n_time_steps - 1) * dt)
first_start_time = start_time # Store the start time of first file
dfs_i.Close()
else:
# For subsequent files, we need to handle overlapping periods
# Calculate overlap in timesteps
if start_time <= end_time:

overlap_end = int((end_time - start_time).total_seconds() / dt) + 1

# Read current file data
for timestep in range(n_time_steps):
current_time = start_time + timedelta(seconds=timestep * dt)

for item in range(n_items):
itemdata = dfs_i.ReadItemTimeStep(
int(item + 1), int(timestep)
)
current_data = itemdata.Data.astype(np.float32)

if (
int(timestep) <= overlap_end
): # Convert back to int for comparison
# In overlapping period
existing_pos = int(
(current_time - first_start_time).total_seconds()
/ dt
- 1
)
if existing_pos >= 0:
# Read existing data
dfs_o.Flush()
temp_dfs = DfsFileFactory.DfsGenericOpen(
str(outfilename)
)
Comment on lines +596 to +599
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also kind of crazy. It crashes and complains that it cannot open outfilename, but if I do dfs_o.Flush() again after it crashed, then it works.
its almost like I need to add a try/except and flush twice to make it run

Copy link
Collaborator Author

@daniel-caichac-DHI daniel-caichac-DHI Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I will make a new failing test on Monday, I have some data I know the expected output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I already uploaded a failing unit test

existing_data = temp_dfs.ReadItemTimeStep(
int(item + 1), int(existing_pos)
).Data
temp_dfs.Close()

# Calculate average
averaged_data = (existing_data + current_data) / 2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is calculating the average as soon as the dfs2's overlap

# Write averaged data
# dfs_o.WriteItemTimeStep(
# item + 1, existing_pos, 0, averaged_data
# ) # This is what we need, but it does not wo
dfs_o.WriteItemTimeStepNext(0, averaged_data)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this line is wrong, since it is 'appending' at the end.
I tried with WriteItemTimeStep but could not make it work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-kipawa Is this something that you would like to dig into?


else:
# After overlap period - write new data
dfs_o.WriteItemTimeStepNext(0, current_data)

else:
# No overlap - write all timesteps
for timestep in range(n_time_steps):
current_time = start_time + timedelta(seconds=timestep * dt)
for item in range(n_items):
itemdata = dfs_i.ReadItemTimeStep(
str(item + 1), str(timestep)
)
darray = itemdata.Data.astype(np.float32)
dfs_o.WriteItemTimeStepNext(0, darray)

end_time = start_time + timedelta(seconds=(n_time_steps - 1) * dt)
dfs_i.Close()

dfs_o.Close()


Expand Down
11 changes: 11 additions & 0 deletions tests/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,17 @@ def test_concat_overlapping(tmp_path):
assert len(ds.time) == 145


def test_concat_average(tmp_path):
infilename_a = "generic_concat_file1.dfs2"
infilename_b = "generic_concat_file2.dfs2"
fp = tmp_path / "concat.dfs1"

mikeio.generic.concat([infilename_a, infilename_b], fp, keep="average")

ds = mikeio.read(fp)
assert ds[0][2].values[0][0] == 51.5


def test_concat_files_gap_fail(tmp_path):
infilename_a = "tests/testdata/tide1.dfs1"
infilename_b = "tests/testdata/tide4.dfs1"
Expand Down
Binary file added tests/testdata/generic_concat_file1.dfs2
Binary file not shown.
Binary file added tests/testdata/generic_concat_file2.dfs2
Binary file not shown.
Loading