Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

observations contained in train and validation datasets are outside of specified range #31

Open
ryanpmccaffrey opened this issue Apr 3, 2024 · 3 comments

Comments

@ryanpmccaffrey
Copy link

According to the paper:

We used AIS data
from January 01, 2019 to March 10, 2019 and from March 11,
2019 to March 20, 2019 to train the model and tune the hyperparameters, respectively. The test set comprises AIS data from
March 21, 2019 to March 31, 2019.

Although the test set (ct_dma_test.pkl) seems to honor this date range, the train and validation sets (ct_dma_train.pkl and ct_dma_valid.pkl) seem to contain a significant number of observations from dates later in the year, outside the date ranges outlined above. Why is that?

@dnguyengithub
Copy link
Collaborator

Hello, can you provide some examples?

@ryanpmccaffrey
Copy link
Author

Some snippets:

image
image
image
image
image

@dnguyengithub
Copy link
Collaborator

You're right.

There was likely an error in the preprocessing step (possibly related to the date format (yyyy-mm-dd vs. yyyy-dd-mm). I will need to investigate the issue when I have the time.
The preprocessing code is here: https://github.com/CIA-Oceanix/GeoTrackNet/blob/master/data/csv2pkl.py

IMPORTANT: I've just checked, there are no test tracks present in the training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants