Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround fuel duplicated model #490

Merged
merged 2 commits into from
Apr 25, 2024
Merged

Conversation

koonpeng
Copy link
Contributor

@koonpeng koonpeng commented Apr 4, 2024

Bug fix

There is a duplicated model in fuel which break pit crew's cache. In order to not break existing caches, I just make an exception for that model.

Fixed bug

Fix applied

If your pull request is a bug fix, delete this line and all the below lines.

If your pull request is a feature implementation, delete this line and all the above lines.

New feature implementation

Implemented feature

Implementation description

Signed-off-by: Teo Koon Peng <[email protected]>
@luca-della-vedova
Copy link
Member

luca-della-vedova commented Apr 4, 2024

I tried a different approach to actually skip the model instead of breaking, it seems to work for me but admittedly I'm not sure why duplicate name was made a breaking condition and whether there is a corner case where we might end up looping endlessly (although that doesn't seem like it should happen):

diff --git a/rmf_building_map_tools/pit_crew/pit_crew.py b/rmf_building_map_tools/pit_crew/pit_crew.py
index 459e81a..56040f6 100644
--- a/rmf_building_map_tools/pit_crew/pit_crew.py
+++ b/rmf_building_map_tools/pit_crew/pit_crew.py
@@ -840,7 +840,7 @@ def build_and_update_cache(cache_file_path=None, write_to_cache=True,
     # RFE: Doing this asynchronously will significantly speed this up.
     # Any solution must guarantee:
     #   - All new models are guaranteed to be added to the cache
-    #   - Loop breaks as soon as we start pulling up already cached models
+    #   - Models that are already cached are not added
     while status == 200 and not break_flag:
         logger.info("Fetching page: %d" % page)
 
@@ -853,12 +853,11 @@ def build_and_update_cache(cache_file_path=None, write_to_cache=True,
                 model_name = model.get("name", "")
                 author_name = model.get("owner", "")
 
-                # If a cached model was found, halt
+                # If a cached model was found, skip
                 if (model_name, author_name) in old_cache['model_cache']:
                     logger.info("Cached model found! "
-                                "Halting Fuel traversal...")
-                    break_flag = True
-                    break
+                                f"Skipping model {model_name} by {author_name}")
+                    continue
                 # Otherwise, add it to the cache
                 else:
                     new_cache_count += 1

Does this make sense? CC @methylDragon

Edit: Oh I see that it would always update the model cache if it doesn't exit early

@koonpeng
Copy link
Contributor Author

koonpeng commented Apr 4, 2024

The best approach would be to clean up the fuel db, that particular duplicated model has 0 downloads so far.

@methylDragon
Copy link
Collaborator

methylDragon commented Apr 4, 2024

What I did locally was I made the model name tuple a three member tuple of (model name, author name, creation timestamp) (this fixes the issue since the db's current duplicated model only differs in timestamp)

But then that needed some other changes elsewhere that I wasn't confident of pushing in because it might lead to conflicts with preexisting model caches 😬

Best is to just fix the DB though, Ian said that the model-author pair is supposed to be unique.

@methylDragon
Copy link
Collaborator

methylDragon commented Apr 4, 2024

I tried a different approach to actually skip the model instead of breaking, it seems to work for me but admittedly I'm not sure why duplicate name was made a breaking condition and whether there is a corner case where we might end up looping endlessly (although that doesn't seem like it should happen):

diff --git a/rmf_building_map_tools/pit_crew/pit_crew.py b/rmf_building_map_tools/pit_crew/pit_crew.py
index 459e81a..56040f6 100644
--- a/rmf_building_map_tools/pit_crew/pit_crew.py
+++ b/rmf_building_map_tools/pit_crew/pit_crew.py
@@ -840,7 +840,7 @@ def build_and_update_cache(cache_file_path=None, write_to_cache=True,
     # RFE: Doing this asynchronously will significantly speed this up.
     # Any solution must guarantee:
     #   - All new models are guaranteed to be added to the cache
-    #   - Loop breaks as soon as we start pulling up already cached models
+    #   - Models that are already cached are not added
     while status == 200 and not break_flag:
         logger.info("Fetching page: %d" % page)
 
@@ -853,12 +853,11 @@ def build_and_update_cache(cache_file_path=None, write_to_cache=True,
                 model_name = model.get("name", "")
                 author_name = model.get("owner", "")
 
-                # If a cached model was found, halt
+                # If a cached model was found, skip
                 if (model_name, author_name) in old_cache['model_cache']:
                     logger.info("Cached model found! "
-                                "Halting Fuel traversal...")
-                    break_flag = True
-                    break
+                                f"Skipping model {model_name} by {author_name}")
+                    continue
                 # Otherwise, add it to the cache
                 else:
                     new_cache_count += 1

Does this make sense? CC @methylDragon

Edit: Oh I see that it would always update the model cache if it doesn't exit early

And yes, exiting early is important otherwise builds will get unnecessarily extended (and fuel will get slammed with model list requests)

The models are always presented such that the latest models are shown first, which, when paired with the invariant that model-author pairs are unique, means we can halt accordingly.

@methylDragon
Copy link
Collaborator

See what I did here: #492

I haven't refined it but the basic idea is there.

@luca-della-vedova luca-della-vedova merged commit e1d7a3e into main Apr 25, 2024
5 checks passed
@luca-della-vedova luca-della-vedova deleted the kp/workaround-fuel-dup branch April 25, 2024 07:18
koonpeng added a commit that referenced this pull request Jun 12, 2024
Signed-off-by: Teo Koon Peng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants