Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-7646 TestCache concurrency issues #3450

Merged
merged 2 commits into from
Jan 11, 2024

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jan 10, 2024

  • fix the TestCacheBundle and TestCacheResult caches to be concurrent safe
  • was using concurrent safe maps, but not computing the results safely.

CLDR-7646

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

- fix the TestCacheBundle and TestCacheResult caches to be concurrent safe
@srl295 srl295 requested review from macchiati and btangmu January 10, 2024 21:37
@srl295 srl295 self-assigned this Jan 10, 2024
@srl295
Copy link
Member Author

srl295 commented Jan 10, 2024

@btangmu i have this same question:

    /*
     * TODO: document whether CLDR_TESTCACHE_SIZE is set on production server, and if so to what, and why;
     * evaluate why the fallback 12 for CLDR_TESTCACHE_SIZE is appropriate or too small. Consider not
     * using maximumSize() at all, depending on softValues() instead to garbage collect only when needed.
     */

it's NOT set in production. So we are only caching 12 values at best.

@srl295
Copy link
Member Author

srl295 commented Jan 10, 2024

This PR seems to cause the dashboard concurrency errors to no longer occur.

@srl295 srl295 force-pushed the cldr-7646/testcache-concurrency branch from baa59ed to 6a5d231 Compare January 10, 2024 21:55
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • tools/cldr-code/src/main/java/org/unicode/cldr/test/TestCache.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a reasonable test that we can add to detect problems such as these?

@srl295
Copy link
Member Author

srl295 commented Jan 10, 2024

Question: is there a reasonable test that we can add to detect problems such as these?

  • CLDR-17304 capture internalError check stack traces into the system log #3447 makes sure that these internal error stack traces (such as concurrency and other issues) would get logged, so DataDog / server console would pick it up.
  • ConsoleCheck does call TestCache in a threaded way, perhaps not in the right order.
  • SurveyTool has typically run in a single threaded environment, so calls into this would not be contested.
  • That said, we could add a threaded TestCache stress test, that could be a good idea. This change now calls each CheckCLDR within a single thread though, but can read the cache from multiple threads.

@srl295 srl295 merged commit f6aa620 into unicode-org:main Jan 11, 2024
10 checks passed
@srl295 srl295 deleted the cldr-7646/testcache-concurrency branch January 11, 2024 00:15
@btangmu
Copy link
Member

btangmu commented Jan 11, 2024

@btangmu i have this same question:

    /*
     * TODO: document whether CLDR_TESTCACHE_SIZE is set on production server, and if so to what, and why;
     * evaluate why the fallback 12 for CLDR_TESTCACHE_SIZE is appropriate or too small. Consider not
     * using maximumSize() at all, depending on softValues() instead to garbage collect only when needed.
     */

it's NOT set in production. So we are only caching 12 values at best.

@srl295 @macchiati Per git blame I added that comment 2019-10-23! In the meantime I've come to believe that TODO comments should always include ticket numbers, and tickets shouldn't be closed until their TODOs are all done or moved to other open tickets. Anyway, it looks like CLDR_TESTCACHE_SIZE limits the number of TestResultBundle objects that are cached, and each TestResultBundle is for a specific locale (and coverage level, etc). So, max 12 locales have cached test results at any point in time. It's plausible that there are rarely vetters at work on more than 12 locales simultaneously. However, performance may suffer whenever a multiple-locale operation such as Priority Items Summary is run. What should we do? Documentation says "in most circumstances it is better to set a per-cache maximum size instead of using soft references" -- but our code is already using BOTH max size and soft ref. It's tempting to get rid of the max since it's so arbitrary, and trust (hope? verify?) that soft refs will give good results on their own.

@srl295
Copy link
Member Author

srl295 commented Jan 11, 2024

@btangmu you put some important analysis that i missed - this is the number of locales not the number of xpaths. 12 locales is definitely going to be beneficial, 12 xpaths might not be worth doing.

I think I remember something about how soft refs (or was that weak refs?) might not get evicted in a cpu- and io- starved situation, (such as recalculating priority items summary). So there is some risk there. I wouldn't object to removing the limit, if it was tested with priority items summary etc.

@btangmu
Copy link
Member

btangmu commented Jan 11, 2024

I've made a new ticket about CLDR_TESTCACHE_SIZE: https://unicode-org.atlassian.net/browse/CLDR-17308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants