using multiple CellDb to concurrency read from celldb #1363

fatcat22 · 2024-11-05T09:27:57Z

Background

When using the TON liteserver built by ourself, we noticed that when the access count increases, the response time of some requests (such as GetAccountState) get slower, with many timeout errors.

After investigation, we found that the longest processing time for a single GetAccountState request is the scheduling of CellDb::load_cell. Below is a timing statistics we added for a specific GetAccountState request:

ton-node-1  | [ 2][t27][2024-10-30 06:04:20.001725500][query_stat.cpp:105][!litequery]        query stat counter:1205. perform_getAccountState schedule cost: 1463946μs
ton-node-1  | ValidatorManagerImpl::get_block_data_for_litequery schedule cost: 19310μs
ton-node-1  | LiteQuery::request_mc_block_data cost: 9μs. 
ton-node-1  | ValidatorManagerImpl::get_block_state_for_litequery schedule cost: 19319μs
ton-node-1  | LiteQuery::request_mc_block_state cost: 0μs. 
ton-node-1  | LiteQuery::request_mc_block_data_state cost: 30μs. 
ton-node-1  | ValidatorManagerImpl::get_block_data_from_db schedule cost: 30753μs
ton-node-1  | ValidatorManagerImpl::get_block_handle cost: 1μs. 
ton-node-1  | ValidatorManagerImpl::get_block_handle_for_litequery cost: 1μs. 
ton-node-1  | ValidatorManagerImpl::get_block_data_for_litequery cost: 3μs. 
ton-node-1  | ValidatorManagerImpl::get_shard_state_from_db schedule cost: 30886μs
ton-node-1  | ValidatorManagerImpl::get_block_handle cost: 1μs. 
ton-node-1  | ValidatorManagerImpl::get_block_handle_for_litequery cost: 1μs. 
ton-node-1  | ValidatorManagerImpl::get_block_state_for_litequery cost: 1μs. 
ton-node-1  | RootDb::get_block_data schedule cost: 9480μs
ton-node-1  | RootDb::get_block_state schedule cost: 9547μs
ton-node-1  | ValidatorManagerImpl::get_shard_state_from_db cost: 0μs. 
ton-node-1  | ArchiveManager::get_file schedule cost: 2197μs
ton-node-1  | RootDb::get_block_data cost: 0μs. 
ton-node-1  | CellDb::load_cell schedule cost: 1400909μs
ton-node-1  | RootDb::get_block_state cost: 4μs. 
ton-node-1  | ArchiveSlice::get_file schedule cost: 568μs
ton-node-1  | ArchiveManager::get_file cost: 1μs. 
ton-node-1  | PackageReader reader schedule cost: 20μs
ton-node-1  | ArchiveSlice::get_file cost: 8μs. 
ton-node-1  | LiteQuery::got_mc_block_data schedule cost: 56μs
ton-node-1  | PackageReader::start_up cost: 181μs. 
ton-node-1  | LiteQuery::got_mc_block_data cost: 0μs. 
ton-node-1  | LiteQuery::got_mc_block_state schedule cost: 1516μs
ton-node-1  | CellDb::load_cell cost: 1477μs. 
ton-node-1  | LiteQuery::finish_query cost: 6μs.

We could see that perform_getAccountState cost 1463946μs totally, during which CellDb::load_cell schedule const 1400909μs. So the schedule of CellDb::load_cell wast most of the time.

Fix

As we know, the task send to the same actor id is executed one by one. Since there's only one CellDb, so all the load cell operation will queued and executed one by one, but there are too many load cell operation waiting to be executed, that why the CellDb::load_cell schedule cost so much time.

So the solution is clear, we increased the number of CellDb objects to allow CellDb::load_cell calls to execute concurrently.

Result

Below is our test result (the test method involves sending 5000 GetAccountState requests simultaneously, then recording the response time for each request, and finally calculating the number of timeout errors and the average response time).

before optimization：

request count: 5000. failed count: 3292. failed rate: 65.84%. avg cost: 3804ms

after optimization：

request count: 5000. failed count: 0. failed rate: 0.00%. avg cost: 1021ms

We can see that after enabling concurrent CellDb::load_cell calls, the average response time dropped from 3.8 seconds to around 1 second.

EmelyanenkoK · 2024-11-05T09:52:56Z

Nice analysis, we will check

fatcat22 added 5 commits November 4, 2024 18:03

multiple CellDb

1993a66

fix crash and warning

333c0ee

remove move for db_options

7ddbbf6

use threads argument as celldb reader count

625ebf3

set threads count to ValidatorEngine

072ae30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using multiple CellDb to concurrency read from celldb #1363

using multiple CellDb to concurrency read from celldb #1363

fatcat22 commented Nov 5, 2024

EmelyanenkoK commented Nov 5, 2024

using multiple CellDb to concurrency read from celldb #1363

Are you sure you want to change the base?

using multiple CellDb to concurrency read from celldb #1363

Conversation

fatcat22 commented Nov 5, 2024

Background

Fix

Result

EmelyanenkoK commented Nov 5, 2024