Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
table: Optimize creation of reader excluding staging for view building
View building from staging creates a reader from scratch (memtable + sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s to INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s Refs scylladb#14089. Fixes scylladb#14244. Signed-off-by: Raphael S. Carvalho <[email protected]>
- Loading branch information