Skip to content

rocksdb performance bottom up

Tianyi Wang edited this page Jan 11, 2016 · 8 revisions

#Dismember an ox as Skillfully as a Butcher – understand the performance of our system step-by-step

  1. Performance of our basic components
  2. Local procedure calls (LPC via tasking::enqueue)
    1. Intra-thread => how many cycles for each local EMPTY task
  Tianyi: 1 thread, 3.1 million/s  
  Zhenyu: understand why 700+ instructions for each EMPTY task  

1.	Inter-thread => what is the cost of task-queue  
  Tianyi: 2 threads, 0.2 million/s  
  Zhenyu: is it reasonable overhead? What is the raw performance of the synchronization primitives used in task queues?   
  1. Network providers => see whether we can achieve the raw device performance
    1. Local machine
    1. Remote machine
  2. Rpc frameworks with simulated network provider => what is the CPU cost of rpc stack in rDSN
  3. Rpc framework with native/fast network provider => end-to-end rpc performance
    1. Local machine
    Tianyi: 280k/s
    Zhenyu: What is the setting? Seems too good compared to 1.1.2
    Tianyi: 1.1.2 is a blocking tic-tock alternatively en-queuing benchmark. 1. Remote machine
  4. Aio performance => see whether we can achieve the raw device performance
    1. Read/write performance on disk
    1. Read/write performance on SSD
  5. Synchronization primitives?
  6. Computation in replication => gap between this and 1.3 should be small~
  7. Start 1 replica server only and simulated network to avoid network communication
  8. Use empty aio provider or RAM disk to minimize disk operations
  9. Use 1 partition and max_replica_degree = 1 to minimize the interference of meta server and cross-thread contentions in replica server Tianyi: qps = 100k for empty request, 40k for write request
    Zhenyu: slowdown is too much even compared to 1.4, why?
  10. Network + computation performance in replication/1 server => gap between this and 1.4 should be small
    • native network provider atop of step 2
  11. IO performance of mutation log w/ native aio provider => gap between this and 1.5 should be small
  12. Write
  13. Replay
  14. Network + computation + IO performance in replication/1 server => gap between this and 1.5 should be small
    • native aio provider atop of step 3
  15. Network + computation performance + throttling in replication/1 server
    • large client concurrency atop of step 3
  16. Network + computation performance in replication/2 servers => ???
    • replica_count = 2 atop of step 3
  17. Network + computation performance in replication/3 servers => ???
    • replica_count = 3 atop of step 3
  18. // TODO: more combinations
  19. End-to-end using nativerun tool
  20. End-to-end using fastrun tool

To-be-fixed:

 throttling mechanism