Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange behaviors in kvell #9

Open
anoyiuhu opened this issue Sep 3, 2020 · 1 comment
Open

strange behaviors in kvell #9

anoyiuhu opened this issue Sep 3, 2020 · 1 comment

Comments

@anoyiuhu
Copy link

anoyiuhu commented Sep 3, 2020

Hi,

  1. According to scripts/run-aws.sh, we will run ycsb many times. The first time, kvell will generate a database(e.g 100g), then run ycsb workload. In the second time, kvell can reuse database in the last time and recover it. However, I found that sometimes, after recovery of database, it would stop suddenly, very confusing. Do you know why this happened?

Screen Shot 2020-09-03 at 10 34 48 AM

  1. During my test, using 2 disks, 4 workers per disk and setting queue depth to 1, I found the Latency and Bandwidth cannot be matched. For example, for ycsb-uniform, latency is 116us, thp is 409838(req/s). Theoretically, the ideal thp is equal to (1/116)(24)*10^6= 68965 req/s, which is smaller than 409838. Can you explain this phenomenon?

Best regards
Looking for your reply.

@BLepers
Copy link
Owner

BLepers commented Sep 3, 2020

Hi,

  1. Never happened to me. If it happens again, you can maybe find some info using gdb? I'd be interested in the debug info.

  2. (Edited because I missed that you use a QD of 1)

    If I remember correctly, the latency is computed from the moment a query is inserted in a worker's queue. So there is some degree of "batching" even with a QD of 1 because the queue can contain multiple items. You can try to set MAX_NB_PENDING_CALLBACKS_PER_WORKER to 1. Make sure NEVER_EXCEED_QUEUE_DEPTH is equal to 1 too.

    Because the latency also includes the time "waiting in the queue", it complexifies maximum BW computation too. (Intuitively if a worker processes 1 request at a time + there is always 1 pending request in the queue, then your throughput is 2x what you would compute based on latency.)

    The latency is also computed on the first 10M queries (see MAX_STATS in stats.c), so the average might be wrong if your test is long.

    If you want to use the following formula:
    thp = avg latency * batch size * number of workers
    then modify the following line https://github.com/BLepers/KVell/blob/master/slabworker.c#L218 , replacing 2 by 0 (this will reset the latency measurement at that point in time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants