Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(runloop/wasm): optimize hash_chain_entity with string.buffer #11304

Merged
merged 3 commits into from
Nov 4, 2023

Conversation

chronolaw
Copy link
Contributor

@chronolaw chronolaw commented Jul 27, 2023

Summary

We already use string.buffer in many fields successfully, especially for replacing string.concat.

Checklist

Full changelog

  • introduce string.buffer
  • remove useless clear_tab concat

Issue reference

Fix #[issue number]

@chronolaw chronolaw force-pushed the refactor/wasm_hash_with_string_buffer branch from f7e7d3a to 460e7cd Compare July 31, 2023 06:38
Copy link
Member

@guanlan guanlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runloop/wsam ->runloop/wasm

@chronolaw chronolaw changed the title refactor(runloop/wsam): optimize hash_chain_entity with string.buffer refactor(runloop/wasm): optimize hash_chain_entity with string.buffer Jul 31, 2023
@chronolaw
Copy link
Contributor Author

runloop/wsam ->runloop/wasm

done, thanks.

@kikito kikito requested a review from flrgh August 1, 2023 17:09
@flrgh
Copy link
Contributor

flrgh commented Aug 7, 2023

I was curious so I hacked up a micro benchmark for this:

https://gist.github.com/flrgh/77263a8cd925aa6259e940fe573d01e5

1k items

$ for i in {1..3}; do resty perf.lua 1000; done
{
  chains_disabled = 164,
  chains_total = 1226,
  config_size_avg = 1153.0548190613,
  config_size_total = 6436352,
  filters_avg = 4.5530179445351,
  filters_disabled = 799,
  filters_total = 5582
}
 table buffer: 0.0049998760223389
string buffer: 0.004000186920166
{
  chains_disabled = 176,
  chains_total = 1234,
  config_size_avg = 1139.9486447932,
  config_size_total = 6392832,
  filters_avg = 4.5445705024311,
  filters_disabled = 857,
  filters_total = 5608
}
 table buffer: 0.0039999485015869
string buffer: 0.0039999485015869
{
  chains_disabled = 192,
  chains_total = 1263,
  config_size_avg = 1172.2390269699,
  config_size_total = 6650112,
  filters_avg = 4.4916864608076,
  filters_disabled = 828,
  filters_total = 5673
}
 table buffer: 0.0049998760223389
string buffer: 0.0049998760223389

10k items

$ for i in {1..3}; do resty perf.lua 10000; done
{
  chains_disabled = 1850,
  chains_total = 12537,
  config_size_avg = 1151.3846044672,
  config_size_total = 64900096,
  filters_avg = 4.4960516870065,
  filters_disabled = 8416,
  filters_total = 56367
}
 table buffer: 0.042999982833862
string buffer: 0.043999910354614
{
  chains_disabled = 1914,
  chains_total = 12534,
  config_size_avg = 1158.0953468456,
  config_size_total = 64908928,
  filters_avg = 4.4716770384554,
  filters_disabled = 8448,
  filters_total = 56048
}
 table buffer: 0.045000076293945
string buffer: 0.046000003814697
{
  chains_disabled = 1885,
  chains_total = 12565,
  config_size_avg = 1156.5595723284,
  config_size_total = 65552640,
  filters_avg = 4.5108635097493,
  filters_disabled = 8496,
  filters_total = 56679
}
 table buffer: 0.055000066757202
string buffer: 0.047000169754028

100k items

$ for i in {1..3}; do resty perf.lua 100000; done
{
  chains_disabled = 18739,
  chains_total = 125177,
  config_size_avg = 1153.3145011088,
  config_size_total = 649568640,
  filters_avg = 4.4993808766786,
  filters_disabled = 84461,
  filters_total = 563219
}
 table buffer: 0.48699998855591
string buffer: 0.5090000629425
{
  chains_disabled = 18889,
  chains_total = 125064,
  config_size_avg = 1152.8903691092,
  config_size_total = 648707200,
  filters_avg = 4.4991284462355,
  filters_disabled = 84659,
  filters_total = 562679
}
 table buffer: 0.48399996757507
string buffer: 0.49199986457825
{
  chains_disabled = 18582,
  chains_total = 125041,
  config_size_avg = 1157.6386823436,
  config_size_total = 651398656,
  filters_avg = 4.5000919698339,
  filters_disabled = 84707,
  filters_total = 562696
}
 table buffer: 0.48399996757507
string buffer: 0.50200009346008

There doesn't seem to be much of a measurable difference. In general I also expect the time spent in this hash function to be dwarfed by time spent iterating over filter chain entities from the db or fetching routes from the cache/db.

I'm not super knowledgeable of luajit internals, so is there something I'm missing? Can you share your own benchmarks?

@chronolaw
Copy link
Contributor Author

Theoretically, string.buffer is faster than table.concat, because it has no allocation of table items.

But in this case, I think lots of time cost is sha256, so we can not see the improvement of performance.

Back to the test result, Keeping the original code may be a good choice.

@chronolaw chronolaw closed this Aug 7, 2023
@chronolaw chronolaw deleted the refactor/wasm_hash_with_string_buffer branch August 7, 2023 23:50
@bungle
Copy link
Member

bungle commented Aug 9, 2023

I was curious so I hacked up a micro benchmark for this:

@flrgh,

Any difference in memory usage / garbage being collected?

I removed the SHA256 and ran the microbenchmark with 1000000 iterations and got this:

table buffer: 20.077999830246
string buffer: 4.1630001068115

There is around 5 times difference in favour for string buffer. So I don't think there is doubt that is it better. I didn't check memory usage and garbage generated, but I think that would also favour string.buffer if checked.

That said, we can always ask "does it matter" when there are other factors dominating. Perhaps not that much in many places, but on hot path and with huge strings perhaps. Does our plugin iterator speed matter if there is a plugin that does network io and makes client wait for that. Sure it matters but not when that plugin is added. Another thing is that our code base is also used as a source for new code. Thus sometimes it is worth to try to spread the better approach, even if it didn't matter that much in that spot.

For example which one of these do you prefer to read:

    local n = 0
    for _, filter in ipairs(chain.filters) do
      buf[n + 1] = filter.name
      buf[n + 2] = tostring(filter.enabled)
      buf[n + 3] = tostring(filter.enabled and sha256(filter.config))
      n = n + 3
    end
    local s = concat(buf, "", 1, n)

vs.

   local filters = chain.filters
   for i = 1, #filters do
     local filter = filters[i]

     buf:put(filter.name)
     buf:put(tostring(filter.enabled))
     buf:put(tostring(filter.enabled and sha256(filter.config)))
   end
   local s = buf:get()

I think I like the string buffer more even in that case. It even allows:

   local filters = chain.filters
   for i = 1, #filters do
     local filter = filters[i]
     buf:put(filter.name):put(tostring(filter.enabled)):put(tostring(filter.enabled and sha256(filter.config)))
   end
   local s = buf:get()

If that is preferred or:

   local filters = chain.filters
   for i = 1, #filters do
     local filter = filters[i]
     buf:putf("%s%s%s", filter.name, tostring(filter.enabled), tostring(filter.enabled and sha256(filter.config)))
   end
   local s = buf:get()

@chronolaw
Copy link
Contributor Author

Thanks for @bungle's comment. If we think string.buffer is necessary I will reopen this PR.

@flrgh
Copy link
Contributor

flrgh commented Aug 11, 2023

I didn't do any real scientific measurements on garbage/memory. I did test each of these with GC disabled and then comparing before/after results of collectgarbage("count"). The results seemed pretty similar between each approach, but I didn't really challenge this finding too much.

If we wanted to increase performance I think things would get better if we swapped the buf:free() for buf:reset() (calling buf:free() only once after we are done rebuilding state), but this would be a larger and somewhat encapsulation-infringing change.

Anyhow, I don't think it's a bad change and wasn't insinuating we should abandon the PR (sorry @chronolaw if I gave this impression); I just want us to remain focused on the end result when it comes to perf changes. If y'all see this as an improvement overall for the code I'm happy for us to revisit it 👍. Code style/quality-wise I am 50/50 on table vs string.buffer; they look about the same to me in this use case, though I'm sure I have some Stockholm syndrome from writing OpenResty lua for so long 😂.

@chronolaw chronolaw restored the refactor/wasm_hash_with_string_buffer branch November 3, 2023 07:49
@chronolaw chronolaw reopened this Nov 3, 2023
@chronolaw chronolaw force-pushed the refactor/wasm_hash_with_string_buffer branch from 460e7cd to 23d47cd Compare November 3, 2023 07:53
@chronolaw
Copy link
Contributor Author

Since we have some new discussion in #11811, I reopen this PR and appreciate any comments.

@flrgh flrgh merged commit bd1ac6a into master Nov 4, 2023
31 of 32 checks passed
@flrgh flrgh deleted the refactor/wasm_hash_with_string_buffer branch November 4, 2023 21:03
windmgc pushed a commit that referenced this pull request Nov 23, 2023
As other PRs did, string.buffer can replace table.concat to get more performance.

Reference: #11304 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants