Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize layout calculations in HashMap #51340

Merged
merged 1 commit into from
Jun 4, 2018

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Jun 4, 2018

This now produces the same assembly code as the previous implementation.

cc #51163 @nnethercote @gnzlbg @andjo403

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify submodules.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 4, 2018
@Amanieu Amanieu force-pushed the hashmap_layout2 branch from a9aeb0a to 3d8bca8 Compare June 4, 2018 08:22
@Amanieu
Copy link
Member Author

Amanieu commented Jun 4, 2018

(stupid git pull doesn't update submodules by default...)

Playground link for playing around with the generated code: https://play.rust-lang.org/?gist=b233a122e3223db47521f5b0a0a3911c&version=nightly&mode=release

@Amanieu
Copy link
Member Author

Amanieu commented Jun 4, 2018

r? @SimonSapin

@rust-highfive rust-highfive assigned SimonSapin and unassigned kennytm Jun 4, 2018
@Amanieu Amanieu force-pushed the hashmap_layout2 branch 2 times, most recently from 7bf6245 to 0b56145 Compare June 4, 2018 08:26
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:03:29]    Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode)
[00:03:30]    Compiling alloc_system v0.0.0 (file:///checkout/src/liballoc_system)
[00:03:30]    Compiling panic_abort v0.0.0 (file:///checkout/src/libpanic_abort)
[00:03:35]    Compiling panic_unwind v0.0.0 (file:///checkout/src/libpanic_unwind)
[00:03:36] error[E0425]: cannot find function `align_of` in this scope
[00:03:36]    --> libstd/collections/hash/table.rs:663:60
[00:03:36]     |
[00:03:36] 663 |         (layout, hashes.size() + hashes.padding_needed_for(align_of::<A>()))
[00:03:36]     |                                                            ^^^^^^^^ not found in this scope
[00:03:36] help: possible candidate is found in another module, you can import it into scope
[00:03:36] 11  | use core::mem::align_of;
[00:03:36]     |
[00:03:36] 
[00:03:36] error[E0412]: cannot find type `A` in this scope
[00:03:36] error[E0412]: cannot find type `A` in this scope
[00:03:36]    --> libstd/collections/hash/table.rs:663:71
[00:03:36]     |
[00:03:36] 663 |         (layout, hashes.size() + hashes.padding_needed_for(align_of::<A>()))
[00:03:36]     |                                                                       ^ did you mean `K`?
[00:03:39] error: aborting due to 2 previous errors
[00:03:39] 
[00:03:39] Some errors occurred: E0412, E0425.
[00:03:39] For more information about an error, try `rustc --explain E0412`.
[00:03:39] For more information about an error, try `rustc --explain E0412`.
[00:03:39] error: Could not compile `std`.
[00:03:39] 
[00:03:39] Caused by:
[00:03:39]   process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustc --crate-name std libstd/lib.rs --color always --error-format json --crate-type dylib --crate-type rlib --emit=dep-info,link -C prefer-dynamic -C opt-level=2 --cfg feature="alloc_jemalloc" --cfg feature="backtrace" --cfg feature="jemalloc" --cfg feature="panic-unwind" --cfg feature="panic_unwind" -C metadata=fc6b9a3d7065b2e2 -C extra-filename=-fc6b9a3d7065b2e2 --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps --target x86_64-unknown-linux-gnu -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/release/deps --extern rustc_asan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_asan-a72c0fa0b2d5877e.rlib --extern std_unicode=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libstd_unicode-18039a1361938ca9.rlib --extern rustc_msan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_msan-ce1ae9dbeb02af61.rlib --extern rustc_lsan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_lsan-a2f6e9f263c82267.rlib --extern unwind=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libunwind-bd02867d7573c11e.rlib --extern alloc_system=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc_system-386e278237d725f5.rlib --extern core=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libcore-e9cdce497aae9e81.rlib --extern panic_abort=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libpanic_abort-5f4d07ea9b3edda4.rlib --extern panic_unwind=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libpanic_unwind-2bd12fd5ac9768f9.rlib --extern rustc_tsan=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/librustc_tsan-ccd411645051f1ef.rlib --extern libc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liblibc-223f966213a60acb.rlib --extern alloc_jemalloc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc_jemalloc-5e6c2ac297f2c2e3.rlib --extern alloc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/liballoc-d6cd5f8b78fddf12.rlib --extern compiler_builtins=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/libcompiler_builtins-90a13cda2e54742f.rlib -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/libbacktrace/ -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/libbacktrace -l static=backtrace -l static=backtrace -l dl -l rt -l pthread -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/build/compiler_builtins-ffd422941bf53e42/out -L native=/checkout/obj/build/x86_64-unknown-linux-gnu/native/jemalloc/lib` (exit code: 101)
[00:03:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "--message-format" "json"
[00:03:39] expected success, got: exit code: 101
[00:03:39] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1091:9
[00:03:39] travis_fold:end:stage0-std

[00:03:39] travis_time:end:stage0-std:start=1528101059815394637,finish=1528101100038715281,duration=40223320644


[00:03:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:39] Build completed unsuccessfully in 0:00:41
[00:03:39] make: *** [tidy] Error 1
[00:03:39] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:21f9eaf0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Amanieu Amanieu force-pushed the hashmap_layout2 branch from 0b56145 to 7311634 Compare June 4, 2018 08:34
hashes.extend(pairs).map(|(layout, _)| {
// LLVM seems to have trouble properly const-propagating pairs.align(),
// possibly due to the use of NonZeroUsize. This little hack allows it
// to generate optimal code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, the NonZeroUsize::get(self) function is marked #[inline] and the constructor is a const fn. Did you by chance happen to look at the generated LLVM-IR for calculate_layout ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the playground link that I posted above. If you remove the map then it generates worse code. What seems to happen is that LLVM will emit an overflowing multiply to calculate pairs.size, and will use the overflow flag to set pairs.align to either 0 (LayoutErr) or the actual alignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the comment should include the second sentence of that explanation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every caller of calculate_layout only use one of the returned Layout or offset. Now that they’re mostly calculated separately, aren’t we better off having two separate functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this feels like more of a temporary hack to deal with a LLVM bug, rather than a permanent solution. Logically the offset calculation belongs as part of the layout calculation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnethercote I added a reference to #51346 in the comment.

@nnethercote
Copy link
Contributor

Nice! Hopefully this will fix the rustc perf regression. rustc uses hash tables a lot :)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 4, 2018

Avoiding NonZeroUsize: https://play.rust-lang.org/?gist=419e4d6cee534087573e17943b02b674&version=nightly&mode=release

playground::new:
	push	rax
	mov	rsi, rdi
	mov	ecx, 8
	mov	rax, rsi
	mul	rcx
	mov	rcx, rax
	jo	.LBB3_3
	mov	edx, 16
	mov	rax, rsi
	mul	rdx
	jo	.LBB3_3
	add	rax, rcx
	jb	.LBB3_3
	mov	rax, rcx
	pop	rcx
	ret

.LBB3_3:
	call	core::result::unwrap_failed
	ud2

which is at least shorter (don't know if better) than using NonZeroUsize: https://play.rust-lang.org/?gist=9c5e1054b61976dffc49b2d26c86ad82&version=nightly&mode=release

playground::new:
	mov	ecx, 8
	xor	esi, esi
	mov	rax, rdi
	mul	rcx
	mov	r8, rax
	setno	r9b
	jo	.LBB3_7
	mov	edx, 16
	xor	ecx, ecx
	mov	rax, rdi
	mul	rdx
	setno	dl
	jo	.LBB3_7
	mov	sil, r9b
	shl	rsi, 3
	mov	cl, dl
	shl	rcx, 3
	cmp	rcx, rsi
	cmovae	rsi, rcx
	lea	rdx, [r8 + rcx]
	add	rdx, -1
	neg	rcx
	and	rcx, rdx
	sub	rcx, r8
	add	rcx, r8
	jb	.LBB3_7
	add	rax, rcx
	jb	.LBB3_7
	mov	rdx, rsi
	neg	rdx
	cmp	rax, rdx
	ja	.LBB3_7
	test	rsi, rsi
	je	.LBB3_7
	lea	rax, [rsi + 15]
	and	rax, rsi
	jne	.LBB3_7
	mov	rax, rcx
	ret

.LBB3_7:
	push	rbp
	mov	rbp, rsp
	call	core::result::unwrap_failed
	ud2

However, given that both situations should just generate:

playground::original:
	lea	rax, [8*rdi]
	ret

I don't think that NonZeroUsize is the root of the issue here.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 4, 2018

new is the implementation that is currently in rust right now (added by #51163). The one that you should be looking at is new_optimized which calls unreachable_unchecked if a LayoutErr occurs..

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 4, 2018

@Amanieu that one was as good as the original IIRC (that is, this PR fixes the perf regression).

My point was that maybe it is worth fixing whatever codegen bug is causing the perf regression in the first place because, while we might add these workarounds in std, I don't think users of Layout outside of std will.

You mentioned the regression might be caused by making align NonZeroUsize, but while that has an impact on the generated code, that does not appear to be the cause (and reverting that has little effect). I think it would be worth it to open an issue to at least try to get to the bottom of it.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 4, 2018

@gnzlbg If you use my original playground link and remove the .map then you will see that the generated code is not the same:

playground::new_optimized:
	movl	$16, %edx
	xorl	%ecx, %ecx
	movq	%rdi, %rax
	mulq	%rdx
	setno	%cl
	shlq	$3, %rcx
	leaq	(%rcx,%rdi,8), %rax
	addq	$-1, %rax
	negq	%rcx
	andq	%rax, %rcx
	movq	%rcx, %rax
	retq

playground::original:
	leaq	(,%rdi,8), %rax
	retq

This is the codegen bug that I was talking about, which is caused by NonZeroUsize. This discussion was originally started because you were wondering why the .map call was necessary.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 4, 2018

Can you explain how does that show that it is caused by NonZeroUsize ? In my two playground links I have the code, once using NonZeroUsize (https://play.rust-lang.org/?gist=9c5e1054b61976dffc49b2d26c86ad82&version=nightly&mode=release), and once using usize instead (https://play.rust-lang.org/?gist=419e4d6cee534087573e17943b02b674&version=nightly&mode=release). The assembly code generated for new without the .map looks like the one you show in both cases.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 4, 2018

It's obvious if you look at the generated LLVM IR:

define i64 @_ZN10playground13new_optimized17hf103a5e804c97ad6E(i64 %c) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = shl i64 %c, 3

  // This computes the alignment result for Layout::repeat.
  // It checks if the multiplication overflows and returns either 0 (LayoutErr) or 8 (self.align).
  %1 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %c, i64 16) #7
  %2 = extractvalue { i64, i1 } %1, 1
  %spec.select.i15.i = select i1 %2, i64 0, i64 8

  // This calculates the offset and Layout::padding_for in Layout::extend.
  // The codegen bug is here: At this point, the alignment can *only* be 8, not 0
  // since Err(LayoutErr) would have returned early before this point (due to ?).
  // But LLVM fails to constant-propagate this for some reason.
  %3 = add i64 %0, -1
  %4 = add i64 %3, %spec.select.i15.i
  %5 = sub nsw i64 0, %spec.select.i15.i
  %6 = and i64 %4, %5
  ret i64 %6
}

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 4, 2018

Looking only at new_optimized with NonZeroUsize: https://play.rust-lang.org/?gist=e7cd2987b841d62657a4c8d114abea59&version=nightly&mode=release

playground::new_optimized:
	mov	rcx, rdi
	mov	edx, 8
	mov	rax, rcx
	mul	rdx
	mov	rsi, rax
	mov	edx, 16
	xor	edi, edi
	mov	rax, rcx
	mul	rdx
	setno	dil
	lea	rax, [rsi + 8*rdi]
	add	rax, -1
	shl	rdi, 3
	neg	rdi
	and	rdi, rax
	mov	rax, rdi
	ret

and without NonZeroUsize: https://play.rust-lang.org/?gist=25ce1213448b27d155bffd30c4adc831&version=nightly&mode=release

playground::new_optimized:
	lea	rax, [8*rdi]
	ret

It looks like not using NonZeroUsize makes a hell of a difference.

@Mark-Simulacrum
Copy link
Member

@bors p=1 for when this is approved

This now produces the same assembly code as the previous implementation.
@Amanieu Amanieu force-pushed the hashmap_layout2 branch from 7311634 to b69724f Compare June 4, 2018 15:00
@SimonSapin
Copy link
Contributor

Let’s land this to fix the perf regression, but I think it is rather unfortunate an somewhat defeats the point of doing #51163 in the first place.

@bors r+

@bors
Copy link
Contributor

bors commented Jun 4, 2018

📌 Commit b69724f has been approved by SimonSapin

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 4, 2018
@bors
Copy link
Contributor

bors commented Jun 4, 2018

⌛ Testing commit b69724f with merge 41affd0...

bors added a commit that referenced this pull request Jun 4, 2018
Optimize layout calculations in HashMap

This now produces the same assembly code as the previous implementation.

cc #51163 @nnethercote @gnzlbg @andjo403
@bors
Copy link
Contributor

bors commented Jun 4, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: SimonSapin
Pushing 41affd0 to master...

@bors bors merged commit b69724f into rust-lang:master Jun 4, 2018
@nnethercote
Copy link
Contributor

I just did some local measurements with Cachegrind and the numbers are looking much better -- the regression from #51163 looks to be entirely fixed. Thank you!

@nikic
Copy link
Contributor

nikic commented Jun 5, 2018

@Mark-Simulacrum
Copy link
Member

That should be accurate; I'm not sure why we're seeing a difference from @nnethercote's local benchmarks.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 5, 2018

Note that #51226 also significantly improve the performance since it marked several Layout functions as inlinable.

@nnethercote
Copy link
Contributor

My measurements were with Cachegrind rather than perf-stat... that shouldn't make any difference though.

@nnethercote
Copy link
Contributor

Can anyone think of reasons why I would see improvements on my machine, but the rustc-perf machine would not? Are the clang versions guaranteed to be the same?

@Mark-Simulacrum
Copy link
Member

Clang is not installed on the perf collector (though not sure what specific clang you are referring to...); gcc is version 5.4.0.

@nnethercote
Copy link
Contributor

Sorry, I meant LLVM, not clang. Are the LLVM versions guaranteed to be the same?

@Mark-Simulacrum
Copy link
Member

Yes, LLVM should be version equivalent (by default), though the way it's compiled might be different, e.g. using gcc locally vs. clang on Travis. I suppose that could play a role, but it would be decidedly odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants