Skip to content

Commit

Permalink
Replace std::vector; also fix clang C++11
Browse files Browse the repository at this point in the history
In my application I need to sort a vector of pointers.  I was profiling
timsort and was surprised at how much time was being spent in copy_to_tmp()
This method was implemented as:

    tmp_.clear();
    tmp_.reserve(len);
    GFX_TIMSORT_MOVE_RANGE(begin, begin + len, std::back_inserter(tmp_));

Unfortunately this performs badly on simple types like pointers.  THe
GFX_TIMSORT_MOVE_RANGE() macro itself can reduce to a single memmove()
but using std::back_inserter (which is required for move-only types)
breaks this optimization.  Instead of a nice SSE-optimized memory copy
you end up with an element-by-element loop with a vector capacity check
each time.

As an experiment I did some metaprogramming so that trivially_constructable
types would just do:

    tmp_.assign(begin, begin + len);

This basiscally fixed the problem, but it's still not *quite* perfect, since
the non-trivial case still is doing extra capacity checks that aren't required.
I did a more aggressive fix that replaced the std::vector use entirely with
a custom datastructure (since we don't really need a general purpose vector
here, just a managed array) which bought another couple percent in speed.

First I had to make two prepratory changes, though:

1. The C++11 support was broken on libc++ (which is the default STL for
   most clang installs, including Xcode)  The changes @mattreecebentley
   made to auto-detect C++11 were mostly good but they specifically
   rejected anything with "_LIBCPP_VERSION" set.  Not sure why.

   Rather than one large binary expression I instead used a larger (but
   hopefully easier to understand) ifdef decision tree.  This can also
   be overridden on the command-line with -DGFX_TIMSORT_USE_CXX11=[0|1]
   As before we default to using C++11 constructs unless we see some
   evidence that it won't work.  However we now let modern versions
   of clang/libc++ pass.

2. The parts of the "TimSort" class that don't very based on LessFunction
   are now in their own set of classes such as "TimSortState"

   This is partially just a cleanup I needed to make some template
   metaprogramming less gross.  However, it's a good idea in any case.
   It's not unusual for a program to need to sort a type of data multiple
   ways, which means expanding the "TimSort<RandomAccessIterator,LessFunction>"
   multiple times.  With this change, the compiler can reuse the expansion
   of "TimSortState<RandomAccessIterator>" between them.  If nothing else,
   this should compile faster.

Now with that out of the way, I could get to the meat of the change: replacing
the "std::vector<value_t> tmp_;" with a custom "TimSortMergeSpace<>" type.

This class allocates itself like a vector, but the only "setter" is a "move_in()"
method that replaces its contents via move construction (similar to
what the std::back_inserter loop was doing)  We don't construct elements
before they're needed (even if we allocated them) so it will work even for
non-default-constructable types.  The move-loop is faster than before since
we don't need to re-check for capacity at every insertion.

However, on C++11 we do even better: we use template-specialization to
provide an alternate implementation of this data type for types that
pass std::is_trivially_copyable<>.  The big advantage is that we can just
use std::memcpy() to refill the merge buffer.  The code is also simpler
in general since we don't need to worry about construction/destruction of
the buffer elements.

Since a lot of the overall cost of the timsort algorithm is spent merging,
making this data structure as fast as possible is important.  This change
makes soring randomized sequences about 10% faster when working with
trivially-copyable types.

While I was there I also replaced the "std::vector<run> pending_" with
my own "TimSortRunStack<>" type.  This doesn't have the same importance
for performance, but it's another place where we don't really need the
full STL vector support... just a simple resizable stack.  Since I
was replacing vector I thought it was more consistent to just replace
both.  This also removes the header depenedncy on <vector>

RESULTS: "make bench" on Xcode 10.1, Mac Pro:

  RANDOMIZED SEQUENCE [int] size=100K
    Before: 0.851
    After: 0.745

  RANDOMIZED SEQUENCE [std::string] size=100K
    Before: 5.389
    After: 3.735

The improvement with "int" is due to the vector replacement.  The bigger
improvement with "std::string" is making C++11 work with the libc++ STL so
that move optimizations get applied.
  • Loading branch information
mitchblank committed Jan 15, 2019
1 parent e39ca95 commit 5ceb868
Show file tree
Hide file tree
Showing 3 changed files with 398 additions and 74 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ COMPATIBILITY

This library is compatible with C++98, but if you give compile it with C++11 or later, this library uses `std::move()` instead of value copy and thus you can sort move-only types (see [#9](https://github.com/gfx/cpp-TimSort/pull/9) for details).

You can disable use of `std::move()` by passing the macro '-DDISABLE_STD_MOVE'.
You can disable use of `std::move()` by passing the macro '-DGFX_TIMSORT_USE_CXX11=0'

SEE ALSO
==================
Expand Down
2 changes: 1 addition & 1 deletion test/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

#include "timsort.hpp"

#if ENABLE_STD_MOVE
#if GFX_TIMSORT_USE_CXX11
#warning std::move() enabled
#else
#warning std::move() disabled
Expand Down
Loading

0 comments on commit 5ceb868

Please sign in to comment.