Skip to content

Latest commit

 

History

History
340 lines (299 loc) · 16.4 KB

README.md

File metadata and controls

340 lines (299 loc) · 16.4 KB

JAAA is a collection of advanced array-based algorithms for Java 8+. Currently, the following main types of algorithms are included in this library:

  • Sorting
  • Merging
  • Searching
  • Selection
  • Heap Construction and Access
  • Partition

Design Goals

The coding style of this library may seem a little off to the seasoned Java programmer. Some Java veterans may be shocked by the lack of getters, setters, factories and even dependency injection. Yet it is not all chaos and madness, but also some deliberate consideration in order to best achieve the following goals:

  • Performance: Implementations shall beat the benchmark (e.g. OpenJDK) wherever possible.
  • Specialization: Performance shall be independent of the element type. One implementation has to work well for int[] and String[], no Boxing allowed.
  • Abstraction: The notion of an array has to be as abstract as possible. Algorithms have to work for arrays, buffers, MemorySegment, RandomAccessFile, ...
  • Readability: Implementations of individual algorithms are supposed be as close to Pseudocode as Java allows.
  • Modularity: Implementations of advanced algorithms are to be built on top of implementations of simpler algorithms in a way that respects all the other design goals.
  • DRY: One algorithm implemented in one place only.

In order to achieve these goals, the algorithms are implemented in form of default methods of interfaces. The interfaces represent the modules which are composed via inheritance. The array access is achieved via abstract methods which have to be overridden for specific array types.

Access-Based Algorithms

For in-place algorithms, access-style interfaces are used to access arrays. In-place sorting algorithm for example, extend the CompareSwapAccess which has, as the name suggests, two abstract methods: compare(int,int) and swap(int,int) which compare and swap the given indices of an array. The following example uses the KiwiSort algorithm to sort an array in-place:

import com.github.jaaa.permute.Swap;
import com.github.jaaa.sort.KiwiSortAccess;
import java.util.Arrays;

public class Example1 {
  public static void main( String... args ) {
    int[] array = { 14, 11, 19, 2, 10, 14, 8, 15, 22, 8, 0, 22, 5, 21, 12, 2 };
    new KiwiSortAccess() {
      @Override public void   swap( int i, int j ) { Swap.swap(array,i,j); }
      @Override public int compare( int i, int j ) { return Integer.compare(array[i], array[j]); }
    }.kiwiSort(0,array.length);
    System.out.println( Arrays.toString(array) );
    // [0, 2, 2, 5, 8, 8, 10, 11, 12, 14, 14, 15, 19, 21, 22, 22]
  }
}

Note that this array access is incredibly flexible. Let's say, for example, we want to sort a collection of 2d points, where the x-coordinates are stored in an array x, and the y-coordinates in a separate array y. Here is an example of how to use the HeapSort algorithm to sort such a collection of points:

import com.github.jaaa.permute.Swap;
import com.github.jaaa.sort.HeapSortAccess;
import java.util.Arrays;

public class Example2 {
  public static void main( String... args ) {
    int[] x = {  0, 12,  8, 14, 19,  6, 3, 21, 18, 16, 22, 14, 7, 8, 8, 20 },
          y = { 13, 17, 12, 23, 13, 10, 1,  8,  3, 23, 18,  3, 8, 8, 5, 21 };
    new HeapSortAccess() {
      @Override public void swap( int i, int j ) {
        Swap.swap(x,i,j);
        Swap.swap(y,i,j);
      }
      @Override public int compare( int i, int j ) {
        int    c  =         Integer.compare(x[i], x[j]);
        return c != 0 ? c : Integer.compare(y[i], y[j]);
      }
    }.heapSort(0,x.length);
    System.out.println( Arrays.toString(x) );
    System.out.println( Arrays.toString(y) );
    // [ 0, 3,  6, 7, 8, 8,  8, 12, 14, 14, 16, 18, 19, 20, 21, 22]
    // [13, 1, 10, 8, 5, 8, 12, 17,  3, 23, 23,  3, 13, 21,  8, 18]
  }
}

Accessor-Based Algorithms

For algorithms which require temporary arrays as working memory, the access pattern is insufficient. In these cases, accessor-style interfaces are used. In case of sorting "arrays" of type T, the RandomAccessor<T> interface is used which has the following abstract methods:

  • compare(T a, int i, T b, int j): Compares a[i] to b[j].
  • copy(T a, int i, T b, int j): Copies the value of a[i] to b[j].
  • swap(T a, int i, T b, int j): Swaps the entries a[i] and b[j].
  • malloc( int len ): Allocates and returns a new array of type T.

A single sort accessor can be used to sort more than one array. The following example shows how the TimSort algorithm can be used to sort two arrays of type String[]. Note that the TimSort implementation allows for an initial working array to be passed. If no such array is provided, null,0,0 can be passed instead:

import com.github.jaaa.permute.Swap;
import com.github.jaaa.sort.TimSortAccessor;
import java.util.Arrays;

public class Example3 {
  public static void main( String... args ) {
    TimSortAccessor<String[]> acc = new TimSortAccessor<String[]>() {
      @Override public String[] malloc( int len ) { return new String[len]; }
      @Override public void   swap( String[] a, int i, String[] b, int j ) { Swap.swap(a,i, b,j); }
      @Override public void   copy( String[] a, int i, String[] b, int j ) { b[j] = a[i]; }
      @Override public int compare( String[] a, int i, String[] b, int j ) { return a[i].compareTo(b[j]); }
    };
    String[] array1 = { "C", "P", "X", "N", "V", "X", "B", "M", "P", "E", "R", "E", "H", "J", "X", "K" },
             array2 = { "Q", "H", "B", "C", "V", "C", "B", "G", "Q" };
    acc.timSort(array1, 0, array1.length, /*init work mem*/null,0,0);
    acc.timSort(array2, 0, array2.length, /*init work mem*/null,0,0);
    System.out.println( Arrays.toString(array1) );
    System.out.println( Arrays.toString(array2) );
    // [B, C, E, E, H, J, K, M, N, P, P, R, V, X, X, X]
    // [B, B, C, C, G, H, Q, Q, V]
  }
}

Sorting Algorithms

As the examples above demonstrate JAAA offers a variety of in-place and out-of-place sorting algorithms:

Algorithm Comparisons Swaps/Copies In-Place Stable Adaptive Comment
HeapSort O(n*log(n)) O(n*log(n)) Yes No No
InsertionAdaptiveSort O(n*log(n)) O(n²) Yes Yes Yes Great for small arrays
KiwiSort O(n*log(n)) O(n*log(n)) Yes Yes Yes Enhanced WikiSort
QuickSort O(n*log(n)) O(n*log(n)) Yes No No
MergeSort O(n*log(n)) O(n*log(n)) No Yes No
ParallelSkipMergeSort O(n*log(n)) O(n*log(n)) No Yes No
ParallelZenMergeSort O(n*log(n)) O(n*log(n)) No Yes Yes
TimSort O(n*log(n)) O(n*log(n)) No Yes No De-facto standard

Merge Algorithms

Merging algorithms take two sorted sequences of elements and merges them into a single sorted sequence. The most commonly known merging algorithm is the TapeMerge algorithm that is used in most MergeSort implementations. TapeMerge is excellent for pairs of random sequences of roughly equal length. TapeMerge does however lack the ability to adapt to the structure or the length difference of sequence pairs. In these case, other algorithms may perform significantly better. The following stable in-place algorithms are available:

Algorithm Comparisons Swaps Adaptive
ExpMerge O(n) O(n²) Yes
HwangLingMerge O(n) O(n²) No
KiwiMerge O(n) O(n) Yes
TapeMerge O(n) O(n²) No
TimMerge O(n) O(n²) Yes

The following stable out-of-place algorithms are available:

Algorithm Comparisons/Copies Adaptive
ExpMerge O(n) Yes
HwangLingMerge O(n) No
ParallelSkipMergeSort O(n) No
ParallelZenMergeSort O(n) Yes
TapeMerge O(n) No
TimMerge O(n) Yes

The following example demonstrates, how the KiwiMerge algorithm can be used to merge two arrays:

import com.github.jaaa.merge.KiwiMergeAccess;
import com.github.jaaa.permute.Swap;
import java.util.Arrays;

public class Example4 {
  public static void main( String... args ) {
    int[] a = { 0, 2, 12, 15, 17, 18, 19, 19, 22 },
          b = { 0, 2, 12, 14, 16, 17, 22 };
    int[] merged = new int[a.length + b.length];
    System.arraycopy(a,0, merged,0,        a.length);
    System.arraycopy(b,0, merged,a.length, b.length);
    new KiwiMergeAccess() {
      @Override public void swap( int i, int j ) { Swap.swap(merged,i,j); }
      @Override public int compare( int i, int j ) { return Integer.compare(merged[i], merged[j]); }
    }.kiwiMerge(0, a.length, merged.length);
    System.out.println( Arrays.toString(merged) );
    // [0, 0, 2, 2, 12, 12, 14, 15, 16, 17, 17, 18, 19, 19, 22, 22]
  }
}

Search Algorithms

Search algorithms find the index of an element in a sorted array. The most commonly known algorithm is binary search, also known as bisection. Given a sorted array of n elements, binary search requires O(log(n)) operations to find an element.

The JDK comes with a binary search implementation in form of Arrays.binarySearch. The implementation however has some limitations. Let's say for example we want to look to the entry 7 in the array {1,2,3,5,7,7,7,7,7,8,9}. Which index will Arrays.binarySearch return? Turns out it returns the index of the first element it finds. This is not always desirable. For stable insertion it may be preferable if the left-most or right-most index is returned. This can be achieved by using a custom comparator but this ist cumbersome and error-prone. JAAA therefore offers searchL and searchR variants of every search algorithm to expressly look for the left-most or right-most element. If an element is not found by a search, the bit complement of the insertion index (i.e. a negative integer) is returned. See the Arrays.binarySearch documentation for more information. If a search method is used for insertion only, it may be more convenient, if search methods always returned a positive integer index. JAAA offers the searchGap (searchGapL, searchGapR) methods to achieve that.

A lesser known specialized variant of binary search is exponential search. Its perhaps most predominant use is as part of the TimSort algorithm where it is referred to as "galloping search". Given a start index i for the search, exponential search finds the correct index j of an element in O(log|i-j|) operations. That means if a good start index is chosen, exponential search will significantly outperform binary search. JAAA offers different variants of exponential search which choose different start indices for the search:

  • ExpSearch: Accepts an explicit start index as argument.
  • ExpL2RSearch: Always starts search at the left end.
  • ExpR2LSearch: Always starts search at the right end.
  • AkimboSearch: If the searched entry is in the left half, it starts search at the left end, otherwise it starts search at the right end.

The following example shows how an ExpL2RSearchAccessor can be used to find an element from one array in another array:

import com.github.jaaa.search.ExpL2RSearchAccessor;

public class Example5 {
  public static void main( String... args ) {
    int[] array = { 0, 1, 1, 3, 5, 5, 6, 7, 7, 7, 8, 11, 12, 13, 14, 21 },
            key = { 7 };
    ExpL2RSearchAccessor<int[]> acc = (a,i, b,j) -> Integer.compare(a[i], b[j]);
    int index = acc.expL2RSearchL(array,0,array.length, key,0);
    System.out.println(index); // 7
  }
}

Search algorithms can be used without access-/accessor-style interfaces. In those cases a "compass" function is used. For an argument i the compass has to return -1 if the searched element is to the left, 0 if a searched element is found or +1 if the search element is to the right. The following example demonstrates how a compass can be used to find the integer floor of a square root:

import com.github.jaaa.search.ExpL2RSearch;
import java.util.stream.IntStream;
import static java.lang.Math.*;

public class Example6 {
  private static int sqrtFloor( int x ) {
    if( x < 0 ) throw new IllegalArgumentException();
    return ExpL2RSearch.searchGapL(0, Integer.MAX_VALUE, y -> Long.signum((long)y*y - x) );
  }
  public static void main( String... args ) {
    IntStream.rangeClosed(0, Integer.MAX_VALUE).parallel().forEach( x -> {
      assert sqrtFloor(x) == floor( sqrt(x) );
    });
  }
}

Selection Algorithms

The selection problem is perhaps one of the most misunderstood and least known algorithmic array problems. Given an (unsorted) array a and three indices from <= mid <= until, a selection algorithm rearranges the elements in the range [from,until) of a such that the smallest mid-from elements are moved to the left and the until-mid larges elements are moved to the right with a[mid] being the in-between element. In other words: a[from <= i <= mid] <= a[mid] <= a[mid <= j < until]. The selection problem naturally occurs as part of evolutionary algorithms. Computing the median or the percentile of a data series is a selection problem as well.

Every sorting algorithm is a selection algorithm. A common misconception among programmers is that sorting (which requires O(log(n)) operations) is the best possible solution. There are however O(n) selection algorithms which have been known for decades. Another misconception is that these O(n) algorithms are slower than sorting in practice. In reality, most selection algorithms easily beat TimSort for until-from > 10_000, especially if mid is close to from or until. It can also be shown that sorting algorithms like HeapSort, QuickSort or MergeSort can be turned into a selection resulting in significant overhead reductions.

JAAA offers a wide variety of selection algorithms. All implementations are unstable and in-place. Most in-place algorithms could easily be turned into a stable out-of-place variants. The following algorithms are available:

Algorithm Comparisons/Copies Adaptive Comment
QuickSelect O(n) No
HeapSelect O(max(l,r)*log(min(l,r))) Yes l=mid-from+1 r=until-mid
Mom3Select O(n) No Enhanced Median-ofMedians
Mom5Select O(n) No Classic Median-of-Medians

The following example shows how to apply Mom3Select:

import com.github.jaaa.permute.Swap;
import com.github.jaaa.select.Mom3SelectAccess;
import java.util.Arrays;

public class Example7 {
  public static void main( String... args ) {
    int[] array = { 14, 14, 0, 6, 13, 23, 9, 4, 22, 23, 5, 1, 7, 12, 7, 11 };
    new Mom3SelectAccess() {
      @Override public void   swap( int i, int j ) { Swap.swap(array,i,j); }
      @Override public int compare( int i, int j ) { return Integer.compare(array[i], array[j]); }
    }.heapSelect(/*from=*/0, /*mid=*/array.length/2, /*until=*/array.length);
    System.out.println( array[array.length/2] ); // 11
    System.out.println( Arrays.toString(array) );
    // [1, 4, 0, 6, 7, 5, 9, 7, 11, 23, 23, 22, 14, 14, 13, 12]
  }
}