Skip to content

Commit

Permalink
reimplement find_bm() using std::search
Browse files Browse the repository at this point in the history
Use std::search() to implement find_bm() instead of using a
local implementation of Boyer-Moore.

Avoids integer overflow reported in issue tat#31 and PR tat#31.
Should fix build problem in issue tat#7.

std::search is also faster for the test program in issue tat#31
on a system with an Intel Xeon E-2224 CPU:
- gcc 8.5, find_bm(): 3.16s
- g++ 8.5, std::search: 2.40s
- g++ 13, std::search: 2.16s

Experiments using the C++17 std::boyer_moore_searcher or
std::boyer_moore_horspool_searcher were also slower than
std::search in this experiment.
  • Loading branch information
lukem committed Dec 29, 2023
1 parent ebf95df commit f62a888
Showing 1 changed file with 2 additions and 35 deletions.
37 changes: 2 additions & 35 deletions mimetic/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
***************************************************************************/
#ifndef _MIMETIC_UTILS_H_
#define _MIMETIC_UTILS_H_
#include <algorithm>
#include <iostream>
#include <string>
#include <ctype.h>
Expand Down Expand Up @@ -43,37 +44,6 @@ int str2int(const std::string& s);
/// returns a string hexadecimal representation of \p n
std::string int2hex(unsigned int n);

// find_bm specialization for random access iterators
template<typename Iterator>
Iterator find_bm(Iterator bit, Iterator eit, const std::string& word, const std::random_access_iterator_tag&)
{
int bLen = word.length();
const char* pWord = word.c_str();
int i, t, shift[256];
unsigned char c;

for(i = 0; i < 256; ++i)
shift[i] = bLen;

for(i = 0; i < bLen; ++i)
shift[ (unsigned char) pWord[i] ] = bLen -i - 1;

for(i = t = bLen-1; t >= 0; --i, --t)
{
if((bit + i) >= eit)
return eit;

while((c = *(bit + i)) != pWord[t])
{
i += std::max(bLen-t, shift[c]);
if((bit + i) >= eit) return eit;
t = bLen-1;
}
}

return bit + i + 1;
}

// boyer-moore find
/**
* find the first occurrence of \p word in (\p bit, \p eit]
Expand All @@ -84,12 +54,9 @@ Iterator find_bm(Iterator bit, Iterator eit, const std::string& word, const std:
template<typename Iterator>
Iterator find_bm(Iterator bit, Iterator eit, const std::string& word)
{
return find_bm(bit, eit, word,
typename std::iterator_traits<Iterator>::iterator_category());
return std::search(bit, eit, word.begin(), word.end());
}



} // ns utils

}
Expand Down

0 comments on commit f62a888

Please sign in to comment.