-
-
Notifications
You must be signed in to change notification settings - Fork 450
Heuristics for testing search
Testing Search effectively, especially given the depth and breadth of contents available for Kiwix. This is compounded by the challenges of making the Search performant, reliable, and relevant on the vast range of Android devices, especially combined with the range of languages in use. Therefore it's hard to specify precisely what we expect in terms of how search will behave and perform. These heuristics are intended as a starting point - they should generally hold true yet you're welcome to adapt, reject or ignore those that don't seem useful.
Note: this wikipage remains a work-in-progress and is not complete or finished. Contributions are welcome.
- Kiwix will be able to search text-based content in ZIM files available to the app. Some storage locations don't seem to be available in practice e.g. OTG connected storage. We don't expect Kiwix to be able to search content it cannot access directly.
- Searches will be based on the ZIM files currently available on the device at runtime. Users may delete files, add files, replace memory cards, etc. while Kiwix is running and between times when Kiwix is used. When users change what's available while Kiwix is running we expect Kiwix to adapt without needing to be restarted.
- Searches will be possible in the language of the content; users will be able to input characters in that language e.g. in Japanese for Japanese content regardless of what language the device is configured to use.
- Users will not be left with a blank results page. If Search doesn't find any results it will tell the users so.
- Whitespace is allowed and the first character of whitespace between words is significant. Additional whitespace will silently be ignored in terms of search results. e.g.
white space
andwhite space
are considered to be equivalent when searching for results. - The first character of whitespace at the end of a word in the search box is significant. e.g.
go
may return different results fromgo
. Sogo
would matchgood
,go
would not matchgood
. - Top online search terms for Wikimedia sources will be found (and matched) when searching the equivalent ZIM file in Kiwix-Android. There may be exceptions for highly topical searches e.g. in response to breaking news.
- As more characters are entered there will be fewer results, as characters are removed from the end of the search term more results will be returned. The numbers will broadly be symmetric e.g. for a set of search queries
fun
->fund
->fun
similar results would be returned for bothfun
queries, fewer results would be returned forfund
asfun
matchesfunction
,fund
does not matchfunction
.
Possible sources of top online search terms include:
- Top 5000 Searches for 'last week': https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_pages
- Top 25 Searches for 'last week': (likely to have more topical searches) https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report
- Lots of sources, including both the above: https://en.wikipedia.org/wiki/Wikipedia:Statistics
Kiwix has been available for many years, as have ZIM files. Over the years the ZIM file format has been extended and modified. So has the way Searching has been implemented. Some of the older software and older ZIM files may behave differently. At some point we may be able to provide a matrix of software and ZIM file formats and how the intersections behave in various ways, including searching and search results. For now, let's remember there are likely to be differences and note these as part of investigating ways to improve search and search results.
Generally, Search should be consistent across all Kiwix apps, servers and utilities. There may be valid reasons for some differences e.g. related to UX expectations, performance, etc.
Kiwix includes various command-line tools, one is kiwix-search
, another zimsearch
. We've decided to pick kiwix-search
as the reference to test the core search capabilities.
The version I'm currently using is from: http://download.kiwix.org/nightly/2017-12-17/kiwix_tools_linux64_2017-12-17.tar.gz
The following are unknown, at least from my perspective. Hopefully we will be able to clarify the expected/desired/actual behaviours for these soon.
- Whether accents are significant in either the term entered or the content matched.
- Whether commonly paired words such as
white space
,white-space
andwhitespace
are considered to be equivalent in either the term entered or the content matched. - Whether common abbreviations will be supported and matched with the unabbreviated form e.g.
WW2
andWorld War Two
. - Whether users can enter special characters or otherwise control the behaviours of the search e.g. in terms of case sensitivity, boolean operations, wildcards, etc.