I'm trying to implement full-text search with static hosting, reasonable file sizes, and limited computation. I chose to give each book a unique ID encoded in (say) base 36 (0-9 and A-Z). The ids are sorted in some order (currently date) with reviewed books first because we want to feature those.
The index for a word simply lists the book ids as a string. For example, the
index for the word "did" is 3C5475ACAF
with 2-digit book ids encoded in
base 16. "did" occurs in 6 books in that tiny 100-book collection.
- I currently store the words in files with the word as their name. The above index for "did" is in the file index/did. This will, no doubt, cause problems when we try to include languages with unicode characters. Perhaps we should hash the word and use that for the filename?
- I store categories as 4-letter "words" in uppercase. This has to change.
- This coding is inefficient with lots of leading zeros wasting space. Can we easily do better?
- Should we consider using a higher base for the ids and then translating when we go to the URL?
- Is there some better approach? This is the first thing I thought of that allowed incremental results.
- What order should they be in? Popularity?
- Which words should not be indexed?
Implements sort-merge algorithms for selecting subsets of books. Assumes the sort has already by done. I chose this approach because I wanted to produce search outputs incrementally.
- Should we change the encoding? This is the only place that would have to change.
- Encoding the first unreviewed in config.json is a hack.
Simple tests for BookSet.ts.
The generated books and indexes down in the content folder are generated from these templates.
Supporting styles and code for books.
Choose a book to read from a favorites page. Identical to the find page except for the absence of search controls.
Manage favorites page and supporting styles and code.
Book search page and supporting files.
- We should signal the user when a word is not indexed; it currently ignores such words silently.
- We should signal when we are working. Would help with testing as well.
Template for html page headers.
Icons uses on the site.
Home page for the site.
Service worker prototype.
- How to update?
- How much to precache?
- Cache management strategy?
- How to search offline?
Template for the menu.
- I'm using the html5 disclosure element
details
to implement the menu. It seems accessible to me. Is it?
Settings page.
- Ugly. Needs help.
Site wide styles.
- Shouldn't this just be included in the page css?
- Should we even have different css per page?
Module for managing speech.
- Any way to manage the list of voices? For English books shouldn't it only display EN voices?
Module for persistent state.
- We're using local storage and indexedDB. These are domain specific. How to manage multiple editions on the same domain?
Handle swipe events.