Skip to content

Alan's Zen Highlighter Concept

Tom Elam edited this page Mar 29, 2021 · 24 revisions

Highlighter Concept

Updates: 3/29/2021

Tom says: There are a lot of tasks embedded in building the Zen Highlighter, so let's break this down.

Key ideas

1. We are not stuck with the search capabilities of the document provider or the browser

Many websites provide keyword or other simple search, but it is not very powerful. This is just within a document, a fine-grain search. Google has "site:xyz.edu", but it's not very fine-grain. You already have the document. Google might have some Boolean logic, but ... still. You want to be able to do Boolean logic. Would like to do:

`(IN-ONE-SENTENCE "Internet" AND ((AS-IN "programming" "Scheme") OR "Lisp"))`

(Did Google take out their Boolean search features?)

Grab an NLP toolkit in Scheme?

-> Tom will look for quick hacks, but in this case we already have the document(s) of interest. The Zen Highlighter should have (or must have) SEMANTICS in its search algorithm.

2. We can search documents from multiple sources at the same time

Alan's sketch shows boxes for several documents.

-> Tom will research access of an embedded iframe.

3. We are interested in sentences or paragraphs, not just words

Got it.

4. Search understands the DOM structure, not just plain text

Outline on the left of the page.

  • Headings (<h1>, ...)

The search should suggest related publications on such websites as:

  • IEEE
  • AAAI.org (?)

5. Can augment with domain-specific search capabilities, e.g. chemistry, biology, computer science

NLP again. Taxonomies, dictionaries, conventions.

A plug-in capability for Zen.

Tom says:

#1: Good, but there is a Google operator "site:xyz.com these are my search terms", which most non-technical users probably don't know about. Still, the Zen Highlighter should be dog food for us, not for an undefined, un-"monetized" "crowd in the cloud". The Zen Highlighter should be something that I would want to use (mostly because I'm paying for it).

#2: See my point for key idea #1.

#3: Interesting, but how long would it take to program that? My budget is very limited.

#4: How could this be used?

#5: This sounds too difficult to implement without a lot of foregoing development. I want something useful very soon, based upon easy programming (not visual programming such as Demo #3: Zen visual program editor).


Scenario

Grab some web pages and drop into Zen Highlighter page

Tom has an example of working with HTML5 APIs like the Drag and Drop API.

There is another API that might be interesting: * See https://developer.mozilla.org/en-US/docs/Web/API/Gamepad_API/Using_the_Gamepad_API * See https://developer.mozilla.org/en-US/docs/Web/API * See https://github.com/Mashweb/web-call.cc/wiki/Zen-user-intereactions.

Main means of implementing the dragging and dropping of web pages onto the Zen Highlighter page: * HTML drag and drop * File drag and drop * Clipboard

Web pages (documents) are displayed as thumbnails with the active one differentiated from the others

  • Scaled?
  • Image snapshot using a web service.
  • Text like page title or <h1> headers.

Specify complex search criteria

  • Words and logical operators
  • Possibly with stemming and other capabilities behind the scenes (change words based upon tense, nouns, synonyms, other NLP tricks for word matching and sentence understanding)
  • Hit the search button (Maybe also looking for more web pages like this. "What would Google use to find other documents like this?" Word frequencies, etc. Which words are fairly rare?)

Sentences in the documents that match the criteria are highlighted

Insert tags.

Next/previous buttons to jump to highlighted sentences

Put anchor tags in the tags. There is a way to force the browser (tab ordering and _____) to the next form.

-> Look at what external web services provide, e.g.:

  • Wolfram Alpha
  • Semantic Scholar
  • (Another website a bit like Semantic Scholar whose name Tom can't immediately recall)

Operations on highlighted sentences

  • Can un-highlight uninteresting sentences
  • Can add the next or previous sentence to highlighting (Grow your selection.)
  • Can highlight sentences manually by clicking on a sentence
  • Can add a comment to a sentence
  • Can click one or more sentences and "get more like this" to expand the search In the rest of the document or in the set of documents you have.

-> Tom will make a pointer to a web service and browser plugin that allows you to annotate any web page.


Operations on documents

  • Can extract all highlighted sentences Pull them out and put them into another document.
  • Can do DOM operations like create an outline of the headings and highlight those that contain highlighted sentences
    • Skimming and scanning like every first sentence and every first paragraph.
    • Hide everything else.
    • Folding editor or Safari's Reader Mode.
  • For documents that don't use headings, can construct effective outline based on DOM structure Maybe.
  • Can hide and unhide sections OK.
  • Can pull in linked documents and search them with same criteria OK.

Paragraphs

  • Can have all of the same capabilities for paragraphs instead of sentences (and might even be easier) OK. Also section level.
  • Can do DOM aware operations like get intro paragraph and/or closing paragraph for every section OK.

Where to go from here

Tom says:

The Dojo Toolkit might provide more fodder for ideas. See https://github.com/Mashweb/web-call.cc/wiki/Zen-Widgets:-Their-Creation,-Manipulation,-and-Affordances.

I have begun editing a wiki page that will help us adapt and adopt some of your ideas quickly: Zen user intereactions.

Note that it is easy to "embed" videos in a Zen web page. (See how easy it is to "embed" a YouTube video in a GitHub wiki, for example.) That is actually something I would like to do, because I find prominent people are often interviewed for the news.

I will work on your ideas some more later today, Alan. You have come up with a lot of good ideas. I don't want to shoot them down. Let's see how we can develop them quickly so they can help tech savvy people like us. ;-)

-Tom