Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp docs as UBI has evolved. #8800

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 0 additions & 204 deletions _search-plugins/ubi/data-structures.md

This file was deleted.

14 changes: 8 additions & 6 deletions _search-plugins/ubi/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,17 @@ redirect_from:
**Introduced 2.15**
{: .label .label-purple }

**References UBI Specification 1.0.0**
**References UBI Specification 1.2.0**
{: .label .label-purple }

User Behavior Insights (UBI) is a plugin that captures client-side events and queries for the purposes of improving search relevance and the user experience.
It is a causal system, linking a user's query to all of their subsequent interactions with your application until they perform another search.
User Behavior Insights (UBI) is a project for capturing client-side events and queries for the purposes of improving search relevance and the user experience.
It is a *causal* system, linking a user's query to all of their subsequent interactions with your application until they perform another search.
This differs from many systems that infer the linking of search to events through *chronological* sequence.

UBI includes the following elements:
* A machine-readable [schema](https://github.com/o19s/ubi) that faciliates interoperablity of the UBI specification.
* An OpenSearch [plugin](https://github.com/opensearch-project/user-behavior-insights) that facilitates the storage of client-side events and queries.
* A client-side JavaScript [example reference implementation]({{site.url}}{{site.baseurl}}/search-plugins/ubi/data-structures/) that shows how to capture events and send them to the OpenSearch UBI plugin.
* A client-side JavaScript [reference implementation](https://github.com/opensearch-project/user-behavior-insights/tree/main/ubi-javascript-collector/ubi.js) that shows how to capture searches and events.

<!-- vale off -->

Expand All @@ -30,7 +31,8 @@ The UBI documentation is organized into two categories: *Explanation and referen

| Link | Description |
| :--------- | :------- |
| [UBI Request/Response Specification](https://github.com/o19s/ubi/) | The industry-standard schema for UBI requests and responses. The current version references UBI Specification 1.0.0. |
| [UBISearch.dev](https://UBISearch.dev) | The User Behavior Insights community website. |
| [UBI Request/Response Specification](https://github.com/o19s/ubi/) | The industry-standard schema for UBI requests and responses. The current version references UBI Specification 1.2.0. |
| [UBI index schema]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/) | Documentation on the individual OpenSearch query and event stores. |


Expand All @@ -39,7 +41,7 @@ The UBI documentation is organized into two categories: *Explanation and referen
| Link | Description |
| :--------- | :------- |
| [UBI plugin](https://github.com/opensearch-project/user-behavior-insights) | How to install and use the UBI plugin. |
| [UBI client data structures]({{site.url}}{{site.baseurl}}/search-plugins/ubi/data-structures/) | Sample JavaScript structures for populating the event store. |
| [UBI JavaScript Collector]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-javascript-collector/) | Clientside JavaScript library to capture events. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [UBI JavaScript Collector]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-javascript-collector/) | Clientside JavaScript library to capture events. |
| [UBI JavaScript Collector]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-javascript-collector/) | A client-side JavaScript library for capturing events. |

| [Example UBI query DSL queries]({{site.url}}{{site.baseurl}}/search-plugins/ubi/dsl-queries/) | How to write queries for UBI data in OpenSearch query DSL. |
| [Example UBI SQL queries]({{site.url}}{{site.baseurl}}/search-plugins/ubi/sql-queries/) | How to write analytic queries for UBI data in SQL. |
| [UBI dashboard tutorial]({{site.url}}{{site.baseurl}}/search-plugins/ubi/ubi-dashboard-tutorial/) | How to build a dashboard containing UBI data. |
Expand Down
30 changes: 12 additions & 18 deletions _search-plugins/ubi/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ The User Behavior Insights (UBI) data collection process involves tracking and r

For UBI to function properly, the connections between the following fields must be consistently maintained within an application that has UBI enabled:

- [`object_id`](#object_id) represents an ID for whatever object the user receives in response to a query. For example, if you search for books, it might be an ISBN code of a book, such as `978-3-16-148410-0`.
- [`query_id`](#query_id) is a unique ID for the raw query language executed and the `object_id` values of the _hits_ returned by the user's query.
- [`object_id`](#object_id) represents an ID for whatever object the user receives in response to a query. For example, if you search for books, it might be an ISBN number for a book, such as `978-3-16-148410-0`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`object_id`](#object_id) represents an ID for whatever object the user receives in response to a query. For example, if you search for books, it might be an ISBN number for a book, such as `978-3-16-148410-0`.
- [`object_id`](#object_id) represents an ID for whatever object the user receives in response to a query. For example, if you search for books, it might be an ISBN for a book, such as `978-3-16-148410-0`.

- [`query_id`](#query_id) is a unique ID for the raw query language executed and the `object_id` maps to the primary identifier of the _hits_ returned by the user's query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reword this? Query ID seems to be a unique ID for the raw query, correct? Not the query language?

Suggested change
- [`query_id`](#query_id) is a unique ID for the raw query language executed and the `object_id` maps to the primary identifier of the _hits_ returned by the user's query.
- [`query_id`](#query_id) is a unique ID for the raw query executed, while the `object_id` maps to the primary identifier of the _hits_ returned by the user's query.

- [`client_id`](#client_id) represents a unique query source. This is typically a web browser used by a unique user.
- [`object_id_field`](#object_id_field) specifies the name of the field in your index that provides the `object_id`. For example, if you search for books, the value might be `isbn_code`.
- [`action_name`](#action_name), though not technically an ID, specifies the exact user action (such as `click`, `add_to_cart`, `watch`, `view`, or `purchase`) that was taken (or not taken) for an object with a given `object_id`.
Expand Down Expand Up @@ -138,11 +138,11 @@ All underlying query information and results (`object_ids`) are stored in the `u

The `ubi_queries` index [schema](https://github.com/OpenSearch-project/user-behavior-insights/tree/main/src/main/resources/queries-mapping.json) includes the following fields:

- `timestamp` (events and queries): A UNIX timestamp indicating when the query was received.
- `timestamp` (events and queries): A ISO 8601 formatted timestamp indicating when the query was received.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `timestamp` (events and queries): A ISO 8601 formatted timestamp indicating when the query was received.
- `timestamp` (events and queries): An ISO 8601-formatted timestamp indicating when the query was received.


- `query_id` (events and queries): The unique ID of the query provided by the client or generated automatically. Different queries with the same text generate different `query_id` values.
- `client_id` (events and queries): A user/client ID provided by the client application.
- `query_id` (events and queries): The unique ID of the query provided by the client or generated by the search engine. Different queries with the same text generate different `query_id` values.

- `client_id` (events and queries): A client ID provided by the client application.

- `query_response_objects_ids` (queries): An array of object IDs. An ID can have the same value as the `_id`, but it is meant to be the externally valid ID of a document, item, or product.

Expand All @@ -169,14 +169,14 @@ The following are the predefined, minimal fields in the `ubi_events` index:
<p id="query_id"> </p>

- `query_id` (size 100): The unique identifier of a query, which is typically a UUID but can be any string.
The `query_id` is either provided by the client or generated at index time by the UBI plugin. The `query_id` values in both the **UBI queries** and **UBI events** indexes must be consistent.
The `query_id` is either provided by the client or generated at query time by the UBI plugin. The `query_id` values in both the **UBI queries** and **UBI events** indexes must be consistent.

<p id="client_id"> </p>

- `client_id`: The client that issues the query. This is typically a web browser used by a unique user.
The `client_id` in both the **UBI queries** and **UBI events** indexes must be consistent.

- `timestamp`: When the event occurred, either in UNIX format or formatted as `2018-11-13T20:20:39+00:00`.
- `timestamp`: When the event occurred, using ISO 8601 format such as `2018-11-13T20:20:39+00:00Z`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `timestamp`: When the event occurred, using ISO 8601 format such as `2018-11-13T20:20:39+00:00Z`.
- `timestamp`: The time the event occurred in ISO 8601 format, such as `2018-11-13T20:20:39+00:00Z`.


- `message_type` (size 100): A logical bin for grouping actions (each with an `action_name`). For example, `QUERY` or `CONVERSION`.

Expand All @@ -193,18 +193,12 @@ The following are the predefined, minimal fields in the `ubi_events` index:

- `event_attributes.position.ordinal`: Tracks the list position that a user can select (for example, selecting the third element can be described as `event{onClick, results[4]}`).

- `event_attributes.position.{x,y}`: Tracks x and y values defined by the client.

- `event_attributes.position.page_depth`: Tracks the page depth of the results.

- `event_attributes.position.scroll_depth`: Tracks the scroll depth of the page results.

- `event_attributes.position.trail`: A text field that tracks the path/trail that a user took to get to this location.

- `event_attributes.position.xy.{x,y}`: Tracks x and y values defined by the client.

- `event_attributes.object`: Contains identifying information about the object returned by the query (for example, a book, product, or post).
The `object` structure can refer to the object by internal ID or object ID. The `object_id` is the ID that links prior queries to this object. This field comprises the following subfields:

- `event_attributes.object.internal_id`: A unique ID that OpenSearch can use to internally index the object, for example, the `_id` field in the indexes.
- `event_attributes.object.internal_id`: The unique ID that OpenSearch uses to internally index the object, for example, the `_id` field in the indexes.

<p id="object_id">

Expand All @@ -214,7 +208,7 @@ The following are the predefined, minimal fields in the `ubi_events` index:

<p id="object_id_field">

- `event_attributes.object.object_id_field`: Indicates the type/class of the object and the name of the search index field that contains the `object_id`.
- `event_attributes.object.object_id_field`: Indicates the type/class of the object and the name of the search index field that contains the `object_id` such as `ssn`, `isbn`, or `ean`.

- `event_attributes.object.description`: An optional description of the object.

Expand Down
Loading
Loading