Skip to content

Commit

Permalink
Continuing image cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
noramullen1 committed Jan 17, 2024
1 parent c150643 commit 904d223
Show file tree
Hide file tree
Showing 71 changed files with 31 additions and 37 deletions.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file removed .gitbook/assets/server-controller-deep-store (1).png
Diff not rendered.
Binary file removed .gitbook/assets/snapshot-msk (1).png
Diff not rendered.
File renamed without changes
2 changes: 1 addition & 1 deletion basics/components/cluster/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Offline servers are responsible for downloading segments from the segment store,
**Real-time**\
Real-time servers directly ingest from a real-time stream (such as Kafka or EventHubs). Periodically, they make segments of the in-memory ingested data, based on certain thresholds. This segment is then persisted onto the segment store.

![](<../../../.gitbook/assets/RealtimeServer (1).jpg>)
![](<../../../.gitbook/assets/RealtimeServer.jpg>)

Pinot servers are modeled as Helix participants, hosting Pinot tables (referred to as _resources_ in Helix terminology). Segments of a table are modeled as Helix partitions (of a resource). Thus, a Pinot server hosts one or more Helix partitions of one or more helix resources (_i.e._ one or more segments of one or more tables).

Expand Down
4 changes: 2 additions & 2 deletions basics/components/exploring-pinot.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ If you want to view the contents of a server, click on its instance name. You'll

To view the _baseballStats_ table, click on its name, which will show the following screen:

![baseballStats Table](<../../.gitbook/assets/view-table-baseball-stats (1) (1).png>)
![baseballStats Table](<../../.gitbook/assets/view-table-baseball-stats.png>)

From this screen, we can edit or delete the table, edit or adjust its schema, as well as several other operations.

Expand Down Expand Up @@ -69,7 +69,7 @@ Pinot supports a subset of standard SQL. For more information, see [Pinot Query

The [Pinot Admin UI](http://localhost:9000/help) contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.

![](<../../.gitbook/assets/Screen Shot 2020-02-28 at 10.00.43 AM.png>)
![](<../../.gitbook/assets/pinot-admin-ui.png>)

Let's check out the tables in this cluster by going to [Table -> List all tables in cluster](http://localhost:9000/help#/Table/listTables), click **Try it out**, and then click **Execute**. We can see the`baseballStats` table listed here. We can also see the exact cURL call made to the controller API.

Expand Down
4 changes: 2 additions & 2 deletions basics/components/table/segment/deep-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ The ingestion job then sends a notification about the new segment to the control

For real-time tables, by default, a segment is first built-in memory by the server. It is then uploaded to the lead controller (as part of the Segment Completion Protocol sequence), which writes the segment into the deep store, as shown in the diagram below:

![Server sends segment to Controller, which writes segments into the deep store](<../../../../.gitbook/assets/server-controller-deep-store (1).png>)
![Server sends segment to Controller, which writes segments into the deep store](<../../../../.gitbook/assets/server-controller-deep-store.png>)

Having all segments go through the controller can become a system bottleneck under heavy load, in which case you can use the peer download policy, as described in [Decoupling Controller from the Data Path](../../../../operators/operating-pinot/decoupling-controller-from-the-data-path.md).

When using this configuration, the server will directly write a completed segment to the deep store, as shown in the diagram below:

![Server writing a segment into the deep store](<../../../../.gitbook/assets/server-deep-store (1).png>)
![Server writing a segment into the deep store](<../../../../.gitbook/assets/server-deep-store.png>)

## Configuring the deep store

Expand Down
8 changes: 4 additions & 4 deletions basics/data-import/segment-compaction-on-upserts.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,19 +42,19 @@ Because segment compaction is an expensive operation, we **do not recommend** se

The following example includes a dataset with 24M records and 240K unique keys that have each been duplicated 100 times. After ingesting the data, there are 6 segments (5 completed segments and 1 consuming segment) with a total estimated size of 22.8MB.&#x20;

<figure><img src="../../.gitbook/assets/Screenshot 2023-09-28 at 12.00.05 PM.png" alt=""><figcaption><p>Example dataset</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/example-dataset.png" alt=""><figcaption><p>Example dataset</p></figcaption></figure>

Submitting the query `“set skipUpsert=true; select count(*) from transcript_upsert”` before compaction produces 24,000,000 results:

<div align="left">

<figure><img src="../../.gitbook/assets/Screenshot 2023-09-28 at 12.04.07 PM.png" alt="" width="265"><figcaption><p>Results before segment compaction</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/results-before-segment-compaction.png" alt="" width="265"><figcaption><p>Results before segment compaction</p></figcaption></figure>

</div>

After the compaction tasks are complete, the [Minion Task Manager UI](../components/cluster/minion.md#task-manager-ui) reports the following.

<figure><img src="../../.gitbook/assets/Screenshot 2023-09-28 at 12.07.22 PM.png" alt=""><figcaption><p>Minion compaction task completed</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/minion-task-completed.png" alt=""><figcaption><p>Minion compaction task completed</p></figcaption></figure>

Segment compactions generates a task for each segment to compact. Five tasks were generated in this case because 90% of the records (3.6–4.5M records) are considered ready for compaction in the completed segments, exceeding the configured thresholds.&#x20;

Expand All @@ -66,7 +66,7 @@ Submitting the query again shows the count matches the set of 240K unique keys.



<figure><img src="../../.gitbook/assets/Screenshot 2023-09-28 at 12.20.05 PM.png" alt=""><figcaption><p>Results after segment compaction</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/results-after-segment-compaction.png" alt=""><figcaption><p>Results after segment compaction</p></figcaption></figure>

Once segment compaction has completed, the total number of segments remain the same and the total estimated size drops to 2.77MB.&#x20;

Expand Down
10 changes: 5 additions & 5 deletions basics/data-import/upsert.md
Original file line number Diff line number Diff line change
Expand Up @@ -638,21 +638,21 @@ You can also run partial upsert demo with the following command
bin/quick-start-partial-upsert-streaming.sh
```

As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to checkout the real-time data.
As soon as data flows into the stream, the Pinot table will consume it and it will be ready for querying. Head over to the Query Console to check out the real-time data.

![Query the upsert table](<../../.gitbook/assets/Screen Shot 2021-06-15 at 10.02.46 AM.png>)
![Query the upsert table](<../../.gitbook/assets/query-upsert-table.png>)

For partial upsert you can see only the value from configured column changed based on specified partial upsert strategy.

![Query the partial upsert table](../../.gitbook/assets/screen-shot-2021-07-13-at-12.40.24-pm.png)
![Query the partial upsert table](../../.gitbook/assets/query-partial-upsert-table.png)

An example for partial upsert is shown below, each of the event\_id kept being unique during ingestion, meanwhile the value of rsvp\_count incremented.

![Explain partial upsert table](../../.gitbook/assets/screen-shot-2021-07-13-at-12.41.42-pm.png)
![Explain partial upsert table](../../.gitbook/assets/explain-partial-upsert-table.png)

To see the difference from the non-upsert table, you can use a query option `skipUpsert` to skip the upsert effect in the query result.

![Disable the upsert during query via query option](<../../.gitbook/assets/Screen Shot 2021-06-15 at 10.03.22 AM.png>)
![Disable the upsert during query via query option](<../../disable_upsert_during_query.png>)

### FAQ

Expand Down
6 changes: 3 additions & 3 deletions basics/recipes/github-events-stream.md
Original file line number Diff line number Diff line change
Expand Up @@ -614,18 +614,18 @@ $ kubectl apply -f pinot-github-realtime-events.yml

Browse to the [Query Console](http://localhost:9000/query) to view the data.

![](<../../.gitbook/assets/Screen Shot 2020-03-26 at 6.27.43 PM.png>)
![](<../../.gitbook/assets/events-stream-view-data.png>)

### Visualize with SuperSet

You can use SuperSet to visualize this data. Some of the interesting insights we captures were

#### List the most active organizations during the lockdown

![](<../../.gitbook/assets/Screen Shot 2020-04-08 at 9.28.57 AM.png>)
![](<../../.gitbook/assets/superset-most-active-organizations-example.png>)

Repositories by number of commits in the Apache organization

![](<../../.gitbook/assets/Screen Shot 2020-04-08 at 9.29.12 AM.png>)
![](<../../.gitbook/assets/superset-repos-with-most-commits-example.png>)

To integrate with SuperSet you can check out the [SuperSet Integrations](../../integrations/superset.md) page.
2 changes: 0 additions & 2 deletions configuration-reference/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ These are the properties that be set at the cluster level.

## Cluster Configs APIs

![](<../.gitbook/assets/Screen Shot 2020-07-01 at 10.29.33 PM.png>)

{% swagger baseUrl="http://<controller>:<port>/cluster/configs" path="" method="get" summary="List All Cluster Configs" %}
{% swagger-description %}
**Description**
Expand Down
2 changes: 1 addition & 1 deletion developers/advanced/v2-multi-stage-query-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ To learn more about what the multi-stage query engine is, see [Multi-stage query

* To enable the multi-stage query engine, in the Pinot Query Console, select the **Use Multi-Stage Engine** check box.

<figure><img src="../../.gitbook/assets/Screenshot 2023-09-14 at 9.59.22 AM.png" alt=""><figcaption><p>Pinot Query Console with Use Multi Stage Engine enabled</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/pinot-query-console-multi-stage-enabled.png" alt=""><figcaption><p>Pinot Query Console with Use Multi Stage Engine enabled</p></figcaption></figure>

## Programmatically access the multi-stage query engine

Expand Down
2 changes: 1 addition & 1 deletion integrations/presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Splits: 17 total, 17 done (100.00%)

Meanwhile you can access [Presto Cluster UI](http://localhost:8080/ui/) to see query stats.

![Presto Cluster UI](<../.gitbook/assets/presto-cluster-ui (1) (1).png>)
![Presto Cluster UI](<../.gitbook/assets/presto-cluster-ui.png>)
{% endtab %}
{% endtabs %}

Expand Down
10 changes: 5 additions & 5 deletions operators/operating-pinot/rebalance/rebalance-brokers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ These are typically done when downsizing/uplifting a cluster, or replacing nodes

Every broker added to the Pinot cluster, has tags associated with it. A group of brokers with the same tag forms a Broker Tenant. By default, a broker in the cluster gets added to the `DefaultTenant` i.e. gets tagged as `DefaultTenant_BROKER`. Below is an example of how this tag looks in the znode, as seen in ZooInspector.

![Broker tag](<../../../.gitbook/assets/Screen Shot 2020-09-15 at 11.24.58 AM.png>)
![Broker tag](<../../../.gitbook/assets/zookeeper-browser-broker-tenant.png>)

A Pinot table config has a tenants section, to define the tenant to be used by the table. More details about this in the [Tenants](../../../basics/components/cluster/tenant.md) section.

Expand All @@ -28,7 +28,7 @@ A Pinot table config has a tenants section, to define the tenant to be used by t

Using the tenant defined above, a mapping is created, from table name to brokers and stored in the `IDEALSTATES/brokerResource`. This mapping can be used by external services that need to pick a broker for querying.&#x20;

![brokerResource IDEALSTATE](<../../../.gitbook/assets/Screen Shot 2020-09-15 at 11.26.12 AM.png>)
![brokerResource IDEALSTATE](<../../../.gitbook/assets/zookeeper-browser-broker-resource.png>)

#### Updating tags

Expand All @@ -40,7 +40,7 @@ To update the tags on the broker, use the following API:

`PUT /instances/{instanceName}/updateTags?tags=<comma separated tags>`

![updateTags API](<../../../.gitbook/assets/Screen Shot 2020-09-15 at 11.31.47 AM.png>)
![updateTags API](<../../../.gitbook/assets/update-tags-api.png>)

Example for tagging the broker as per your custom tenant:

Expand All @@ -56,7 +56,7 @@ After making any capacity changes to the broker, the brokerResource needs to be

`POST /tables/{tableNameWithType}/rebuildBrokerResourceFromHelixTags`

![rebuildBrokerResource API](<../../../.gitbook/assets/Screen Shot 2020-09-15 at 11.35.29 AM.png>)
![rebuildBrokerResource API](<../../../.gitbook/assets/rebuild-broker-resource-api.png>)

### Drop nodes

Expand All @@ -66,7 +66,7 @@ First, shutdown the broker. Then, use API below to remove the node from the clus

`DELETE /instances/{instanceName}`

![](<../../../.gitbook/assets/Screen Shot 2020-09-15 at 11.38.37 AM.png>)
![](<../../../.gitbook/assets/delete-instances-api.png>)

### Troubleshooting

Expand Down
10 changes: 3 additions & 7 deletions operators/operating-pinot/rebalance/rebalance-servers.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ These are typically done when downsizing/uplifting a cluster or replacing nodes

#### Tenants and tags

Every server added to the Pinot cluster, has tags associated with it. A group of servers with the same tag forms a Server Tenant.
Every server added to the Pinot cluster has tags associated with it. A group of servers with the same tag forms a server tenant.

By default, a server in the cluster gets added to the `DefaultTenant` i.e. gets tagged as `DefaultTenant_OFFLINE` and `DefaultTenant_REALTIME`.

Below is an example of how this looks in the znode, as seen in ZooInspector.

![](<../../../.gitbook/assets/Screen Shot 2020-09-08 at 2.05.29 PM.png>)
![](<../../../.gitbook/assets/zookeeper-browser-server-tenant.png>)

A Pinot table config has a tenants section, to define the tenant to be used by the table. The Pinot table will use all the servers which belong to the tenant as described in this config. For more details about this, see the [Tenants](../../../basics/components/cluster/tenant.md) section.

Expand All @@ -43,12 +43,10 @@ A Pinot table config has a tenants section, to define the tenant to be used by t

_**0.6.0 onwards**_

In order to change the server tags, the following API can be used.
In order to change the server tags, use the following API.

`PUT /instances/{instanceName}/updateTags?tags=<comma separated tags>`

![](<../../../.gitbook/assets/Screen Shot 2020-09-08 at 2.29.44 PM.png>)

_**0.5.0 and prior**_

UpdateTags API is not available in 0.5.0 and prior. Instead, use this API to update the Instance.
Expand Down Expand Up @@ -140,8 +138,6 @@ To run a rebalance, use the following API.

`POST /tables/{tableName}/rebalance?type=<OFFLINE/REALTIME>`

![](<../../../.gitbook/assets/Screen Shot 2020-09-08 at 2.53.48 PM.png>)

This API has a lot of parameters to control its behavior. Make sure to go over them and change the defaults as needed.

{% hint style="warning" %}
Expand Down
6 changes: 3 additions & 3 deletions users/api/pinot-rest-admin-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ The [Pinot Admin UI](http://localhost:9000/help) contains all the APIs that you

Note: The controller API's are primarily for admin tasks. Even though the UI console queries Pinot when running queries from the query console, use the [Broker Query API](https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql) for querying Pinot.

![](<../../.gitbook/assets/Screen Shot 2020-02-28 at 10.00.43 AM.png>)
![](<../../.gitbook/assets/pinot-admin-ui.png>)

Let's check out the tables in this cluster by going to [Table -> List all tables in cluster](http://localhost:9000/help#!/Table/listTableConfigs) and click on `Try it out!`. We can see the `baseballStats` table listed here. We can also see the exact `curl` call made to the controller API.

![List all tables in cluster](<../../.gitbook/assets/.unused/Screen Shot 2020-02-28 at 10.00.26 AM.png>)
![List all tables in cluster](<../../.gitbook/assets/list-all-tables.png>)

You can look at the configuration of this table by going to [Tables -> Get/Enable/Disable/Drop a table](http://localhost:9000/help#!/Table/alterTableStateOrListTableConfig), type in `baseballStats` in the table name, and click `Try it out!`

Let's check out the schemas in the cluster by going to [Schema -> List all schemas in the cluster](http://localhost:9000/help#!/Schema/listSchemaNames) and click `Try it out!`. We can see a schema called `baseballStats` in this list.

![List all schemas in the cluster](<../../.gitbook/assets/Screen Shot 2020-02-28 at 10.09.18 AM.png>)
![List all schemas in the cluster](<../../.gitbook/assets/list-all-schemas.png>)

Take a look at the schema by going to [Schema -> Get a schema](http://localhost:9000/help#!/Schema/getSchema), type `baseballStats` in the schema name, and click `Try it out!`.

Expand Down
2 changes: 1 addition & 1 deletion users/user-guide-query/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >-

# Query

<figure><img src="../../.gitbook/assets/Screen Shot 2023-09-13 at 2.42.17 AM.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/pinot-query-console.png" alt="Pinot Query Console"><figcaption></figcaption></figure>

{% content-ref url="querying-pinot.md" %}
[querying-pinot.md](querying-pinot.md)
Expand Down

0 comments on commit 904d223

Please sign in to comment.