-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Integrating SQL/PPL query languages into DSL via the _search API #12434
Comments
approach 1 (recommended)/ seem independent for query languages. which are a low level building block that can be used in option 2 anyways |
Why not modify the client so that it access the plugins/sql endpoint? |
What is the usecase for this approach? |
@vamsi-amazon have you thought introducing the ppl/sql as a another query clause rather than coming with a new concept Query engine? Currently in Opensearch you can define a query type along with how to parse and convert the query into apt Lucene query clause. Wondering if we have explored that option and what is the reason for not choosing that option and rather than building a new concept all together? This will have many advantages:
|
Where is this problem explored? If client library support was improved it would benefit the SQL plugin and all other plugins for OpenSearch. |
How does |
We already support driver in JDBC / ODBC and dbapi. other opensearch clients would need to support the jdbc/odbc spec and enable access via SQL / PPL; this can be done in the fullness of time. Our goals are not just client user access / but also developer access without introducing inter-plugin dependencies. Many of our users still use the dsl and hand craft DSL queries. The proposal is not to move code but maintain code modularity by adding a QueryEngineSpec.
SQL is an independent high-level query language hence 1 doesn't apply ; 2,3 can still be used
no, it doesn't need to work out of the box. SQL drivers JDBC/ODBC/DBAPI will continue to work for the developers and users.
Integrating SQL/PPL into OpenSearch as standard languages enhances its utility and accessibility. For users, it promises compatibility with JDBC/ODBC and DBAPI clients, opening up OpenSearch to a wider audience. All features, including dashboards, will eventually support SQL/PPL by default, increasing usability. For developers, incorporating these features into the core simplifies development, avoids plugin dependencies while ensures backward compatibility, making OpenSearch a more unified platform for querying. This move positions OpenSearch as a leading relevancy-focused SQL engine with advanced capabilities like highlighting and full-text queries. PPL reference manual SQL reference manual developer docs Getting started developer guide Drivers this approach will/
Opensearch client don't need to support SQL / PPL by default- they are supported by the jdbc/odbc spec'ed drivers and dbapi. Since this is an optional clause clients can ignore it. Search pipelines is not a low level feature to implement a fundamental query language. |
This is the part that bugs me. It's not using the standard Search API. We want to access SQL/PPL with JDBC/ODBC clients. Sure. The requests are not Given that we have no interest in For example, could I add a
Then I get back a response like:
Is that something we want to support? What things go into the There's nothing stopping me from adding a
|
This is the right question, for which I think we need some tenets. Search is over documents that are stored in indexes. To me, search is defined by 1) parses a query written in some language, 2) evaluates every stored document against that query, 3) matches or doesn't match the document, 4) produces a score for all documents that match, then 5) sorts results my score and 6) returns them. Is this an acceptable definition @msfroh? If so, in the case of the math example or settings you're missing 2), 3), 4) and 5), so it doesn't fit under search. In the case of SQL I think it fits that definition where the language to express the query is different. |
This is confusing to me. I suppose I don't understand internals. I think there should be a |
Aha! I like this. I think this gets into some of the question of "What is the input used to perform an internal operation versus what is the representation sent over the wire?" that touches on the challenge that @VachaShah has encountered on her Protobuf work, made difficult by the fact that business objects have historically defined their own wire format. I believe the approach in this proposal is "How can we embed a SQL/PPL representation of a search request inside the existing REST This almost feels like we want to support a different Of course, once we're on the cluster, we're forking down completely different paths. The existing |
@dblock I think by your definition the took, _shards, hits, etc. fields in the response would be relevant for any search request, but this proposal explicitly calls them "unrelated parameters". If we truly need to exclude those fields then it feels like we're shoehorning something into the API similar to @froh's contrived |
I think I see a way that we can handle this, albeit in two steps:
That way, we preserve the "SearchResponse returns documents" part that my contrived math example doesn't (though it could send an answer back in a document 😄 ). From a code standpoint, we could split off before trying to parse into a |
Opensearch response today is divided in to roughly 2 parts
we should be able to fill all the response metadata fields; but the response data format I propose be jdbc spec'ed. that is easy/simple to use and understand no matter what aggregation is used.
@dblock wrt how you are defining tenets/ SQL /PPL support almost-all opensearch relevancy features in an easy to use high level language this would be the only / most powerful SQL dialect that support all relevancy features in a SQL / Piped language which is on a search engine agree with @msfroh on the final comment here/ |
@vamsi-amazon Thoughts on updating the proposal above with the information discussed? |
agree with @msfroh on the final comment of support for the schema/datarows concept we should also align on the scope and launch criteria of support SQL in
we should also
|
@dblock Got busy with 2.13 release for last couple of days. Will update the proposal with the information discussed. |
This is truly awesome question and problem statement we should take deeper look if we want to support different query engine. But here are some thoughts I have after reading the conversation:
|
@penghuo you have good examples of how SQL doesn't align well with other queries @navneet1v Does your question boil down to the fact that search needs to be aware of what kind of index it is? I think both comments are just generic questions of what's constant and what's variable (OO abstractions). For example |
Summarizing the recent discussions above and evaluation regarding the integration of SQL into the OpenSearch core. We explored various strategies listed in the updated description/ for incorporating SQL into the OpenSearch core, ultimately recommending against integration in favor of enhancing client capabilities. The initial approaches considered involve directly integrating SQL into the _search endpoint or as a new endpoint within the core, each presenting distinct advantages. However, these strategies also face significant drawbacks, including limited parity in DSL query support, non-uniform response structures, increased complexity in core repo, potential build system integration challenges and significantly increased cost in upfront development with no extra customer value over the current approach. The preferred approach advocates for maintaining the current setup with added transport clients for SQL, presenting a minimal change strategy that ensures compatibility with other plugins. (Preferred): No Core Integration; Enhance Client Capabilities and Add Cohesion in core dashboards features. The preferred approach is aiming to introduce a transport API for SQL and a new transport client library. This method ensures SQL compatibility with other plugins while maintaining the current system's integrity and minimizing changes. “ If the plan is to keep the SQL implementation as either a plugin or module component in the core repository, then moving the code from one repository to another seems more like an administrative decision than a software architecture one. “/paraphrasing @andrross / |
@dblock opensearch-project/OpenSearch-Dashboards#7081 if you have some time to check this one out too? |
@kavilla it looks like this proposal was closed? |
SQL/PPL via DSL in Search API.
1. Problem Statement.
Today, OpenSearch offers support for SQL and PPL query languages through the plugin endpoints
_plugins/sql
and_plugins/ppl
. However, clients using OpenSearch client libraries face limitations, as these libraries do not accommodate with plugin endpoints.To increase adoption with minimal disruption, our proposal introduces new SQL and PPL clauses directly into the SearchRequest body. This approach aims to facilitate the use of these languages through the Search API, streamlining access and integration for users.2. Summary
_search
API (e.g."result_format":"hit_object"
vs"result_format":"datarow"
) to support existing format or datarows. By default we still usehit_object
._search
API. If you send a SQL query and don’t explicitly specifyresult_format
, the format defaults todatarow
.From user perspective, the following example demonstrate SQL vis DSL request and response.
In this doc, we will discuss detailed design, limitations, and development plan.
3. Tenets:
4. Solution
4.1.Search API
4.1.1. Endpoint
DSL is being used in above APIs majorly and also asynchronous search. Validation exception would be thrown whenever a sql block is encountered in above unsupported APIs.
4.1.2.URL Parameters
All URL parameters are not supported.
4.1.3.Request Body
Sample Request Body:
aggs
parameter, you can define any number of aggregations. Each aggregation is defined by its name and one of the types of aggregations that OpenSearch supports. For more information, see Aggregations._source
field in the response.4.1.4.Response
If query type is SQL, response format is datarows. query response include datarows section. datarows section include schema and datarows, for example
4.2.Feature Parity - SQL vs DSL
Currently, SQL does not fully support all DSL query and aggregations. The following table highlight key query features missing support in SQL
and metrics aggregation function missing in SQL
4.3.Performance
SQL queries through the search endpoint should offer performance comparable to that of DSL queries. Users should not experience any degradation in performance. We use OpenSearch benchmark framework to compare DSL query and SQL via Search.
4.4.Client
All OpenSearch client should support SQL query and datarow response. We have three different types of clients.
4.5.Security
Calls to SQL via _search include index names in the request body, so they have the same access policy considerations as the bulk, mget, and msearch operations.
5. Detailed Design.
5.1 Approach 1: Extend SearchPlugin Interface and Integrate with SQL plugin.
In this approach, we will add a new function to the existing SearchPlugin interface to introduce a new construct called
QueryEngineSpec
. Plugins can implement theSearchPlugin
interface to introduce a query engine that takes over the search request to produce theSearchResponse
.QueryEngineSpec
defines the name of the spec, which will be the key in the DSL under which the respective query engine request parameters are enclosed. Once OpenSearch-core receives a request with a clause containing a key defined by aQueryEngineSpec
, OpenSearch-core creates the Query Engine and transfers the request to the plugin via the Query Engine Object.We would be introducing a new field
data_rows_output
in InternalSearchResponse. SQL, PPL Plugins would populate this field and also other meta information oftook
,shards
,timed_out
information. In case of normal DSL query, hits object would be formulated.5.2 Approach 2: Leverage Search Pipelines and introduce new SQL/PPL processors.
In the current state, search pipelines offer the following types of processors:
The Flow framework proposes to include a new type of search processor in Pipelines. This search processor would take a SearchRequest and produce a SearchResponse. We can leverage this new Search Processor type to introduce SQLSearchProcessor and PPLSearchProcessor, which take over the Search Request whenever there is an SQL and PPL block in the search request, respectively. Since an SQL request is a blocking operation, we would build something like processResponseAsync, which takes in a SearchResponse Listener. For including SQL/PPL-related request and response bodies, we could leverage the already existing ext clauses feature in SearchResponse and SearchRequest.
Request Flow
5. Task breakdown
6. Open Questions and Edge Case Scenarios.
explain
query param and send the explain response back in ppl block and sql block.took
,_shards
,hits
, etc.?-1
andnull
, hinting to the customer that those fields are invalid and that the entire response will be in either theext
block orppl
block.7. POC:
The text was updated successfully, but these errors were encountered: