-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic search for data aggregations and queries #12
Comments
Dear Eva,
Your request for including the n1ql refers to oneprovider, onezone docker
images or both?
Michal
|
Dear Eva,
Regarding the metadata in rc9, I was informed they were moved to:
custom_metadata
bucket, where id's are the same as for files and have 2 values
(space_id,value), where value is of a form {key:value}, where key is a name
of xattr and the value is it's value. I hope that helps.
Michal
…On Fri, Dec 2, 2016 at 7:53 PM, Michal Orzechowski < ***@***.***> wrote:
Dear Eva,
Your request for including the n1ql refers to oneprovider, onezone docker
images or both?
Michal
|
Dear Michal, thank you very much for your replies. Regarding n1ql it would be needed in oneprovider docker image (where metadata are stored) but if you think it could be useful for elastic search on metadata also on onezone image. I will now investigate the new bucket custom_metadata. Furthermore, I was also experimenting the advanced metadata queries (https://onedata.org/docs/doc/using_onedata/metadata.html#advanced-metadata-queries) using REST API. From which onedata release does it work? Regards, Eva. |
Dear Eva,
Onedata.org is atm. at 3.0.0-rc9.
The newest version is always deployed:
beta.oneda.org - atm. rc11
however the documentation is exactly the same in rc9 and rc11 for that
particular feature:
https://beta.onedata.org/docs/doc/using_onedata/metadata.html#advanced-metadata-queries
So it works for rc9 and on.
I will pass the requirement for n1ql in our upcoming Friday planing
meeting.
Regards,
Michal
…On Mon, Dec 5, 2016 at 10:33 AM, evasciacca ***@***.***> wrote:
Dear Michal,
thank you very much for your replies. Regarding n1ql it would be needed in
oneprovider docker image (where metadata are stored) but if you think it
could be useful for elastic search on metadata also on onezone image.
I will now investigate the new bucket custom_metadata.
Furthermore, I was also experimenting the advanced metadata queries (
https://onedata.org/docs/doc/using_onedata/metadata.html#
advanced-metadata-queries) using REST API. From which onedata release
does it work?
Regards,
Eva.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOuppJB-GATMcLdvbCOtfG-wc8Zn_3hks5rE9pMgaJpZM4LABIZ>
.
|
Dear Michal, are there any news regarding the inclusion of the n1ql for next releases? We have updated OneData to release rc11 and we are testing the queries using the indexing functions over the metadata and we have successfully done some queries using them. Now, we would like to create more complex queries including more than one index combination, e.g. to perform this kind of sql query: SELECT FILEPATH/FILEID from Do you know if it is possible and how to do this? Thank you in advance, Eva. |
We've had a discussion on out hipchat support channel and we've come up with examples of such queries, which were successfully used by Eva. To summarize the topic, here are the examples provided: environment SPACE_ID=4dfVO2NTmZJJzrzIYOzUJkXFA8ndZRveThPgSscPbtA
TOKEN=MDAxNWxvY2F00aW9uIG9uZXpvbmUKMDAzYmlkZW500aWZpZXIgYXJSbF9HRWZ6WWpzNlFsR001PTG1QVl9Fb009yN1kzSVlaZXVONUtWRjhrNAowMDFhY2lkIHRpbWUgPCAxNTE4MDE3MjI4CjAwMmZzaWduYXR1cmUgNLd02VRyADxrt12Sk8d01oSYMzFIcsG2IuxjdzNaLky5AK create file curl -k -v --tlsv1.2 -H "X-Auth-Token: $TOKEN" -H "X-CDMI-Specification-Version: 1.1.1" -H "Accept: application/cdmi-object" -H "Content-Type: application/cdmi-object" \
-d '{"metadata": {"ORIGIN_ID": "41", "PROGRAM_ID": "001", "OBSERV_ID": "00001", "RUNS_ID": "000004", "MODES_ID": "R" , "SEQUENCE_NUM": "000", "PACKET_TYPE": "1002", "PROP_ID": "0000000000000001", "DATATYPE": "0000", "TSTART": "430580855", "TSTOP": "430580965" }}' \
-X PUT "https://localhost:8443/cdmi/test/file?metadata" complexindex.js function(meta) {
if(meta['ORIGIN_ID'] && meta['PROGRAM_ID']) {
return [meta['ORIGIN_ID'], meta['PROGRAM_ID']];
}
return null;
} create index curl -v -k --tlsv1.2 -X POST -H "X-Auth-Token: $TOKEN" -H 'Content-type: application/javascript' -d @complexindex.js "https://localhost:8443/api/v3/oneprovider/index?space_id=$SPACE_ID&name=index1" query index INDEX_ID=t2zLyYaR0HPHO8d6wICNYiRDqzc8lGyVVcGUlbepBWo
curl -v -k --tlsv1.2 -Ss -H "X-Auth-Token: $TOKEN" -H 'Content-type: application/javascript' -X GET "https://localhost:8443/api/v3/oneprovider/query-index/$INDEX_ID?key=\[\"41\",\"001\"\]&stale=false" Regards, |
@xorver thank you for providing this short guide. |
Dear Tomasz and Michal, thank you very much, your support was very helpful. I was able to perform some test queries. Now we are waiting for further improvements such as the possibility to delete unused indexes that is foreseen for next OneData releases (maybe rc13). Cheers, Eva. |
Dear Tomasz, we would need to include in the query some information on the creation time of the dataset. Therefore I added to the complexindex.js the meta['ctime'] attribute but it is not working. Is this operation maybe working only on extended attributes and not on internal ones? Thank you, Eva. |
Dear Eva, Those indices work only on user created attributes, not the internal ones (as you suspected). Regards, |
Dear Tomasz, thank you for your prompt replay. Would it be possible to add this possibility as enhancement for next OneData releases. Maybe the fastest way would be to add the internal attributes related to time creation/modification as extended ones (not user modifiable) on data ingestion / modification. Or maybe modify rest API to work with internal attributes... Please let us know if you will make progresses in this direction! Kind regards, Eva. |
Dear Eva, Times are stored in a separate document, as they're changed frequently. We may possibly consider merging them with metadata document, but we've decided not to because the change of access time would provoke a lot of index updates. Moreover, there are other internal attributes that we cannot possibly add (such as size) because they are generated dynamically. Adding only a part of them would be confusing. There are also some compatibility issues that may be hard to overcome during an upgrade. If you're only interested in creation time I would recommend adding it manually after creation. By the way ctime corresponds to attribute change time, not creation, so it may be newer if the file got modified after creation. I'll discuss with my colleagues whether to add automatically to each file xattr called creation_time, maybe it will not be a big deal. Regards, |
Thank you very much. |
Dear Tomasz, also we wold like to understand if it would be possible to use operators such as "not equal", "like" or "order by" and if these are planned to be implemented in next releases. Thank you in advance, Eva. |
Dear Eva, We're limited to the API provided by couchbase (http://docs.couchbase.com/admin/admin/Views/views-querying.html). We create an index directly in couchbase, wrapping your function with ours. Our 'map' function uses your function to get the key (based on metadata) and emits (your_key, file_id). Then every query is simply passed directly into couchbase (you define keys, flags etc, and obtain file_ids in return). Couchbase queries do not support operations: "not equal", "like", "order by". But you may create an index with a function that emits only files that satisfy your requirements, e. g. For emitting only files authored by 'Eva*' that are of version different than '3.0.0': function(meta) {
if(meta['VERSION'] && meta['AUTHOR'])
if(meta['VERSION'] != '3.0.0' && meta['AUTHOR'].match(/Eva.*/)
return 1;
return null;
} Then query it with key=1 or without key at all (it will return all entries). However, for each query pattern you'd need to create a new index, and it does not cover "order by". Regards, |
@evasciacca can we consider this issue as closed or there are still things to be resolved/answered here? |
Dear @groundnuty, it is still not yet very clear how to use the new custom_metadata bucket from n1ql. I am not even able to see it from the couchbase console (see attached screenshot). Could you please provide more information on how to query metadata from the couchbase db to retrive the datasets using more complex queries directly on the metadata bucket? Thank you in advance, |
Dear Eva, As discussed in Catania, for now, we will try to add FileId to each custom_metadata document, so you may query Couchbase directly. It should be available in rc15, I'll write info here when it's done. Tomasz |
Dear @xorver , |
I have an example for you, I've entered the machine with running couchbase, then executed: root@couchbase0-p1:~# /opt/couchbase/bin/cbq
Couchbase query shell connected to http://localhost:8093/ . Type Ctrl-D to exit.
cbq> CREATE PRIMARY INDEX sync ON sync;
{
"requestID": "126f23f8-c507-425f-8932-d672d7ad8209",
"signature": null,
"results": [
],
"status": "success",
"metrics": {
"elapsedTime": "3.763397768s",
"executionTime": "3.763321304s",
"resultCount": 0,
"resultSize": 0
}
}
cbq> SELECT sync.* FROM sync where sync.`<record_type>` = "custom_metadata";
{
"requestID": "97a0abc4-ecad-43b4-8e69-40173e75f926",
"signature": {
"*": "*"
},
"results": [
{
"\u003crecord_type\u003e": "custom_metadata",
"\u003crecord_version\u003e": 1,
"_deleted": "false",
"_sync": {
"history": {
"channels": [
null
],
"parents": [
-1
],
"revs": [
"1-64a3bd48eccc930028116168fb118ae0"
]
},
"recent_sequences": [
142
],
"rev": "1-64a3bd48eccc930028116168fb118ae0",
"sequence": 142,
"time_saved": "2017-05-19T10:15:36.656732653Z"
},
"space_id": "s1",
"value": {
"onedata_json": {
"key": []
},
"onedata_rdf": "",
"xattr_name": "xattr_value"
}
}
],
"status": "success",
"metrics": {
"elapsedTime": "28.257026ms",
"executionTime": "28.126285ms",
"resultCount": 2,
"resultSize": 1933
}
} Metadata is in "value" field of fetched json. In rc15 there will be a file_id in that document also. I guess you may do it using libs provided by couchbase or REST API as well. |
Dear @xorver ,
|
You need to extend the "where" clause just as in typical SQL, e. g. SELECT sync.* FROM sync where sync.`<record_type>` = "custom_metadata" AND sync.value.DATATYPE = "0000"; The syntax: http://www.dofactory.com/sql/where-and-or-not |
Yes I have tried this before but with no success:
|
Try it like this: SELECT sync.* FROM sync WHERE sync.`<record_type>` = "custom_metadata" AND sync.`value`.`name` = "something"; It worked with my setup. |
Great, thank you very much! It works now... so from next release r15 we will automatically find a file_id from which we can retrieve the data file, is it correct? Thank you, Eva. |
Yes, it's correct. |
Dear developers,
We would like to understand how to perform elastic search for data aggregations and queries. We have done some tests on the previous OneData release 3.0.0 beta 7 using the direct connection to CouchBase on the bucket 'xattr'. In release 3.0.0 rc9 we did not find where and how metadata are stored on the CouchBase. It would be very useful for us to include n1ql in the containers so that we could access the n1ql GUI on port 8094 and test the queries. Any news on this topic?
Thank you,
Eva.
The text was updated successfully, but these errors were encountered: