-
Notifications
You must be signed in to change notification settings - Fork 70
Ch 2: Getting Started
Book data represented as JSON document
{
"title":"Effective Java",
"author":"Joshua Bloch",
"release_date":"2001-06-01",
"amazon_rating":4.7,
"best_seller":true,
"prices": {
"usd":9.95,
"gbp":7.95,
"eur":8.95
}
}
curl -XPUT "http://localhost:9200/books/_doc/1" -H 'Content-Type: application/json' -d'
{
"title":"Effective Java",
"author":"Joshua Bloch",
"release_date":"2001-06-01",
"amazon_rating":4.7,
"best_seller":true,
"prices": {
"usd":9.95,
"gbp":7.95,
"eur":8.95
}
}'
PUT books/_doc/1
{
"title":"Effective Java",
"author":"Joshua Bloch",
"release_date":"2001-06-01",
"amazon_rating":4.7,
"best_seller":true,
"prices": {
"usd":9.95,
"gbp":7.95,
"eur":8.95
}
}
The response to the above request would be:
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Index a document with ID 2:
PUT books/_doc/2
{
"title":"Core Java Volume I - Fundamentals",
"author":"Cay S. Horstmann",
"release_date":"2018-08-27",
"amazon_rating":4.8,
"best_seller":true,
"prices": {
"usd":19.95,
"gbp":17.95,
"eur":18.95
}
}
Index another (third) document with ID 3:
PUT books/_doc/3
{
"title":"Java: A Beginner’s Guide",
"author":"Herbert Schildt",
"release_date":"2018-11-20",
"amazon_rating":4.2,
"best_seller":true,
"prices": {
"usd":19.99,
"gbp":19.99,
"eur":19.99
}
}
Using the _count
API to retrieve the number of documents available in the books
index:
GET books/_count
This will return the number of books in the books
index:
{
"count" : 3,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Given an ID, we can fetch the document by issuing a GET command:
GET books/_doc/1
This should return the document that we've had indexed earlier:
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 3,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Effective Java",
"author" : "Joshua Bloch",
"release_date" : "2001-06-01",
"amazon_rating" : 4.7,
"best_seller" : true,
"prices" : {
"usd" : 9.95,
"gbp" : 7.95,
"eur" : 8.95
}
}
}
To fetch only the source and ignoring the metadata, issue the command: GET books/_source/1
To fetch multiple documents using a set a given IDs, we use a ids
query on a _search
endpoint:
GET books/_search
{
"query": {
"ids": {
"values": [1,2,3]
}
}
}
This will return all three documents if available.
We can fetch all documents in one go from the books index using a generic _search:
GET books/_search
This will return all the documents available in the books
index. This is equivalent to a match_all
search query.
Develop a match
query to fetch book(s) written by Joshua:
GET books/_search
{
"query": {
"match": {
"author": "Joshua"
}
}
}
It would return one book written by Joshua:
...
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0417082,
"_source" : {
"title" : "Effective Java",
"author" : "Joshua Bloch",
...
}
}
]
GET books/_search
{
"query": {
"match": {
"title": {
"query": "Effective java",
"operator": "and"
}
}
}
}
You'd expect one book returned to you, with an exact title "Effective Java"
Execute the following script in Kiabana window (the data is also present in the code/datasets/books-kibana-dataset.txt
POST _bulk
{"index":{"_index":"books","_id":"1"}}
{"title": "Core Java Volume I – Fundamentals","author": "Cay S. Horstmann","edition": 11, "synopsis": "Java reference book that offers a detailed explanation of various features of Core Java, including exception handling, interfaces, and lambda expressions. Significant highlights of the book include simple language, conciseness, and detailed examples.","amazon_rating": 4.6,"release_date": "2018-08-27","tags": ["Programming Languages, Java Programming"]}
{"index":{"_index":"books","_id":"2"}}
{"title": "Effective Java","author": "Joshua Bloch", "edition": 3,"synopsis": "A must-have book for every Java programmer and Java aspirant, Effective Java makes up for an excellent complementary read with other Java books or learning material. The book offers 78 best practices to follow for making the code better.", "amazon_rating": 4.7, "release_date": "2017-12-27", "tags": ["Object Oriented Software Design"]}
{"index":{"_index":"books","_id":"3"}}
{"title": "Java: A Beginner’s Guide", "author": "Herbert Schildt","edition": 8,"synopsis": "One of the most comprehensive books for learning Java. The book offers several hands-on exercises as well as a quiz section at the end of every chapter to let the readers self-evaluate their learning.","amazon_rating": 4.2,"release_date": "2018-11-20","tags": ["Software Design & Engineering", "Internet & Web"]}
{"index":{"_index":"books","_id":"4"}}
{"title": "Java - The Complete Reference","author": "Herbert Schildt","edition": 11,"synopsis": "Convenient Java reference book examining essential portions of the Java API library, Java. The book is full of discussions and apt examples to better Java learning.","amazon_rating": 4.4,"release_date": "2019-03-19","tags": ["Software Design & Engineering", "Internet & Web", "Computer Programming Language & Tool"]}
{"index":{"_index":"books","_id":"5"}}
{"title": "Head First Java","author": "Kathy Sierra and Bert Bates","edition":2, "synopsis": "The most important selling points of Head First Java is its simplicity and super-effective real-life analogies that pertain to the Java programming concepts.","amazon_rating": 4.3,"release_date": "2005-02-18","tags": ["IT Certification Exams", "Object-Oriented Software Design","Design Pattern Programming"]}
{"index":{"_index":"books","_id":"6"}}
{"title": "Java Concurrency in Practice","author": "Brian Goetz with Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, and Doug Lea","edition": 1,"synopsis": "Java Concurrency in Practice is one of the best Java programming books to develop a rich understanding of concurrency and multithreading.","amazon_rating": 4.3,"release_date": "2006-05-09","tags": ["Computer Science Books", "Programming Languages", "Java Programming"]}
{"index":{"_index":"books","_id":"7"}}
{"title": "Test-Driven: TDD and Acceptance TDD for Java Developers","author": "Lasse Koskela","edition": 1,"synopsis": "Test-Driven is an excellent book for learning how to write unique automation testing programs. It is a must-have book for those Java developers that prioritize code quality as well as have a knack for writing unit, integration, and automation tests.","amazon_rating": 4.1,"release_date": "2007-10-22","tags": ["Software Architecture", "Software Design & Engineering", "Java Programming"]}
{"index":{"_index":"books","_id":"8"}}
{"title": "Head First Object-Oriented Analysis Design","author": "Brett D. McLaughlin, Gary Pollice & David West","edition": 1,"synopsis": "Head First is one of the most beautiful finest book series ever written on Java programming language. Another gem in the series is the Head First Object-Oriented Analysis Design.","amazon_rating": 3.9,"release_date": "2014-04-29","tags": ["Introductory & Beginning Programming", "Object-Oriented Software Design", "Java Programming"]}
{"index":{"_index":"books","_id":"9"}}
{"title": "Java Performance: The Definite Guide","author": "Scott Oaks","edition": 1,"synopsis": "Garbage collection, JVM, and performance tuning are some of the most favorable aspects of the Java programming language. It educates readers about maximizing Java threading and synchronization performance features, improve Java-driven database application performance, tackle performance issues","amazon_rating": 4.1,"release_date": "2014-03-04","tags": ["Design Pattern Programming", "Object-Oriented Software Design", "Computer Programming Language & Tool"]}
{"index":{"_index":"books","_id":"10"}}
{"title": "Head First Design Patterns", "author": "Eric Freeman & Elisabeth Robson with Kathy Sierra & Bert Bates","edition": 10,"synopsis": "Head First Design Patterns is one of the leading books to build that particular understanding of the Java programming language." ,"amazon_rating": 4.5,"release_date": "2014-03-04","tags": ["Design Pattern Programming", "Object-Oriented Software Design eTextbooks", "Web Development & Design eTextbooks"]}
This will index 10 books into Elasticsearch.
Execute the query to match "Java" across two fields - "title" and "synopsis"
GET books/_search
{
"_source": {
"includes": "title"
},
"query": {
"multi_match": {
"query": "Java",
"fields": ["title","synopsis"]
}
}
}
The results will be something like this:
{
...
"hits" : [{
...
"_score" : 0.33537668,
"_source" : {
"title" : "Effective Java”,
"synopsis":"A must-have book for every Java…”,
...
},{
...
"_score" : 0.30060259,
"_source" : {
"title" : "Head First Java”,
“synopsis":"The most important selling points of Head First Java”
...
},
...
}]
}
GET books/_search
{
"_source": {
"includes": ["title","synopsis"]
},
"query": {
"multi_match": {
"query": "Java",
"fields": ["title^3","synopsis"]
}
}
}
The results would be like the following (compare the _score
attribute before and after)
{
...
"hits" : [{
...
"_score" : 1.0061301,
"_source" : {
"title" : "Effective Java”,
"synopsis":"A must-have book for every Java…”,
...
},{
...
"_score" : 0.90180784,
"_source" : {
"title" : "Head First Java”,
“synopsis":"The most important selling points of Head First Java”
...
},
...
}]
}
When you compare the score, the result for Effective Java is 0.33537668
before boosting but the score rose to 1.0061301
after boosting the title field.
Searching for books with an exact phrase
GET books/_search
{
"query": {
"match_phrase": {
"synopsis": "must-have book for every Java programmer"#B Our phrase
}
}
}
This query will result in:
"hits" : [{
"_score" : 7.300332,
"_source" : {
"title" : "Effective Java",
"synopsis" : "A must-have book for every Java programmer and Java ...",
}]}
We can enable highlights in the return results by coding a highlight
object at the root level:
GET books/_search
{
"query": {
"match_phrase": {
"synopsis": "must-have book for every Java programmer"
}
},
"highlight": {#A The highlight object at the same level as query object
"fields": {# B mention which fields we wish to have highlights
"synopsis": {}
}
}
}
This query will return:
"hits" : [
"_source" : {
...
"title" : "Effective Java",
"synopsis" : "A must-have book for every Java
},
"highlight" : {
"synopsis" : [
"A <em>must</em>-<em>have</em> <em>book</em> <em>for</em> <em>every</em> <em>Java</em> <em>programmer</em> and Java aspirant.."]}}
]
The matches highlighted with a html markup tag (em) indicating the words are emphasised
At times, we may have a word or two missing in a phrase. We can use a match_phrase query with slop parameter to fix this. The following query was missing a "for" word in the phrase:
GET books/_search
{
"query": {
"match_phrase": {
"synopsis": {
"query": "must-have book every Java programmer",
"slop": 1
}
}
}
}
PUT books/_doc/99
{
"title":"Java Collections Deep Dive"
}
PUT books/_doc/100
{
"title":"Java Computing World"
}
Query to fetch all books with a title having “Java co” prefix:
GET books/_search
{
"query": {
"match_phrase_prefix": {
"title": "Java co"
}
}
}
This query will search for all books that have a title like Java concurrency
, Java collections
, Java computing
and so on.
The fuzzy query forgives users' spelling mistakes. The following query returns Java related books in spite of the user incorrectly specifying the search word as 'kava':
GET books/_search
{
"query": {
"fuzzy": {
"title": {
"value": "kava",
"fuzziness": 1
}
}
}
}
You should get hits:
{
...
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
}
...
}
}
A term
query is used to fetch exact matches for a value provided in the search criteria.
Fetching third edition books
GET books/_search
{
"_source": ["title","edition"],
"query": {
"term": {
"edition": {
"value": 3
}
}
}
}
This query returns all third edition books (we only have one book - Effective Java):
"hits" : [{
...
"_score" : 1.0,
"_source" : {
"title" : "Effective Java",
"edition" : 3,
...
}
}]
A range
query to fetch books that rate between 4.5 and 5 stars
GET books/_search
{
"query": {
"range": {
"amazon_rating": {
"gte": 4.5,
"lte": 5
}
}
}
}
The above range query should fetch three books
The must clause of a bool
query returning all books authored by Joshua
GET books/_search
{
"query": {
"bool": {
"must": [{
"match": {
"author": "Joshua Bloch"
}
}]
}
}
}
The must
clause can have multiple leaf queries, for example, the following query finds all books written by Joshua matching with an exact phrase:
GET books/_search
{
"query": {
"bool": {
"must": [{
"match": {
"author": "Joshua Bloch"
}
},
{
"match_phrase": {
"synopsis": "best Java programming books"
}
}]
}
}
}
A bool
query with must and must not clauses in action:
GET books/_search
{
"query": {
"bool": {
"must": [{ "match": { "author": "Joshua" } }],
"must_not": [{ "range": { "amazon_rating": { "lt": 4.7}}}]
}
}
}
A should query increases the relevancy score when a match is found:
GET books/_search
{
"query": {
"bool": {
"must": [{"match": {"author": "Joshua"}}],
"must_not":[{"range":{"amazon_rating":{"lt":4.7}}}],
"should": [{"match": {"tags": "Software"}}]
}
}
A filter
clause wouldn't affect the relevancy score when a match is found:
GET books/_search
{
"query": {
"bool": {
"must": [{"match": {"author": "Joshua"}}],
"must_not":[{"range":{"amazon_rating":{"lt":4.7}}}],
"should": [{"match": {"tags": "Software"}}],
"filter":[{"range":{"release_date":{"gte": "2015-01-01"}}}]}
}
}
The bool
query with the additional filter on edition
field
GET books/_search
{
"query": {
"bool": {
"must": [{"match": {"author": "Joshua"}}],
"must_not":[{"range":{"amazon_rating":{"lt":4.7}}}],
"should": [{"match": {"tags": "Software"}}],
"filter":[
{"range":{"release_date":{"gte": "2015-01-01"}}},
{"term": {"edition": 3}}
]}
}
}
Copy the contents of covid-26march2021.txt from datasets (https://github.com/madhusudhankonda/elasticsearch-in-action/blob/main/datasets/covid-26march2021.txt) to Kibana's DevTools. Once copied, execute them using _bulk
API
Fetching the total number of critical patients
GET covid/_search
{
"size": 0,
"aggs": {
"critical_patients": {
"sum": {
"field": "critical"
}
}
}
}
This should return:
"aggregations" : {
"critical_patients" : {
"value" : 88090.0
}
}
The query to fetch the highest number of deaths among the 10 countries we have in our data set:
GET covid/_search
{
"size": 0,
"aggs": {
"total_deaths": {
"max": {
"field": "deaths"
}
}
}
}
The result would be:
"aggregations" : {
"max_deaths" : {
"value" : 561142.0
}
}
We can find the minimum (min
), average (avg
), and others too. But there’s one statistical function that returns all these basic metrics in one go: the stats
metric:
GET covid/_search
{
"size": 0,
"aggs": {
"all_stats": {
"stats": {
"field": "deaths"
}
}
}
}
Here’s the snippet of the response:
"aggregations" : {
"all_stats" : {
"count" : 20,
"min" : 30772.0,
"max" : 561142.0,
"avg" : 163689.1,
"sum" : 3273782.0
}
}
The extended_stats
will return further more stats like variance, standard deviation etc:
GET covid/_search
{
"aggs": {
"all_extended_stats": {
"extended_stats": {
"field": "deaths"
}
}
}
}
Fetching the countries by number of critical patients in buckets of 2500:
GET covid/_search
{
"size": 0,
"aggs": {
"critical_patients_as_histogram": {
"histogram": {
"field": "critical",
"interval": 2500
}
}
}
}
The response should be:
"aggregations" : {
"critical_patients_as_histogram" : {
"buckets" : [{
"key" : 0.0,
"doc_count" : 8
},
{
"key" : 2500.0,
"doc_count" : 6
},
{
"key" : 5000.0,
"doc_count" : 0
},
{
"key" : 7500.0,
"doc_count" : 6
}]
}
}
Casualties by custom ranges using range bucketing:
GET covid/_search
{
"size": 0,
"aggs": {
"range_countries": {
"range": { #A The range bucketing aggregation
"field": "deaths", #B Field on which we apply the agg
"ranges": [#C Define the custom ranges
{"to": 60000},
{"from": 60000,"to": 70000},
{"from": 70000,"to": 80000},
{"from": 80000,"to": 120000}
]
}
}
}
}
This will return
"aggregations" : {
"range_countries" : {
"buckets" : [{
"key" : "*-60000.0",
"to" : 60000.0,
"doc_count" : 2
},{
"key" : "60000.0-70000.0",
"from" : 60000.0,
"to" : 70000.0,
"doc_count" : 0
},{
"key" : "70000.0-80000.0",
"from" : 70000.0,
"to" : 80000.0,
"doc_count" : 4
},{
"key" : "80000.0-120000.0",
"from" : 80000.0,
"to" : 120000.0,
"doc_count" : 6
}]
}
}