Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES mapping that fits our use case #3

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

Adiguptakn
Copy link

I have created a mappings file which fits our use case after we push the data into ES.

I have created a mappings file which fits our use case after we push the data into ES.
Copy link
Collaborator

@MMAThijssen MMAThijssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping is looking nice. Good use of the type keyword for country, city, email and link. The number of shards and replicas is to be determined later on in the project?

forward43/Mappings.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@andrewsutjahjo andrewsutjahjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of things that might interfere here:
json in and of itself doesn't allow for comments. I'm not too sure how it would interact with the es-python package when adding a body though.

Also this doesn't have a clean way to interface with the rest of the codebase as it's written (I believe) to interact with Kibana.

The cleanest way to get a mapping would be to create a function
def get_mapping() -> dict:
which reads the mappings.txt file and turns it into a dict. That dict can then be picked up by whatever function that ultimately creates the index.

Same goes for the settings file

Comment on lines 29 to 30
"contactPerson.name": {"type": "text"},
"cotactPerson.email": {"type": "keyword"},
Copy link
Contributor

@andrewsutjahjo andrewsutjahjo Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also go for camelcase: contact_person as the rest is in camel case as well.

I'm also not too sure how ES would resolve dots in the naming of keys, but the usual way is:

"contact_person" : {
    "properties" : {
        "name": {"type": "text"},
        "email": {"type": "keyword"},
        "number": {"type": "long"}
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch translates fields using a dot notation into using the “properties” parameter behind the scenes. So,

      "contact_person.name": {"type": "text"}, 
      "cotact_person.email": {"type": "keyword"},
      "contact_person.number": {"type": "long"} 

would translate to

"contact_person" : {
    "properties" : {
        "name": {"type": "text"},
        "email": {"type": "keyword"},
        "number": {"type": "long"}
    }
}

Comment on lines 9 to 14
{
"settings": {
"number_of_shards" : x, # x is the number of shards
"number_of_replicas" : y # y is the number of replicas
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some optimizations ES has for keywords called normalizers, which you can add into your settings

https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html

@Adiguptakn
Copy link
Author

There's a couple of things that might interfere here:
json in and of itself doesn't allow for comments. I'm not too sure how it would interact with the es-python package when adding a body though.

Also this doesn't have a clean way to interface with the rest of the codebase as it's written (I believe) to interact with Kibana.

The cleanest way to get a mapping would be to create a function
def get_mapping() -> dict:
which reads the mappings.txt file and turns it into a dict. That dict can then be picked up by whatever function that ultimately creates the index.

Same goes for the settings file

Thanks for the review.
This mappings was specifically written to interact with Kibana. I will make the necessary changes and create a python function to read the mapping file and turns it into a dict.

Copy link
Collaborator

@MMAThijssen MMAThijssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add normalisers to the settings. The example Andrew mentioned shows a good example that makes sure that search queries are lowercase and without accent marks.

The format used in settings.txt and mappings.txt does not follow the format assumed by ES:

      "contact_person.name": {"type": "text"}, 
      "cotact_person.email": {"type": "keyword"},
      "contact_person.number": {"type": "long"} 

"type" is missing here.



def get_settings(settingsFile):
'''(file) -> dictionary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what file is. Is it settingsFile? The type of that isn't a dictionary but a string (of the path name / file name).

def get_settings(settingsFile):
'''(file) -> dictionary

Propertes for creating the index n ES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two typos: Properties for creating the index in ES.

Return the contents of the text file with a dictionary.
'''

with open(''+settingsFile) as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the method you open the file. What is the purpose of adding the '' in front of settingsFile? It seems redundant.



def get_mappings(mappingsFile):
'''(file) -> dictionary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. If mappingsFile is meant by file, the type is string. In the code the content of mappingsFile is transformed to a dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants