-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES mapping that fits our use case #3
base: master
Are you sure you want to change the base?
Conversation
I have created a mappings file which fits our use case after we push the data into ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping is looking nice. Good use of the type keyword for country, city, email and link. The number of shards and replicas is to be determined later on in the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple of things that might interfere here:
json in and of itself doesn't allow for comments. I'm not too sure how it would interact with the es-python package when adding a body though.
Also this doesn't have a clean way to interface with the rest of the codebase as it's written (I believe) to interact with Kibana.
The cleanest way to get a mapping would be to create a function
def get_mapping() -> dict:
which reads the mappings.txt
file and turns it into a dict. That dict can then be picked up by whatever function that ultimately creates the index.
Same goes for the settings file
forward43/Mappings.txt
Outdated
"contactPerson.name": {"type": "text"}, | ||
"cotactPerson.email": {"type": "keyword"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also go for camelcase: contact_person
as the rest is in camel case as well.
I'm also not too sure how ES would resolve dots in the naming of keys, but the usual way is:
"contact_person" : {
"properties" : {
"name": {"type": "text"},
"email": {"type": "keyword"},
"number": {"type": "long"}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elasticsearch translates fields using a dot notation into using the “properties” parameter behind the scenes. So,
"contact_person.name": {"type": "text"},
"cotact_person.email": {"type": "keyword"},
"contact_person.number": {"type": "long"}
would translate to
"contact_person" : {
"properties" : {
"name": {"type": "text"},
"email": {"type": "keyword"},
"number": {"type": "long"}
}
}
forward43/Mappings.txt
Outdated
{ | ||
"settings": { | ||
"number_of_shards" : x, # x is the number of shards | ||
"number_of_replicas" : y # y is the number of replicas | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some optimizations ES has for keywords called normalizers, which you can add into your settings
https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html
Thanks for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add normalisers to the settings. The example Andrew mentioned shows a good example that makes sure that search queries are lowercase and without accent marks.
The format used in settings.txt and mappings.txt does not follow the format assumed by ES:
"contact_person.name": {"type": "text"},
"cotact_person.email": {"type": "keyword"},
"contact_person.number": {"type": "long"}
"type" is missing here.
|
||
|
||
def get_settings(settingsFile): | ||
'''(file) -> dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what file is. Is it settingsFile? The type of that isn't a dictionary but a string (of the path name / file name).
def get_settings(settingsFile): | ||
'''(file) -> dictionary | ||
|
||
Propertes for creating the index n ES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two typos: Properties for creating the index in ES.
Return the contents of the text file with a dictionary. | ||
''' | ||
|
||
with open(''+settingsFile) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the method you open the file. What is the purpose of adding the '' in front of settingsFile? It seems redundant.
|
||
|
||
def get_mappings(mappingsFile): | ||
'''(file) -> dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. If mappingsFile is meant by file, the type is string. In the code the content of mappingsFile is transformed to a dict.
I have created a mappings file which fits our use case after we push the data into ES.