Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added capability to do >10k searches with Elasticsearch #95

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 32 additions & 3 deletions flask/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def search_es(es_query):
'keywords'
],
'operator': 'or',
'fuzziness': 'AUTO',
'fuzziness': 'AUTO'
}
},
'script_score': {
Expand Down Expand Up @@ -77,10 +77,39 @@ def empty_search_es(offset, limit, allowed_graphs):
}
}
},
'from': offset,
'size': limit
}
return utils.get_es().search(index=utils.get_config()['elasticsearch_index_name'], body=body)

# Initial ES query
initial_search = utils.get_es().search(index=utils.get_config()['elasticsearch_index_name'], body=body, scroll='5s')

# Store inital parts
parts = initial_search

# Get scroll ID for search
scroll_id = initial_search['_scroll_id']

# We will limit the size to 30k results
size = 0

# While the scroll search still returns results and we are below 30k total parts:
while (len(initial_search['hits']['hits']) and size < 30000):
initial_search = utils.get_es().scroll(scroll_id=scroll_id, scroll='5s')
# Save ID of latest scroll search in case ID changed
scroll_id = initial_search['_scroll_id']

# Append new parts to list of parts
parts['hits']['hits'].extend(initial_search['hits']['hits'])
# Increment counter of parts
size = size + len(initial_search['hits']['hits'])

# Clear scroll
utils.get_es().clear_scroll(scroll_id)

# Get all parts between from and limit
parts['hits']['hits'] = parts['hits']['hits'][offset:(offset+limit+1)]

return parts


def extract_query(sparql_query):
Expand Down