Skip to content

Commit

Permalink
Merge pull request #35 from alkemics/cov
Browse files Browse the repository at this point in the history
pre-commit
  • Loading branch information
alk-lbinet authored Jun 22, 2020
2 parents 818123c + 48614bf commit 63b9ea1
Show file tree
Hide file tree
Showing 27 changed files with 67 additions and 77 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/python-3-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.5, 3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8]
env:
PYTHON: ${{ matrix.python-version }}
OS: 'ubuntu-latest'
Expand All @@ -37,6 +37,9 @@ jobs:
flake8 --count --ignore=W503,W605 --show-source --statistics pandagg
# on tests, more laxist: allow "missing whitespace after ','" and "line too long"
flake8 --count --ignore=W503,W605,E231,E501 --show-source --statistics tests
- name: Lint with black
run: |
black --check .
- name: Test with pytest and generate coverage report
run:
pytest --cov=./pandagg --cov-report=xml
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.*
!.github
!.pre-commit-config.yaml
*.py[co]
*.egg
*.egg-info
Expand Down
11 changes: 11 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 19.3b0
hooks:
- id: black
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ We actively welcome your pull requests.
5. Make sure your code lints.

## Any contributions you make will be under the MIT Software License
In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project.
In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project.
Feel free to contact the maintainers if that's a concern.

## Issues
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[![PyPI Latest Release](https://img.shields.io/pypi/v/pandagg.svg)](https://pypi.org/project/pandagg/)
[![License](https://img.shields.io/pypi/l/pandagg.svg)](https://github.com/leonardbinet/pandagg/blob/master/LICENSE)
[![License](https://img.shields.io/pypi/l/pandagg.svg)](https://github.com/alkemics/pandagg/blob/master/LICENSE)
![Python package](https://github.com/alkemics/pandagg/workflows/Python%203%20Tests/badge.svg)
![Python package](https://github.com/alkemics/pandagg/workflows/Python%202%20Tests/badge.svg)
[![Coverage](https://codecov.io/github/alkemics/pandagg/coverage.svg?branch=master)](https://codecov.io/gh/alkemics/pandagg)
Expand All @@ -11,8 +11,8 @@
**pandagg** is a Python package providing a simple interface to manipulate ElasticSearch queries and aggregations. Its goal is to make it
the easiest possible to explore data indexed in an Elasticsearch cluster.

Some of its interactive features are inspired by [pandas](https://github.com/pandas-dev/pandas) library, hence the name **pandagg** which aims to apply **panda**s to Elasticsearch
**agg**regations.
Some of its interactive features are inspired by [pandas](https://github.com/pandas-dev/pandas) library, hence the name **pandagg** which aims to apply **panda**s to Elasticsearch
**agg**regations.

**pandagg** is also greatly inspired by the official high level python client [elasticsearch-dsl](https://github.com/elastic/elasticsearch-dsl-py),
and is intended to make it more convenient to deal with deeply nested queries and aggregations.
Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ generate: clean api-doc build
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
## Sphinx documentation

Documentation stems from 3 sources:
- automatically generated based on repository sources in `docs/source/reference` directory
- automatically generated based on repository sources in `docs/source/reference` directory
- manually written documentation in all other files of `docs/source` directory
- a jupyter notebook file generated following procedure in `example/imdb`, then running notebook and exporting
- a jupyter notebook file generated following procedure in `example/imdb`, then running notebook and exporting
html file

#### Procedure
Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, "pandagg.tex", u"pandagg Documentation", u"Léonard Binet", "manual"),
(master_doc, "pandagg.tex", u"pandagg Documentation", u"Léonard Binet", "manual")
]


Expand All @@ -155,7 +155,7 @@
"pandagg",
"One line description of project.",
"Miscellaneous",
),
)
]


Expand Down
1 change: 0 additions & 1 deletion docs/source/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -378,4 +378,3 @@ Cluster indices discovery
*************************

TODO

14 changes: 7 additions & 7 deletions examples/imdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ In this case, relational databases (SQL) are a good fit to store with consistenc
Yet indexing some of this data in a optimized search engine will allow more powerful queries.

## Query requirements
In this example, we'll suppose most usage/queries requirements will be around the concept of movie (rather than usages
In this example, we'll suppose most usage/queries requirements will be around the concept of movie (rather than usages
focused on fetching actors or directors, even though it will still be possible with this data structure).

The index should provide good performances trying to answer these kind question (non-exhaustive):
- in which movies this actor played?
- what movies genres were most popular among decades?
- which actors have played in best-rated movies, or worst-rated movies?
- which actors movies directors prefer to cast in their movies?
- which are best ranked movies of last decade in Action or Documentary genres?
- which are best ranked movies of last decade in Action or Documentary genres?
- ...


Expand All @@ -25,7 +25,7 @@ I exported following SQL tables from MariaDB [following these instructions](http

Relational schema is the following:

![imdb tables](ressources/imdb_ijs.svg)
![imdb tables](ressources/imdb_ijs.svg)

## Index mapping

Expand All @@ -46,9 +46,9 @@ Movie:

#### Which fields require nesting?
Since genres contain a single keyword field, in no case we need it to be stored as a nested field.
On the contrary, actor roles and directors require a nested mapping if we consider applying multiple
simultanous query clauses on their sub-fields (for instance search movie in which actor is a woman AND whose role is
nurse).
On the contrary, actor roles and directors require a nested mapping if we consider applying multiple
simultanous query clauses on their sub-fields (for instance search movie in which actor is a woman AND whose role is
nurse).
More information on distinction between array and nested fields [here](
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html).

Expand Down Expand Up @@ -101,7 +101,7 @@ Note to Elastic, if you have a spare cluster to prepare demo indices on which yo
operations we could skip this step ;)

#### Dump tables
Follow instruction on bottom of https://relational.fit.cvut.cz/dataset/IMDb page and dump following tables in a
Follow instruction on bottom of https://relational.fit.cvut.cz/dataset/IMDb page and dump following tables in a
directory:
- movies.csv
- movies_genres.csv
Expand Down
9 changes: 1 addition & 8 deletions examples/imdb/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,7 @@
from os.path import join
from elasticsearch import Elasticsearch, helpers
from examples.imdb.conf import ES_HOST, ES_USE_AUTH, ES_PASSWORD, ES_USER, DATA_DIR
from pandagg.mapping import (
Mapping,
Keyword,
Text,
Float,
Nested,
Integer,
)
from pandagg.mapping import Mapping, Keyword, Text, Float, Nested, Integer

index_name = "movies"
mapping = Mapping(
Expand Down
2 changes: 1 addition & 1 deletion pandagg/interactive/mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ def _set_agg_property_if_required(self):
def __call__(self, *args, **kwargs):
print(
json.dumps(
self._tree.to_dict(), indent=2, sort_keys=True, separators=(",", ": "),
self._tree.to_dict(), indent=2, sort_keys=True, separators=(",", ": ")
)
)
6 changes: 1 addition & 5 deletions pandagg/tree/aggs/aggs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@
from pandagg.tree._tree import Tree
from pandagg.tree.mapping import Mapping

from pandagg.node.aggs.abstract import (
BucketAggNode,
AggNode,
ShadowRoot,
)
from pandagg.node.aggs.abstract import BucketAggNode, AggNode, ShadowRoot
from pandagg.node.aggs.bucket import Nested, ReverseNested, Terms
from pandagg.node.aggs.pipeline import BucketSelector, BucketSort

Expand Down
4 changes: 2 additions & 2 deletions pandagg/tree/query/abstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -567,13 +567,13 @@ def _compound_update(self, name, new_compound, mode):
)
if not existing_param:
self.insert(
item=new_compound.subtree(param_node.identifier), parent_id=name,
item=new_compound.subtree(param_node.identifier), parent_id=name
)
continue
if mode == REPLACE:
self.drop_node(existing_param.identifier)
self.insert(
item=new_compound.subtree(param_node.identifier), parent_id=name,
item=new_compound.subtree(param_node.identifier), parent_id=name
)
continue
if mode == ADD:
Expand Down
4 changes: 2 additions & 2 deletions pandagg/tree/response.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def __init__(self, aggs, index):
self.__index = index

def _clone_init(self, deep=False):
return AggsResponseTree(aggs=self.__aggs.clone(deep=deep), index=self.__index,)
return AggsResponseTree(aggs=self.__aggs.clone(deep=deep), index=self.__index)

def parse(self, raw_response):
"""Build response tree from ElasticSearch aggregation response
Expand Down Expand Up @@ -70,7 +70,7 @@ def _parse_node_with_children(self, agg_node, raw_response, pid=None):
self.insert_node(bucket, pid)
for child in self.__aggs.children(agg_node.name, id_only=False):
self._parse_node_with_children(
agg_node=child, raw_response=raw_value, pid=bucket.identifier,
agg_node=child, raw_response=raw_value, pid=bucket.identifier
)

def bucket_properties(self, bucket, properties=None, end_level=None, depth=None):
Expand Down
2 changes: 1 addition & 1 deletion requirements-test-2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ pytest-cov
# last mock version compatible with P2 (will drop constraint when removing support for P2)
mock<=3.0.5
# idem, last pandas compatible version with P2
pandas<=0.23.1
pandas<=0.23.1
4 changes: 3 additions & 1 deletion requirements-test.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
pre-commit
black
flake8
pytest
pytest-cov
mock
pandas
pandas
7 changes: 1 addition & 6 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,7 @@
here = os.path.abspath(os.path.dirname(__file__))
README = open(os.path.join(here, "README.md")).read()

install_requires = [
"six",
"future",
"lighttree==0.0.8",
"elasticsearch>=7.0.0,<8.0.0",
]
install_requires = ["six", "future", "lighttree==0.0.8", "elasticsearch>=7.0.0,<8.0.0"]


setup(
Expand Down
12 changes: 5 additions & 7 deletions tests/interactive/test_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def test_quick_agg(self):

mapping_tree = Mapping(MAPPING)
client_bound_mapping = IMapping(
mapping_tree, client=client_mock, index="classification_report_index_name",
mapping_tree, client=client_mock, index="classification_report_index_name"
)

workflow_field = client_bound_mapping.workflow
Expand All @@ -169,7 +169,7 @@ def test_quick_agg(self):
)
self.assertEqual(
response,
[(1, {"doc_count": 25, "key": 1}), (2, {"doc_count": 50, "key": 2}),],
[(1, {"doc_count": 25, "key": 1}), (2, {"doc_count": 50, "key": 2})],
)
client_mock.search.assert_called_once()
client_mock.search.assert_called_with(
Expand All @@ -188,7 +188,7 @@ def test_quick_agg_nested(self):
client_mock = Mock(spec=["search"])
es_response_mock = {
"_shards": {"failed": 0, "successful": 135, "total": 135},
"aggregations": {"local_metrics": {"avg_agg": {"value": 23},},},
"aggregations": {"local_metrics": {"avg_agg": {"value": 23}}},
"hits": {"hits": [], "max_score": 0.0, "total": 300},
"timed_out": False,
"took": 30,
Expand All @@ -197,7 +197,7 @@ def test_quick_agg_nested(self):

mapping_tree = Mapping(MAPPING)
client_bound_mapping = IMapping(
mapping_tree, client=client_mock, index="classification_report_index_name",
mapping_tree, client=client_mock, index="classification_report_index_name"
)

local_train_support = client_bound_mapping.local_metrics.dataset.support_train
Expand All @@ -209,9 +209,7 @@ def test_quick_agg_nested(self):
raw_output=True,
query={"term": {"classification_type": "multiclass"}},
)
self.assertEqual(
response, [(None, {"value": 23}),],
)
self.assertEqual(response, [(None, {"value": 23})])
client_mock.search.assert_called_once()
client_mock.search.assert_called_with(
body={
Expand Down
6 changes: 3 additions & 3 deletions tests/node/agg/test_bucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def test_filter(self):
buckets,
[
# key -> bucket
(None, {"doc_count": 12, "sub_aggs": {}}),
(None, {"doc_count": 12, "sub_aggs": {}})
],
)

Expand All @@ -98,15 +98,15 @@ def test_nested(self):
buckets,
[
# key -> bucket
(None, {"doc_count": 12, "sub_aggs": {}}),
(None, {"doc_count": 12, "sub_aggs": {}})
],
)

# test extract bucket value
self.assertEqual(Nested.extract_bucket_value({"doc_count": 12}), 12)

# test get_filter
nested_agg = Nested(name="some_agg", path="nested_path",)
nested_agg = Nested(name="some_agg", path="nested_path")
self.assertEqual(nested_agg.get_filter(None), None)

# test query dict
Expand Down
5 changes: 2 additions & 3 deletions tests/node/query/test_full_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,7 @@ def test_match_bool_prefix_clause(self):
q3 = MatchBoolPrefix(message="quick brown f")
self.assertEqual(q3.body, {"message": {"query": "quick brown f"}})
self.assertEqual(
q3.to_dict(),
{"match_bool_prefix": {"message": {"query": "quick brown f"}}},
q3.to_dict(), {"match_bool_prefix": {"message": {"query": "quick brown f"}}}
)
self.assertEqual(
q3.line_repr(depth=None),
Expand All @@ -155,7 +154,7 @@ def test_match_phrase_clause(self):
q3 = MatchPhrase(message="this is a test")
self.assertEqual(q3.body, {"message": {"query": "this is a test"}})
self.assertEqual(
q3.to_dict(), {"match_phrase": {"message": {"query": "this is a test"}}},
q3.to_dict(), {"match_phrase": {"message": {"query": "this is a test"}}}
)
self.assertEqual(
q3.line_repr(depth=None),
Expand Down
2 changes: 1 addition & 1 deletion tests/node/query/test_term_level.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def test_terms_clause(self):
self.assertEqual(q.body, body)
self.assertEqual(q.to_dict(), expected)
self.assertEqual(
q.line_repr(depth=None), 'terms, boost=1, user=["kimchy", "elasticsearch"]',
q.line_repr(depth=None), 'terms, boost=1, user=["kimchy", "elasticsearch"]'
)

def test_terms_set_clause(self):
Expand Down
4 changes: 1 addition & 3 deletions tests/test_discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ def test_pandagg_wrapper(self, indice_get_mock):
self.assertTrue(hasattr(indices, "classification_report_one"))
report_index = indices.classification_report_one
self.assertIsInstance(report_index, Index)
self.assertEqual(
report_index.__str__(), "<Index 'classification_report_one'>",
)
self.assertEqual(report_index.__str__(), "<Index 'classification_report_one'>")
self.assertEqual(report_index.name, "classification_report_one")

# ensure mapping presence
Expand Down
2 changes: 1 addition & 1 deletion tests/test_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def test_parse_as_tabular_multiple_roots(self):
"avg_f1_score": {"value": 0.815},
}
index_names, index_values = Aggregations(
data=raw_response, aggs=my_agg, index=None, client=None, query=None,
data=raw_response, aggs=my_agg, index=None, client=None, query=None
).to_tabular(index_orient=True, expand_sep=" || ")

self.assertEqual(index_names, [])
Expand Down
4 changes: 2 additions & 2 deletions tests/test_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ def test_source_on_clone(self):
{
"_source": {"includes": ["foo.bar.*"], "excludes": ["foo.one"]},
"query": {
"bool": {"filter": [{"term": {"title": {"value": "python"}}}],}
"bool": {"filter": [{"term": {"title": {"value": "python"}}}]}
},
},
Search()
Expand All @@ -307,7 +307,7 @@ def test_source_on_clone(self):
{
"_source": False,
"query": {
"bool": {"filter": [{"term": {"title": {"value": "python"}}}],}
"bool": {"filter": [{"term": {"title": {"value": "python"}}}]}
},
},
Search().source(False).filter("term", title="python").to_dict(),
Expand Down
Loading

0 comments on commit 63b9ea1

Please sign in to comment.