forked from engarde-dev/engarde
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
46028e6
commit 7705c98
Showing
6 changed files
with
1,523 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
Example | ||
======= | ||
|
||
Engarde really shines when you have a dataset that regularly receives updates. | ||
We'll work with a data set of customer preferences on trains, available here_. | ||
This is a static dataset and isn't being updated, but you could imagine that each month the Dutch authorities upload a new month's worth of data. | ||
|
||
.. _here: http://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Train.html | ||
|
||
We can start by making some very basic assertions, that the dataset is the correct shape, and that a few columns are the correct dtypes. Assertions are made as decorators to functions that return a DataFrame. | ||
|
||
.. ipython:: python | ||
import pandas as pd | ||
import engarde.decorators as ed | ||
pd.set_option('display.max_rows', 10) | ||
dtypes = dict( | ||
price1=int, | ||
price2=int, | ||
time1=int, | ||
time2=int, | ||
change1=int, | ||
change2=int, | ||
comfort1=int, | ||
comfort2=int | ||
) | ||
@ed.is_shape((-1, 11)) | ||
@ed.has_dtypes(items=dtypes) | ||
def unload(): | ||
trains = pd.read_csv("data/trains.csv", index_col=0) | ||
return trains | ||
One very important part of the design of Engarde is that your code, the code actually | ||
doing the work, shouldn't have to change. I don't want a bunch of asserts cluttering | ||
up the logic of what's happening. This is a perfect case for decorators. | ||
|
||
The order of execution here is ``unload`` returns the ``DataFrame``, ``trains``. | ||
Next, ``ed.has_dtypes`` asserts that ``trains`` has the correct dtypes, as specified with ``dtypes``. Once that assert passes, ``has_dtypes`` passes ``trains`` along to the next check, and so on, until the original caller gets back ``unload``. | ||
|
||
Since people are rational, their first choice is surely going to be better in *at least* one way than their second choice (faster, more comfortable, ...). This is fundamental to our analysis later on, so we'll explicitly state it in our code, and check it in our data. | ||
|
||
.. ipython:: python | ||
def rational(df): | ||
""" | ||
Check that at least one criteria is better. | ||
""" | ||
r = ((df.price1 < df.price2) | (df.time1 < df.time2) | | ||
(df.change1 < df.change2) | (df.comfort1 > df.comfort2)) | ||
return r | ||
@ed.is_shape((-1, 11)) | ||
@ed.has_dtypes(items=dtypes) | ||
@ed.verify_all(rational) | ||
def unload(): | ||
trains = pd.read_csv("data/trains.csv", index_col=0) | ||
return trains | ||
df = unload() | ||
df.head() | ||
OK, so apparently people aren't rational... We'll fix this problem by ignoring those people (why change your mind when you can change the data?). | ||
|
||
.. ipython:: python | ||
@ed.verify_all(rational) | ||
def drop_silly_people(df): | ||
r = ((df.price1 < df.price2) | (df.time1 < df.time2) | | ||
(df.change1 < df.change2) | (df.comfort1 > df.comfort2)) | ||
return df[r] | ||
@ed.is_shape((-1, 11)) | ||
@ed.has_dtypes(items=dtypes) | ||
def unload(): | ||
trains = pd.read_csv("data/trains.csv", index_col=0) | ||
return trains | ||
df = unload().pipe(drop_silly_people) | ||
df.head() | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,6 +91,7 @@ Contents: | |
:maxdepth: 1 | ||
|
||
install.rst | ||
example.rst | ||
checks.rst | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
""" | ||
reST directive for syntax-highlighting ipython interactive sessions. | ||
""" | ||
|
||
from sphinx import highlighting | ||
from IPython.lib.lexers import IPyLexer | ||
|
||
def setup(app): | ||
"""Setup as a sphinx extension.""" | ||
|
||
# This is only a lexer, so adding it below to pygments appears sufficient. | ||
# But if somebody knows what the right API usage should be to do that via | ||
# sphinx, by all means fix it here. At least having this setup.py | ||
# suppresses the sphinx warning we'd get without it. | ||
pass | ||
|
||
# Register the extension as a valid pygments lexer. | ||
# Alternatively, we could register the lexer with pygments instead. This would | ||
# require using setuptools entrypoints: http://pygments.org/docs/plugins | ||
|
||
ipy2 = IPyLexer(python3=False) | ||
ipy3 = IPyLexer(python3=True) | ||
|
||
highlighting.lexers['ipython'] = ipy2 | ||
highlighting.lexers['ipython2'] = ipy2 | ||
highlighting.lexers['ipython3'] = ipy3 |
Oops, something went wrong.