more exampelse; more docs

topper-123 · Jul 11, 2015 · 7705c98 · 7705c98
1 parent 46028e6
commit 7705c98
Show file tree

Hide file tree

Showing 6 changed files with 1,523 additions and 1 deletion.
diff --git a/docs/conf.py b/docs/conf.py
@@ -20,7 +20,7 @@
 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
-#sys.path.insert(0, os.path.abspath('.'))
+sys.path.insert(0, os.path.abspath('./sphinxext'))
 
 # -- General configuration ------------------------------------------------
 
@@ -33,6 +33,8 @@
 extensions = [
     'sphinx.ext.autodoc',
     'sphinx.ext.mathjax',
+    'ipython_directive',
+    'ipython_console_highlighting',
     'numpydoc'
 ]
 

diff --git a/docs/example.rst b/docs/example.rst
@@ -0,0 +1,84 @@
+Example
+=======
+
+Engarde really shines when you have a dataset that regularly receives updates.
+We'll work with a data set of customer preferences on trains, available here_.
+This is a static dataset and isn't being updated, but you could imagine that each month the Dutch authorities upload a new month's worth of data.
+
+.. _here: http://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Train.html
+
+We can start by making some very basic assertions, that the dataset is the correct shape, and that a few columns are the correct dtypes. Assertions are made as decorators to functions that return a DataFrame.
+
+.. ipython:: python
+
+   import pandas as pd
+   import engarde.decorators as ed
+
+   pd.set_option('display.max_rows', 10)
+
+   dtypes = dict(
+       price1=int,
+       price2=int,
+       time1=int,
+       time2=int,
+       change1=int,
+       change2=int,
+       comfort1=int,
+       comfort2=int
+   )
+
+   @ed.is_shape((-1, 11))
+   @ed.has_dtypes(items=dtypes)
+   def unload():
+       trains = pd.read_csv("data/trains.csv", index_col=0)
+       return trains
+
+One very important part of the design of Engarde is that your code, the code actually
+doing the work, shouldn't have to change. I don't want a bunch of asserts cluttering
+up the logic of what's happening. This is a perfect case for decorators.
+
+The order of execution here is ``unload`` returns the ``DataFrame``, ``trains``.
+Next, ``ed.has_dtypes`` asserts that ``trains`` has the correct dtypes, as specified with ``dtypes``. Once that assert passes, ``has_dtypes`` passes ``trains`` along to the next check, and so on, until the original caller gets back ``unload``.
+
+Since people are rational, their first choice is surely going to be better in *at least* one way than their second choice (faster, more comfortable, ...). This is fundamental to our analysis later on, so we'll explicitly state it in our code, and check it in our data.
+
+.. ipython:: python
+
+   def rational(df):
+       """
+       Check that at least one criteria is better.
+       """
+       r = ((df.price1 < df.price2) | (df.time1 < df.time2) |
+            (df.change1 < df.change2) | (df.comfort1 > df.comfort2))
+       return r
+
+   @ed.is_shape((-1, 11))
+   @ed.has_dtypes(items=dtypes)
+   @ed.verify_all(rational)
+   def unload():
+       trains = pd.read_csv("data/trains.csv", index_col=0)
+       return trains
+
+    df = unload()
+    df.head()
+
+OK, so apparently people aren't rational... We'll fix this problem by ignoring those people (why change your mind when you can change the data?).
+
+.. ipython:: python
+
+   @ed.verify_all(rational)
+   def drop_silly_people(df):
+       r = ((df.price1 < df.price2) | (df.time1 < df.time2) |
+            (df.change1 < df.change2) | (df.comfort1 > df.comfort2))
+       return df[r]
+
+
+   @ed.is_shape((-1, 11))
+   @ed.has_dtypes(items=dtypes)
+   def unload():
+       trains = pd.read_csv("data/trains.csv", index_col=0)
+       return trains
+
+    df = unload().pipe(drop_silly_people)
+    df.head()
+
diff --git a/docs/index.rst b/docs/index.rst
@@ -91,6 +91,7 @@ Contents:
    :maxdepth: 1
 
    install.rst
+   example.rst
    checks.rst
 
 

diff --git a/docs/sphinxext/ipython_console_highlighting.py b/docs/sphinxext/ipython_console_highlighting.py
@@ -0,0 +1,27 @@
+"""
+reST directive for syntax-highlighting ipython interactive sessions.
+
+"""
+
+from sphinx import highlighting
+from IPython.lib.lexers import IPyLexer
+
+def setup(app):
+    """Setup as a sphinx extension."""
+
+    # This is only a lexer, so adding it below to pygments appears sufficient.
+    # But if somebody knows what the right API usage should be to do that via
+    # sphinx, by all means fix it here.  At least having this setup.py
+    # suppresses the sphinx warning we'd get without it.
+    pass
+
+# Register the extension as a valid pygments lexer.
+# Alternatively, we could register the lexer with pygments instead. This would
+# require using setuptools entrypoints: http://pygments.org/docs/plugins
+
+ipy2 = IPyLexer(python3=False)
+ipy3 = IPyLexer(python3=True)
+
+highlighting.lexers['ipython'] = ipy2
+highlighting.lexers['ipython2'] = ipy2
+highlighting.lexers['ipython3'] = ipy3