ENH: support for msgpack serialization/deserialization

DOC: install.rst mention DOC: added license from msgpack_numpy PERF: changed Timestamp and DatetimeIndex serialization for speedups add vb_suite benchmarks ENH: added to_msgpack method in generic.py, and default import into pandas TST: all packers to always be imported, fail on usage with no msgpack installed DOC: added mentions in release notes, v0.11.1, basics ENH: provide automatic list if multiple args passed to to_msgpack DOC: changed docs to 0.12 ENH: iterator support for stream unpacking Conflicts: RELEASE.rst ENH: added support for Panel,SparseSeries,SparseDataFrame,SparsePanel,IntIndex,BlockIndex ENH: handle np.datetime64,np.timedelta64,date,timedelta types TST: added compression (zlib/blosc) via big hack DOC: moved back to 0.11.1 docs BLD: integrated with built-in msgpack DOC: io.rst fixes PERF: update vb_suite for packers TST: fix for test_list_float_complex test? PERF: prototype for packing faster PERF: was still using tolist on indicies DOC: v0.13.0.txt and release notes DOC: release notes PERF: revamples packers vbench to use packers,csv,pickle,hdf_store,hdf_table TST: better test comparison s for numpy types BLD: py3k compat
mlovci · Oct 1, 2013 · d9225fb · d9225fb
1 parent 1501356
commit d9225fb
Show file tree

Hide file tree

Showing 11 changed files with 1,196 additions and 11 deletions.
diff --git a/LICENSES/MSGPACK_NUMPY_LICENSE b/LICENSES/MSGPACK_NUMPY_LICENSE
@@ -0,0 +1,33 @@
+.. -*- rst -*-
+
+License
+=======
+
+Copyright (c) 2013, Lev Givon.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above
+  copyright notice, this list of conditions and the following
+  disclaimer in the documentation and/or other materials provided
+  with the distribution.
+* Neither the name of Lev Givon nor the names of any
+  contributors may be used to endorse or promote products derived
+  from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -36,6 +36,7 @@ object.
     * ``read_hdf``
     * ``read_sql``
     * ``read_json``
+    * ``read_msgpack``
     * ``read_html``
     * ``read_stata``
     * ``read_clipboard``
@@ -48,6 +49,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
     * ``to_hdf``
     * ``to_sql``
     * ``to_json``
+    * ``to_msgpack``
     * ``to_html``
     * ``to_stata``
     * ``to_clipboard``
@@ -1732,6 +1734,72 @@ module is installed you can use it as a xlsx writer engine as follows:
 
 .. _io.hdf5:
 
+Serialization
+-------------
+
+msgpack
+~~~~~~~
+
+.. _io.msgpack:
+
+.. versionadded:: 0.11.1
+
+Starting in 0.11.1, pandas is supporting the ``msgpack`` format for 
+object serialization. This is a lightweight portable binary format, similar
+to binary JSON, that is highly space efficient, and provides good performance 
+both on the writing (serialization), and reading (deserialization).
+
+.. warning::
+
+   This is a very new feature of pandas. We intend to provide certain 
+   optimizations in the io of the ``msgpack`` data. We do not intend this
+   format to change (and will be backward compatible if we do).
+
+.. ipython:: python
+
+   df = DataFrame(np.random.rand(5,2),columns=list('AB'))
+   df.to_msgpack('foo.msg')
+   pd.read_msgpack('foo.msg')
+   s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
+
+You can pass a list of objects and you will receive them back on deserialization.
+
+.. ipython:: python
+
+   pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
+   pd.read_msgpack('foo.msg')
+
+You can pass ``iterator=True`` to iterate over the unpacked results
+
+.. ipython:: python
+
+   for o in pd.read_msgpack('foo.msg',iterator=True):
+       print o
+
+You can pass ``append=True`` to the writer to append to an existing pack
+
+.. ipython:: python
+
+   df.to_msgpack('foo.msg',append=True)
+   pd.read_msgpack('foo.msg')
+
+Unlike other io methods, ``to_msgpack`` is available on both a per-object basis,
+``df.to_msgpack()`` and using the top-level ``pd.to_msgpack(...)`` where you
+can pack arbitrary collections of python lists, dicts, scalars, while intermixing
+pandas objects.
+
+.. ipython:: python
+
+   pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })
+   pd.read_msgpack('foo2.msg')
+
+.. ipython:: python
+   :suppress:
+   :okexcept:
+
+   os.remove('foo.msg')
+   os.remove('foo2.msg')
+
 HDF5 (PyTables)
 ---------------
 

diff --git a/doc/source/release.rst b/doc/source/release.rst
@@ -64,17 +64,19 @@ New features
 Experimental Features
 ~~~~~~~~~~~~~~~~~~~~~
 
-- The new :func:`~pandas.eval` function implements expression evaluation using
-  ``numexpr`` behind the scenes. This results in large speedups for complicated
-  expressions involving large DataFrames/Series.
-- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
-  evaluates an expression in the context of the ``DataFrame``.
-- A :meth:`~pandas.DataFrame.query` method has been added that allows
-  you to select elements of a ``DataFrame`` using a natural query syntax nearly
-  identical to Python syntax.
-- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
-  objects in Python space because ``numexpr`` cannot handle ``NaT`` values
-  (:issue:`4897`).
+  - The new :func:`~pandas.eval` function implements expression evaluation using
+    ``numexpr`` behind the scenes. This results in large speedups for complicated
+    expressions involving large DataFrames/Series.
+  - :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
+    evaluates an expression in the context of the ``DataFrame``.
+  - A :meth:`~pandas.DataFrame.query` method has been added that allows
+    you to select elements of a ``DataFrame`` using a natural query syntax nearly
+    identical to Python syntax.
+  - ``pd.eval`` and friends now evaluate operations involving ``datetime64``
+    objects in Python space because ``numexpr`` cannot handle ``NaT`` values
+    (:issue:`4897`).
+  - Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()/df.to_msgpack()`` for serialization
+     of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)
 
 Improvements to existing features
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/source/v0.13.0.txt b/doc/source/v0.13.0.txt
@@ -686,6 +686,35 @@ to unify methods and behaviors. Series formerly subclassed directly from
      s.a = 5
      s
 
+IO Enhancements
+~~~~~~~~~~~~~~~
+
+- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
+  of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`
+
+  .. ipython:: python
+
+        df = DataFrame(np.random.rand(5,2),columns=list('AB'))
+        df.to_msgpack('foo.msg')
+        pd.read_msgpack('foo.msg')
+
+        s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
+        pd.to_msgpack('foo.msg', df, s)
+        pd.read_msgpack('foo.msg')
+
+  You can pass ``iterator=True`` to iterator over the unpacked results
+
+  .. ipython:: python
+
+        for o in pd.read_msgpack('foo.msg',iterator=True):
+            print o
+
+  .. ipython:: python
+        :suppress:
+        :okexcept:
+
+        os.remove('foo.msg')
+
 Bug Fixes
 ~~~~~~~~~
 

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -805,6 +805,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
         from pandas.io import pytables
         return pytables.to_hdf(path_or_buf, key, self, **kwargs)
 
+    def to_msgpack(self, path_or_buf, **kwargs):
+        from pandas.io import packers
+        return packers.to_msgpack(path_or_buf, self, **kwargs)
+
     def to_pickle(self, path):
         """
         Pickle (serialize) object to input file path

diff --git a/pandas/io/api.py b/pandas/io/api.py
@@ -11,3 +11,4 @@
 from pandas.io.sql import read_sql
 from pandas.io.stata import read_stata
 from pandas.io.pickle import read_pickle, to_pickle
+from pandas.io.packers import read_msgpack, to_msgpack