Skip to content

Commit

Permalink
ENH: support for msgpack serialization/deserialization
Browse files Browse the repository at this point in the history
DOC: install.rst mention

DOC: added license from msgpack_numpy

PERF: changed Timestamp and DatetimeIndex serialization for speedups

      add vb_suite benchmarks

ENH: added to_msgpack method in generic.py, and default import into pandas

TST: all packers to always be imported, fail on usage with no msgpack installed

DOC: added mentions in release notes, v0.11.1, basics

ENH: provide automatic list if multiple args passed to to_msgpack

DOC: changed docs to 0.12

ENH: iterator support for stream unpacking

Conflicts:

	RELEASE.rst

ENH: added support for Panel,SparseSeries,SparseDataFrame,SparsePanel,IntIndex,BlockIndex

ENH: handle np.datetime64,np.timedelta64,date,timedelta types

TST: added compression (zlib/blosc) via big hack

DOC: moved back to 0.11.1 docs

BLD: integrated with built-in msgpack

DOC: io.rst fixes

PERF: update vb_suite for packers

TST: fix for test_list_float_complex test?

PERF: prototype for packing faster

PERF: was still using tolist on indicies

DOC: v0.13.0.txt and release notes

DOC: release notes

PERF: revamples packers vbench to use packers,csv,pickle,hdf_store,hdf_table

TST: better test comparison s for numpy types

BLD: py3k compat
  • Loading branch information
jreback committed Oct 1, 2013
1 parent 1501356 commit d9225fb
Show file tree
Hide file tree
Showing 11 changed files with 1,196 additions and 11 deletions.
33 changes: 33 additions & 0 deletions LICENSES/MSGPACK_NUMPY_LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. -*- rst -*-
License
=======

Copyright (c) 2013, Lev Givon.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of Lev Givon nor the names of any
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
68 changes: 68 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ object.
* ``read_hdf``
* ``read_sql``
* ``read_json``
* ``read_msgpack``
* ``read_html``
* ``read_stata``
* ``read_clipboard``
Expand All @@ -48,6 +49,7 @@ The corresponding ``writer`` functions are object methods that are accessed like
* ``to_hdf``
* ``to_sql``
* ``to_json``
* ``to_msgpack``
* ``to_html``
* ``to_stata``
* ``to_clipboard``
Expand Down Expand Up @@ -1732,6 +1734,72 @@ module is installed you can use it as a xlsx writer engine as follows:
.. _io.hdf5:

Serialization
-------------

msgpack
~~~~~~~

.. _io.msgpack:

.. versionadded:: 0.11.1

Starting in 0.11.1, pandas is supporting the ``msgpack`` format for
object serialization. This is a lightweight portable binary format, similar
to binary JSON, that is highly space efficient, and provides good performance
both on the writing (serialization), and reading (deserialization).

.. warning::

This is a very new feature of pandas. We intend to provide certain
optimizations in the io of the ``msgpack`` data. We do not intend this
format to change (and will be backward compatible if we do).

.. ipython:: python
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
You can pass a list of objects and you will receive them back on deserialization.

.. ipython:: python
pd.to_msgpack('foo.msg', df, 'foo', np.array([1,2,3]), s)
pd.read_msgpack('foo.msg')
You can pass ``iterator=True`` to iterate over the unpacked results

.. ipython:: python
for o in pd.read_msgpack('foo.msg',iterator=True):
print o
You can pass ``append=True`` to the writer to append to an existing pack

.. ipython:: python
df.to_msgpack('foo.msg',append=True)
pd.read_msgpack('foo.msg')
Unlike other io methods, ``to_msgpack`` is available on both a per-object basis,
``df.to_msgpack()`` and using the top-level ``pd.to_msgpack(...)`` where you
can pack arbitrary collections of python lists, dicts, scalars, while intermixing
pandas objects.

.. ipython:: python
pd.to_msgpack('foo2.msg', { 'dict' : [ { 'df' : df }, { 'string' : 'foo' }, { 'scalar' : 1. }, { 's' : s } ] })
pd.read_msgpack('foo2.msg')
.. ipython:: python
:suppress:
:okexcept:
os.remove('foo.msg')
os.remove('foo2.msg')
HDF5 (PyTables)
---------------

Expand Down
24 changes: 13 additions & 11 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,19 @@ New features
Experimental Features
~~~~~~~~~~~~~~~~~~~~~

- The new :func:`~pandas.eval` function implements expression evaluation using
``numexpr`` behind the scenes. This results in large speedups for complicated
expressions involving large DataFrames/Series.
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
evaluates an expression in the context of the ``DataFrame``.
- A :meth:`~pandas.DataFrame.query` method has been added that allows
you to select elements of a ``DataFrame`` using a natural query syntax nearly
identical to Python syntax.
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
(:issue:`4897`).
- The new :func:`~pandas.eval` function implements expression evaluation using
``numexpr`` behind the scenes. This results in large speedups for complicated
expressions involving large DataFrames/Series.
- :class:`~pandas.DataFrame` has a new :meth:`~pandas.DataFrame.eval` that
evaluates an expression in the context of the ``DataFrame``.
- A :meth:`~pandas.DataFrame.query` method has been added that allows
you to select elements of a ``DataFrame`` using a natural query syntax nearly
identical to Python syntax.
- ``pd.eval`` and friends now evaluate operations involving ``datetime64``
objects in Python space because ``numexpr`` cannot handle ``NaT`` values
(:issue:`4897`).
- Add msgpack support via ``pd.read_msgpack()`` and ``pd.to_msgpack()/df.to_msgpack()`` for serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format (:issue:`686`)

Improvements to existing features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
29 changes: 29 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,35 @@ to unify methods and behaviors. Series formerly subclassed directly from
s.a = 5
s

IO Enhancements
~~~~~~~~~~~~~~~

- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format. :ref:`See the docs<io.msgpack>`

.. ipython:: python

df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')

s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
pd.to_msgpack('foo.msg', df, s)
pd.read_msgpack('foo.msg')

You can pass ``iterator=True`` to iterator over the unpacked results

.. ipython:: python

for o in pd.read_msgpack('foo.msg',iterator=True):
print o

.. ipython:: python
:suppress:
:okexcept:

os.remove('foo.msg')

Bug Fixes
~~~~~~~~~

Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -805,6 +805,10 @@ def to_hdf(self, path_or_buf, key, **kwargs):
from pandas.io import pytables
return pytables.to_hdf(path_or_buf, key, self, **kwargs)

def to_msgpack(self, path_or_buf, **kwargs):
from pandas.io import packers
return packers.to_msgpack(path_or_buf, self, **kwargs)

def to_pickle(self, path):
"""
Pickle (serialize) object to input file path
Expand Down
1 change: 1 addition & 0 deletions pandas/io/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
from pandas.io.sql import read_sql
from pandas.io.stata import read_stata
from pandas.io.pickle import read_pickle, to_pickle
from pandas.io.packers import read_msgpack, to_msgpack
Loading

0 comments on commit d9225fb

Please sign in to comment.