Skip to content

Deliberately restricted pickling library for Scala

License

Notifications You must be signed in to change notification settings

danarmak/safepickle

 
 

Repository files navigation

Table of Contents

safepickle

A deliberately restricted pickling library for Scala. It has certain features, and deliberately lacks others, because it is tailormade for a particular scenario I needed. Most people will prefer general purpose libraries such as scala-pickling, upickle, or rapture, but this one is optimized for the following goals:

  1. Security. Pickled input can be generated by untrusted sources. Unpickling must not instantiate unexpected classes, take unpredictable amounts of space or time, or produce values not of the expected type. The set of pickleable types, and the code that serializes them, is determined at compile time, and runtime reflection is never used.
  2. Certain changes to the definitions of pickled types are guaranteed to be backward and forward compatible, so different versions of the program can communicate, and pickled data can be used for long term storage.
  3. Backward incompatible changes can be managed explicitly, with version numbers and conversion code, allowing new code to read data written by old code, and old code to fail on encountering data written by new code.
  4. Pluggable backends for JSON, BSON and other similar formats.
  5. Pickled classes correspond directly to the pickled form (at least for JSON and BSON), making it easy to write classes to represent data whose main schema definition is written in terms of the pickled format.
  6. Performance: the picklers are thin layers on top of the backend implementations and should not contribute to pickling overhead in any scenario.

The library is also small enough to understand and validate by hand, and to make sure its performance is driven by that of the backend used (e.g. Jackson for JSON).

Usage overview

safepickle artifacts are published on Maven Central with the group ID com.fsist and the names safepickle, safepickle-jackson, safepickle-joda-time, safepickle-reactivemongo. The published artifact versions correspond to the release tags in the Github project.

The safepickle-* modules are separate because they add dependencies on other libraries - Jackson, Joda-Time and ReactiveMongo, respectively. They define picklers for their types and pickling backends for their JSON and BSON implementations.

To pickle or unpickle a value, you need to choose an implementation of PicklerBackend, such as JacksonPicklerBackend.String, and call its read and write methods. These two methods take an implicit parameter of type Pickler[T], which knows how to pickle and unpickle values of type T.

case class A(i: Int, b: String)
object A {
 implicit val pickler = Autogen[A]
}

val json = backend.write(A(1, "foo")) 
// Result: """{ "i": 1, "b": "foo }"""

Creating picklers

To support a type T you need to provide an instance of the typeclass Pickler[T]. It's possible to implement this trait manually, but this should be very rarely necessary. In ordinary use, picklers are generated using the Autogen macros.

Using Autogen

A simple usage example

case class C(a: String, b: Int = 2)
object C {
  implicit val pickler = Autogen[C]
}

sealed trait T
object T {
  case class One(a: String) extends T
  case object Two extends T

  implicit val pickler = Autogen.children[T, One | Two.type]
}

Autogen serializes a class by writing the arguments of its primary constructor and their values. These arguments have to be vals, as in a case class. The complete details are listed below.

Autogen.children can be used on any trait or class with descendants, and it doesn't have to be sealed. However, you have to provide an explicit list of the subtypes you wish to support. Because of a long-standing scala compiler issue, even when the parent type is sealed, if you invoke Autogen.children in the same compilation unit (file) where the parent type is defined, you must pass an explicit list of child types.

Providing sub-picklers for Autogen

The Autogen macros will use any implicit picklers that are in scope for pickling referenced types. They will also always use the built-in picklers defined in DefaultPicklers for primitive values (Int, String, etc.) and collections.

If a pickler is not available, they will call Autogen macros recursively for the missing types:

case class A(s: String)
case class B(a: A)
case class C(i: Int) {
  implicit val pickler = ...
}
case class D(b: B, c: C)
object D {
  implicit val pickler = Autogen[D]
}

The pickler generated by Autogen[D] will use the existing C.pickler, because it's available as an implicit on the C companion object. It will call Autogen[B], which will call Autogen[A].

Modifying Autogen's output

There are several annotations that can be placed on class parameters to modify Autogen's behavior:

@Name("foo") will pickle the parameter as if it were named foo.

@WriteDefault will include the parameter and its value in the pickled form, even if its value is equal to the default value of the parameter. The default behavior is to omit such parameters, and use the default value when unpickling.

Debugging with Autogen

The Autogen macros (apply and children) have variants named applyDebug and childrenDebug, which print the generated code as a compiler info message. You can use this to understand what's going wrong if the output of Autogen doesn't compile.

Using ConvertPickler

Sometimes the structure of your class doesn't match how you want the pickled form to look, or Autogen doesn't do exactly what you want. You could implement a Pickler[T] manually, but usually there is an easier way.

Autogen is designed to map JSON-like structures directly to Scala. If you want to achieve a particular pickled structure, you should always be able to describe it using case classes and Autogen. You can then implement a Pickler for your real class T by defining conversions between T and the class representing the pickled form. By convention, this looks like this:

case class Foo(i: Int)

object Foo {
  case class PickledFoo(i: String) 
  object PickledFoo {
    implicit val pickler = Autogen[PickledFoo]
  }
  
  implicit val pickler : Pickler[Foo] = new ConvertPickler[Foo, PickledFoo] {
    def convertTo(foo: Foo): PickledFoo = PickledFoo(foo.i.toString)
    def convertFrom(pickled: PickledFoo): Foo = Foo(pickled.i.toInt)
  }
}

In this example, the attribute Foo.i is pickled as a String and not an Int.

Picklers for standard types

safepickle provides Pickler implementations (as implicit definitions on the Pickler companion object) for the following "primitive" types: Boolean, String, Null, Int, Long, Float, Double.

It also provides pickler combinators, that is implicit methods of the Pickler companion object, that provide picklers for:

  • Iterable[T] and all sub-types, including Set[T], as well as Array[T] (which isn't an Iterable), and all Tuple types. Requires a Pickler[T].
  • Map[K, V] and all sub-types s are pickled as arrays, where each key-value pair is pickled as an array of size 2. Requires a Pickler[K] and a Pickler[V].
  • As a special case, Map[String, K] and subtypes are pickled as Objects. Requires a Pickler[V].

Self-referential types

If a type is directly self-referencing, Autogen works correctly:

case class Node(next: Option[Node])
object Node {
  implicit val pickler = Autogen[Node]
}

However, if a type indirectly references itself, using Autogen can either fail to compile or lead to infinite loops at runtime. (There are enough slightly different failure scenarios that I won't describe them in detail.)

The following doesn't work:

sealed trait T
object T {
  implicit val pickler = Autogen.children[T, C]
  case class C(ts: Seq[T]) extends T
}

The solution in all such cases is to use a separate implicit def to declare the pickler, and a private lazy val to actually create it:

sealed trait T
object T {
  implicit def pickler: Pickler[T] = thePickler
  private lazy val thePickler = Autogen.children[T, C]
  case class C(ts: Seq[T]) extends T
}

Pickling backends

safepickle supports multiple backends which write data in different formats and using different implementations. It comes with a backend for JSON that uses Jackson and another for BSON that uses ReactiveMongo, and more can be added.

A backend is declared by implementing trait PicklingBackend, which defines the types used (e.g. for JSON it might be a String), and provides a factory for PickleReaders and PickleWriters.

Implementations of trait PickleReader and trait PickleWriter provide low-level access to reading and writing a sequence of primitive values in a particular backend. Users of safepickle normally don't interact directly with these interfaces, except for manually implementing Picklers.

Included backends

safepickle includes two PicklerBackend implementations, one for JSON using the Jackson implementation, and one for BSON using ReactiveMongo, an Akka-based Scala driver for Mongo.

ReactiveMongo integration

The safepickle-reactivemongo module includes, in addition to the PicklerBackend and Pickler implementations, a macro called MongoHandler.apply[T]. When an implicit Pickler[T] is available, this macro generates a reactivemongo.bson.BSONHandler[BSONValue, T], which is the typeclass instance ReactiveMongo needs to read and write values of type T to Mongo. The macro uses the pickler with the reactivemongo PicklerBackend implementation without modifying the output.

A second variant on the macro, called MongoHandler.document[T], generates instead a reactivemongo.bson.BSONDocumentReader[T] with reactivemongo.bson.BSONDocumentWriter[T], which are needed for ReactiveMongo to read and writes instances of type T as top-level documents in Mongo. If the original Pickler produces a BSONValue which is not a BSONDocument, like a BSONString or a BSONArray, the macro wraps it in a BSONDocument as the value of the _id field.

Typical use then looks as follows:

case class A(...)
object A {
  implicit val pickler = Autogen[A]
  implicit val mongoHandler = MongoHandler.document[A]
}

Backend type support

Every backend is required to support reading and writing these primitive types:

  • Boolean, String, Null
  • Int, Long, Float, Double
  • Array: a sequence of values of any types
  • Object: a map of strings to values of any types

A particular backend might support more primitives. For instance, BSON supports Binary (i.e. Array[Byte]) as a primitive type, but JSON doesn't. Other types are written as objects or arrays using primitive types.

The detailed behavior of Autogen

Requirements

A class used with Autogen.apply must obey these requirements:

  • The primary constructor (the one that's written as part of the class definition) must have no more than one parameter list.
  • The primary constructor must be public.
  • The primary constructor's parameters must be declared as public vals or vars (using a case class does this by default).
  • The primary constructor's parameter types must have implicit picklers in scope. (Pickler definitions from DefaultPicklers are used automatically and don't have to be explicitly imported.)

Autogenerated pickler behavior

Classes and traits are pickled as follows:

  1. Objects, and classes with zero parametrs or without parameter lists, are pickled as strings whose value is the object or class's (non fully qualified) name. The parameters are taken from the primary constructor, which is also used when unpickling; other constructors are ignored.
  2. Classes are pickled as Objects whose attributes correspond to the values of the class's main constructor arguments.
  3. A sealed trait or sealed abstract class is pickled (by Autogen.children) as whichever of its (immediate) descendants is actually present. If that results in an Object, it will have an extra attribute named $type, equal to the class's name. This is known as the type tag, and tells the unpickler which value to instantiate. If pickling results in a String, because the concrete descendant of trait T is an object or a class with zero parameters, then the value of the string provides this service.
  4. A sealed trait (or abstract class) T1, which is extended by another sealed trait T2, will be pickled as follows: an Object with attribute $type = T2 and another attribute $value equal to whatever the concrete value extending T2 is pickled as.

The following behaviors are designed to allow the compatible changes to class declarations that are listed in another section:

  1. When unpickling, the order of the pickled Object's attributes doesn't have to correspond to the order of the constructor parameters. However, the $version attribute (if present) must come first.
  2. When unpickling, attributes with unexpected names are discarded.
  3. When pickling, if a parameter's value is equal to its declared default value, that parameter is not written. Equality is determined using == (i.e. the equals method). This behavior can be disabled for a particular attribute by adding the @WriteDefault annotation.
  4. When unpickling, if a constructor parameter with a declared default value is missing a pickled value, the default value will be used instead.
  5. Class parameters of type Option[T] are pickled as follows: for Some[T], the T value is written directly; for None, no attribute is written at all. This allows making an existing class parameter optional, which is a common change.

Compatible type changes

As long as all object and class picklers used are created by the Autogen macro, or are written to be compatible with the above rules, the following changes to Scala definitions will be backward and forward compatible, so that different code versions will be able to exchange data.

Because the compatibility is bidirectional, each of these cases implies the reverse transformation is also compatible.

  1. These are all interchangeable: object O, class O and class O(). Also, case and non-case objects and classes are interchangeable.
  2. A class parameter can be added (at any position), if it has a default value declared. A parameter can be removed (from any position), if it previously (always) had a default value declared. When code with the parameter declared unpickles data without it, it uses the default value. When code without the parameter declared unpickles data with it, it ignores it.
  3. The order of parameters can be changed freely.
  4. Any sequence type (Iterable[T] or a subtype of it) can be replaced with any other sequence type with the same member type. E.g., List[Int] can be replaced with Vector[Int]. This includes Sets and Maps, but not Maps whose key type is String, because those are pickled as Objects and not as Arrays. (If a non-Set sequence type is replaced with a Set, when unpickling, duplicate values will be discarded.)
  5. Only in a class parameter type, T can be replaced with Option[T], as long as the non-optional T has always had a default value. (Recall that, for class parameters, Option[T] is written as a T or omitted entirely if the value was None.)
  6. A sequence member type, or a map key or value type, can be replaced with another type, if the two types are compatible according to these rules. For instance, List[List[Int]] can be replaced with Vector[Set[Long]].

Versioning

safepickle supports explicit versioning of type changes which are not backward-compatible by the above rules. If you use this feature, you will need to declare both types in your code, with their associated picklers. New code will be able to read all supported versions, converting old formats transparently to the newest one, so your code outside the data definition always deals with the latest format version. Old code versions will refuse to read newer formats that they do not support.

Suppose you have a case class Foo(s: String) and you want to add a second field i: Int. This is not a backward-compatible change, because there is no declared default value for i, so new code cannot read old data. (Of course this is a contrived example that doesn't try to assign semantic value to the changes.)

You can accomplish this as follows:

case class Foo(s: String, i: Int) // The new format
object Foo {
  implicit val pickler: Pickler[Foo] = Autogen.versioned[Foo, Foo.Old.FooV1]
  
  object Old {  
    case class FooV1(s: String) extends OldVersion[Foo] { // The old format
      def toNewVersion: Foo = Foo(s, s.toInt + 123) // Conversion function
    }
    object FooV1 {
      implicit val pickler: Pickler[FooV1] = Autogen[FooV1]
    }
  }
}

A versioned object will have an extra field named $version in its pickled form. Versioning is not supported for sealed traits.

Notes:

  • When copying the old version of Foo to FooV1, you can remove all its method definitions and other parts that are ignored by Autogen; only the class structure needs to be defined.
  • You must preserve the original Pickler[Foo] definition as the new Pickler[FooV1]. If it was a custom pickler, you need to keep it.
  • If FooV1 references non-primitive types that have their own picklers, you may want to make local copies of those type definitions and their picklers too. Otherwise, if you later change one of those referenced types without versioning it, or remove it entirely, the pickler for FooV1 will stop working correctly.
  • The name FooV1 isn't important; it doesn't influence the generated pickler or the version number it assigns.

Multiple versioning

When you have more than old version of the same type, you need to define an upgrade method from each type to the successive one (and not to the newest one). Pass the oldest type as the second parameter to Autogen.versioned, like this:

case class Foo(s: String, i: Int, b: Boolean) 
object Foo {
  implicit val pickler: Pickler[Foo] = Autogen.versioned[Foo, Foo.Old.FooV1]
  
  object Old {  
    case class FooV1(s: String) extends OldVersion[FooV2] // Implementation and pickler omitted
    case class FooV2(s: String, i: Int) extends OldVersion[FooV3] // Implementation and pickler omitted
    case class FooV3(s: String, l: Long) extends OldVersion[Foo] // Implementation and pickler omitted
  }
}

Autogen.versioned will notice that the type FooV1 implements OldVersion[FooV2] and not OldVersion[Foo]. It will walk down the chain of OldVersion implementations, until it finds an OldVersion[Foo].

The version numbers will be assigned automatically: the oldest type (that is, the second parameter of Autogen.versioned) is always at version 1, the next one at version 2, and so on. In this example, Foo will be assigned version 4.

Schemas

The Schema type describes the structure of the pickled data written and read by a pickler. For instance, a Schema.SString describes a string.

Every Pickler has a .schema member populated by Autogen. For picklers you write manually, you must provide the schema yourself.

Schemas are useful to automatically generate data for other programs to consume, such as Json schemas or SQL table definitions.

I am working on a JSON schema generator, but it is still experimental. I hope to publish it here in the future.

NOTE that currently schemas are not always reliable, due to backend overrides. For instance, the default pickler for Array[Byte] has a Schema saying it writes a String, but the BSON backend overrides this behavior to write a BSONBinary instead. A future version will allow backends to override schemas, or solve the problem in some other way.

About

Deliberately restricted pickling library for Scala

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 100.0%