- safepickle
- Usage overview
- Creating picklers
- Pickling backends
- The detailed behavior of Autogen
- Compatible type changes
- Versioning
- Schemas
NOTE: this project is not being maintained anymore, due to lack of time and interest.
A deliberately restricted pickling library for Scala. It has certain features, and deliberately lacks others, because it is tailormade for a particular scenario I needed. Most people will prefer general purpose libraries such as scala-pickling, upickle, or rapture, but this one is optimized for the following goals:
- Security. Pickled input can be generated by untrusted sources. Unpickling must not instantiate unexpected classes, take unpredictable amounts of space or time, or produce values not of the expected type. The set of pickleable types, and the code that serializes them, is determined at compile time, and runtime reflection is never used.
- Certain changes to the definitions of pickled types are guaranteed to be backward and forward compatible, so different versions of the program can communicate, and pickled data can be used for long term storage.
- Backward incompatible changes can be managed explicitly, with version numbers and conversion code, allowing new code to read data written by old code, and old code to fail on encountering data written by new code.
- Pluggable backends for JSON, BSON and other similar formats.
- Pickled classes correspond directly to the pickled form (at least for JSON and BSON), making it easy to write classes to represent data whose main schema definition is written in terms of the pickled format.
- Performance: the picklers are thin layers on top of the backend implementations and should not contribute to pickling overhead in any scenario.
The library is also small enough to understand and validate by hand, and to make sure its performance is driven by that of the backend used (e.g. Jackson for JSON).
safepickle
artifacts are published on Maven Central with the group ID com.fsist
and the names safepickle
, safepickle-jackson
, safepickle-joda-time
, safepickle-reactivemongo
. The published artifact versions correspond to the release tags in the Github project.
The safepickle-*
modules are separate because they add dependencies on other libraries - Jackson, Joda-Time and ReactiveMongo, respectively. They define picklers for their types and pickling backends for their JSON and BSON implementations.
To pickle or unpickle a value, you need to choose an implementation of PicklerBackend
, such as JacksonPicklerBackend.String
, and call its read
and write
methods. These two methods take an implicit parameter of type Pickler[T]
, which knows how to pickle and unpickle values of type T
.
case class A(i: Int, b: String)
object A {
implicit val pickler = Autogen[A]
}
val json = backend.write(A(1, "foo"))
// Result: """{ "i": 1, "b": "foo }"""
To support a type T
you need to provide an instance of the typeclass Pickler[T]
. It's possible to implement this trait manually, but this should be very rarely necessary. In ordinary use, picklers are generated using the Autogen
macros.
A simple usage example
case class C(a: String, b: Int = 2)
object C {
implicit val pickler = Autogen[C]
}
sealed trait T
object T {
case class One(a: String) extends T
case object Two extends T
implicit val pickler = Autogen.children[T, One | Two.type]
}
Autogen
serializes a class by writing the arguments of its primary constructor and their values. These arguments have to be val
s, as in a case class. The complete details are listed below.
Autogen.children
can be used on any trait or class with descendants, and it doesn't have to be sealed. However, you have to provide an explicit list of the subtypes you wish to support. Because of a long-standing scala compiler issue, even when the parent type is sealed, if you invoke Autogen.children
in the same compilation unit (file) where the parent type is defined, you must pass an explicit list of child types.
The Autogen macros will use any implicit picklers that are in scope for pickling referenced types. They will also always use the built-in picklers defined in DefaultPicklers
for primitive values (Int, String, etc.) and collections.
If a pickler is not available, they will call Autogen macros recursively for the missing types:
case class A(s: String)
case class B(a: A)
case class C(i: Int) {
implicit val pickler = ...
}
case class D(b: B, c: C)
object D {
implicit val pickler = Autogen[D]
}
The pickler generated by Autogen[D]
will use the existing C.pickler
, because it's available as an implicit on the C
companion object. It will call Autogen[B]
, which will call Autogen[A]
.
There are several annotations that can be placed on class parameters to modify Autogen's behavior:
@Name("foo")
will pickle the parameter as if it were named foo
.
@WriteDefault
will include the parameter and its value in the pickled form, even if its value is equal to the default value of the parameter. The default behavior is to omit such parameters, and use the default value when unpickling.
The Autogen macros (apply
and children
) have variants named applyDebug
and childrenDebug
, which print the generated code as a compiler info message. You can use this to understand what's going wrong if the output of Autogen doesn't compile.
Sometimes the structure of your class doesn't match how you want the pickled form to look, or Autogen doesn't do exactly what you want. You could implement a Pickler[T]
manually, but usually there is an easier way.
Autogen
is designed to map JSON-like structures directly to Scala. If you want to achieve a particular pickled structure, you should always be able to describe it using case classes and Autogen. You can then implement a Pickler for your real class T by defining conversions between T and the class representing the pickled form. By convention, this looks like this:
case class Foo(i: Int)
object Foo {
case class PickledFoo(i: String)
object PickledFoo {
implicit val pickler = Autogen[PickledFoo]
}
implicit val pickler : Pickler[Foo] = new ConvertPickler[Foo, PickledFoo] {
def convertTo(foo: Foo): PickledFoo = PickledFoo(foo.i.toString)
def convertFrom(pickled: PickledFoo): Foo = Foo(pickled.i.toInt)
}
}
In this example, the attribute Foo.i
is pickled as a String and not an Int.
safepickle
provides Pickler implementations (as implicit definitions on the Pickler companion object) for the following "primitive" types: Boolean, String, Null, Int, Long, Float, Double.
It also provides pickler combinators, that is implicit methods of the Pickler companion object, that provide picklers for:
Iterable[T]
and all sub-types, includingSet[T]
, as well asArray[T]
(which isn't an Iterable), and all Tuple types. Requires aPickler[T]
.Map[K, V]
and all sub-types s are pickled as arrays, where each key-value pair is pickled as an array of size 2. Requires aPickler[K]
and aPickler[V]
.- As a special case,
Map[String, K]
and subtypes are pickled as Objects. Requires aPickler[V]
.
If a type is directly self-referencing, Autogen works correctly:
case class Node(next: Option[Node])
object Node {
implicit val pickler = Autogen[Node]
}
However, if a type indirectly references itself, using Autogen can either fail to compile or lead to infinite loops at runtime. (There are enough slightly different failure scenarios that I won't describe them in detail.)
The following doesn't work:
sealed trait T
object T {
implicit val pickler = Autogen.children[T, C]
case class C(ts: Seq[T]) extends T
}
The solution in all such cases is to use a separate implicit def
to declare the pickler, and a private lazy val
to actually create it:
sealed trait T
object T {
implicit def pickler: Pickler[T] = thePickler
private lazy val thePickler = Autogen.children[T, C]
case class C(ts: Seq[T]) extends T
}
safepickle
supports multiple backends which write data in different formats and using different implementations. It comes with a backend for JSON that uses Jackson and another for BSON that uses ReactiveMongo, and more can be added.
A backend is declared by implementing trait PicklingBackend
, which defines the types used (e.g. for JSON it might be a String), and provides a factory for PickleReader
s and PickleWriter
s.
Implementations of trait PickleReader
and trait PickleWriter
provide low-level access to reading and writing a sequence of primitive values in a particular backend. Users of safepickle
normally don't interact directly with these interfaces, except for manually implementing Pickler
s.
safepickle
includes two PicklerBackend
implementations, one for JSON using the Jackson implementation, and one for BSON using ReactiveMongo, an Akka-based Scala driver for Mongo.
The safepickle-reactivemongo
module includes, in addition to the PicklerBackend
and Pickler
implementations, a macro called MongoHandler.apply[T]
. When an implicit Pickler[T]
is available, this macro generates a reactivemongo.bson.BSONHandler[BSONValue, T]
, which is the typeclass instance ReactiveMongo needs to read and write values of type T
to Mongo. The macro uses the pickler with the reactivemongo PicklerBackend
implementation without modifying the output.
A second variant on the macro, called MongoHandler.document[T]
, generates instead a reactivemongo.bson.BSONDocumentReader[T] with reactivemongo.bson.BSONDocumentWriter[T]
, which are needed for ReactiveMongo to read and writes instances of type T
as top-level documents in Mongo. If the original Pickler produces a BSONValue
which is not a BSONDocument
, like a BSONString
or a BSONArray
, the macro wraps it in a BSONDocument
as the value of the _id
field.
Typical use then looks as follows:
case class A(...)
object A {
implicit val pickler = Autogen[A]
implicit val mongoHandler = MongoHandler.document[A]
}
Every backend is required to support reading and writing these primitive types:
- Boolean, String, Null
- Int, Long, Float, Double
- Array: a sequence of values of any types
- Object: a map of strings to values of any types
A particular backend might support more primitives. For instance, BSON supports Binary (i.e. Array[Byte]) as a primitive type, but JSON doesn't. Other types are written as objects or arrays using primitive types.
A class used with Autogen.apply
must obey these requirements:
- The primary constructor (the one that's written as part of the
class
definition) must have no more than one parameter list. - The primary constructor must be public.
- The primary constructor's parameters must be declared as public
val
s orvar
s (using a case class does this by default). - The primary constructor's parameter types must have implicit picklers in scope. (Pickler definitions from DefaultPicklers are used automatically and don't have to be explicitly imported.)
Classes and traits are pickled as follows:
- Objects, and classes with zero parametrs or without parameter lists, are pickled as strings whose value is the object or class's (non fully qualified) name. The parameters are taken from the primary constructor, which is also used when unpickling; other constructors are ignored.
- Classes are pickled as Objects whose attributes correspond to the values of the class's main constructor arguments.
- A
sealed trait
orsealed abstract class
is pickled (byAutogen.children
) as whichever of its (immediate) descendants is actually present. If that results in an Object, it will have an extra attribute named$type
, equal to the class's name. This is known as the type tag, and tells the unpickler which value to instantiate. If pickling results in a String, because the concrete descendant oftrait T
is an object or a class with zero parameters, then the value of the string provides this service. - A sealed trait (or abstract class) T1, which is extended by another sealed trait T2, will be pickled as follows: an Object with attribute
$type = T2
and another attribute$value
equal to whatever the concrete value extending T2 is pickled as.
The following behaviors are designed to allow the compatible changes to class declarations that are listed in another section:
- When unpickling, the order of the pickled Object's attributes doesn't have to correspond to the order of the constructor parameters. However, the
$version
attribute (if present) must come first. - When unpickling, attributes with unexpected names are discarded.
- When pickling, if a parameter's value is equal to its declared default value, that parameter is not written. Equality is determined using == (i.e. the
equals
method). This behavior can be disabled for a particular attribute by adding the @WriteDefault annotation. - When unpickling, if a constructor parameter with a declared default value is missing a pickled value, the default value will be used instead.
- Class parameters of type
Option[T]
are pickled as follows: forSome[T]
, theT
value is written directly; forNone
, no attribute is written at all. This allows making an existing class parameter optional, which is a common change.
As long as all object and class picklers used are created by the Autogen
macro, or are written to be compatible with the above rules, the following changes to Scala definitions will be backward and forward compatible, so that different code versions will be able to exchange data.
Because the compatibility is bidirectional, each of these cases implies the reverse transformation is also compatible.
- These are all interchangeable:
object O
,class O
andclass O()
. Also, case and non-case objects and classes are interchangeable. - A class parameter can be added (at any position), if it has a default value declared. A parameter can be removed (from any position), if it previously (always) had a default value declared. When code with the parameter declared unpickles data without it, it uses the default value. When code without the parameter declared unpickles data with it, it ignores it.
- The order of parameters can be changed freely.
- Any sequence type (
Iterable[T]
or a subtype of it) can be replaced with any other sequence type with the same member type. E.g.,List[Int]
can be replaced withVector[Int]
. This includesSet
s andMap
s, but notMap
s whose key type is String, because those are pickled as Objects and not as Arrays. (If a non-Set sequence type is replaced with a Set, when unpickling, duplicate values will be discarded.) - Only in a class parameter type,
T
can be replaced withOption[T]
, as long as the non-optionalT
has always had a default value. (Recall that, for class parameters,Option[T]
is written as aT
or omitted entirely if the value wasNone
.) - A sequence member type, or a map key or value type, can be replaced with another type, if the two types are compatible according to these rules. For instance,
List[List[Int]]
can be replaced withVector[Set[Long]]
.
safepickle
supports explicit versioning of type changes which are not backward-compatible by the above rules. If you use this feature, you will need to declare both types in your code, with their associated picklers. New code will be able to read all supported versions, converting old formats transparently to the newest one, so your code outside the data definition always deals with the latest format version. Old code versions will refuse to read newer formats that they do not support.
Suppose you have a case class Foo(s: String)
and you want to add a second field i: Int
. This is not a backward-compatible change, because there is no declared default value for i
, so new code cannot read old data. (Of course this is a contrived example that doesn't try to assign semantic value to the changes.)
You can accomplish this as follows:
case class Foo(s: String, i: Int) // The new format
object Foo {
implicit val pickler: Pickler[Foo] = Autogen.versioned[Foo, Foo.Old.FooV1]
object Old {
case class FooV1(s: String) extends OldVersion[Foo] { // The old format
def toNewVersion: Foo = Foo(s, s.toInt + 123) // Conversion function
}
object FooV1 {
implicit val pickler: Pickler[FooV1] = Autogen[FooV1]
}
}
}
A versioned object will have an extra field named $version
in its pickled form. Versioning is not supported for sealed traits.
Notes:
- When copying the old version of
Foo
toFooV1
, you can remove all its method definitions and other parts that are ignored by Autogen; only the class structure needs to be defined. - You must preserve the original
Pickler[Foo]
definition as the newPickler[FooV1]
. If it was a custom pickler, you need to keep it. - If
FooV1
references non-primitive types that have their own picklers, you may want to make local copies of those type definitions and their picklers too. Otherwise, if you later change one of those referenced types without versioning it, or remove it entirely, the pickler forFooV1
will stop working correctly. - The name
FooV1
isn't important; it doesn't influence the generated pickler or the version number it assigns.
When you have more than old version of the same type, you need to define an upgrade method from each type to the successive one (and not to the newest one). Pass the oldest type as the second parameter to Autogen.versioned
, like this:
case class Foo(s: String, i: Int, b: Boolean)
object Foo {
implicit val pickler: Pickler[Foo] = Autogen.versioned[Foo, Foo.Old.FooV1]
object Old {
case class FooV1(s: String) extends OldVersion[FooV2] // Implementation and pickler omitted
case class FooV2(s: String, i: Int) extends OldVersion[FooV3] // Implementation and pickler omitted
case class FooV3(s: String, l: Long) extends OldVersion[Foo] // Implementation and pickler omitted
}
}
Autogen.versioned
will notice that the type FooV1
implements OldVersion[FooV2]
and not OldVersion[Foo]
. It will walk down the chain of OldVersion
implementations, until it finds an OldVersion[Foo]
.
The version numbers will be assigned automatically: the oldest type (that is, the second parameter of Autogen.versioned
) is always at version 1, the next one at version 2, and so on. In this example, Foo
will be assigned version 4.
The Schema
type describes the structure of the pickled data written and read by a pickler. For instance, a Schema.SString
describes a string.
Every Pickler
has a .schema
member populated by Autogen
. For picklers you write manually, you must provide the schema yourself.
Schemas are useful to automatically generate data for other programs to consume, such as Json schemas or SQL table definitions.
I am working on a JSON schema generator, but it is still experimental. I hope to publish it here in the future.
NOTE that currently schemas are not always reliable, due to backend overrides. For instance, the default pickler for Array[Byte]
has a Schema saying it writes a String, but the BSON backend overrides this behavior to write a BSONBinary
instead. A future version will allow backends to override schemas, or solve the problem in some other way.