Skip to content

Latest commit

 

History

History
288 lines (237 loc) · 10.7 KB

README.md

File metadata and controls

288 lines (237 loc) · 10.7 KB

retable

retable is a Kotlin library to work with tabular data files (csv, excel, open document)

Under the hood it uses Apache commons csv for CSV and Apache POI for Excel / OpenDocument, and expose them with a common and simple API.

retable is still very young, and currently has a very limited set of feature, but more are coming.

Usage (Read)

Hint: see src/test/resources/examples for data files

Basic Usage (CSV)

Reading a CSV file is very straightforward

// opens and use an input stream with std kotlin
File(pathTo("simple_data.csv")).inputStream().use {
    val hello =
        Retable.csv().read(it) // read as csv with default settings
            .records       // access the records sequence
            .map { it["first_name"] + " " + it["last_name"] } // access data by column name (headers in file)
            .first()       // sequence is consumed only on call, so getting first record in a large file is fast
    println(hello)  // prints `Xavier Hanin`
}

Basic Usage (Excel)

Reading an Excel file is equally simple, and except some configuration options (see later) the API is exactly the same as with CSV!

File(pathTo("simple_data.xlsx")).inputStream().use {
    val hello =
            Retable.excel().read(it) // read as excel with default settings
                    .records       // access the records sequence
                    .map { it["first_name"] + " " + it["last_name"] } // access data by column name (headers in file)
                    .first()       // sequence is consumed only on call, so getting first record in a large file is fast
    println(hello)  // prints `Xavier Hanin`
}

Accessing column names

By default retable reads column information from file header, and you can access to these columns

File(pathTo("simple_data.csv")).inputStream().use {
    val retable = Retable.csv().read(it)

    // you can access to the columns
    val colNames = retable.columns.list().map { it.name }.joinToString()

    println(colNames)      // prints `first_name, last_name, age`
}

Accessing information about the record

On each record you have access to some meta information and to the raw data (list of string) of the record

File(pathTo("simple_data.csv")).inputStream().use {
    val record = Retable.csv().read(it)
            .records.first() // get first record

    record.apply {
        // on the record you have access to:
        // - the line number, i.e. the line (in the file) on which the record was found (1 based)
        // - the record number, i.e. the index of this record among all records (1 based)
        // - rawData, the list (as string) of each "cell"
        println("$recordNumber $lineNumber $rawData") // prints `1 2 [Xavier, Hanin, 41]`
    }
}

Defining columns

Much more powerful usage can be obtained by defining the expected columns. Providing their index (1 based), their names, and their type, you can then access to the data of each column on each record in a type safe way.

File(pathTo("simple_data.csv")).inputStream().use {
    val retable = Retable
            .csv(
            // we can also define the expected columns
            object:RetableColumns() {
                // each column is defined as a property on an object
                val FIRST_NAME = string("first_name")
                val LAST_NAME  = string("last_name")
                // note that the column is typed - here the age is expected to be an Int
                val AGE        = int("age")
            })
            .read(it)

    // we will now be able to access data using the predefined columns, we `apply` them for ease of use
    retable.columns.apply {
        val hello = retable.records
                // now we can access the column value on a record by its column
                // note that both the column (AGE) and its type (Int) are known by the compiler
                // the direct access to the column with `AGE` is made possible thanks to the `apply`
                .filter { it[AGE]?:0 > 18 }
                // see how easy it is to access the fields
                .map { "Hello ${it[FIRST_NAME]} ${it[LAST_NAME]}" }
                .joinToString()
        println(hello)          // prints `Hello Xavier Hanin, Hello Victor Hugo`
    }
}

Validation

By default retable validates the format for typed columns (such as int columns). You can also set up additional constraints that are checked by retable while reading the records.

File(pathTo("invalid_data.csv")).inputStream().use {
    val retable = Retable
            .csv(
                    // we can set constraints on the columns
                    object:RetableColumns() {
                        // here we set a constraint on the length is in a given range
                        // the constraint is defined in a block
                        val FIRST_NAME = string("first_name") { length { inRange(3..20) } }
                        // different code style, we set the constraint with a named parameter
                        val LAST_NAME  = string("last_name",
                                constraint = { matches(Regex("[A-Za-z ]+"),
                                                "should only contain alpha and spaces")})
                        // an int column will automatically check the value is an int
                        val AGE        = int("age",
                                // and we can add other constraint too
                                constraint =  { inRange(0..120) })
                    })
            .read(it)

    retable.columns.apply {
        val hello = retable.records
                .filter { it.isValid() } // we can check if a record is valid, and filter it out if we want
                .map { "Hello ${it[FIRST_NAME]} ${it[LAST_NAME]}" }
                .joinToString()
        println(hello)          // prints `Hello Victor Hugo`

        // once the sequence of records has been consumed, the invalid records
        // are available in a list
        val invalid = retable.violations.records
                .map {
                    "line ${it.lineNumber} - ${it[FIRST_NAME]} ${it[LAST_NAME]}\n" +
                    "violations:\n" +
                    it.violations
                             .map { "${it.severity.name} - ${it.message()}" }
                             .joinToString("\n")
                }
                .joinToString("\n")
        println(invalid)    // prints:
                            // ---
                            // line 2 - Xavier Hanin
                            // violations:
                            // ERROR - age 241 should be between 0 and 120
                            // line 3 - A Dalton
                            // violations:
                            // ERROR - first_name "A" length 1 should be between 3 and 20
                            // ERROR - age "TWELVE" should be an integer
                            // ---
    }
}

CSV Options

Set charset

val retable = Retable.csv(options = CSVReadOptions(charset = Charsets.ISO_8859_1)).read(it)

Other options

val retable = Retable.csv(
                columns = RetableColumns.ofNames(listOf("Prénom", "Nom", "Oeuvre")),
                options = CSVReadOptions(
                            delimiter = ';',
                            quote = '`',
                            ignoreEmptyLines = true,
                            trimValues = true,
                            firstRecordAsHeader = false
        )).read(it)

Excel Options

Select Sheet by name

val retable = Retable.excel(options = ExcelReadOptions(sheetName = "my sheet")).read(it)

Select Sheet by index

val retable = Retable.excel(options = ExcelReadOptions(sheetIndex = 2)).read(it)

Common options

val retable = Retable.excel(options = ExcelReadOptions(
                ignoreEmptyLines = true,
                trimValues = true,
                firstRecordAsHeader = false
         )).read(it)

Usage (Write)

Note: Only basic Excel write is supported yet

Basic

        Retable(
                // we define the column names
                RetableColumns.ofNames(listOf("first_name", "last_name", "age"))
            )
            .data(
                // we provide the data to write as a list
                // of List<Any> (the list of values of each row)
                listOf(
                        listOf("John",  "Doe", 23),
                        listOf("Jenny", "Boe", 25)
                )
            )
            // then we can just ask to write data to outputstream
            // in the format we want
            .write(Retable.excel() to File(pathTo("export_data.xlsx")).outputStream())

            /* produces an excel file like this:
                +------------+-----------+-----+
                | first_name | last_name | age |
                +------------+-----------+-----+
                | John       | Doe       |  23 |
                | Jenny      | Boe       |  25 |
                +------------+-----------+-----+
             */

Typed Columns

// we can also define typed columns with arbitrary indexes (or not)
        val columns = object:RetableColumns() {
            val FIRST_NAME = string("first_name", index = 2)
            val LAST_NAME  = string("last_name", index = 1)
            val AGE        = int("age", index = 3)
        }
        Retable(columns)
            .data(
                    // we provide the data to write as a list
                    // of any kind
                    listOf(
                            Person("John", "Doe", 23),
                            Person("Jenny", "Boe", 25)
                    )
            ) {     // with the mapper function to transform them to map <column -> value>
                    mapOf(
                            // columns are easily accessible in this context
                            // (the receiver is the RetableColumns object defined above)
                            FIRST_NAME to it.firstName,
                            LAST_NAME to it.lastName,
                            AGE to it.age
                    )
            }
            // then we can just ask to write data
            .write(Retable.excel(columns) to File(pathTo("export_data_cols.xlsx")).outputStream())

            /* produces an excel file like this:
                +-----------+------------+-----+
                | last_name | first_name | age |
                +-----------+------------+-----+
                | Doe       | John       |  23 |
                | Boe       | Jenny      |  25 |
                +-----------+------------+-----+
             */

Installation

retable has not been released yet, but you can use jitpack to obtain it: https://jitpack.io/#4sh/retable/-SNAPSHOT