Releases: bpolaszek/bentools-etl
4.0-alpha8
What's Changed
- Feat: Improve chaining by @bpolaszek in #30
- Fix: Do not chain if not required by @bpolaszek in #31
- Fix: Narrow transformResult type in TransformEvent by @bpolaszek in #32
- Doc: Improve documentation by @bpolaszek in #33
Full Changelog: 4.0-alpha7...4.0-alpha8
4.0-alpha7
What's Changed
- Feat: Instantiators by @bpolaszek in #28
- Feat: Multiple next tick callbacks by @bpolaszek in #29
Full Changelog: 4.0-alpha6...4.0-alpha7
4.0-alpha6
What's Changed
- Fix: Early flush precedence by @bpolaszek in #26
- Feat: Chain extractors by @bpolaszek in #27
Full Changelog: 4.0-alpha5...4.0-alpha6
4.0-alpha5
What's Changed
- Doc: Improve recipes documentation by @bpolaszek in #22
- Refactor: Make transform result internal stuff by @bpolaszek in #23
- Fix: Chain transformers using generators in the middle by @bpolaszek in #25
- Feat: Next tick by @bpolaszek in #24
Full Changelog: 4.0-alpha4...4.0-alpha5
4.0-alpha4
What's Changed
- Feat: Allow context to be initialized by user by @bpolaszek in #16
- Refactor: Rename recipe main method by @bpolaszek in #17
- Tests: Improve coverage by @bpolaszek in #18
- Feat: chain transformers, chain loaders, conditional loaders by @bpolaszek in #19
- Fix: Silently chain transformers / loaders with the EtlBuilder by @bpolaszek in #20
- Refactor: Fix typo in root namespace 😅 by @bpolaszek in #21
Full Changelog: 4.0-alpha3...4.0-alpha4
4.0-alpha3
What's Changed
- Feat: Transformer can now return single values for a better DX by @bpolaszek in #12
- Fix: Improve object cloning performance by @bpolaszek in #14
- Fix: Clones not being accurate by @bpolaszek in #15
- Refactor: use holding objects instead of passing EtlState by reference by @bpolaszek in #13
Full Changelog: 4.0-alpha2...4.0-alpha3
4.0-alpha2
What's Changed
- Feat: Only 1 final flush by default by @bpolaszek in #11
Full Changelog: 4.0-alpha1...4.0-alpha2
Version 4.0 on its way !
Hey folks! 👋
It's been more than 4 years since a version 3 bentools/etl
was drafted, but never got out of the alpha stability, mostly because of a lack of time but also, I have to admit, uncertainties about design directions taken.
Introducing bentools/etl
v4
PHP 8 and a lot of projects on my side came in between, and I recently got the need of this library, but I wanted to keep the good ideas of the v3, and remove the bad ones as well.
So, I decided that a stable v3 will never sunrise, and because lots of classes have been renamed, most of them became immutable, here's a brand new v4 version.
What's new?
-
This version requires PHP 8.2 as a minimum, is 100% covered by tests (this wasn't the case before), and uses PHPStan to ensure types consistency at the highest level. A Github Actions CI has also been set up.
-
It introduces a new
EtlState
object, which is instantiated at the beginning of the ETL process, and passed through the different steps and event listeners. TheEtlExecutor
(previously theEtl
class) is no longer mutable, since it basically holds the Extractor, the Transformer and the Loader objects, fires events and provides you with the state you need with theEtlState
. -
The
EtlState
is mostly readonly, but you can still call$state->skip()
to skip items,$state->stop()
to stop the process,$state->flush()
to request an early flush, and you can use the$state->context
array to pass arbitrary data between the different steps and events during the whole workflow.
How does it work?
Here's an example of the new API:
city_english_name,city_local_name,country_iso_code,continent,population
"New York","New York",US,"North America",8537673
"Los Angeles","Los Angeles",US,"North America",39776830
Tokyo,東京,JP,Asia,13929286
...
use Bentools\ETL\EtlConfiguration;
use Bentools\ETL\EtlExecutor;
use Bentools\ETL\EventDispatcher\Event\LoadEvent;
use Bentools\ETL\Extractor\CSVExtractor;
use Bentools\ETL\Loader\JSONLoader;
use Bentools\ETL\Recipe\LoggerRecipe;
use Monolog\Logger;
$etl = (new EtlExecutor(options: new EtlConfiguration(flushEvery: 100)))
->extractFrom(new CSVExtractor(options: ['columns' => 'auto']))
->transformWith(function (array $city) {
$city['slug'] = strtr(strtolower($city['city_english_name']), [' ' => '-']);
yield $city;
})
->loadInto(new JSONLoader())
->onLoad(fn (LoadEvent $event) => print("Loading city `{$event->item['slug']}`".PHP_EOL))
->withRecipe(new LoggerRecipe(new Logger('etl-logs')));
$report = $etl->process(
source: 'file:///tmp/cities.csv',
destination: 'file:///tmp/cities.json',
);
var_dump($report->output); // file:///tmp/cities.json
[
{
"city_english_name": "New York",
"city_local_name": "New York",
"country_iso_code": "US",
"continent": "North America",
"population": 8537673,
"slug": "new-york"
},
{
"city_english_name": "Los Angeles",
"city_local_name": "Los Angeles",
"country_iso_code": "US",
"continent": "North America",
"population": 39776830,
"slug": "los-angeles"
},
{
"city_english_name": "Tokyo",
"city_local_name": "東京",
"country_iso_code": "JP",
"continent": "Asia",
"population": 13929286,
"slug": "tokyo"
}
]
I hope you'll enjoy this release as much as I enjoyed coding it! 😃
3.0-alpha.3
Etl::process()
can now have no args- Refactored some constructors
- Add early flush feature (loader can now flush on demand from ETL or from the event system, and will know if it's a partial flush or a full flush)
- Handled Doctrine namespace change
- PHP 7.4 support (sorry, we're late)
Init loader hook
- Signature changes on some extractors / loaders (CSV, JSON).
- It's now possible to pass arbitrary arguments to
LoaderInterface::init()
that will be processed just before the 1st item to be loaded. These arbitrary, optionnal arguments are now part ofEtl::process()
signature. this allows a single loader to have multiple options and/or targets at runtime, and to reset its state at each ETL process. - It's possible to hook on the
loader.init
event withEtlBuilder::onLoaderInit()
.