You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added support for writing results into SQLite databases
Added literal client configuration provider
Added total time displayed once operation is done
Added ability to set maximum number of crawled URLs
Added a warning when empty response is encountered
Added ability to mark objects as fetched only once per operation
Added ability to pass encoding options to JSON file result writer
Added support for relative URL in ArgumentAdvancerUrlListProvider's template
Added TEST_SERVER_WAIT environment variable to change default wait time for
the server used in integration tests
Changed
CssSelectorTextMatcher and XpathSelectorTextMatcher are now renamed to CssSelectorHtmlMatcher and XpathSelectorHtmlMatcher accordingly and will
return original HTML content instead of textual form, making them consistent
with other matchers like regular expression matcher. To retain previous
behavior one should strip the tags further down the line (e.g. in entities)
RegexTextMatcher has been renamed to RegexHtmlMatcher
Underlying Guzzle instance will always depend on cURL now. This is done to
ensure that widest set of features is available for handling HTTP requests.
Scrawler will now explicitly emit a warning for content types other than XML or (X)HTML
DefaultConfigurationProvider sets timeouts for Guzzle now
JSON_UNESCAPED_UNICODE option is now used by default when using JSON file
result writer
The simple_annotations options for the database result writer is now false
by default. Previously it had to be specified explicitly.
Only HTTP and HTTPS protocols are now explicitly allowed, URLs with other
protocols will be silently ignored now
Improved performance of CSS selector matchers
Improved handling of networking errors
Improved logs readability
Changed default log verbosity for both console and the textfile to INFO level
PHP_SERVER_PORT environment variable used to set the port of the webserver
used to run integration tests has been renamed to TEST_SERVER_PORT
Fixed
Fixed incorrect detecting of visited URLs resulting in some adresses being
processed multiple times
Fixed incorrect whitespace trimming in text matchers
Removed
Removed InMemoryResultWriter from the blocks list - now it is only available
during development to run tests