All notable changes to Fili will be documented here. Changes are accumulated as new paragraphs at the top of the current major version. Each change has a link to the pull request that makes the change and to the issue that triggered the pull request if there was one.
-
Enforce role based security for incoming API requests
- Added
RoleBasedTableValidatorRequestMapper
class which checks if a user's role satisfies the predicates defined for a logical table.
- Added
-
HttpResponseMaker header building made extendable
- Added
buildAndAddResponseHeaders
method inHttpResponseMaker
which handles building and adding headers to a response builder. This logic was moved fromcreateResponseBuilder
. - Made
createResponseBuilder
method protected to open up the class to be more extendable.
- Added
-
Add factory for build ApiFilter objects
ApiFilter
changed into a simple value object.ApiFilter
constructor using filter clause from API request moved to factory as staticbuild
method.ApiFilter
union method moved to factory.
-
Add interface to FilterOperation for easy extension
- Changed existing version of
FilterOperation
toDefaulFilterOperation
and madeFilterOperation
into an interface. - Changed code that depended on the enum to be dependent on the new interfaces instead.
- Changed existing version of
-
Wrapping DruidInFilterBuilder as default filter builder under a feature flag
- Added
DEFAULT_IN_FILTER
feature flag. - If
DEFAULT_IN_FILTER
feature flag is enabled, thenDruidInFilterBuilder
will be used as the default druid query builder. - If
DEFAULT_IN_FILTER
feature flag is disabled, thenDruidOrFilterBuilder
will be used as the default druid query builder.
- Added
-
Enable Druid In-filter in Fili
DefaultDruidFilterBuilder
is renamed toDruidOrFilterBuilder
- Implement
DruidInFilterBuilder
which turns list of selector filter generated byDruidOrFilterBuilder
into a single Druid in-filter. The in-filter resolves the timeout issue. The in-filter could substantially shorten druid query, making the query more sharable and readable. - Fili now uses the
DruidInFilterBuilder
as the "Default" Druid query filter builder, in stead of the oldDruidOrFilterBuilder
, because the new default filter is terser and packs the query payload tighter, and in proportion to the number of filters being applied.
-
Move off BaseCompositePhysicalTable inheritance usage
- Added builder methods to
MetricUnionAvailability
andPartitionAvailability
to save on needing to add additional table classes.
- Added builder methods to
-
Adding withDoFork to LookBackQuery
- Added
withDoFork
toLookBackQuery
class.
- Added
-
An injection point for customizing the WebLoggingFilter to use during tests
- Extend
JerseyTestBinder
and overridegetLoggingFilter
.
- Extend
-
An injection point for customizing Exception handling
- Customers can provide their own logic for handling top level Exceptions in
the
DataServlet
by implementingDataExceptionHandler
, and any other servlet by implementingMetadataExceptionHandler
.
- Customers can provide their own logic for handling top level Exceptions in
the
-
Add support for "case_sensitive" attribute in FragmentSearchQuerySpec
- Enable
FragmentSearchQuerySpec
to accept an argument forcase_sensitive
so that API users can configure this attribute for JSON serialization through Fili.
- Enable
-
Add specs for InsensitiveContainsSearchQuerySpec & RegexSearchQuerySpec
RegexSearchQuerySpec
andInsensitiveContainsSearchQuerySpec
have no dedicated test specs. This PR adds tests for them.
-
Implement ContainsSearchQuerySpec
- Adds serialization for
ContainsSearchQuerySpec
so that Fili API users can use that through Fili.
- Adds serialization for
-
Add storageStrategy as a field of the DimensionConfig class
- Adds getStorageStrategy as a field of the dimensionConfig class.
- Passes the storage strategy to the KeyValueStoreDimension Constructor
-
Add more tests to RegisteredLookupMetadataLoadTask
- Adds tests to make sure the load tasks can update status correctly.
-
- Add missing tests to
Utils
class.
- Add missing tests to
-
Extract logic for getting pagination of dimension rows
- Extract the logic in
DimensionsServlet
to get pagination of dimension rows into a protected function.
- Extract the logic in
-
- Removed Pagination deprecation
- Removed
DataSourceConstraint
deprecation
-
Bumping query id inside withIntervals of LookBackQuery
- Returning a new
LookBackQuery
withdoFork
set totrue
which bumps query id insidewithIntervals
method. - Renamed every occurrence of
doFork
toincrementQueryId
. - Removed
withDoFork
fromLookBackQuery
class.
- Returning a new
-
Change to ordered data structures for ApiRequestImpls
- Change Set to LinkedHashSet in most ApiRequestImpl getters
- Change Set to List in ApiRequests
-
Moved ResponseFormatType from Singleton enum to interface with enum impl
- Refactored ResponseFormatType to allow expansion via additional interface implementors
- Replaced equality based matching with 'accepts' model
-
Restructured Report Building out of ApiRequest
- Pushed UriInfo used to produce
PaginationMapper
intoRequestContext
- Renamed 'getPage' to 'paginate' and refactored off
ApiRequest
and intoPageLinkBuilder
- Moved pagination mostly out of individual servlet classes and into
EndpointServlet
- Initialized per request
Response.ResponseBuilder
in EndpointServlet rather thanApiRequestImpl
constructor - Simplified injected classes that took both
UrlInfo
andContainerRequestContext
to get the one from inside the other. - Pushed entire container request context to content disposition building code (prelude to yahoo#709 )
- Pushed UriInfo used to produce
-
Change
BaseCompositePhysicalTable
into a concrete class- Currently
BaseCompositePhysicalTable
is an abstract class, though it has all of the functionality for a simple composite physical table. Changed to a concrete class to allow for simple composite table behavior with requiring an extension. - Two simple implementations of
BaseCompositePhysicalTable
,PartitionCompositeTable
andMetricUnionCompositeTable
are now deprecated. Instead, the availabilities for these tables should be created directly and passed in toBaseCompositePhysicalTable
.
- Currently
-
Change availability behavior on BasePhysicalTable
- Currently
BasePhysicalTable
overridesgetAvailableIntervals(constraint)
andgetAllAvailableIntervals()
, and defers this behavior to its availability. This PR changesBasePhysicalTable
to also overridegetAvailableIntervals()
and defer to its availability.
- Currently
-
Making Custom Sketch operations possible in PostAggreations
- Added
PostAggregation
andAggregation
instance check inasSketchEstimate(MetricField field)
method of the classThetaSketchFieldConverter
. FieldAccessorPostAggregation
is called only forAggregation
and not forPostAggregation
.
- Added
-
Let DimensionApiRequestMapper throw RequestValidationException instead of BadApiRequestException
DimensionApiRequestMapper.apply()
is made to obey the interface contract by throwingRequestValidationException
instead ofBadApiRequestException
-
Inject customizable extraction functions to RegisteredLookupDimension
- Instead of injecting registered lookup names, we inject registered lookup extraction functions for lookup dimension so that downstream projects can configure all fields of registered lookup extraction functions.
-
Abort request when too many Druid filters are generated
- In order to avoid Druid queries with too much filters on high-cardinality dimension, Fili sets a upper limit on the number of filters and aborts requests if the limit is exceeded.
-
- Put
Granularity
interfaces and its implementations in the same package - Put
*ApiRequest
interfaces and their implementations in the same package
- Put
-
Avoid casting to generate SimplifiedIntervalList
- Some downstream projects generated partial intervals as
ArrayList
, which cannot be cased toSimplifiedIntervalList
in places likegetVolatileIntervalsWithDefault
. The result is a casting exception which crashes downstream applications. Casting is replaced with a explicitSimplifiedIntervalList
object creation.
- Some downstream projects generated partial intervals as
-
Undeprecated pagination by collection
- Since we seem to be in no hurry to switch to heavier reliance on streams. (also renamed paginate and moved to
PageLinkBuilder
)
- Since we seem to be in no hurry to switch to heavier reliance on streams. (also renamed paginate and moved to
-
Deprecate
PartitionCompositeTable
andMetricUnionCompositeTable
- Two simple implementations of
BaseCompositePhysicalTable
(PartitionCompositeTable
andMetricUnionCompositeTable
) are now deprecated. Instead, the availabilities for these tables should be created directly and passed in toBaseCompositePhysicalTable
.
- Two simple implementations of
-
Fix name change in test logical metrics that breaks downstream tests
- Change test logical metric generation to use
LogicalMetricInfo
constructor which takes both long name and description.
- Change test logical metric generation to use
-
Fix GroovyTestUtils json parsing
- Properly handles json parsing failures and non-JSON expected strings.
-
Fix generate intervals logic when availability is empty
- Logic to generate intervals when
CURRENT_MACRO_USES_LATEST
flag is turned on has a bug. The code throwsNoSuchElementException
when the table has no availabilities. This PR fixes the bug by checking if the availability of the underlying table is empty.
- Logic to generate intervals when
-
Correct Druid coordinator URL in Wikipedia example
- Config value for Druid coordinator URL is mis-typed.
-
Upgrade codenarc to recognize unused imports in Groovy
- There are number of unused imports sitting in tests. The cause is an out-dated codenarc version. This PR upgrades the version and removes those unused imports with the new version.
-
Removed deprecated code references
- Renamed keys from
BardLoggingFilter
properties off deprecated refence class (this was an artifact from a bad rename)
- Renamed keys from
-
- Removed constructos and getters with clean replacements
- Stripped the remaining UI/NonUI code
- Cleaned up old schema classes and methods
- Removed orphaned metadata response data factory
- Removed pre-theta sketch code
- Removed deprecated min/max aggregations
- Removed loader code for metrics that don't include dimension dictionary
- Removed
KeyValueStoreDimension
Release security module for fili data security filters. Created ChainingRequestMapper
, and a set of mappers for gatekeeping on security roles and whitelisting dimension filters.
Added by @michael-mclawhorn in yahoo#405
Downstream projects now have more flexibility to construct DataApiRequest
by using injectableFactory. An additional constructor for DataApiRequestImpl unpacks the config resources bundle to make it easier to override dictionaries.
Added by @michael-mclawhorn in yahoo#603
Make Field Accessor PostAggregation able to reference post aggregations in adddition to aggregations
Druid allows (but does not protect against ordering) post aggregation trees referencing columns that are also post aggregation trees. This makes it possible to send such a query by using a field accessor to reference another query expression. Using this capability may have some risk.
Added by @michael-mclawhorn in yahoo#543
In the more recent versions of druid that are released after February 23rd, 2017. Druid added support for HTTP Etag. By including a If-None-Match header along with a druid query, druid will compute a hash as the etag in a way such that each unique response has a corresponding unique etag, the etag will be included in the header along with the response. In addition, if a query to druid includes the If-None-Match with a etag of the query, druid will check if the etag matches the response of the query, if yes, druid will return a HTTP Status 304 Content Not Modified response to indicate that the response is unchanged and matches the etag received from druid query request header. Otherwise druid will execute the query and respond normally with a new etag attached to the response header.
This new feature is designed by @garyluoex . For more info, visit @garyluoex 's design at yahoo#255
Lucene Search Provider can re-open in a bug-free way and close more cleanly
Added by @garyluoex in yahoo#551 and yahoo#521
Update Fili to accommodate the deprecated ExtractionFilter
in druid, use selector filter with extraction function instead. Added extraction function on dimensional filter, defaults to extraction function on dimension if it exists.
Added by @garyluoex in yahoo#617
Exposes the LogInfo
objects stored in the RequestLog
, via RequestLog::retrieveAll
making it easier for customers to implement their own scheme for logging the RequestLog
Added by @archolewa in yahoo#574
Fili now supports checking Druid lookup status as one of it's health check. It will be very easy to identify any failed lookups.
Added by @QubitPi in yahoo#620
While backward compatibility is guaranteed, Fili now allows users to rate limit(with a a new rate limiter) based on different criteria other than the default criteria.
Added by @efronbs in yahoo#591
Druid TimeFormatExtractionFunction
is added to Fili. API users could interact with Druid using TimeFormatExtractionFunction
through Fili.
Added by @QubitPi in yahoo#611
In order to allow clients to be notified if a dimension's values are browsable and searchable, a storage strategy metadata is added to dimension. A browsable and searchable dimension is denoted by LOADED
, whereas the opposite is denoted by NONE
. This will be very useful for UI backed by Fili on sending dimension-related queries.
Added by @michael-mclawhorn, @garyluoex and @QubitPi in yahoo#575, yahoo#589, yahoo#558, yahoo#578
Include metrics in logging to allow for better evaluation of the impact of caching for split queries. There used to be only a binary flag (BardQueryInfo.cached
) that is inconsistently set for split queries. Now 3 new metrics are added
- Number of split queries satisfied by cache
- Number of split queries actually sent to the fact store. (not satisfied by cache)
- Number of weight-checked queries
Added by @QubitPi in yahoo#537
Logical metric has more config-richness to not just configure metric name, but also metric long name, description, etc. MetricInstance is now created by accepting a LogicalMetricInfo which contains all these fields in addition to metric name.
Added by @QubitPi in yahoo#492
LuceneSearchProvider
is able to hot swap index by replacing Lucene index by moving the old index directory to a different location, moving new indexes to a new directory with the same old name, and deleting the old index directory in file system.
KeyValueStore
is also made to support hot-swapping key value store location
Added by @QubitPi in yahoo#522
A metric showing how long Fili has been running is available.
Added by @mpardesh in yahoo#518
ui_druid_broke
and non_ui_druid_broker
are not used separately anymore. Instead, a single druid_broker
replaces the two. For backwards compatibility, Fili checks if druid_broker
is set. If not, Fili uses non_ui_druid_broker
and then ui_druid_broker
Added by @mpardesh in yahoo#489
Thanks to everyone who contributed to this release!
@michael-mclawhorn Michael Mclawhorn @garyluoex Gary Luo @archolewa Andrew Cholewa @QubitPi Jiaqi Liu @asifmansoora Asif Mansoor Amanullah @efronbs Ben Efron @deepakb91 Deepak Babu @tarrantzhang Tarrant Zhang @kevinhinterlong Kevin Hinterlong @mpardesh Monica Pardeshi @colemanProjects Neelan Coleman @onlinecco @dejan2609 Dejan Stojadinović
-
- Added
logicalTableAvailability
toTableUtils
which returns the union of intervals for the logical table. - Added
now
parameter togenerateIntervals
for which time macros will be relatively calculated. - Added
CURRENT_MACRO_USES_LATEST
flag which when turned on uses the first unavailable availability to generate the intervals.
- Added
-
- Add
@FunctionalInterface
annotation to all functional interfaces.
- Add
-
- Add capability for Fili to check load statuses of Druid lookups.
-
Extraction Function on selector filter
- Added extraction function on dimensional filter, defaults to extraction function on dimension if it exists.
-
Implement TimeFormatExtractionFunction
- Enable
TimeFormatExtractionFunction
in Fili so that API users could interact with Druid usingTimeFormatExtractionFunction
through Fili.
- Enable
-
- Add tests for all un-tested methods in
DateTimeUtils
.
- Add tests for all un-tested methods in
-
Enable checkstyle to detect incorrect package header
- Fili was able to pass the build with wrong package headers in some source files. This needs to be fixed, and it's fixed in this PR by adding PackageDeclaration checkstyle rule.
- In addition, checkstyle version has been bumped to the latest one(Nov, 2017), which is now able to detect more styling errors.
-
Add loaded strategy onto tables full view endpoint
- Add dimension storage strategy to table full view endpoint
-
Add getter for LogicalMetricInfo in MetricInstance
- There are 3 instance variables inside
MetricInstance
class, two of which have getters. The one without getter,LogicalMetricInfo
, should have one, as well, so that subclass can access it without creating a duplicateLogicalMetricInfo
inside their own.
- There are 3 instance variables inside
-
Backwards compatible constructor for KeyValueStoreDimension around storage strategy
- Provide a backwards compatible constructor for existing implementations that don't provide storage strategies.
-
Have Table Endpoint Filter Using QueryPlanningConstraint
- Enable tables endpoint to fiilter availabilities based on availability-constraint
-
Implement dimension metadata to indicate storage strategy
- In order to allow clients to be notified if a dimension's values are browsable and searchable, a storage strategy metadata is added to dimension.
-
- Add inteface layer to each type of API request class. The types of API request under the refactor are
TablesApiRequest
DimensionApiRequest
SlicesApiRequest
MetricsApiRequest
JobsApiRequest
- Add inteface layer to each type of API request class. The types of API request under the refactor are
-
- Include metrics in logging to allow for better evaluation of the impact of caching for split queries.
- Currently there is only a binary flag (
BardQueryInfo.cached
) that is inconsistently set for split queries - Three new metrics are added
- Number of split queries satisfied by cache
- Number of split queries actually sent to the fact store. (not satisfied by cache)
- Number of weight-checked queries
- Currently there is only a binary flag (
- Include metrics in logging to allow for better evaluation of the impact of caching for split queries.
-
Evaluate format type from both URI and Accept header
- Add a new functional interface
ResponseFormatResolver
to coalesce Accept header format type and URI format type. - Implement a concrete implementation of
ResponseFormatResolver
inAbstractBindingFactory
.
- Add a new functional interface
-
Add Constructor and wither for TableApiRequest
- Making the TablesApiRequest similar to other ApiRequest classses so added an all argument constructor and withers. The all argument constructor is made private since its used only by the withers.
-
Add Code Narc to validate Groovy style
- Checkstyle is great, but it doesn't process Groovy. Code Narc is Checkstyle for Groovy, so we should totally use it.
-
Allow Webservice to Configure Metric Long Name
- Logical metric needs more config-richness to not just configure metric name, but also metric long name, description, etc. MetricInstance is now created by accepting a LogicalMetricInfo which contains all these fields in addition to metric name.
-
Enable search provider to hot-swap index and key value store to hot-swap store location
- Add new default method to
SearchProvider
interface in order to support hot-swapping index. - Implement the hot-swapping method of the
SearchProvider
interface inLuceneSearchProvider
- replace Lucene index by moving the old index directory to a different location, moving new indexes to a new directory with the same old name, and deleting the old index directory in file system. - Add new default method to
KeyValueStore
interface in order to support hot-swapping key value store location.
- Add new default method to
-
Translate doc, built-in-makers.md, to Chinese
- Part of Fili translation in order to increase popularity of Fili in Chinese tech industries.
-
- Add a metric to show how long Fili has been running
-
Add
druid_broker
config parameter to replaceui_druid_broker
andnon_ui_druid_broker
-
Have Tables Endpoint Support (but not use) Additional Query Parameters
- Make the availability consider the TablesApiRequest by passing it into the getLogicalTableFullView method
- Move auxiliary methods from
DataApiRequest
toApiRequest
in order to make them sharable betweenDataApiRequest
andTableApiRequest
.
-
- Added security module for fili data security filters
- Created
ChainingRequestMapper
, and a set of mappers for gatekeeping on security roles and whitelisting dimension filters.
-
- Add
availableIntervals
field to tables endpoint by union the availability for the logical table without taking the TablesApiRequest into account.
- Add
-
Implement EtagCacheRequestHandler
- Add
EtagCacheRequestHandler
that checks the cache for a matching eTag - Add
EtagCacheRequestHandler
toDruidWorkflow
- Make
MemTupleDataCache
take parametrized meta data type
- Add
-
Implement EtagCacheResponseProcessor
- Add
EtagCacheResponseProcessor
that caches the results if appropriate after completing a query according to etag value.
- Add
-
Add dimension dictionary to metric loader
- Added a two argument version of
loadMetricDictionary
default method inMetricLoader
interface that allows dimension dependent metrics by providing a dimension dictionary given byConfigurationLoader
- Added a two argument version of
-
Corrected generality on with methods
- Changed
DataApiRequest
methods to not refer to the implementation classes.
- Changed
-
Avoid casting to generate SimplifiedIntervalList
- Some downstream projects generated partial intervals as
ArrayList
, which cannot be cased toSimplifiedIntervalList
in places likegetPartialIntervalsWithDefault
. The result is a casting exception which crashes downstream applications. Casting is replaced with a explicitSimplifiedIntervalList
object creation.
- Some downstream projects generated partial intervals as
-
ResponseProcessor is now injectable.
- To add a custom
ResponseProcessor
, implementResponseProcessorFactory
, overrideAbstractBinderFactory::buildResponseProcessorFactory
to return your customResponseProcessorFactory.class
.
- To add a custom
-
Add config to ignore partial/volatile intervals and cache everything in cache V2
- In cache V2, user should be able to decide whether partial data or volatile data should be cached or not. This PR adds a config that allows the user to do this.
-
Lift required override on deprecated method in MetricLoader
- Add default implementation to deprecated
loadMetricDictionary
inMetricLoader
so that downstream projects are able to implement the new version without worrying about the deprecated version.
- Add default implementation to deprecated
-
Added DataApiRequestFactory layer
- Replaced static construction of DataApiRequest with an injectableFactory
- Create an additional constructor for DataApiRequestImpl which unpacks the config resources bundle to make it easier to override dictionaries.
-
Refactored HttpResponseMaker to allow for custom ResponseData implementations
- Currently ResponseData is being directly created in when building a response in the HttpResponseMaker. This creation has been extracted to a factory method, which subclasses of HttpResponseMaker can override.
- Changed relevant methods fields from private to protected.
-
- Move
makeRequest
from test toJerseyTestBinder
- Some tests uses variable name
jerseyTestBinder
; some usesjtb
. They are all renamed to the former for naming conformance - Re-indent testing strings for better code formatting
- Move
-
Moved availabilities to metrics construction to MetricUnionCompositeTableDefinition
- Currently, the availability to metrics construction is taking place even before the availability is loaded. Hence, moving the construction to MetricUnionCompositeTableDefinition so that availability is loaded first.
-
Better programmatic generation of metadata json in tests
- Rework metadata tests to be more generated from strings and more pluggable to support heavier and more expressive testing. This allows for more consistency, as well as make it easier to test more cases.
-
Ability to use custom rate limiting schemes
- Allows users to rate limit based on different criteria that the default criteria.
- Existing rate limiting code is now located in
DefaultRateLimiter
. - Create a new rate limiter by:
- implementing the
RateLimiter
interface - overriding the
buildRateLimiter
method in concrete implementation ofAbstractBinderFactory
to return customRateLimiter
implementation - Default token that uses a callback mechanism is available
CallbackRateLimitRequestToken
takes an implementation of the callback interfaceRateLimitCleanupOnRequestComplete
. When the request is completed the token calls thecleanup
method of the callback to handle releasing any resources associate with the inflight request that this token belongs to.
- implementing the
-
Expose
RequestLog
LogInfo
objects- Exposes the
LogInfo
objects stored in theRequestLog
, viaRequestLog::retrieveAll
making it easier for customers to implement their own scheme for logging theRequestLog
.
- Exposes the
-
Display corrected case on StorageStrategy serialization
- The default serialization of enum is
name()
which is final and thus cannot be overridden. An API method is added to return the API name of a storage strategy.
- The default serialization of enum is
-
[Made StorageStrategy lower case]
-
Make shareable methods accessiable to all types of API requests
-
As non-data endpoints are behaving more like data endpoints, some methods deserve to be shared among all types of API requests. Methods for
- parsing and generating
LogicalMetrics
- parsing and generating
LogicalTable
- computing the union of constrained availabilities of constrained logical table
are made available.
- parsing and generating
-
-
Substitute preflight method wildcard character with explicit allowed methods
- Modify ResponseCorsFilter Allowed Methods header to explicitly list allowed methods. Some browsers do not support a wildcard header value.
-
[Make Field Accessor PostAggregation able to reference post aggregations in adddition to aggregations]
- Druid allows (but does not protect against ordering) post aggregation trees referencing columns that are also post aggregation trees. This makes it possible to send such a query by using a field accessor to reference another query expression. Using this capability may have some risk.
-
- Modify FullResponse JSON Objects to contain a flag showing whether a response is new or fetched from cache.
-
Fix wrong default druid url and broken getInnerMostQuery
- Comment out the wrong default druid broker url in module config that break old url config compatibility, add check for validate url in
DruidClientConfigHelper
- Fix broken
getInnermostQuery
method inDruidQuery
- Comment out the wrong default druid broker url in module config that break old url config compatibility, add check for validate url in
-
Rename filter variables and methods in DataApiRequest
- The method names
getFilter
andgetFilters
can be confusing, as well as thefilters
variable
- The method names
-
Decoupled from static dimension lookup building
- Instead of
ModelUtils
, create an interface forExtractionFunctionDimension
and rebaseLookupDimension
andRegisteredLookupDimension
on that interface. LookupDimensionToDimensionSpec
now uses only the Extraction interface to decide how to serialize dimensions.
- Instead of
-
DruidDimensionLoader is now a more generic DimensionValueLoadTask
- The
DimensionValueLoadTask
takes in a collection ofDimensionValueLoader
s to allow for non-Druid dimensions to be loaded.
- The
-
DruidQuery::getInnerQuery and Datasource::getQuery return Optional
- Returning
Optional
is more correct for their usage and should protect against unexpected null values.
- Returning
-
Use all available segment metadata in fili-generic-example
- The fili-generic-example now uses all segment metadata given by Druid instead of just the first one and also provides it to the metadata service.
-
Refactor Response class and implement new serialization logics
- Define interface
ResponseWriter
and its default implementation - Refactor
Response
class, splitting intoResponseData
and three implementations ofResponseWriter
- Define interface
ResponseWriterSelector
and its default implementation. - Hook up the new serialization logic with
HttpResponseMaker
to replace the old one
- Define interface
-
LuceneSearchProvide needs to handle nulls
- Lucene search provider cannot handle null load values. Treat all null values as empty string.
-
Make AvroDimensionRowParser.parseAvroFileDimensionRows support consumer model
- In order to do deferred/buffered file reading, create a call back style method.
-
Make HttpResponseMaker injectable and change functions signature related to custom response creation
- Make
HttpResponseMaker
injectable.DataServlet
andJobsServlet
takesHttpResponseMaker
as input parameter now - Add
ApiRequest
toBuildResponse
,HttpReponseChannel
andcreateResponseBuilder
to enable passing information needed by customizable serialization - Remove duplicate parameter such as
UriInfo
that can be derived fromApiRequest
- Make
-
Change id field in DefaultDimensionField to lower case for Navi compatibility.
- Navi's default setting only recongizes lower case 'id' key name.
-
Fix a bug where table loader uses nested compute if absent
- Nesting
computeIfAbsent
on maps can cause a lot of issues in the map internals that causes weird behavior, nesting structure is now removed
- Nesting
-
Convert null avro record value to empty string
- Make
AvroDimensionRowParser
convert null record value into empty string to avoid NPE
- Make
-
FailedFuture is replaced by CompletedFuture
- CompletedFuture allows values to be returned when calling
.get
on a future instead of just throwing an exception
- CompletedFuture allows values to be returned when calling
-
Extraction Function on selector filter
- Deprecated
ExtractionFilter
since it is deprecated in druid, use selector filter with extraction function instead
- Deprecated
-
Rename filter variables and methods in DataApiRequest
- Deprecated
getFilters
in favor ofgetApiFilters
andgetFilter
in favor ofgetDruidFilter
- Deprecated
-
Deprecate
ui_druid_broker
andnon_ui_druid_broker
and addeddruid_broker
-
Add dimension dictionary to metric loader
- Deprecated single argument version of
loadMetricDictionary
inMetricLoader
, favor additional dimension dictionary argumentloadMetricDictionary
instead
- Deprecated single argument version of
-
Correct exception message & add missing tests
- Clarified exception message thrown by
StreamUtils.throwingMerger
- Clarified exception message thrown by
-
Fix lookup metadata loader by pulling the RegisteredLookupDimension
- Lookup Metadata Health Check always return true when some Druid registered lookup are absolutely failing to be
loaded. Instead of checking load status of
RegisteredLookupDimension
,RegisteredLookupMetadataLoadTask
is checking the status ofLookupDimension
. This PR corrects this behavior.
- Lookup Metadata Health Check always return true when some Druid registered lookup are absolutely failing to be
loaded. Instead of checking load status of
-
Fix 'descriptionription' mis-naming in dimension field
- This is caused by a "desc" -> "description" string replacement. A string handling method has been added to detect "desc" and transform it to "description". If it already comes with "description", no string transformation is made
-
- We want to cache partial or volatile data when
cache_partial_data
is set to true. This is condition is currently reversed. This PR shall fix it
- We want to cache partial or volatile data when
-
- Pagination links on the first pages are missing perPage param. This PR fixes this problem.
-
None show clause was not being respected
- Changed
ResponseData
andJsonApiResponseWriter
to suppress columns that don't have associated dimension fields. - Updated tests to reflect none being hidden.
- Changed
-
Scoped metric dictionaries and the having clause now work together by default
- Add a new ApiHavingGenerator that builds a temporary metric dictionary from the set of requested metrics(not from globally scoped metric dictionary), and then using those to resolve the having clause.
- Add a table generating functions in BaseTableLoader that effectively allow the customer to provide a different metric dictionary at lower scope(not from the globally scoped metric dictionary) for use when building each table.
-
Debug BardQueryInfo to show query split counting
- Query counter in
BardQueryInfo
does not show up in logging because the counter used to be static and JSON serializer does not serialize static fields. - This externalizes the state via a getter for serialization.
- Query counter in
-
Fix intermittent class scanner error on DataSourceConstraint equal
- Class Scanner Spec was injecting an improper dependant field due to type erasure. Made field type explicit.
-
Fix tests with wrong time offset calculation
- Time-checking based tests setup time offset in a wrong way.
timeZoneId.getOffset
is fixed to take the right argument.
- Time-checking based tests setup time offset in a wrong way.
-
Handle Number Format errors from empty or missing cardinality value
-
Fix lucene search provider replace method
- Reopen the search index
-
Fix ConstantMaker make method with LogicalMetricInfo class
- The ConstantMaker make method needs to be rewritten with the LogicalMetricInfo class.
-
Slices endpoint returns druid name instead of api name
- The slices endpoint now gives the druid name instead of the api name for dimensions.
-
Prevent NPE in test due to null instance variables in DataApiRequest
- A particular order of loading
ClassScannerSpec
classes results inNullPointerException
and fails tests, because some instance variables from testingDataApiRequest
are null. This patch assigns non-null values to those variables. - The testing constructor
DataApiRequestImpl()
is now deprecated and will be removed entirely.
- A particular order of loading
-
Fix Lucene Cardinality in New KeyValueStores
- Fix lucene to put correct cardinality value to new key value store that does not contain the cardinality key
-
Log stack trace at error on unexpected DimensionServlet failures
- DimensionServlet was using debug to log unexpected exceptions and not printing the stack trace
-
Fix datasource name physical table name mismatch in VolatileDataRequestHandler
- Fix fetching from
physicaltableDictionary
using datasource name. Now use proper physical table name instead.
- Fix fetching from
-
Fix performance bug around feature flag
- BardFeatureFlag, when used in a tight loop, is very expensive. Underlying map configuration copies the config map on each access.
- Switching to lazy value evaluation
- Added reset contract so changes to feature flags can be directly reverted rather than going through the
SystemConfig
directly
-
Fix deploy branch issue where substrings of whitelisted branches could be released
-
Fix availability testing utils to be compatible with composite tables
- Fix availability testing utils
populatePhysicalTableCacheIntervals
to assign aTestAvailability
that will serialize correctly instead of alwaysStrictAvailability
- Fix internal representation of
VolatileIntervalsFunction
inDefaultingVolatileIntervalsService
fromMap<PhysicalTable, VolatileIntervalsFunction>
toMap<String, VolatileIntervalsFunction>
- Fix availability testing utils
-
Fix metric and dimension names for wikipedia-example
- The metrics and dimensions configured in the
fili-wikipedia-example
were different from those in Druid and as a result the queries sent to Druid were incorrect
- The metrics and dimensions configured in the
-
Remove testing constructor of *ApiRequestImpl
- It is a better practice to separate testing code with implementation. All testing constructors of the following
API requests are removed:
ApiRequestImpl
DataApiRequestImpl
DimensionsApiRequestImpl
MetricsApiRequestImpl
SlicesApiRequestImpl
TablesApiRequestImpl
- Meanwhile, construction of testing API request is delegated to testing class, e.g.
TestingDataApiRequestImpl
- It is a better practice to separate testing code with implementation. All testing constructors of the following
API requests are removed:
-
Reverted the druid name change in slices endpoint instead added to factName
- Reverting the PR-419(yahoo#419) so that the name still points to apiName and added factName which points to druidName.
name
was not valid for cases when it is a Lookup dimension because it was pointing to the base dimension name , so reverted that change and addeddruidName
which is the actual druid fact name andname
being the apiName
- Reverting the PR-419(yahoo#419) so that the name still points to apiName and added factName which points to druidName.
-
Remove custom immutable collections in favor of Guava
Utils.makeImmutable(...)
was misleading and uneeded so it has been removed. Use Guava's immutable collections.
-
Remove dependency on org.apache.httpcomponents
- This library was only used in
fili-wikipedia-example
and has been replaced with AsyncHttpClient.
- This library was only used in
-
- Replace uses of org.json with the jackson equivalent
-
Remove NO_INTERVALS from SimplifiedIntervalList
- This shared instance was vulnerable to being changed globally. All calls to this have been replaced with the empty constructor
The main changes in this version are changes to the Table and Schema structure, including a major refactoring of Physical Table. The concept of Availability was split off from Physical Table, allowing Fili to better reason about availability of columns in Data Sources in ways that it couldn't easily do before, like in the case of Unions. As part of this refactor, Fili also gains 1st-class support for queries using the Union data source.
Full description of changes to Tables, Schemas, Physical Tables, Availability, PartialDataHandler, etc. tbd
This was a long and winding journey this cycle, so the changelog is not nearly as tight as we'd like (hopefully we'll come back and consolidate it for this release), but all of the changes are in there. Along the way, we also addressed a number of other small concerns. Here are some of the highlights beyond the main changes around Physical Tables:
Fixes:
- Unicode characters are now properly sent back to Druid
- Druid client now follows redirects
New Capabilities & Enhancements:
- Can sort on
dateTime
- Can use Druid query response for final verification of response partiality
- Class Scanner Spec can discover dependencies, making its dynamic equality testing easier to use
- There's an example application that shows how to slurp configuration from an existing Druid instance
- Druid queries return a
Future
instead ofvoid
, allowing for blocking requests if needed (though use sparingly!) - Support for extensions defining new Druid query types
Performance upgrades:
- Lazy DruidFilters
- Assorted log level reductions
- Lucene "total results" 50% speedup
Deprecations:
DataSource::getDataSources
no longer makes sense, sinceUnionDataSource
only supports 1 table nowBaseTableLoader::loadPhysicalTable
. UseloadPhysicalTablesWithDependency
insteadLogicalMetricColumn
isn't really a needed concept
Removals:
PartialDataHandler::findMissingRequestTimeGrainIntervals
permissive_column_availability_enabled
feature flag, since the newAvailability
infrastructure now handles this- Lots of things on
PhysicalTable
, since that system was majorly overhauled SegmentMetadataLoader
, which had been deprecated for a while and relies on no longer supported Druid features
-
Implement DruidPartialDataRequestHandler
- Implement
DruidPartialDataRequestHandler
that injectsdruid_uncovered_interval_limit
into Druid query context - Append
DruidPartialDataResponseProcessor
to the current nextResponseProcessor
chain - Add
DruidPartialDataRequestHandler
toDruidWorkflow
betweenAsyncDruidRequestHandler
andCacheV2RequestHandler
and invoke theDruidPartialDataRequestHandler
ifdruid_uncovered_interval_limit
is greater than 0
- Implement
-
- Deprecate Cache V1 and V2 and log warning wherever they are used in codebase
- Add config param
query_response_caching_strategy
that allows any one of the TTL cache, local signature cache, or etag cache to be used as caching strategy - Add 'CacheMode' that represent the caching strategy
- Add 'DefaultCacheMode' that represents all available caching strategies
- Make
AsyncDruidWebServiceImpl::sendRequest
not blow up when getting a 304 status response if etag cache is on - Add etag header to response JSON if etag cache is set to be used
- Add
FeatureFlag::isSet
to expose whether feature flags have been explicitly configured
-
Implement DruidPartialDataResponseProcessor
- Add
FullResponseProcessor
interface that extendsResponseProcessor
- Add response status code to JSON response
- Add
DruidPartialDataResponseProcessor
that checks for any missing data that's not being found
- Add
-
Add
DataSourceName
concept, removing responsibility fromTableName
TableName
was serving double-duty, and it was causing problems and confusion. Splitting the concepts fixes it.
-
Add a
BaseMetadataAvailability
as a parallel toBaseCompositeAvailability
Concrete
andPermissiveAvailability
both extend this new baseAvailability
-
Constrained Table Support for Table Serialization
- Add ConstrainedTable which closes over a physical table and an availability, caching all availability merges.
- Add PartialDataHandler method to use
ConstrainedTable
-
Testing: ClassScannerSpec now supports 'discoverable' depenencies
- Creating
supplyDependencies
method on a class's spec allows definitions of dependencies for dynamic equality testing
- Creating
-
Moved UnionDataSource to support only single tables
DataSource
now supports getDataSource() operation
-
- Add new query context for druid's uncovered interval feature
- Add a configurable property named "druid_uncovered_interval_limit"
- Add new response error messages as needed by Partial Data V2
-
Merge Druid Response Header into Druid Response Body Json Node in AsyncDruidWebServiceImplV2
- Add configuration to
AsyncDruidWebServiceImpl
so that we can opt-in configuration of JSON response content. AsyncDruidWebServiceImpl
takes a strategy for building the JSON from the entire response.
- Add configuration to
-
Add constructor to specify DruidDimensionLoader dimensions directly
-
Add
IntervalUtils::getTimeGrain
to determine the grain given an Interval -
Add Permissive Concrete Physical Table Definition
- Added
PermissiveConcretePhysicalTableDefinition
for defining aPermissiveConcretePhysicalTable
- Added
-
Fix to use physical name instead of logical name to retrieve available interval
- Added
PhysicalDataSourceConstraint
class to capture physical names of columns for retrieving available intervals
- Added
-
BaseCompositePhysicalTable
provides common operations, such as validating coarsest ZonedTimeGrain, for composite tables.
-
Add Reciprocal
satisfies()
relationship complementingsatisfiedBy()
on Granularity -
MetricUnionAvailability and MetricUnionCompositeTable
- Added
MetricUnionAvailability
which puts metric columns of different availabilities together andMetricUnionCompositeTable
which puts metric columns of different tables together in a single table.
- Added
-
Method for finding coarsest ZonedTimeGrain
- Added utility method for returning coarsest
ZonedTimeGrain
from a collection ofZonedTimeGrain
s. This is useful to construct composite tables that requires the coarsestZonedTimeGrain
among a set of tables.
- Added utility method for returning coarsest
-
Should also setConnectTimeout when using setReadTimeout
- Setting connectTimeout on DefaultAsyncHttpClientConfig when building AsyncDruidWebServiceImpl
-
CompositePhysicalTable Core Components Refactor
- Added
ConcretePhysicalTable
andConcreteAvailability
to model table in druid datasource and its availability in the new table availability structure - Added class variable for
DataSourceMetadataService
andConfigurationLoader
intoAbstractBinderFactory
for application to access - Added
loadPhysicalTablesWithDependency
intoBaseTableLoader
to load physical tables with dependencies
- Added
-
PermissiveAvailability and PermissiveConcretePhysicalTable
- Added
PermissiveConcretePhysicalTable
andPermissiveAvailability
to model table in druid datasource and its availability in the new table availability structure.PermissiveAvailability
differs fromConcreteAvailability
in the way of returning available intervals:ConcreteAvailability
returns the available intervals constraint byDataSourceConstraint
and provides them in intersection.PermissiveAvailability
, however, returns them without constraint fromDataSourceConstraint
and provides them in union.PermissiveConcretePhysicalTable
is different fromConcretePhysicalTable
in that the former is backed byPermissiveAvailability
while the latter is backed byConcreteAvailability
.
- Added
-
Refactor DatasourceMetaDataService to fit composite table needs
DataSourceMetadataService
also stores interval data from segment data as intervals by column name map and provides methodgetAvailableIntervalsByTable
to retrieve it
-
QueryPlanningConstraint and DataSourceConstraint
- Added
QueryPlanningConstraint
to replace current interface of Matchers and Resolvers arguments during query planning - Added
DataSourceConstraint
to allow implementation ofPartitionedFactTable
's availability in the near future
- Added
-
Major refactor for availability and schemas and tables
ImmutableAvailability
- provides immutable, typed replacement for maps of column availabilities- New Table Implementations:
BasePhysicalTable
core implementationConcretePhysicalTable
creates an ImmutableAvailability
Schema
implementationsBaseSchema
hasColumns
,Granularity
PhysicalTableSchema
has base plusZonedTimeGrain
, name mappingsLogicalTableSchema
base with builder from table groupResultSetSchema
base with transforming with-ers
ApiName
,TableName
: Added static factory from String to NameErrorMessageFormat
for errors duringResultSetMapper
cycle
-
Added default base class for all dimension types
- Added base classes
DefaultKeyValueStoreDimensionConfig
,DefaultLookupDimensionConfig
andDefaultRegisteredLookupDimensionConfig
to create default dimensions.
- Added base classes
-
dateTime based sort feature for the final ResultSet added
- Now we support dateTime column based sort in ASC or DESC order.
- Added
DateTimeSortMapper
to sort the time buckets andDateTimeSortRequestHandler
to inject to the workflow
-
dateTime specified as sortable field in sorting clause
- added
dateTimeSort
as class parameter inDataApiRequest
. So it can be tracked down to decide the resultSet sorting direction.
- added
-
Detect unset userPrincipal in Preface log block
- Logs a warning if no userPrincipal is set on the request (ie. we don't know who the user is), and sets the
user
field in thePreface
log block toNO_USER_PRINCIPAL
.
- Logs a warning if no userPrincipal is set on the request (ie. we don't know who the user is), and sets the
-
- Made isOn dynamic on BardFeatureFlag
-
Rename
Concrete
toStrict
for the respectivePhysicalTable
andAvailability
- The main difference is in the availability reduction, so make the class name match that.
-
Make
PermissiveConcretePhysicalTable
andConcretePhysicalTable
extend from a common base- Makes the structure match that for composite tables, so they can be reasoned about together.
-
- The main difference is in the accepted availabilities, so make the class structure match that.
-
Make
MetricUnionAvailability
take a set ofAvailability
instead ofPhysicalTable
- Since it was just unwrapping anyways, simplifying the dependency and pushing the unwrap up-stream makes sense
-
Add
DataSourceName
concept, removing responsibility fromTableName
- Impacts:
DataSource
& childrenDataSourceMetadataService
&DataSourceMetadataLoader
SegmentIntervalsHashIdGenerator
PhysicalTable
& childrenAvailability
& childrenErrorMessageFormat
SlicesApiRequest
- Impacts:
-
Force
ConcretePhysicalTable
only take aConcreteAvailability
- Only a
ConcreteAvailability
makes sense, so let the types enforce it
- Only a
-
Clarify name of built-in static
TableName
comparator- Change to
AS_NAME_COMPARATOR
- Change to
-
Constrained Table Support for Table Serialization
- Switched
PartialDataRequestHandler
to use the table from the query rather than thePhysicalTableDictionary
DruidQueryBuilder
uses constrained tables to dynamically pick between Union and Table DataSource implementationsPartialDataHandler
has multiple different entry points now depending on pre or post constraint conditionsgetAvailability
moved to aConfigTable
interface and all configured Tables to that interface- DataSource implementations bind to
ConstrainedTable
and only ConstrainedTable is used after table selection PhysicalTable.getAllAvailableIntervals
explicitly rather than implicitly usesSimplifiedIntervalList
- Bound and default versions of getAvailableIntervals and getAllAvailableIntervals added to PhysicalTable interface
- Package-private optimize tests in
DruidQueryBuilder
moved to protected - Immutable
NoVolatileIntervalsFunction
class made final
- Switched
-
Moved UnionDataSource to support only single tables
UnionDataSource
now accepts only single tables instead of sets of tables.DataSource
now supportsgetDataSource()
operationIntervalUtils.collectBucketedIntervalsNotInIntervalList
moved toPartialDataHandler
-
- The Druid filter is built when requested, NOT at DatApiRequest construction. This will make it easier to write
performant
DataApiRequest
mappers.
- The Druid filter is built when requested, NOT at DatApiRequest construction. This will make it easier to write
performant
-
Reduce log level of failure to store a result in the asynchronous job store
- Customers who aren't using the asynchronous infrastructure shouldn't be seeing spurious warnings about a failure
to execute one step (which is a no-op for them) in a complex system they aren't using. Until we can revisit how we
log report asynchronous errors, we reduce the log level to
DEBUG
to reduce noise.
- Customers who aren't using the asynchronous infrastructure shouldn't be seeing spurious warnings about a failure
to execute one step (which is a no-op for them) in a complex system they aren't using. Until we can revisit how we
log report asynchronous errors, we reduce the log level to
-
Clean up
BaseDataSourceComponentSpec
- Drop a log from
error
totrace
when a response comes back as an error - Make JSON validation helpers return
boolean
instead ofdef
- Drop a log from
-
Make
BasePhysicalTable
take a more extension-friendly set ofPhysicalTable
s- Take
<? extends PhysicalTable>
instead of justPhysicalTable
- Take
-
Update availabilities for PartitionAvailability
- Created
BaseCompositeAvailability
for common features - Refactored
DataSourceMetadataService
methods to useSimplifiedIntervaList
to standardize intersections
- Created
-
Queries to Druid Web Service now return a Future
- Queries now return a
Future<Response>
in addition to having method callbacks.
- Queries now return a
-
Refactor Physical Table Definition and Update Table Loader
PhysicalTableDefinition
is now an abstract class, construct usingConcretePhysicalTableDefinition
insteadPhysicalTableDefinition
now requires abuild
methods to be implemented that builds a physical tableBaseTableLoader
now constructs physical tables by callingPhysicalTableDefinition::build
inbuildPhysicalTablesWithDependency
BaseTableLoader::buildDimensionSpanningTableGroup
now usesloadPhysicalTablesWithDependency
instead of deprecatedloadPhysicalTables
BaseTableLoader::buildDimensionSpanningTableGroup
now does not take druid metrics as arguments, insteadPhysicalTableDefinition
does
-
Fix to use physical name instead of logical name to retrieve available interval
ConcreteAvailability::getAllAvailableIntervals
no longer filters out un-configured columns, insteadPhysicaltable::getAllAvailableIntervals
doesAvailability::getAvailableIntervals
now takesPhysicalDataSourceConstraint
instead ofDataSourceConstraint
Availability
no longer takes a set of columns on the table, only table needs to knowAvailability::getAllAvailableIntervals
now returns a map of column physical name string to interval list instead of column to interval listTestDataSourceMetadataService
now takes map from string to list of intervals instead of column to list of intervals for constructor
-
Reduced number of queries sent by
LuceneSearchProvider
by 50% in the common case- Before, we were using
IndexSearcher::count
to get the total number of documents, which spawned an entire second query (so two Lucene queries rather than one when requesting the first page of results). We now pull that information from the results of the query directly.
- Before, we were using
-
Allow GranularityComparator to return static instance
- Implementation of PR #193 suggests an possible improvement on
GranularityComparator
: Put the static instance on theGranularityComparator
class itself, so that everywhere in the system that wants it could just callGranularityComparator.getInstance()
- Implementation of PR #193 suggests an possible improvement on
-
Make
TemplateDruidQuery::getMetricField
get the first field instead of any field- Previously, order was by luck, now it's by the contract of
findFirst
- Previously, order was by luck, now it's by the contract of
-
Clean up config loading and add more logs and checks
- Use correct logger in
ConfigurationGraph
(wasConfigResourceLoader
) - Add error / logging messages for module dependency indicator
- Tweak loading resources debug log to read better
- Tweak module found log to read better
- Convert from
Resource::getFilename
toResource::getDescription
when reporting errors in the configuration graph.getDescription
is more informative, usually holding the whole file path, rather than just the terminal segment / file name
- Use correct logger in
-
Base TableDataSource serialization on ConcretePhysicalTable fact name
-
CompositePhsyicalTable Core Components Refactor
TableLoader
now takes an additional constructor argument (DataSourceMetadataService
) for creating tablesPartialDataHandler::findMissingRequestTimeGrainIntervals
now takesDataSourceConstraint
- Renamed
buildTableGroup
method tobuildDimensionSpanningTableGroup
-
Restored flexibility about columns for query from DruidResponseParser
- Immutable schemas prevented custom query types from changing
ResultSetSchema
columns. - Columns are now sourced from
DruidResponseParser
and default implemented onDruidAggregationQuery
- Immutable schemas prevented custom query types from changing
-
Refactor DatasourceMetaDataService to fit composite table needs
BasePhysicalTable
now stores table name as theTableName
instead ofString
SegmentInfo
now stores dimension and metrics from segment data for constructing column to available interval map
-
QueryPlanningConstraint
andDataSourceConstraint
QueryPlanningConstraint
replaces current interface of Matchers and ResolversDataApiRequest
andTemplateDruidQuery
arguments during query planning- Modified
findMissingTimeGrainIntervals
method inPartialDataHandler
to take a set of columns instead ofDataApiRequest
andDruidAggregationQuery
-
Major refactor for availability and schemas and tables
Schema
andTable
became interfacesTable
has-aSchema
PhysicalTable
extendsTable
, interface only supports read-only operations
Schema
constructed as immutable,Column
s no longer bind toSchema
- Removed
addNew*Column
method
- Removed
Schema
implementations now:BaseSchema
,PhysicalTableSchema
,LogicalTableSchema
,ResultSetSchema
DimensionLoader
usesConcretePhysicalTable
PhysicalTableDefinition
made some fields private, accepts iterables, returns immutable dimensionsResultSet
constructor parameter order swappedResultSetMapper
now depends onResultSetSchema
TableDataSource
constructor arg narrows:PhysicalTable
->ConcreteTable
DataApiRequest
constructor arg narrows:Table
->LogicalTable
DruidQueryBuilder
now polymorphic on building data sources models from new physical tablesApiFilter
schema validation moved toDataApiRequest
- Guava version bumped to 21.0
-
Added support for extensions defining new Query types
TestDruidWebService
assumes unknown query types behave likeGroupBy
,TimeSeries
, andTopN
ResultSetResponseProcessor
delegates toDruidResponseProcessor
to build expected query schema, allowing subclasses to override and extend the schema behavior
-
Make HealthCheckFilter reject message nicer
- The previous message of
reject <url>
wasn't helpful, useful, nor very nice to users, and the message logged was not very useful either. The message has been made nicer (Service is unhealthy. At least 1 healthcheck is failing
), and the log has been made better as well.
- The previous message of
-
RequestLog
timings support the try-with-resources block- A block of code can now be timed by wrapping the timed block in a try-with-resources block that starts the timer. Note: This won't work when performing timings across threads, or across contexts. Those need to be started and stopped manually.
-
Clean up logging and responses in
DimensionCacheLoaderServlet
- Switched a number of
error
-level logs todebug
level to line up with logging guidance when request failures were result of client error - Reduced some
info
-level logs down todebug
- Converted to 404 when error was cause by not finding a path element metadata generated by query we run to get the actual results.
- Switched a number of
-
Update LogBack version 1.1.7 -> 1.2.3
- In web-applications, logback-classic will automatically install a listener which will stop the logging context and release resources when your web-app is reloaded.
- Logback-classic now searches for the file
logback-test.xml
, thenlogback.groovy
, and thenlogback.xml
. In previous versionslogback.groovy
was looked up first which was non-sensical in presence oflogback-test.xml
AsyncAppender
no longer drops events when the current thread has its interrupt flag set.- Critical parts of the code now use
COWArrayList
, a custom developed allocation-free lock-free thread-safe implementation of theList
interface. It is optimized for cases where iterations over the list vastly outnumber modifications on the list. It is based onCopyOnWriteArrayList
but allows allocation-free iterations over the list.
-
Update Metrics version 3.1.2 -> 3.2.2
- Added support for disabling reporting of metric attributes.
- Support for setting a custom initial delay for reporters.
- Support for custom details in a result of a health check.
- Support for asynchronous health checks
- Added a listener for health checks.
- Health checks are reported as unhealthy on exceptions.
- Added support for Jetty 9.3 and higher.
- Shutdown health check registry
- Add support for the default shared health check registry name
-
Update SLF4J version 1.7.21 -> 1.7.25
- When running under Java 9, log4j version 1.2.x is unable to correctly parse the "java.version" system property. Assuming an incorrect Java version, it proceeded to disable its MDC functionality. The slf4j-log4j12 module shipping in this release fixes the issue by tweaking MDC internals by reflection, allowing log4j to run under Java 9.
- The slf4j-simple module now uses the latest reference to
System.out
orSystem.err
. - In slf4j-simple module, added a configuration option to enable/disable caching of the System.out/err target.
-
Update Lucene version 5.3.0 -> 6.5.0
- Added
IndexSearcher::getQueryCache
andgetQueryCachingPolicy
org.apache.lucene.search.Filter
is now deprecated. You should useQuery
objects instead of Filters, and theBooleanClause.Occur.FILTER
clause in order to let Lucene know that aQuery
should be used for filtering but not scoring.MatchAllDocsQuery
now has a dedicatedBulkScorer
for better performance when used as a top-level query.- Added a
IndexWriter::getFieldNames
method (experimental) to return all field names as visible from theIndexWriter
. This would be useful forIndexWriter::updateDocValues
calls, to prevent calling with non-existent docValues fields
- Added
-
Revert deprecation of getAvailbleInterval with PhysicalDatasourceConstraint
- The method is needed in order for availability to function correctly, there is a deeper dive and planning required to actually deprecate it in favor of simpler less confusing design.
-
Remove
PhysicalTable::getTableName
to usegetName
instead- Having more than 1 method for the same concept (ie. what's the name of this physical table) was confusing and not very useful.
-
Remove
PhysicalTableDictionary
dependency fromSegmentIntervalHashIdGenerator
- Constructors taking the dictionary have been deprecated, since it is not used any more
-
Add
DataSourceName
concept, removing responsibility fromTableName
- Impacts:
DataSourceMetadataService
&DataSourceMetadataLoader
ConcretePhysicalTable
- Impacts:
-
Deprecate old static
TableName
comparator- Changed to
AS_NAME_COMPARATOR
since it's more descriptive
- Changed to
-
Constrained Table Support for Table Serialization
- Deprecated static empty instance of
SimplifiedIntervalList.NO_INTERVALS
- It looks like an immutable singleton, but it's mutable and therefore unsafe. Just make new instances of
SimplifiedIntervalList
instead.
- It looks like an immutable singleton, but it's mutable and therefore unsafe. Just make new instances of
PartialDataRequestHandler
constructor usingPhysicalTableDictionary
- Deprecated static empty instance of
-
Moved UnionDataSource to support only single tables
DataSource::getDataSources
no longer makes sense, sinceUnionDataSource
only supports 1 table now
-
- Added
lucene-backward-codecs.jar
as a dependency to restore support for indexes built on earlier instances. - Support for indexes will only remain while the current Lucene generation supports them. All Fili users should rebuild indexes on Lucene 6 to avoid later pain.
- Added
-
Refactor Physical Table Definition and Update Table Loader
- Deprecated
BaseTableLoader::loadPhysicalTable
. UseloadPhysicalTablesWithDependency
instead.
- Deprecated
-
CompositePhysicalTable
Core Components Refactor- Deprecated
BasePhysicalTable::setAvailability
to discourage using it for testing
- Deprecated
-
RequestLog::stopMostRecentTimer
has been deprecated- This method is a part of the infrastructure to support the recently deprecated
RequestLog::switchTiming
.
- This method is a part of the infrastructure to support the recently deprecated
-
LogicalMetricColumn
doesn't need a 2-arg constructor- It's only used in one place, and there's no real need for it because the other constructor does the same thing
-
DimensionColumn
's 2-arg constructor is only used by a deprecated class- When that deprecated class (
LogicalDimensionColumn
) goes away, this constructor will go away as well
- When that deprecated class (
-
Fix druid partial data and partition table incompatibility
- Datasource names returned by partition table now contains only datasources that are actually used in the query
- Fix the problem where uncovered intervals is given by druid for partition table that fili filtered out
-
Fix the generic example for loading multiple tables
- Loading multiple tables caused it to hang and eventually time out.
- Also fixed issue causing all tables to show the same set of dimensions.
-
Support for Lucene 5 indexes restored
- Added
lucene-backward-codecs.jar
as a dependency to restore support for indexes built on earlier instances.
- Added
-
Specify the character encoding to support unicode characters
- Default character set used by the back end was mangling Unicode characters.
-
Correct empty-string behavior for druid header supplier class config
- Empty string would have tried to build a custom supplier. Now it doesn't.
-
Default the
AsyncDruidWebServiceImpl
to follow redirects- It defaulted to not following redirects, and now it doesn't, and will follow redirects appropriately
-
Reenable custom query types in
TestDruidWebService
-
Fixed
SegmentMetadataLoader
Unconfigured Dimension Bug- Immutable availability was failing when attempting to bind segment dimension columns not configured in the dimension dictionary.
- Fix to filter irrelevant column names.
-
Major refactor for availability and schemas and tables
- Ordering of fields on serialization could be inconsistent if intermediate stages used
HashSet
orHashMap
. - Several constructors switched to accept
Iterable
and returnLinkedHashSet
to emphasize importance of ordering/preventHashSet
intermediates which disrupt ordering.
- Ordering of fields on serialization could be inconsistent if intermediate stages used
-
Fix Lookup Dimension Serialization
- Fix a bug where lookup dimension is serialized as dimension spec in both outer and inner query
-
Correct error message logged when no table schema match is found
-
Setting
readTimeout
onDefaultAsyncHttpClientConfig
when buildingAsyncDruidWebServiceImpl
-
Refactor Physical Table Definition and Update Table Loader
- Removed deprecated
PhysicalTableDefinition
constructor that takes aZonlessTimeGrain
. UseZonedTimeGrain
instead - Removed
BaseTableLoader::buildPhysicalTable
. Table building logic has been moved toPhysicalTableDefinition
- Removed deprecated
-
Move UnionDataSource to support only single tables
DataSource
no longer acceptsSet<Table>
in a constructor
-
CompositePhsyicalTable Core Components Refactor
- Removed deprecated method
PartialDataHandler::findMissingRequestTimeGrainIntervals
- Removed
permissive_column_availability_enabled
feature flag support and corresponding functionality inPartialDataHandler
. Permissive availability is instead handled via table configuration, and continued usage of the configuration field generates a warning when Fili starts. - Removed
getIntersectSubintervalsForColumns
andgetUnionSubintervalsForColumns
fromPartialDataHandler
.Availability
now handles these responsibilities. - Removed
getIntervalsByColumnName
,resetColumns
andhasLogicalMapping
methods inPhysicalTable
. These methods were either part of the availability infrastructure, which changed completely, or the responsibilities have moved toPhysicalTableSchema
(in the case ofhasLogicalMapping
). - Removed
PartialDataHandler::getAvailability
.Availability
(on the PhysicalTables) has taken it's place. - Removed
SegmentMetadataLoader
because the endpoint this relied on had been deprecated in Druid. Use theDataSourceMetadataLoader
instead.- Removed
SegmentMetadataLoaderHealthCheck
as well.
- Removed
- Removed deprecated method
-
Major refactor for availability and schemas and tables
- Removed
ZonedSchema
(all methods moved to child classResultSetSchema
) PhysicalTable
no longer supports mutable availability- Removed
addColumn
,removeColumn
,getWorkingIntervals
, andcommit
- Other mutators no longer exist, availability is immutable
- Removed
getAvailableIntervals
.Availability::getAvailableIntervals
replaces it.
- Removed
- Removed
DruidResponseParser::buildSchema
. That logic has moved to theResultSetSchema
constructor. - Removed redundant
buildLogicalTable
methods fromBaseTableLoader
- Removed
This patch is to back-port a fix for getting Druid to handle international / UTF character sets correctly. It is included in the v0.8.x stable releases.
- Specify the character encoding to support unicode characters
- Default character set used by the back end was mangling Unicode characters.
This release is a mix of fixes, upgrades, and interface clean-up. The general themes for the changes are around metric configuration, logging and timing, and adding support for tagging dimension fields. Here are some of the highlights, but take a look in the lower sections for more details.
Fixes:
- Deadlock in
LuceneSearchProvider
- CORS support when using the
RoleBasedAuthFilter
New Capabilities & Enhancements:
- Dimension field tagging
- Controls around max size of Druid response to cache
- Logging and timing enhancements
Deprecations / Removals:
RequestLog::switchTiming
is deprecated due to it's difficulty to use correctly- Metric configuration has a number of deprecations as part of the effort to make configuration easier and less complex
Changes:
- There was a major overhaul of Fili's dependencies to upgrade their versions
-
Dimension Field Tagging and Dynamic Dimension Field Serilization
- Added a new module
fili-navi
for components added to support for Navi - Added
TaggedDimensionField
and related components infili-navi
- Added a new module
-
Ability to prevent caching of Druid responses larger than the maximum size supported by the cache
- Supported for both Cache v1 and V2
- Controlled with
bard__druid_max_response_length_to_cache
setting - Default value is
MAX_LONG
, so no cache prevention will happen by default
-
Log a warning if
SegmentMetadataLoader
tries to load empty segment metadata- While not an error condition (eg. configuration migration), it's unusual, and likely shouldn't stay this way long
-
More descriptive log message when no physical table found due to schema mismatch
- Previous log message was user-facing only, and not as helpful as it could have been
-
Logs more finegrained timings of the request processing workflow
-
Added RegisteredLookupDimension and RegisteredLookupExtractionFunction
- This enables supporting Druid's most recent evolution of the Query Time Lookup feature
-
- This version is rudimentary. See issue 120 for future plans.
-
Added MetricField accessor to the interface of LogicalMetric
- Previously accessing the metric field involved using three method calls
-
Ability for
ClassScanner
to instantiate arrays- This allows for more robust testing of classes that make use of arrays in their constructor parameters
-
- Code to automatically test module is correctly configured.
-
The druid query posting timer has been removed
- There wasn't really a good way of stopping timing only the posting itself. Since the timer is probably not that useful, it has been removed.
-
Dimension Field Tagging and Dynamic Dimension Field Serilization
- Changed
fili-core
dimension endpointDimensionField
serialization strategy from hard coded static attributes to dynamic serialization based onjackson
serializer
- Changed
-
MetricMaker cleanup and simplification
- Simplified raw aggregation makers
ConstantMaker
now throws anIllegalArgumentException
wrapping the raw NumberFormatException on a bad argumentFilteredAggregation
no longer requires a metric name to be passed in. (Aggregation field name is used)FilteredAggregationMaker
now accepts a metric to the 'make' method instead of binding at construction time.ArithmeticAggregationMaker
default now usesNoOpResultSetMapper
instead of rounding mapper. (breaking change)FilteredAggregationMaker
,SketchSetOperationMaker
members are now private
-
Used Metric Field accessor to simplify maker code
- Using metric field accessor simplifies and enables streaminess in maker code
-
Fili's name for a PhysicalTable is decoupled from the name of the associated table in Druid
-
- This should never be a user fault, since that check is much earlier
-
Make
SegmentMetadata::equals
null
-safe- It was not properly checking for
null
before and could have exploded
- It was not properly checking for
-
Default DimensionColumn name to use apiName instead of physicalName
- Change
DimensionColumn.java
to use dimension api name instead of physical name as its name - Modified files dependent on
DimensionColumn.java
and corresponding tests according to the above change
- Change
-
Remove restriction for single physical dimension to multiple lookup dimensions
- Change physical dimension name to logical dimension name mapping into
Map<String, Set<String>>
instead ofMap<String, String>
inPhysicalTable.java
- Change physical dimension name to logical dimension name mapping into
-
SegmentMetadataLoader include provided request headers
SegmentMetadataLoader
sends requests with the provided request headers inAsyncDruidWebservice
now- Refactored
AsyncDruidWebserviceSpec
test and added test for checkinggetJsonData
includes request headers too
-
Include physical table name in warning log message for logicalToPhysical mapping
- Without this name, it's hard to know what table seems to be misconfigured.
-
ResponseValidationException
usesResponse.StatusType
rather thanResponse.Status
Response.StatusType
is the interface thatResponse.Status
implements.- This will have no impact on current code in Fili that uses
ResponseValidationException
, and it allows customers to inject http codes not included inResponse.Status
.
-
Removed "provided" modifier for SLF4J and Logback dependencies in the Wikipedia example
-
Unless otherwise noted, all dependency upgrades are for general stability and performance improvement. The called- out changes are only those that are likely of interest to Fili. Any dependency upgrade for which a changelog could not be found has not been linked to one, otherwise all other upgrades include a link to the relevant changelog.
WARNING: There is a known dependency conflict between apache commons configuration 1.6 and 1.10. If after upgrading to the latest Fili, your tests begin to fail with
NoClassDefFoundExceptions
, it is likely that you are explicitly depending on the apache commons configuration 1.6. Removing that dependency or upgrading it to 1.10 should fix the issue.- Gmaven plugin 1.4 -> 1.5
- Guava 16.0.1 -> 20.0
- Jedis 2.7.2 -> 2.9.0:
- Geo command support and binary mode support
- ZADD support
- Ipv6 and SSL support
- Other assorted feature and Redis support upgrades
- Redisson 2.2.13 -> 3.1.0:
- Support for binary stream in and out of Reddison
- Lots of features for distributed data structure capabilities
- Can make fire-and-forget style calls in ack-response-only modes
- Many fixes and improvements for PubSub features
- Support for command timeouts
- Fixed bug where connections did not always close when RedisClient shut down
- Breaking API changes:
- Moved config classes to own package
- Moved core classes to api package
- Moved to Redisson's RFuture instead of netty's Future
- JodaTime 2.8.2 -> 2.9.6:
- Faster TZ parsing
- Added
Interval.parseWithOffset
- GMT fix for JDK 8u60
- Fixed Interval overflow bug
- TZ data update from 2015g to 2016i
- AsyncHttpClient 2.0.2 -> 2.0.24:
- Custom header separator fix
- No longer double-wrapping CompletableFuture exceptions
- Apache HttpClient 4.5 -> 4.5.2:
- Supports handling a redirect response to a POST request
- Fixed deflate zlib header issue
- RxJava 1.1.5 -> 1.2.2:
- Deprecate TestObserver in favor of TestSubscriber
- Spymemcached 2.12.0 -> 2.12.1
- org.json 20141113 -> 20160810
- Maven release plugin 2.5 -> 2.5.3:
- Fixes
release:prepare
not committing pom.xml if not in the git root - Fixes version update not updating inter-module dependencies
- Fixes version update failing when project is not a SNAPSHOT
- Fixes
- Maven antrun plugin 1.7 -> 1.8
- Maven compiler plugin 3.3 -> 3.6.0:
- Fix for compiler fail in Eclipse
- Maven surefire plugin 2.17 -> 2.19.1:
- Correct indentation for Groovy's power asserts
- Maven javadoc plugin 2.10.3 -> 2.10.4
- Maven site plugin 3.5 -> 3.6
- SLF4J 1.7.12 -> 1.7.21:
- Fixed to MDC adapter, leaking information to non-child threads
- Better handling of ill-formatted strings
- Cleaned up multi-thread consistency for LoggerFactory-based logger initializations
- Closed a multi-threaded gap where early logs may be lost if they happened while SLF4J was initializing in a multi-threaded application
- Logback 1.1.3 -> :
- Child threads no longer inherit MDC values
- AsyncAppender can be configured to never block
- Fixed issue with variable substitution when the value ends in a colon
- Apache Commons Lang 3.4 -> 3.5
- Apache Commons Configuration 1.6 -> 1.10:
- Tightened getList's behavior if the list values are non-strings
- MapConfiguration can be set to not trim values by default
- CompositeConfiguration can now handle non-BaseConfiguration core configurations
addConfiguration()
overload added to allow correcting inconsistent configuration compositing
- Apache Avro 1.8.0 -> 1.8.1
- Spring Core 4.0.5 -> 4.3.4
- CGLib 3.2.0 -> 3.2.4:
- Optimizations and regression fixes
- Objenesis 2.2 -> 2.4
- Jersey 2.22 -> 2.24:
- https://jersey.java.net/release-notes/2.24.html
- https://jersey.java.net/release-notes/2.23.html
@BeanParam
linking support fix- Declarative linking with Maps fixed
- Async write ordering deadlock fix
- HK2 2.4.0-b31 -> 2.5.0-b05:
- Necessitated by Jersey upgrade
- JavaX Annotation API 1.2 -> 1.3
-
Deprecated DefaultingDictionary usage in DefaultingVolatileIntervalsService
-
RequestLog::switchTiming
has been deprecatedRequestLog::switchTiming
is very context-dependent, and therefore brittle. In particular, adding any additional timers inside code called by a timed block may result in the original timer not stopping properly. All usages ofswitchTiming
should be replaced with explicit calls toRequestLog::startTiming
andRequestLog::stopTiming
.
-
Dimension Field Tagging and Dynamic Dimension Field Serilization
- Deprecated
DimensionsServlet::getDimensionFieldListSummaryView
andDimensionsServlet::getDimensionFieldSummaryView
since there is no need for it anymore due to the change in serialization ofDimensionField
- Deprecated
-
Default DimensionColumn name to use apiName instead of physicalName
- Deprecated
TableUtils::getColumnNames(DataApiRequest, DruidAggregationQuery, PhysicalTable)
returning dimension physical name, in favor ofTableUtils::getColumnNames(DataApiRequest, DruidAggregationQuery)
returning dimension api name - Deprecated
DimensionColumn::DimensionColumn addNewDimensionColumn(Schema, Dimension, PhysicalTable)
in favor ofDimensionColumn::DimensionColumn addNewDimensionColumn(Schema, Dimension)
which uses api name instead of physical name as column identifier for columns - Deprecated
LogicalDimensionColumn
in favor ofDimensionColumn
sinceDimensionColumn
stores api name instead of physical name now, soLogicalDimensionColumn
is no longer needed
- Deprecated
-
Moved to static implementations for numeric and sketch coercion helper methods
MetricMaker.getSketchField(String fieldName)
rather useMetricMaker.getSketchField(MetricField field)
MetricMaker.getNumericField(String fieldName)
rather useMetricMaker.getNumericField(MetricField field)
-
MetricMaker cleanup and simplification
AggregationAverageMaker
deprecated conversion method required by deprecated sketch library
-
Metric configuration deprecations
- Deprecated superfluous constructor of
FilteredAggregator
with superfluous argument - Deprecated MetricMaker utility method in favor of using new field accessor on Metric
- Deprecated superfluous constructor of
-
Deprecated MetricMaker.getDependentQuery lookup method in favor of simpler direct access
-
- There is a chance the
LuceneSearchProvider
will deadlock if one thread is attempting to read a dimension for the first time while another is attempting to load it:- Thread A is pushing in new dimension data. It invokes
refreshIndex
, and acquires the write lock. - Thread B is reading dimension data. It invokes
getResultsPage
, and theninitializeIndexSearcher
, thenreopenIndexSearcher
. It hits the write lock (acquired by Thread A) and blocks. - At the end of its computation of
refreshIndex
, Thread A attempts to invokereopenIndexSearcher
. However,reopenIndexSearcher
issynchronized
, and Thread B is already invoking it. - To fix the resulting deadlock,
reopenIndexSearcher
is no longer synchronized. Since threads need to acquire a write lock before doing anything else anyway, the method is still effectively synchronized.
- Thread A is pushing in new dimension data. It invokes
- There is a chance the
-
Fix and refactor role based filter to allow CORS
- Fix
RoleBasedAuthFilter
to bypassOPTIONS
request for CORS - Discovered a bug where
user_roles
is declared but unset still reads as a list with empty string (included a temporary fix by commenting the variable declaration) - Refactored
RoleBasedAuthFilter
andRoleBasedAuthFilterSpec
for better testing
- Fix
-
Added missing coverage for
ThetaSketchEstimate
unwrapping inMetricMaker.getSketchField
-
DataSource::getNames
now returns Fili identifiers, not fact store identifiers -
Made a few injection points not useless
- Template types don't get the same subclass goodness that method invocation and dependencies get, so this method
did not allow returning a subclass of
DruidQueryBuilder
or ofDruidResponseParser
.
- Template types don't get the same subclass goodness that method invocation and dependencies get, so this method
did not allow returning a subclass of
-
Made now required constructor for ArithmeticMaker with rounding public
This release is focused on general stability, with a number of bugs fixed, and also adds a few small new capabilities and enhancements. Here are some of the highlights, but take a look in the lower sections for more details.
Fixes:
- Dimension keys are now properly case-sensitive (
- Because this is a breaking change, the fix has been wrapped in a feature flag. For now, this defaults to the existing broken behavior, but this will change in a future version, and eventually the fix will be permanent.
all
-grain queries are no longer split- Closed a race condition in the
LuceneSearchProvider
where readers would get an error if an update was in progress - Correctly interpreting List-type configs from the Environment tier as a true
List
- Stopped recording synchronous requests in the
ApiJobStore
, which is only intended to hold async requests
New Capabilities & Enhancements:
- Customizable logging format
- X-Request-Id header support, letting clients set a request ID that will be included in the Druid query
- Support for Druid's
In
filter - Native support for building
DimensionRow
s from AVRO files - Ability to set headers on Druid requests, letting Fili talk to a secure Druid
- Better error messaging when things go wrong
- Better ability to use custom Druid query types
-
[Added Dimension Value implementation for PartitionTableDefinition]
- Added
DimensionIdFilter
implementation ofDataSourceFilter
- Created
DimensionListPartitionTableDefinition
- Added
-
Added 'hasAnyRows' to SearchProvider interface
- Has Any Rows allows implementations to optimize queries which only need to identify existence of matches
-
Can Populate Dimension Rows from an AVRO file
- Added
AvroDimensionRowParser
that parses an AVRO data file intoDimensionRow
s after validating the AVRO schema. - Added a functional Interface
DimensionFieldMapper
that maps field name.
- Added
-
- The in-filter only works with Druid versions 0.9.0 and up.
-
Adding slice availability to slices endpoint
- Slice availability can be used to debug availability issues on Physical tables
-
Ability to set headers for requests to Druid * The
AsyncDruidWebServiceImpl
now accepts aSupplier<Map<String, String>>
argument which specifies the headers to add to the Druid data requests. This feature is made configurable throughSystemConfig
in theAbstractBinderFactory
.
-
Error messages generated during response processing include the request id.
-
DimensionStoreKeyUtils
now supports case sensitive row and column keys- Wrapped this config in a feature flag
case_sensitive_keys_enabled
which is set tofalse
by default for backwards compatibility. This flag will be set totrue
in future versions.
- Wrapped this config in a feature flag
-
- Created new class
GranularityDictionary
and bind getGranularityDictionary to it
- Created new class
-
CSV attachment name for multi-interval request now contain '__' instead of ','
- This change is made to allow running multi-api request with csv format using chrome browser.
-
Improves error messages when querying Druid goes wrong
- The
ResponseException
now includes a message that prints theResponseException
's internal state (i.e. the druid query and response code) using the error messagesErrorMessageFormat::FAILED_TO_SEND_QUERY_TO_DRUID
andErrorMessageFormat::ERROR_FROM_DRUID
- The druid query and status code, reason and response body are now logged at the error level in the
failure and error callbacks in
AsyncDruidWebServiceImpl
- The
-
Fili now supports custom Druid query types
QueryType
has been turned into an interface, backed by an enumDefaultQueryType
.- The default implementations of
DruidResponseParser
DruidQueryBuilder
,WeightEvaluationQuery
andTestDruidWebService
only supportDefaultQueryType
.
- The default implementations of
DruidResponseParser
is now injectable by overridingAbstractBinderFactory::buildDruidResponseParser
method.DruidQueryBuilder
is now injectable by overridingAbstractBinderFactory::buildDruidQueryBuilder
method.
-
- For details see: https://commons.apache.org/proper/commons-collections/security-reports.html#Apache_Commons_Collections_Security_Vulnerabilities
- It should be noted that Fili does not make use of any the serialization/deserialization capabilities of any classes in the functor package, so the security vulnerability does not affect Fili.
-
Clean up build plugins
- Move some plugin configs up to
pluginManagement
- Make
fili-core
publish test javadocs - Default source plugin to target
jar-no-fork
instead ofjar
- Default javadoc plugin to target
javadoc-no-fork
as well asjar
- Move some versions up to
pluginManagement
- Remove overly (and un-usedly) specified options in surfire plugin configs
- Make all projects pull in the
source
plugin
- Move some plugin configs up to
-
Corrected bug with Fili sub-module dependency specification
- Dependency versions are now set via a fixed property at deploy time, rather than relying on
project.version
- Dependency versions are now set via a fixed property at deploy time, rather than relying on
-
Cleaned up dependencies in pom files
- Moved version management of dependencies up to the parent Pom's dependency management section
- Cleaned up the parent Pom's dependency section to only be those dependencies that truly every sub-project should depend on.
- Cleaned up sub-project Pom dependency sections to handle and better use the dependencies the parent Pom provides
-
DimensionStoreKeyUtils
now supports case sensitive row and column keys- Case insensitive row and column keys will be deprecated going forward.
- Because this is a breaking change, the fix has been wrapped in a feature flag. For now, this defaults to the
existing broken behavior, but this will change in a future version, and eventually the fix will be permanent.
- The feature flag for this is
bard__case_sensitive_keys_enabled
- The feature flag for this is
-
All constructors of
ResponseException
that do not take anObjectWriter
- An
ObjectWriter
is required in order to ensure that the exception correctly serializes its associated Druid query
- An
-
Environment comma separated list variables are now correctly pulled in as a list
- Before it was pulled in as a single sting containing commas, now environment variables are pulled in the same way as the properties files
- Added test to test comma separated list environment variables when
FILI_TEST_LIST
environment variable exists
-
Druid queries are now serialized correctly when logging
ResponseExceptions
-
Disable Query split for "all" grain
- Before, if we requested "all" grain with multiple intervals, the
SplitQueryRequestHandler
would incorrectly split the query and we would get multiple buckets in the output. Now, the query split is disabled for "all" grain and we correctly get only one bucket in the response.
- Before, if we requested "all" grain with multiple intervals, the
-
Adds read locking to all attempts to read the Lucene index
- Before, if Fili attempted to read from the Lucene indices (i.e. processing a query with filters) while loading
dimension indices, the request would fail and we would get a
LuceneIndexReaderAlreadyClosedException
. Now, the read locks should ensure that the query processing will wait until indexing completes (and vice versa).
- Before, if Fili attempted to read from the Lucene indices (i.e. processing a query with filters) while loading
dimension indices, the request would fail and we would get a
-
- The workflow that updates the job's metadata with
success
was running even when the query was synchronous. That update also caused the ticket to be stored in theApiJobStore
. - The delay operator didn't stop the "update" workflow from executing because it viewed an
Observable::onCompleted
call as a message for the purpose of the delay. Since the two observables that that the metadata update gated on are empty when the query is synchronous, the "update metadata" workflow was being triggered every time. - The delay operator was replaced by
zipWith
as a gating mechanism.
- The workflow that updates the job's metadata with
-
JsonSlurper
can now handle sorting lists with mixed-type entries- even if the list starts with a string, number, or boolean
-
Broken segment metadata with Druid v0.9.1
- Made
NumberedShardSpec
ignore unexpected properties during deserialization - Added tests to
DataSourceMetadataLoaderSpec
to test the v.0.9.1 optional fieldshardSpec.partitionDimensions
on segment info JSON.
- Made
This release focuses on stabilization, especially of the Query Time Lookup (QTL) capabilities, and the Async API and Jobs resource. Here are the highlights of what's in this release:
- A bugfix for the
DruidDimensionLoader
- A new default
DimensionLoader
- A bunch more tests and test upgrades
- Filtering and pagination on the Jobs resource
- A
userId
field for default Job resource representations - Package cleanup for the jobs-related classes
-
always
keyword for theasyncAfter
parameter now guarantees that a query will be asynchronous -
A test implementation of the
AsynchronousWorkflowsBuilder
:TestAsynchronousWorkflowsBuilder
- Identical to the
DefaultAsynchronousWorkflowsBuilder
, except that it includes hooks to allow outside forces (i.e. Specifications) to add additional subscribers to each workflow.
- Identical to the
-
[Enrich jobs endpoint with filtering functionality] (yahoo#26)
- Jobs endpoint now supports filters
-
[Enrich the ApiJobStore interface] (yahoo#23)
ApiJobStore
interface now supports filteringJobRows
in the store- Added support for filtering JobRows in
HashJobStore
- Added
JobRowFilter
to hold filter information
-
QueryTimeLookup Functionality Testing
- Added two tests
LookupDimensionFilteringDataServletSpec
andLookupDimensionGroupingDataServletSpec
to test QTL functionality
- Added two tests
-
- Created
LookupDimensionToDimensionSpec
serializer forLookupDimension
- Created corresponding tests for
LookupDimensionToDimensionSpec
inLookupDimensionToDimensionSpecSpec
- Created
-
Allow configurable headers for Druid data requests
- Deprecated
AsyncDruidWebServiceImpl(DruidServiceConfig, ObjectMapper)
andAsyncDruidWebServiceImpl(DruidServiceConfig, AsyncHttpClient, ObjectMapper)
because we added new construstructors that take aSupplier
argument for Druid data request headers.
- Deprecated
-
QueryTimeLookup Functionality Testing
- Deprecated
KeyValueDimensionLoader
, in favor ofTypeAwareDimensionLoader
- Deprecated
- Removed
physicalName
lookup for metrics inTableUtils::getColumnNames
to remove spurious warnings- Metrics are not mapped like dimensions are. Dimensions are aliased per physical table and metrics are aliazed per logical table.
- Logical metric is mapped with one or many physical metrics. Same look up logic for dimension and metrics doesn't make sense.
-
HashPreResponseStore moved to
test
root directory.- The
HashPreResponseStore
is really intended only for testing, and does not have capabilities (i.e. TTL) that are needed for production.
- The
-
The
TestBinderFactory
now uses theTestAsynchronousWorkflowsBuilder
- This allows the asynchronous functional tests to add countdown latches to the workflows where necessary, allowing for thread-safe tests.
-
Removed
JobsApiRequest::handleBroadcastChannelNotification
- That logic does not really belong in the
JobsApiRequest
(which is responsible for modeling a response, not processing it), and has been consolidated into theJobsServlet
.
- That logic does not really belong in the
-
ISSUE-17 Added pagination parameters to
PreResponse
- Updated
JobsServlet::handlePreResponseWithError
to updateResultSet
object with pagination parameters
- Updated
-
Enrich jobs endpoint with filtering functionality
- The default job payload generated by
DefaultJobPayloadBuilder
now has auserId
- The default job payload generated by
-
Removed timing component in JobsApiRequestSpec
- Rather than setting an async timeout, and then sleeping,
JobsApiRequestSpec::handleBroadcastChannelNotification
returns an empty Observable if a timeout occurs before the notification is received now verifies that the Observable returned terminates without sending any messages.
- Rather than setting an async timeout, and then sleeping,
-
Reorganizes asynchronous package structure
- The
jobs
package is renamed toasync
and split into the following subpackages:broadcastchannels
- Everything dealing with broadcast channelsjobs
- Everything related tojobs
, broken into subpackagesjobrows
- Everything related to the content of the job metadatapayloads
- Everything related to building the version of the job metadata to send to the userstores
- Everything related to the databases for job data
preresponses
- Everything related toPreResponses
, broken into subpackagesstores
- Everything related to the the databases for PreResponse data
workflows
- Everything related to the asynchronous workflow
- The
-
QueryTimeLookup Functionality Testing
AbstractBinderFactory
now usesTypeAwareDimensionLoader
instead ofKeyValueStoreDimensionLoader
-
Fix Dimension Serialization Problem with Nested Queries
- Modified
DimensionToDefaultDimensionSpec
serializer to serialize Dimension to apiName if it's not in the inner-most query - Added
Util::hasInnerQuery
helper in serializer package to determine if query is the inner most query or not - Added tests for
DimensionToDefaultDimensionSpec
- Modified
-
Preserve collection order of dimensions, dimension fields and metrics
DataApiRequest::generateDimensions
now returns aLinkedHashSet
DataApiRequest::generateDimensionFields
now returns aLinkedHashMap<Dimension, LinkedHashSet<DimensionField>>
DataApiRequest::withPerDimensionFields
now takes aLinkedHashSet
as its second argument.DataApiRequest::getDimensionFields
now returns aLinkedHashMap<Dimension, LinkedHashSet<DimensionField>>>
Response::Response
now takes aLinkedHashSet
andLinkedHashMap<Dimension, LinkedHashSet<DimensionField>>>
as its second and third arguments.ResponseContext::dimensionToDimensionFieldMap
now takes aLinkedHashMap<Dimension, LinkedHashSet<DimensionField>>>
ResponseContext::getDimensionToDimensionFieldMap
now returns aLinkedHashMap<Dimension, LinkedHashSet<DimensionField>>>
-
TestDruidWebService::jsonResponse
is now aProducer<String>
Producer -
QueryTimeLookup Functionality Testing
- Modified some testing resources (PETS table and corresponding dimensions) to allow better testing on
LookupDimension
s
- Modified some testing resources (PETS table and corresponding dimensions) to allow better testing on
-
Memoize generated values during recursive class-scan class construction
-
Fixing the case when the security context is not complete
- Check for nulls in the
DefaultJobRowBuilder.userIdExtractor
function.
- Check for nulls in the
-
DruidDimensionsLoader
doesn't set the dimension's lastUpdated dateDruidDimensionsLoader
now properly sets thelastUpdated
field after it finished processing the Druid response