Skip to content

Commit

Permalink
Reduce storage required for indexing - stop writing sp_name, res_type…
Browse files Browse the repository at this point in the history
…, and sp_updated to hfj_spidx_* tables (hapifhir#5941)

* Reduce storage required for indexing - implementation
  • Loading branch information
volodymyr-korzh authored Jun 20, 2024
1 parent 5799c6b commit 0397b9d
Show file tree
Hide file tree
Showing 48 changed files with 1,837 additions and 266 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
type: perf
issue: 5937
title: "A new configuration option, `StorageSettings#setIndexStorageOptimized(boolean)` has been added. If enabled,
the server will not write data to the `SP_NAME`, `RES_TYPE`, `SP_UPDATED` columns for all `HFJ_SPIDX_xxx` tables.
This can help reduce the overall storage size on servers where HFJ_SPIDX tables are expected to have a large
amount of data."
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Possible migration errors on SQL Server (MSSQL)

* This affects only clients running SQL Server (MSSQL) who have custom indexes on `HFJ_SPIDX` tables, which
include `sp_name` or `res_type` columns.
* For those clients, migration of `sp_name` and `res_type` columns to nullable on `HFJ_SPIDX` tables may be completed with errors, as changing a column to nullable when a column is a
part of an index can lead to errors on SQL Server (MSSQL).
* If client wants to use existing indexes and settings, these errors can be ignored. However, if client wants to enable both [Index Storage Optimized](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/entity/StorageSettings.html#setIndexStorageOptimized(boolean))
and [Index Missing Fields](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/entity/StorageSettings.html#getIndexMissingFields()) settings, manual steps are required to change `sp_name` and `res_type` nullability.

To update columns to nullable in such a scenario, execute steps below:

1. Indexes that include `sp_name` or `res_type` columns should be dropped:
```sql
DROP INDEX IDX_SP_TOKEN_REST_TYPE_SP_NAME ON HFJ_SPIDX_TOKEN;
```
2. The nullability of `sp_name` and `res_type` columns should be updated:

```sql
ALTER TABLE HFJ_SPIDX_TOKEN ALTER COLUMN RES_TYPE varchar(100) NULL;
ALTER TABLE HFJ_SPIDX_TOKEN ALTER COLUMN SP_NAME varchar(100) NULL;
```
3. Additionally, the following index may need to be added to improve the search performance:
```sql
CREATE INDEX IDX_SP_TOKEN_MISSING_OPTIMIZED ON HFJ_SPIDX_TOKEN (HASH_IDENTITY, SP_MISSING, RES_ID, PARTITION_ID);
```
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,19 @@ This setting controls whether non-resource (ex: Patient is a resource, MdmLink i
Clients may want to disable this setting for performance reasons as it populates a new set of database tables when enabled.

Setting this property explicitly to false disables the feature: [Non Resource DB History](/apidocs/hapi-fhir-storage/ca/uhn/fhir/jpa/api/config/JpaStorageSettings.html#isNonResourceDbHistoryEnabled())

# Enabling Index Storage Optimization

If enabled, the server will not write data to the `SP_NAME`, `RES_TYPE`, `SP_UPDATED` columns for all `HFJ_SPIDX_xxx` tables.

This setting may be enabled on servers where `HFJ_SPIDX_xxx` tables are expected to have a large amount of data (millions of rows) in order to reduce overall storage size.

Setting this property explicitly to true enables the feature: [Index Storage Optimized](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/entity/StorageSettings.html#setIndexStorageOptimized(boolean))

## Limitations

* This setting only applies to newly inserted and updated rows in `HFJ_SPIDX_xxx` tables. All existing rows will still have values in `SP_NAME`, `RES_TYPE` and `SP_UPDATED` columns. Executing `$reindex` operation will apply storage optimization to existing data.

* If this setting is enabled along with [Index Missing Fields](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/entity/StorageSettings.html#getIndexMissingFields()) setting, the following index may need to be added into the `HFJ_SPIDX_xxx` tables to improve the search performance: `(HASH_IDENTITY, SP_MISSING, RES_ID, PARTITION_ID)`.

* This setting should not be enabled in combination with [Include Partition in Search Hashes](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/config/PartitionSettings.html#setIncludePartitionInSearchHashes(boolean)) flag, as in this case, Partition could not be included in Search Hashes.
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ The following columns are common to **all HFJ_SPIDX_xxx tables**.
<td>SP_NAME</td>
<td></td>
<td>String</td>
<td></td>
<td>Nullable</td>
<td>
This is the name of the search parameter being indexed.
</td>
Expand All @@ -511,7 +511,7 @@ The following columns are common to **all HFJ_SPIDX_xxx tables**.
<td>RES_TYPE</td>
<td></td>
<td>String</td>
<td></td>
<td>Nullable</td>
<td>
This is the name of the resource being indexed.
</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ The [PartitionSettings](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir

The following settings can be enabled:

* **Include Partition in Search Hashes** ([JavaDoc](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/config/PartitionSettings.html#setIncludePartitionInSearchHashes(boolean))): If this feature is enabled, partition IDs will be factored into [Search Hashes](/hapi-fhir/docs/server_jpa/schema.html#search-hashes). When this flag is not set (as is the default), when a search requests a specific partition, an additional SQL WHERE predicate is added to the query to explicitly request the given partition ID. When this flag is set, this additional WHERE predicate is not necessary since the partition is factored into the hash value being searched on. Setting this flag avoids the need to manually adjust indexes against the HFJ_SPIDX tables. Note that this flag should **not be used in environments where partitioning is being used for security purposes**, since it is possible for a user to reverse engineer false hash collisions.
* **Include Partition in Search Hashes** ([JavaDoc](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/config/PartitionSettings.html#setIncludePartitionInSearchHashes(boolean))): If this feature is enabled, partition IDs will be factored into [Search Hashes](/hapi-fhir/docs/server_jpa/schema.html#search-hashes). When this flag is not set (as is the default), when a search requests a specific partition, an additional SQL WHERE predicate is added to the query to explicitly request the given partition ID. When this flag is set, this additional WHERE predicate is not necessary since the partition is factored into the hash value being searched on. Setting this flag avoids the need to manually adjust indexes against the HFJ_SPIDX tables. Note that this flag should **not be used in environments where partitioning is being used for security purposes**, since it is possible for a user to reverse engineer false hash collisions. This setting should not be enabled in combination with [Index Storage Optimized](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/entity/StorageSettings.html#isIndexStorageOptimized()) flag, as in this case Partition could not be included in Search Hashes.

* **Cross-Partition Reference Mode**: ([JavaDoc](/hapi-fhir/apidocs/hapi-fhir-jpaserver-model/ca/uhn/fhir/jpa/model/config/PartitionSettings.html#setAllowReferencesAcrossPartitions(ca.uhn.fhir.jpa.model.config.PartitionSettings.CrossPartitionReferenceMode))): This setting controls whether resources in one partition should be allowed to create references to resources in other partitions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@
*/
package ca.uhn.fhir.jpa.config;

import ca.uhn.fhir.context.ConfigurationException;
import ca.uhn.fhir.context.FhirContext;
import ca.uhn.fhir.i18n.Msg;
import ca.uhn.fhir.interceptor.api.IInterceptorBroadcaster;
import ca.uhn.fhir.jpa.api.config.JpaStorageSettings;
import ca.uhn.fhir.jpa.api.dao.DaoRegistry;
Expand Down Expand Up @@ -47,6 +49,7 @@
import ca.uhn.fhir.jpa.search.cache.ISearchResultCacheSvc;
import ca.uhn.fhir.rest.server.IPagingProvider;
import ca.uhn.fhir.rest.server.util.ISearchParamRegistry;
import jakarta.annotation.PostConstruct;
import org.hl7.fhir.instance.model.api.IBaseResource;
import org.springframework.beans.factory.BeanFactory;
import org.springframework.beans.factory.annotation.Autowired;
Expand Down Expand Up @@ -206,4 +209,15 @@ public SearchContinuationTask createSearchContinuationTask(SearchTaskParameters
exceptionService() // singleton
);
}

@PostConstruct
public void validateConfiguration() {
if (myStorageSettings.isIndexStorageOptimized()
&& myPartitionSettings.isPartitioningEnabled()
&& myPartitionSettings.isIncludePartitionInSearchHashes()) {
throw new ConfigurationException(Msg.code(2525) + "Incorrect configuration. "
+ "StorageSettings#isIndexStorageOptimized and PartitionSettings.isIncludePartitionInSearchHashes "
+ "cannot be enabled at the same time.");
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@
package ca.uhn.fhir.jpa.dao.index;

import ca.uhn.fhir.jpa.model.entity.BaseResourceIndex;
import ca.uhn.fhir.jpa.model.entity.BaseResourceIndexedSearchParam;
import ca.uhn.fhir.jpa.model.entity.ResourceTable;
import ca.uhn.fhir.jpa.model.entity.StorageSettings;
import ca.uhn.fhir.jpa.searchparam.extractor.ResourceIndexedSearchParams;
import ca.uhn.fhir.jpa.util.AddRemoveCount;
import com.google.common.annotations.VisibleForTesting;
Expand All @@ -29,10 +31,12 @@
import jakarta.persistence.PersistenceContextType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Date;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
Expand All @@ -42,6 +46,9 @@
public class DaoSearchParamSynchronizer {
private static final Logger ourLog = LoggerFactory.getLogger(DaoSearchParamSynchronizer.class);

@Autowired
private StorageSettings myStorageSettings;

@PersistenceContext(type = PersistenceContextType.TRANSACTION)
protected EntityManager myEntityManager;

Expand All @@ -68,6 +75,11 @@ public AddRemoveCount synchronizeSearchParamsToDatabase(
return retVal;
}

@VisibleForTesting
public void setStorageSettings(StorageSettings theStorageSettings) {
this.myStorageSettings = theStorageSettings;
}

@VisibleForTesting
public void setEntityManager(EntityManager theEntityManager) {
myEntityManager = theEntityManager;
Expand Down Expand Up @@ -115,6 +127,7 @@ private <T extends BaseResourceIndex> void synchronize(
List<T> paramsToRemove = subtract(theExistingParams, newParams);
List<T> paramsToAdd = subtract(newParams, theExistingParams);
tryToReuseIndexEntities(paramsToRemove, paramsToAdd);
updateExistingParamsIfRequired(theExistingParams, paramsToAdd, newParams, paramsToRemove);

for (T next : paramsToRemove) {
if (!myEntityManager.contains(next)) {
Expand All @@ -134,6 +147,62 @@ private <T extends BaseResourceIndex> void synchronize(
theAddRemoveCount.addToRemoveCount(paramsToRemove.size());
}

/**
* <p>
* This method performs an update of Search Parameter's fields in the case of
* <code>$reindex</code> or update operation by:
* 1. Marking existing entities for updating to apply index storage optimization,
* if it is enabled (disabled by default).
* 2. Recovering <code>SP_NAME</code>, <code>RES_TYPE</code> values of Search Parameter's fields
* for existing entities in case if index storage optimization is disabled (but was enabled previously).
* </p>
* For details, see: {@link StorageSettings#isIndexStorageOptimized()}
*/
private <T extends BaseResourceIndex> void updateExistingParamsIfRequired(
Collection<T> theExistingParams,
List<T> theParamsToAdd,
Collection<T> theNewParams,
List<T> theParamsToRemove) {

theExistingParams.stream()
.filter(BaseResourceIndexedSearchParam.class::isInstance)
.map(BaseResourceIndexedSearchParam.class::cast)
.filter(this::isSearchParameterUpdateRequired)
.filter(sp -> !theParamsToAdd.contains(sp))
.filter(sp -> !theParamsToRemove.contains(sp))
.forEach(sp -> {
// force hibernate to update Search Parameter entity by resetting SP_UPDATED value
sp.setUpdated(new Date());
recoverExistingSearchParameterIfRequired(sp, theNewParams);
theParamsToAdd.add((T) sp);
});
}

/**
* Search parameters should be updated after changing IndexStorageOptimized setting.
* If IndexStorageOptimized is disabled (and was enabled previously), this method copies paramName
* and Resource Type from extracted to existing search parameter.
*/
private <T extends BaseResourceIndex> void recoverExistingSearchParameterIfRequired(
BaseResourceIndexedSearchParam theSearchParamToRecover, Collection<T> theNewParams) {
if (!myStorageSettings.isIndexStorageOptimized()) {
theNewParams.stream()
.filter(BaseResourceIndexedSearchParam.class::isInstance)
.map(BaseResourceIndexedSearchParam.class::cast)
.filter(paramToAdd -> paramToAdd.equals(theSearchParamToRecover))
.findFirst()
.ifPresent(newParam -> {
theSearchParamToRecover.restoreParamName(newParam.getParamName());
theSearchParamToRecover.setResourceType(newParam.getResourceType());
});
}
}

private boolean isSearchParameterUpdateRequired(BaseResourceIndexedSearchParam theSearchParameter) {
return (myStorageSettings.isIndexStorageOptimized() && !theSearchParameter.isIndexStorageOptimized())
|| (!myStorageSettings.isIndexStorageOptimized() && theSearchParameter.isIndexStorageOptimized());
}

/**
* The logic here is that often times when we update a resource we are dropping
* one index row and adding another. This method tries to reuse rows that would otherwise
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,104 @@ protected void init740() {
.unique(false)
.withColumns("RES_UPDATED", "RES_ID")
.heavyweightSkipByDefault();

// Allow null values in SP_NAME, RES_TYPE columns for all HFJ_SPIDX_* tables. These are marked as failure
// allowed, since SQL Server won't let us change nullability on columns with indexes pointing to them.
{
Builder.BuilderWithTableName spidxCoords = version.onTable("HFJ_SPIDX_COORDS");
spidxCoords
.modifyColumn("20240617.1", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxCoords
.modifyColumn("20240617.2", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxDate = version.onTable("HFJ_SPIDX_DATE");
spidxDate
.modifyColumn("20240617.3", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxDate
.modifyColumn("20240617.4", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxNumber = version.onTable("HFJ_SPIDX_NUMBER");
spidxNumber
.modifyColumn("20240617.5", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxNumber
.modifyColumn("20240617.6", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxQuantity = version.onTable("HFJ_SPIDX_QUANTITY");
spidxQuantity
.modifyColumn("20240617.7", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxQuantity
.modifyColumn("20240617.8", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxQuantityNorm = version.onTable("HFJ_SPIDX_QUANTITY_NRML");
spidxQuantityNorm
.modifyColumn("20240617.9", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxQuantityNorm
.modifyColumn("20240617.10", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxString = version.onTable("HFJ_SPIDX_STRING");
spidxString
.modifyColumn("20240617.11", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxString
.modifyColumn("20240617.12", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxToken = version.onTable("HFJ_SPIDX_TOKEN");
spidxToken
.modifyColumn("20240617.13", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxToken
.modifyColumn("20240617.14", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();

Builder.BuilderWithTableName spidxUri = version.onTable("HFJ_SPIDX_URI");
spidxUri.modifyColumn("20240617.15", "SP_NAME")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
spidxUri.modifyColumn("20240617.16", "RES_TYPE")
.nullable()
.withType(ColumnTypeEnum.STRING, 100)
.failureAllowed();
}
}

protected void init720() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,19 @@ public Condition createHashIdentityPredicate(String theResourceType, String theP

public Condition createPredicateParamMissingForNonReference(
String theResourceName, String theParamName, Boolean theMissing, RequestPartitionId theRequestPartitionId) {
ComboCondition condition = ComboCondition.and(
BinaryCondition.equalTo(getResourceTypeColumn(), generatePlaceholder(theResourceName)),
BinaryCondition.equalTo(getColumnParamName(), generatePlaceholder(theParamName)),
BinaryCondition.equalTo(getMissingColumn(), generatePlaceholder(theMissing)));

List<Condition> conditions = new ArrayList<>();
if (getStorageSettings().isIndexStorageOptimized()) {
Long hashIdentity = BaseResourceIndexedSearchParam.calculateHashIdentity(
getPartitionSettings(), getRequestPartitionId(), theResourceName, theParamName);
conditions.add(BinaryCondition.equalTo(getColumnHashIdentity(), generatePlaceholder(hashIdentity)));
} else {
conditions.add(BinaryCondition.equalTo(getResourceTypeColumn(), generatePlaceholder(theResourceName)));
conditions.add(BinaryCondition.equalTo(getColumnParamName(), generatePlaceholder(theParamName)));
}
conditions.add(BinaryCondition.equalTo(getMissingColumn(), generatePlaceholder(theMissing)));

ComboCondition condition = ComboCondition.and(conditions.toArray());
return combineWithRequestPartitionIdPredicate(theRequestPartitionId, condition);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import ca.uhn.fhir.jpa.model.entity.BaseResourceIndex;
import ca.uhn.fhir.jpa.model.entity.ResourceIndexedSearchParamNumber;
import ca.uhn.fhir.jpa.model.entity.ResourceTable;
import ca.uhn.fhir.jpa.model.entity.StorageSettings;
import ca.uhn.fhir.jpa.searchparam.extractor.ResourceIndexedSearchParams;
import ca.uhn.fhir.jpa.util.AddRemoveCount;
import jakarta.persistence.EntityManager;
Expand Down Expand Up @@ -61,6 +62,7 @@ void setUp() {
THE_SEARCH_PARAM_NUMBER.setResource(resourceTable);

subject.setEntityManager(entityManager);
subject.setStorageSettings(new StorageSettings());
}

@Test
Expand Down
Loading

0 comments on commit 0397b9d

Please sign in to comment.