You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, wanted to know if these filter conditions will also push down to the DynamoDB for server side filtering or will they be applied after full scan operation data being loaded into "dynamoDf"?
The text was updated successfully, but these errors were encountered:
Which connector are you using?
Predicate Pushdown is only available for Hive table in EMR DDB Hive Connector package. You can find the supported data types and operators at HERE.
@mimaomao feel free to comment if I miss anything.
When spark 'filter' is being applied in scan query, whether query which is being sent to dynamoDB has 'filterExpressions' associated with it for server side filtering Or it load entire data in scan operation in a dataFrame first and then apply the filter over that dataFrame.
We are using apache spark to connect to Dynamodb using emr-dynamodb-connector.
Now, below are my code statements in pyspark:--
dynamoDf = spark.read.option('region', 'REGION')
.option("tableName", "TABLE_NAME")
.format("dynamodb")
.load()
dynamoDfFilter = dynamoDf.filter((F.col("colFilter").startswith('ABC')) | (F.col("colFilter").startswith('XYZ')))
print(dynamoDfFilter.count())
So, wanted to know if these filter conditions will also push down to the DynamoDB for server side filtering or will they be applied after full scan operation data being loaded into "dynamoDf"?
The text was updated successfully, but these errors were encountered: