Patch jve #210

andrewlawhh · 2021-04-11T23:16:31Z

While running tests after merging code in from main, I noticed that the expected DAG wasn't being properly computed anymore.

This PR fixed some bugs in the DAG construction where the children were not being set properly after pruning the operator tree.

Also added comments.

* add date_add, interval sql still running into issues * Add Interval SQL support * uncomment out the other tests * resolve comments * change interval equality Co-authored-by: Eric Feng <[email protected]>

…ncryptedblocks wip

…n generated

…ilds

…into comp-integrity

…ses all_outputs_mac as Mac table

…nto comp-integrity

…LastPrimary(?)

Refactor construction of executed DAG.

This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved. First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value. Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers.

* logic decoupling in TPCH.scala for easier benchmarking * added TPCHBenchmark.scala * Benchmark.scala rewrite * done adding all support TPC-H query benchmarks * changed commandline arguments that benchmark takes * TPCHBenchmark takes in parameters * fixed issue with spark conf * size error handling, --help flag * add Utils.force, break cluster mode * comment out logistic regression benchmark * ensureCached right before temp view created/replaced * upgrade to 3.0.1 * upgrade to 3.0.1 * 10 scale factor * persistData * almost done refactor * more cleanup * compiles * 9 passes * cleanup * collect instead of force, sf_none * remove sf_none * defaultParallelism * no removing trailing/leading whitespace * add sf_med * hdfs works in local case * cleanup, added new CLI argument * added newly supported tpch queries * function for running all supported tests

…OperatorTest

This PR adds float normalization expressions [implemented in Spark](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala#L170). TPC-H query 2 also passes.

This PR is the first of two parts towards making TPC-H 16 work: the other will be implementing `is_distinct` for aggregate operations. `BroadcastNestedLoopJoin` is Spark's "catch all" for non-equi joins. It works by first picking a side to broadcast, then iterating through every possible row combination and checking the non-equi condition against the pair.

…oject#164) * Add in TPC-H 21 * Add condition processing in enclave code * Code clean up * Enable query 18 * WIP * Local tests pass * Apply suggestions from code review Co-authored-by: octaviansima <[email protected]> * WIP * Address comments * q21.sql Co-authored-by: octaviansima <[email protected]>

* matching in strategies.scala set up class thing cleanup added test cases for non-equi left anti join rename to serializeEquiJoinExpression added isEncrypted condition set up keys JoinExpr now has condition rename serialization does not throw compile error for BNLJ split up added condition in ExpressionEvaluation.h zipPartitions cpp put in place typo added func to header two loops in place update tests condition fixed scala loop interchange rows added tags ensure cached == match working comparison decoupling in ExpressionEvalulation save compiles and condition works is printing fix swap outer/inner o_i_match show() has the same result tests pass test cleanup added test cases for different condition BuildLeft works optional keys in scala started C++ passes the operator tests comments, cleanup attemping to do it the ~right~ way comments to distinguish between primary/secondary, operator tests pass cleanup comments, about to begin implementation for distinct agg ops is_distinct added test case serializing with isDistinct is_distinct in ExpressionEvaluation.h removed unused code from join implementation remove RowWriter/Reader in condition evaluation (join) easier test serialization done correct checking in Scala set is set up spaghetti but it finally works function for clearing values condition_eval isntead of condition goto comment remove explain from test, need to fix distinct aggregation for >1 partitions started impl of multiple partitions fix added rangepartitionexec that runs partitioning cleanup serialization properly comments, generalization for > 1 distinct function comments about to refactor into logical.Aggregation the new case has distinct in result expressions need to match on distinct removed new case (doesn't make difference?) works Upgrade to OE 0.12 (mc2-project#153) Update README.md Support for scalar subquery (mc2-project#157) This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved. First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value. Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers. match remove RangePartitionExec inefficient implementation refined Add TPC-H Benchmarks (mc2-project#139) * logic decoupling in TPCH.scala for easier benchmarking * added TPCHBenchmark.scala * Benchmark.scala rewrite * done adding all support TPC-H query benchmarks * changed commandline arguments that benchmark takes * TPCHBenchmark takes in parameters * fixed issue with spark conf * size error handling, --help flag * add Utils.force, break cluster mode * comment out logistic regression benchmark * ensureCached right before temp view created/replaced * upgrade to 3.0.1 * upgrade to 3.0.1 * 10 scale factor * persistData * almost done refactor * more cleanup * compiles * 9 passes * cleanup * collect instead of force, sf_none * remove sf_none * defaultParallelism * no removing trailing/leading whitespace * add sf_med * hdfs works in local case * cleanup, added new CLI argument * added newly supported tpch queries * function for running all supported tests complete instead of partial -> final removed traces of join cleanup * added test case for one distinct one non, reverted comment * removed C++ level implementation of is_distinct * PartialMerge in operators.scala * stage 1: grouping with distinct expressions * stage 2: WIP * saving, sorting by group expressions ++ name distinct expressions worked * stage 1 & 2 printing the expected results * removed extraneous call to sorted, mc2-project#3 in place but not working * stage 3 has the final, correct result: refactoring the Aggregate code to not cast aggregate expressions to Partial, PartialMerge, etc will be needed * refactor done, C++ still printing the correct values * need to formalize None case in EncryptedAggregateExec.output, but stage 4 passes * distinct and indistinct passes (git add -u) * general cleanup, None case looks nicer * throw error with >1 distinct, add test case for global distinct * no need for global aggregation case * single partition passes all aggregate tests, multiple partition doesn't * works with global sort first * works with non-global sort first * cleanup * cleanup tests * removed iostream, other nit * added test case for 13 * None case in isPartial match done properly * added test cases for sumDistinct * case-specific namedDistinctExpressions working * distinct sum is done * removed comments * got rid of mode argument * tests include null values * partition followed by local sort instead of first global sort

…taframe field instead of string parsing

…operly set children

Andrew Law and others added 30 commits October 1, 2020 18:14

Support for multiple branched CaseWhen

f17d8a8

Interval (mc2-project#116)

366e92c

* add date_add, interval sql still running into issues * Add Interval SQL support * uncomment out the other tests * resolve comments * change interval equality Co-authored-by: Eric Feng <[email protected]>

Remove partition ID argument from enclaves

c7fcd98

Fix comments

93dbf5e

updates

f357ab2

Merge serialization of ecall string as int

bb4018a

Modifications to integrate crumb, log-mac, and all-outputs_mac, wip

56ace17

Store log mac after each output buffer, add all-outputs-mac to each e…

21bbbfb

…ncryptedblocks wip

Add all_outputs_mac to all EncryptedBlocks once all log_macs have bee…

549566f

…n generated

Almost builds

55ee664

cpp builds

057caec

Use ubyte for all_outputs_mac

db54c44

use Mac for all_outputs_mac

e77f1eb

Hopefully this works for flatbuffers all_outputs_mac mutation, cpp bu…

736b8f6

…ilds

merge

cbb2373

Merge branch 'comp-integrity' of https://github.com/mc2-project/opaque …

0351b5d

…into comp-integrity

Scala builds now too, running into error with union

3002bd3

Stuff builds, error with all outputs mac serialization. this commit u…

dc54741

…ses all_outputs_mac as Mac table

Fixed bug, basic encryption / show works

5be9b7c

All single partition tests pass, multiple partiton passes until tpch-9

86fab02

All tests pass except tpch-9 and skew join

8b1a1d1

comment tpch back in

18f45d6

Merge branch 'crumb-path' of http://github.com/chester-leung/opaque i…

123fa1f

…nto comp-integrity

Check same number of ecalls per partition - exception for scanCollect…

bfc06ba

…LastPrimary(?)

First attempt at constructing executed DAG

c818a41

Fix typos

39a4945

Rework graph

c970965

Add log macs to graph nodes

43ccd2e

Construct expected DAG and refactor JobNode.

69fc49e

Refactor construction of executed DAG.

Implement 'paths to sink' for a DAG

35691ff

Andrew Law and others added 30 commits February 9, 2021 14:55

Merge join update

375de7f

Integrate new join

8682f22

Add expected operator for sortexec

c21cb7b

Merge comp-integrity with join update

c1adf85

Merge comp-integrity with join update

9391435

Merge join integration with expected dag update

2b37dab

Remove some print statements

8a93c6c

Migrate from Travis CI to Github Actions (mc2-project#156)

c190aae

Upgrade to OE 0.12 (mc2-project#153)

41ea7b9

Update README.md

29da474

Construct expected DAG from dataframe physical plan

b350992

Refactor collect and add integrity checking helper function to Opaque…

20f4749

…OperatorTest

Float expressions (mc2-project#160)

3c28b5f

This PR adds float normalization expressions [implemented in Spark](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala#L170). TPC-H query 2 also passes.

Remove addExpectedOperator from JobVerificationEngine, add comments

e9b075b

Implement expected DAG construction by doing graph manipulation on da…

dabc178

…taframe field instead of string parsing

Merge

38c9da5

Fix merge errors in the test cases

98bcfdb

Fix merge errors

592ec17

Merge BNLJ into integrity branch

e3e140d

Merge join logic migration into integrity branch

67fd713

Merge join logic migration into integrity branch

29db9e6

Merge distinct aggregation support into integrity branch

886eda8

Fix merge errors

1fb4a5a

fix treeToList to skip visited vertices and operatorDAGFromPlan to pr…

8ba5f75

…operly set children

Add descriptive comments to each function and class

898a1b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch jve #210

Patch jve #210

andrewlawhh commented Apr 11, 2021

Patch jve #210

Are you sure you want to change the base?

Patch jve #210

Conversation

andrewlawhh commented Apr 11, 2021