Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch jve #210

Open
wants to merge 89 commits into
base: comp-integrity
Choose a base branch
from
Open

Conversation

andrewlawhh
Copy link
Collaborator

While running tests after merging code in from main, I noticed that the expected DAG wasn't being properly computed anymore.

This PR fixed some bugs in the DAG construction where the children were not being set properly after pruning the operator tree.

Also added comments.

Andrew Law and others added 30 commits October 1, 2020 18:14
* add date_add, interval sql still running into issues

* Add Interval SQL support

* uncomment out the other tests

* resolve comments

* change interval equality

Co-authored-by: Eric Feng <[email protected]>
Refactor construction of executed DAG.
Andrew Law and others added 30 commits February 9, 2021 14:55
This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved.

First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value.

Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers.
* logic decoupling in TPCH.scala for easier benchmarking

* added TPCHBenchmark.scala

* Benchmark.scala rewrite

* done adding all support TPC-H query benchmarks

* changed commandline arguments that benchmark takes

* TPCHBenchmark takes in parameters

* fixed issue with spark conf

* size error handling, --help flag

* add Utils.force, break cluster mode

* comment out logistic regression benchmark

* ensureCached right before temp view created/replaced

* upgrade to 3.0.1

* upgrade to 3.0.1

* 10 scale factor

* persistData

* almost done refactor

* more cleanup

* compiles

* 9 passes

* cleanup

* collect instead of force, sf_none

* remove sf_none

* defaultParallelism

* no removing trailing/leading whitespace

* add sf_med

* hdfs works in local case

* cleanup, added new CLI argument

* added newly supported tpch queries

* function for running all supported tests
This PR is the first of two parts towards making TPC-H 16 work: the other will be implementing `is_distinct` for aggregate operations.

`BroadcastNestedLoopJoin` is Spark's "catch all" for non-equi joins. It works by first picking a side to broadcast, then iterating through every possible row combination and checking the non-equi condition against the pair.
…oject#164)

* Add in TPC-H 21

* Add condition processing in enclave code

* Code clean up

* Enable query 18

* WIP

* Local tests pass

* Apply suggestions from code review

Co-authored-by: octaviansima <[email protected]>

* WIP

* Address comments

* q21.sql

Co-authored-by: octaviansima <[email protected]>
* matching in strategies.scala

set up class thing

cleanup

added test cases for non-equi left anti join

rename to serializeEquiJoinExpression

added isEncrypted condition

set up keys

JoinExpr now has condition

rename

serialization does not throw compile error for BNLJ

split up

added condition in ExpressionEvaluation.h

zipPartitions

cpp put in place

typo

added func to header

two loops in place

update tests

condition

fixed scala loop

interchange rows

added tags

ensure cached

== match working

comparison decoupling in ExpressionEvalulation

save

compiles and condition works

is printing

fix swap outer/inner

o_i_match

show() has the same result

tests pass

test cleanup

added test cases for different condition

BuildLeft works

optional keys in scala

started C++

passes the operator tests

comments, cleanup

attemping to do it the ~right~ way

comments to distinguish between primary/secondary, operator tests pass

cleanup comments, about to begin implementation for distinct agg ops

is_distinct

added test case

serializing with isDistinct

is_distinct in ExpressionEvaluation.h

removed unused code from join implementation

remove RowWriter/Reader in condition evaluation (join)

easier test

serialization done

correct checking in Scala

set is set up

spaghetti but it finally works

function for clearing values

condition_eval isntead of condition

goto

comment

remove explain from test, need to fix distinct aggregation for >1 partitions

started impl of multiple partitions fix

added rangepartitionexec that runs

partitioning cleanup

serialization properly

comments, generalization for > 1 distinct function

comments

about to refactor into logical.Aggregation

the new case has distinct in result expressions

need to match on distinct

removed new case (doesn't make difference?)

works

Upgrade to OE 0.12 (mc2-project#153)

Update README.md

Support for scalar subquery (mc2-project#157)

This PR implements the scalar subquery expression, which is triggered whenever a subquery returns a scalar value. There were two main problems that needed to be solved.

First, support for matching the scalar subquery expression is necessary. Spark implements this by wrapping a SparkPlan within the expression and calls executeCollect. Then it constructs a literal with that value. However, this is problematic for us because that value should not be decrypted by the driver and serialized into an expression, since it's an intermediate value.

Therefore, the second issue to be addressed here is supporting an encrypted literal. This is implemented in this PR by serializing an encrypted ciphertext into a base64 encoded string, and wrapping a Decrypt expression on top of it. This expression is then evaluated in the enclave and returns a literal. Note that, in order to test our implementation, we also implement a Decrypt expression in Scala. However, this should never be evaluated on the driver side and serialized into a plaintext literal. This is because Decrypt is designated as a Nondeterministic expression, and therefore will always evaluate on the workers.

match

remove RangePartitionExec

inefficient implementation refined

Add TPC-H Benchmarks (mc2-project#139)

* logic decoupling in TPCH.scala for easier benchmarking

* added TPCHBenchmark.scala

* Benchmark.scala rewrite

* done adding all support TPC-H query benchmarks

* changed commandline arguments that benchmark takes

* TPCHBenchmark takes in parameters

* fixed issue with spark conf

* size error handling, --help flag

* add Utils.force, break cluster mode

* comment out logistic regression benchmark

* ensureCached right before temp view created/replaced

* upgrade to 3.0.1

* upgrade to 3.0.1

* 10 scale factor

* persistData

* almost done refactor

* more cleanup

* compiles

* 9 passes

* cleanup

* collect instead of force, sf_none

* remove sf_none

* defaultParallelism

* no removing trailing/leading whitespace

* add sf_med

* hdfs works in local case

* cleanup, added new CLI argument

* added newly supported tpch queries

* function for running all supported tests

complete instead of partial -> final

removed traces of join

cleanup

* added test case for one distinct one non, reverted comment

* removed C++ level implementation of is_distinct

* PartialMerge in operators.scala

* stage 1: grouping with distinct expressions

* stage 2: WIP

* saving, sorting by group expressions ++ name distinct expressions worked

* stage 1 & 2 printing the expected results

* removed extraneous call to sorted, mc2-project#3 in place but not working

* stage 3 has the final, correct result: refactoring the Aggregate code to not cast aggregate expressions to Partial, PartialMerge, etc will be needed

* refactor done, C++ still printing the correct values

* need to formalize None case in EncryptedAggregateExec.output, but stage 4 passes

* distinct and indistinct passes (git add -u)

* general cleanup, None case looks nicer

* throw error with >1 distinct, add test case for global distinct

* no need for global aggregation case

* single partition passes all aggregate tests, multiple partition doesn't

* works with global sort first

* works with non-global sort first

* cleanup

* cleanup tests

* removed iostream, other nit

* added test case for 13

* None case in isPartial match done properly

* added test cases for sumDistinct

* case-specific namedDistinctExpressions working

* distinct sum is done

* removed comments

* got rid of mode argument

* tests include null values

* partition followed by local sort instead of first global sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants