Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Wip 301 pure op crdts #405

Merged

Conversation

gabrielgiussi
Copy link
Contributor

Some notes:

  • The implmentation is based on Pure Operation-Based Replicated Data Types
  • All CRDTs, except Counter, were replaced by CvRDTPureOp. Counter is commutative so its design is simpler.
  • AWSet is the only implemented with a custom state (i.e. a Set (1)). The state of the rest of the CRDTs is just a Seq[Operation]
  • The stable state of SimpleCRDT (aka a CRDT which state is a Seq[Operation]) is a Seq[Operation] (because the VectorTime is no longer needed). This save us a VectorTime per operation in memory, but it forces to create a Versioned with VectorTime.Zero per operation each time a new operation is delivered, so is open to discussion.
  • I have to remove the implicit ops in each service and replaced by a fixed ops. If this solution is not acceptable I will try again to make it work with implicit, maybe you can give me a hint on that.

(1) An improvement will be to be able to define the Set implementation.

@gabrielgiussi
Copy link
Contributor Author

I've created the class EventuateMultiNodeSpec to facilitate the tests. If you are not ok with it I could rewrite the tests without it.

@krasserm
Copy link
Contributor

krasserm commented Mar 8, 2018

@gabrielgiussi thanks a lot for the PR. Please bear with me that a review might take some time ... Since they actually replace the existing CRDT implementations we will have to make sure that they all pass eventuate-chaos tests. @volkerstampa can you setup the chaos tests for the new implementation? Alternatively, if you want to have them merged earlier so that users can start to experiment with, I propose the old implementation and the new pure op-based to co-exist for a while. WDYT?

@gabrielgiussi
Copy link
Contributor Author

gabrielgiussi commented Mar 8, 2018

we will have to make sure that they all pass eventuate-chaos tests

I could give it a try if you want. I mean add a chaos-test for the AWSet for example.

I propose the old implementation and the new pure op-based to co-exist for a while

I agree, I will reset the old classes as they were and move the new implementations to a package pureop.

@volkerstampa
Copy link
Contributor

Currently the chaos tests are mainly testing eventuate itself and not so much the CRDTs. The only existing test-client uses the CRDT-counter. If we want to chaos-test each CRDT implementation corresponding test-clients would have to be written first. I can assist when it comes to implementing those but atm I cannot take care of the implementation by myself.

@gabrielgiussi It would be great if you could give it a try.

@krasserm
Copy link
Contributor

krasserm commented Mar 9, 2018

@gabrielgiussi thanks for giving it a try! Regarding a separate implementation, what about adding a new module e.g. eventuate-crdt-pure?

@gabrielgiussi
Copy link
Contributor Author

gabrielgiussi commented Mar 17, 2018

@krasserm I'll add a new module, but I did some minor changes to CRDTActor and CRDTManager to implement stability, so the question is, you want me to left [crdt] module as it was and duplicate the CRDTService, CRDTActor and CRDTManager in [crdt-pure]? Or you are ok with some minor changes in [crdt] and make [crdt-pure] depend on it?

@gabrielgiussi
Copy link
Contributor Author

gabrielgiussi commented Mar 17, 2018

I've run the counter scenario with eventuate 0.10 and I've got

Processed 3956 requests in the meantime
All 3 nodes converged to the counter value: 2228
Expected counter value: 2679; actual 2228

It's ok that the expected counter value differs?

@krasserm
Copy link
Contributor

@gabrielgiussi Duplication is ok for the moment. It allows us to do further experiments without interfering with the existing implementation.

@krasserm
Copy link
Contributor

@gabrielgiussi regarding the CRDT chaos runs: I remember that we had a bug in calculating the expected value but I thought it was fixed. That's a bug in eventuate-chaos itself. @volkerstampa have you encountered these issues as well?

@volkerstampa
Copy link
Contributor

volkerstampa commented Mar 19, 2018

As long as the exit code of the test program is 0, the test is green, i.e. all nodes converged to the same value. The reported expected counter value is actually only correct if all commands sent during the test can be processed successfully, however this is not guaranteed under chaos, so the expectation is actually wrong and thus the report is sort of misleading.

There is actually an open issue for this: RBMHTechnology/eventuate-chaos#12

@krasserm
Copy link
Contributor

@volkerstampa thanks for clarifying

@gabrielgiussi
Copy link
Contributor Author

gabrielgiussi commented Apr 2, 2018

This is the best I could do for now. Here is the travis build that test counter, pure counter and awset (pure orset).

Also, I've already moved the implementation to a separate module, copying the code but changing the package to com.rbmhtechnology.eventuate.crdt.pure (added pure).

Copy link
Contributor

@volkerstampa volkerstampa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but I think I cannot do a solid review atm. As I pointed out before I am not really into the CRDT stuff and it does not look like that I have the time to look into this in the level of detail required for a review.

I pointed out a few formal things regarding changes in the other sources outside of the crdt module.

.travis.yml Outdated
- .travis/test-stream.sh $TRAVIS_SCALA_VERSION
- .travis/test-spark.sh $TRAVIS_SCALA_VERSION
- .travis/test-vertx.sh $TRAVIS_SCALA_VERSION
- find $HOME/.sbt -name "*.lock" | xargs rm
- find $HOME/.ivy2 -name "ivydata-*.properties" | xargs rm
after_success:
- if [ "${TRAVIS_PULL_REQUEST}" = "false" ] && [ "${TRAVIS_BRANCH}" = "master" ]; then .travis/publish-all.sh $TRAVIS_SCALA_VERSION; fi
- sbt ++$TRAVIS_SCALA_VERSION publish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is changed? We only want to publish from master, right? And spark is not yet on scala 2.12, is it? So we still need the publish-script?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the travis.yml to publish my own artifacts but I shouldn't have included in this PR.

build.sbt Outdated
@@ -9,12 +9,12 @@ import ProjectDependencies._

version in ThisBuild := "0.10-SNAPSHOT"

organization in ThisBuild := "com.rbmhtechnology"
organization in ThisBuild := "oss.gabrielgiussi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs to be switched back!?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guess wrong. Martin doesn't told you? 😏
Same as above, I've updated the organization to publish my own artifacts to jfrog so the eventuate-chaos build could download them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guess wrong. Martin doesn't told you?

Looks like I miss something. Can you please enlighten me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Martin, I'm not very good with jokes.

These changes shouldn't have been included in the PR, I just changed the organization to publish my own artifacts to jfrog so they will be available to the eventuate-chaos travis build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed it 😆 I'm currently living in a parallel universe 😉 Thanks for all you effort and good work!

@@ -79,6 +79,9 @@ case class VectorTime(value: Map[String, Long] = Map.empty) {
def <->(that: VectorTime): Boolean =
conc(that)

def ||(that: VectorTime): Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need another symbolic operator for this? I do not even find a reference to it?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included || initially because is how is represented in literature but I didn't replace the reference to <->. If you agree with me that || is better I could update the only reference that exists to <->, if not I'll delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to stick with the old symbol as || looks too much like a boolean or

import com.rbmhtechnology.eventuate.ReplicationProtocol.ReplicationEndpointInfo
import com.typesafe.config.Config

case class EventuateNodeTest(endpointId: String, connections: Set[String], logs: Set[String], val role: RoleName, customConfig: Option[Config] = None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val is redundant.


def endpointNodeWithLogs(name: String, logs: Set[String], connections: Set[String], customConfig: Option[Config] = None) = EventuateNodeTest(name, connections, logs, role(name), customConfig)

def setNodesConfig(configs: Set[(RoleName, Config)]) = configs.map { case (roleName, config) => nodeConfig(roleName)(config) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rather be a foreach instead of a map. And I would propose to add the type annotation : Unit to make that explicit.

if (events.map(_.payload).contains(ValueUpdated(AddOp(randomNr())))) throw IntegrationTestException else super.write(events, partition, clock)
}

// FIXME allow use write-batch-size > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still open? Do you still plan to fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried without success with

class TestEventLog(id: String) extends LeveldbEventLog(id, "log-test") {
    override def write(events: Seq[DurableEvent], partition: Long, clock: EventLogClock): Unit = {
      val containsStop = events.map(_.payload).exists {
        case ValueUpdated(AddOp(s)) => s.toString.startsWith("stop")
        case _                      => false
      }
      if (!containsStop && events.map(_.payload).contains(ValueUpdated(AddOp(randomNr())))) throw IntegrationTestException else super.write(events, partition, clock)
    }
  }

I can't see what I'm missing, so I could let the FIXME as a reminder or delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me both is fine.

.travis.yml Outdated
@@ -6,25 +6,19 @@ cache:
directories:
- $HOME/.ivy2/cache
- $HOME/.sbt/boot/
before_install:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that removed?

@volkerstampa
Copy link
Contributor

@gabrielgiussi The changes to eventuate-chaos in general look ok to me. Not sure if you want to open a PR for this as well?!? But probably only makes sense if this PR is merged ...

Thanks again for all your effort.

@gabrielgiussi
Copy link
Contributor Author

If this PR is eventually merged I will open a PR for eventuate-chaos.

@volkerstampa
Copy link
Contributor

@gabrielgiussi Sorry for the really looong delay. are you still interested in merging this PR? It seems like neither Martin nor me will have time for a thorough review. As the new code is in a separate module the proposal is that the PR can be merged, once the small issues outside of the new module are fixed and you could consider yourself the owner of the new module. WDYT?

Copy link
Contributor

@volkerstampa volkerstampa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is a compile failure due to the deleted ||: https://travis-ci.org/RBMHTechnology/eventuate/jobs/409323287#L2443

@gabrielgiussi gabrielgiussi force-pushed the wip-301-pure-op-crdts branch from 2fe88fd to 51eea98 Compare July 28, 2018 20:25
@gabrielgiussi
Copy link
Contributor Author

@volkerstampa thanks for taking time to review this PR. If you think it helps to the project I will be happy to merge it and be the owner of the module.

@volkerstampa
Copy link
Contributor

@gabrielgiussi Thanks for the contribution and special thanks for your patience. Please squash all commits and I will merge the PR.

@gabrielgiussi gabrielgiussi force-pushed the wip-301-pure-op-crdts branch from 51eea98 to eb75ec1 Compare July 28, 2018 21:24
@volkerstampa volkerstampa added this to the 0.11 milestone Jul 29, 2018
@volkerstampa volkerstampa merged commit b6a4a14 into RBMHTechnology:master Jul 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants