docs: type derivation and function docs overhaul #352

jvanstraten · 2022-10-03T17:33:47Z

I need to just bite the bullet already and submit something for this.

This is the culmination of multiple man-months of work trying to reconcile the con fli cti ng and outd ated specifications and requirements surrounding type derivations, function prototype matching, etc.

Breaking change?

There are two "pseudo-breaking changes" that I'm aware of ("pseudo" because the specification is currently self-contradictory), as well as one pretty pedantic breaking change:

The ? for nullable compound types is placed before the parameter pack instead of after. This complies with the website, but is inconsistent with the grammar hidden in substrait-java. The core extensions use a mixture of both. I've never seen anyone write the ? after the parameter pack except for a few cases in arithmetic_decimal.yaml, and the grammar file was never advertised on the website, so ? before pack seems to be the clear winner here.
I used if x then y else z for the ternary expression, rather than the C-style x ? y : z. As above, this complies with the website and the hidden grammar, but is inconsistent with arithmetic_decimal.yaml. The C-style ternary would also be HIGHLY ambiguous when combined with some of the language features I added, and even without those would probably not result in a grammar that a tool like Bison (LALR1) can parse. (I have to admit though, that I haven't actually checked if the proposed grammar is LALR1, but from experience dealing with Bison I feel like it should be, for what that's worth)
I added a few keywords. That's technically a breaking change, because for example true or assert can now no longer be used as an alternate name for T or something. None of the words are obvious names for bindings aka type parameters though.

I'm fine with calling this a breaking change if it makes someone happy, but considering how 100% of the users I know about have only been using YAMLs as structured documentation it seems a bit overkill.

Features

Support for type variations. The current ANTLR grammar doesn't have syntax for it, and the type syntax parsing page specifies integers to refer to variations, while variations are identified via names in the YAML files.
Support for user-defined type classes. The current ANTLR grammar fundamentally requires type class names to be keywords, and therefore can never support extensions (not without arcane syntax anyway).
Support (and specified behavior) for nullability of "type parameters" aka bindings, for example T?. The current ANTLR grammar does not have a rule for this and the documentation makes no mention of this corner case.
Support for capturing and evaluating nullability. For example, the nullability behavior of a two-input coalesce can now be correctly specified as coalesce(T?, T?n) -> T?n, i.e. the output is nullable iff the second argument is nullable, and the first argument must always be nullable. The added syntax is also expressive enough to specify MIRROR and DECLARED_OUTPUT nullability explicitly, allowing the nullability behavior enum attached to a function to become a desugaring step rather than a bunch of special cases in the derivation interpreter.
Specified behavior of variadic argument slot types better. To the best of my understanding, something(T, STRUCT<T, S>...) is currently at the very least weird if the function is marked inconsistent (S is not bound yet, which seems to mean that each S can be different. But what about T? Shouldn't that refer back to the previous use of T?). The proposed syntax would be something(T, STRUCT<T, ?S>...) if T is consistent and S can match anything, or you can just use ? instead of ?S if you don't use S elsewhere. Same as for nullability behavior, this makes handling consistency a desugaring step instead of a special case for the derivation interpreter.
Support for all "metatypes" (new word) supported by user-defined compound type classes (integers and data types, but also enums, strings, and booleans). In general, a first-class boolean type rather than an afterthought to make relational operators work.
Full support for constraints (static only, i.e. you can constrain the N in VARCHAR<N> but not the allowed values of an i64). Basically for free, too; the constraint system relies on the same logic that's already necessary to match function argument type patterns.
Specified and intuitive operator precedence. The current grammar treats all operators as having the same precedence.
Extended operator set to include <=, >=, and !=.
Actually specified how to execute a derivation program/expression, rather than leaving this to the imagination. In my first few iterations of trying to somewhat maliciously comply with the specification I ended up with what I'm pretty sure would have become a freaking SAT solver if I had continued, especially due to the inclusion of the covers function. The proposed language can be interpreted left-to-right in a single pass.
The meta type system is dynamic, i.e. something like T can be bound to any metatype. This entirely avoids having to do the type analysis suggested by the protos here. It makes them way easier to interpret and no more difficult to write. It is still strongly-typed though (no coercions whatsoever).
While we're on the subject of those protos, no difference between parameterized types and type expressions. Everything is a pattern, both syntactically and semantically, and most patterns work in both matching and evaluation context.
The reference grammar is properly documented with comments, written such that it avoids ANTLR-specific features (such as left-recursion rewriting, fallback to generic recursive descent for grammars that would be ambiguous for more efficient parsing algorithms, precedence overrides, excessive use of fragments, uncommon lexer patterns, etc), and actually referred to by the updated documentation. This should hopefully make it much easier for the next person who tries to implement a complete interpreter for the language, so it hopefully doesn't take them multiple man-months as well.

No idea if this list is exhaustive, I've not really been keeping track.

Anti-features

I unfortunately wasn't able to make everything nicely generalized and intuitive without (serious) breaking changes.

Grammatically, identifiers in pattern context can mean three different things: they can be a binding (previously called type parameter or integer parameter, to be confused with type parameters as in the things inside a type parameter pack), a type class name, or an enum variant. The resolution behavior is well-defined and should be intuitive for users, but is a royal pain in the semantic analysis phase of parsing.
It was ambiguous whether something like T should capture data types including their nullability, or if it should capture only the class/variation/parameter pack part of the quadruple and refer specifically to the non-nullable variant. I somewhat reluctantly ended up going for the latter in order to be intuitively consistent with something like i32 referring to specifically a non-nullable i32 rather than an non-concrete i32 with unspecified nullability, but it would have been So. Much. Cleaner. if non-nullable types had just required a ! suffix or something, rather than non-nullable being the default. Now you need arcane syntax like T?n to capture the class/variation/parameter portion in T and the nullability in n.
Inconsistent bindings are... weird, both syntactically and (at first glance) semantically. However, they are almost always hidden by the MIRROR/DECLARED_OUTPUT and INCONSISTENT variadic consistency desugaring rules though, so unless a user is specifying weird variadic functions or functions with weird nullability semantics, they are unlikely to ever see them.
When you're not used to it yet, the way patterns work under the hood can be pretty weird. For example, this would be a horrible yet legal way to check whether T is an integer type: true = T == i8 || T == i16 || T == i32 || T == i64. There are better ways to write it, for example assert covers(T, i8) || covers(T, i16) || covers(T, i32) || covers(T, i64), and it's grammatically and semantically possible to add syntax for pattern unions later, to make something like assert T matches i8 | i16 | i32 | i64 work in addition, or to allow such patterns to just be specified in the argument patterns directly. To work around the weirdness somewhat, I split the documentation into a user-focused section that just describes the intuition of how they work, and the exact specification of how the interpreter deals with them. Patterns are, however, very powerful once you do get used to them, and (aside from the other anti-features) pretty trivial to implement. For reference, the validator implementation of patterns (including lots of validator-specific logic and boilerplate code for error messages) is fully contained in this file.

Future proofing

It would be trivial to turn type arguments into generic metavalue arguments. Combined with the new enum metatype that came with support for user-defined compound types, this would supersede required enumeration arguments, and allow the value to be used in type derivation. It would also allow static integers, booleans, and strings to be specified (ex. trim(string, N) -> varchar<N>). The only thing missing is the ability to specify any metatype in the protos used to bind arguments.
Type traits. Metafunction syntax is already supported, so I figure it would be easy to allow custom-named metavalues to be attached to data types, and allow these to be retrieved using metafunctions. For example, the integers could be given an is_integer -> true and size -> 8|16|32|64 trait. A function like integer addition could then be written for example as add(T, T) -> assert is_integer(T); T instead of duplicating the function for all integers. The practical upshot of this (besides deduplication) is that if a user defines their own integer types and sets the same trait, the core add function would automatically be extended and usable for those types.
Parameter pack constraints for user-defined compound types could be written in the type derivation language, rather than being hardcoded as they are now. For example, the constraints on DECIMAL cannot currently be expressed with YAML due to the S <= P constraint, while this is trivial to specify for the derivation interpreter (you'd literally just say assert S <= P). In fact, the entire function argument matching semantics could be copypasted to compound type parameter pack matching; the only difference is that they only have the metavalue "arguments" I mentioned before (no value arguments or optional enumeration arguments).
The whole NSTRUCT pseudotype and certainly the protobuf NamedStruct abstraction thereof has never made sense to me; it makes way more sense to me to just use a single struct type with optional names. For the grammar I generalized this even further and just allow every parameter to be given a name, expressed either as an identifier or as a string. This contrasts with the current grammar, which has NSTRUCT as a grammatical special case (which also required NSTRUCT to be a keyword). Thus, if we ever do decide to generalize, the grammar is ready for it.
The grammar includes support for .-based namespace separators that would be part of feat: support for simple extensions dependencies #265.

Future work

I've updated the YAMLs manually for as far as I saw any conflicts, but I have not checked anything in an exhausted or automated way.
I've not yet updated function.proto, parameterized_types.proto, and type_expressions.proto. Instead I've been working on equivalent messages within the validator namespace here (simple_extensions.proto and type_system.proto). The protos probably don't belong there, but I ran into the problem that I'm using a more generalized message type for data types to represent the internals of the validator more accurately, and am currently on the fence about which representation I should be using for data type literals.
The validator already has a full implementation for the proposed language and interpreter, but it's still dead code because I've yet to handle traversal of the YAML files and update the internal representations of extensions.
Someone should obviously update the Java bindings if this is merged. I am, however, decidedly not a Java person, and had somewhat of an allergic reaction to the visitor pattern and introspection boilerplate nonsense going on in the current bindings.

jacques-n · 2022-10-03T21:11:58Z

If we want to go forward with this change, I'd like to see patches for substrait-java, substrait-validator and any other mature libraries that work with this stuff to be available before we merge this. We can't have our sub-repos substantially out-of-date with the spec.

jvanstraten · 2022-10-05T10:40:13Z

I'll try to ignore the unreasonableness of that request and instead focus on trying to work out what prompted it.

This proposal already very intentionally avoids making any changes to the specification, let alone substantial ones. It's a lot of words, but it's ultimately merely a rewrite of what's already there, to actually describe how the system should (or could) work, rather than just throwing a few examples out there and hoping users get it right. I classified this as a docs overhaul PR for a reason, but maybe the "overhaul" part threw you off?

It does add some new syntax, but does so primarily to even be able to define what's going on in a coherent way. If this new syntax is what's bothering you, I can just change the proposal to indicate that the corresponding patterns are not yet supported and exist only as a means to communicate behavior in the specification. This is, in fact, how the current parser in the validator internals already treats these things; it will emit an error if you're using syntax not currently supported by Substrait, even though it understands the syntax perfectly well, because it has to for the internals to be sane.

I'd like to see patches for [...] substrait-validator

The PRs that introduce this functionality internally are substrait-io/substrait-validator#32, substrait-io/substrait-validator#43, substrait-io/substrait-validator#48, and substrait-io/substrait-validator#49. But, again, the validator does not support YAML validation at all yet, so there was no preexisting behavior to patch. substrait-io/substrait-validator#47 is the WIP/tracking PR for supporting validation of type extensions; variations and functions don't have tracking PRs yet. substrait-io/substrait-validator#50 defined the protos for describing extensions that I was talking about, but I'll likely be changing those, still; I wrote them as something to work toward more than anything.

jacques-n

There are several fundamental problems with how you've approached this patch. The behaviors are not unique here and need to be addressed as they are part of a larger pattern that I see as counter to the kind of community that I think we are trying to build. I have no doubt that you approach all your work with good intentions. You're obviously brilliant and passionate and have made a huge set of contributions to the project. Unfortunately, community health is about more than good intentions and good code.

The key challenging behaviors I've seen include:

Generally negative, denigrating & dismissive tone.
Failure to collaborate
Lack of appreciation/respect for domain (related: lack of effort on learning domain).

I don't have the time to go and find the all of the examples of these behaviors. It's been pretty consistent. From you rage-quitting on PRs I spent days reviewing to just being super rude, the pattern is clear to me. As a personal impact, I've often felt like a punching bag for the last several months as you've railed against the problems in the spec, failed to do the legwork to learn some relational algebra and instead chosen to often preface your comments with "I don't know the domain" and then remark how foolish or wrong something is followed by suggesting major refactoring.

If the behavior concepts I listed above are unclear, let me point out some problematic concrete things in this particular PR:

Your comment: "I'll try to ignore the unreasonableness of that request"
Your link to documentation of "conflicting" with no justification of what is conflicting or trying to solve that conflict in as narrow and collaborative way as possible.
Your comment: "This is the culmination of multiple months of work." You shouldn't be working on rewriting the spec for functions unilaterally, especially months at a time.
Your comment: "No idea if this list is exhaustive, I've not really been keeping track." You need to keep track of what you're proposing/changing. (And this really goes back to the more basic premise that none of us should be working on something for months without collaborating.)
The fact that you forked code in the Java repo, modified it and are now proposing it for the main repo while never contributing back changes to the Java repo. This isn't okay. It's awesome that you want to improve the grammar and the specification. How about we start by making the single piece of code that is already trying to work with the function definition works as opposed to fracturing the community into two competing grammars.

I know you're trying to make Substrait better. It's awesome that you're volunteering so much time to the project. I also am the first to agree there are problems with the spec and things need to be improved. What I need is your help on is improving positivity, collaboration and domain appreciation. As it stands, these fundamental challenges make patches like this very hard to review.

CLAassistant · 2022-10-06T23:45:37Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

jvanstraten · 2022-10-07T07:43:33Z

Well, okay, this explains a lot. I appreciate your honesty.

I suppose it's not worth much at this point, but I agree that I was out of line in #333 and apologize again. In general, many of the things you mention about my personality are fair enough. I have a hard time looking past a problem once I've seen it, and get upset if I can't solve it, not to mention when I see people apparently in the process of making things measurably worse. I find it especially aggravating when something is almost consistent but not quite, which is how I see Substrait right now. I also don't believe my opinion (or anyone else's) should matter when making decisions, so I try to argue with facts rather than sugarcoating things. Facts can be cold and can be taken personally, despite me not meaning them that way. Simultaneously, when other people start arguing based on feelings rather than facts, like what happened in #333, I get especially upset, because you can never reason someone out of a position they didn't reason themselves into, and disagreements that cannot be resolved leads to deadlock. There are already a while number of examples of that in the form of open PRs. But I guess I'm making excuses now. I don't know. I hope not, but maybe I'm just an asshole.

I don't intend to be disrespectful of anyone else's field, and I try stay out of discussions when I don't know what I'm talking about. However, you're right, I don't really care for relational algebra personally. I do however care for and have a background in engineering specifications in general, defining domain-specific languages, and compiler construction. I would even argue that the fact that I don't have prior experience with any particular DBMS or whatever makes me uniquely qualified to look at things objectively without biases to what I'm used to, and to point out when things are unclear. I'm sorry you seem to disagree. I would also like to point out that you don't pay me to work on this; I'm not going to study a field I'm not all that interested in just because you want me to. That goes for both relational algebra in general and the Java programming ecosystem. I spent a day trying to get substrait-java to compile in an IDE so I could maybe look at proposing something, but failed completely, and despite you calling it mature, there doesn't seem to be any documentation or (binary) release + dependencies anywhere.

Anyway, while we're making observations about things in general, here's one of mine: we seem to fundamentally disagree on what a specification is.

To me, a specification is a document that unambiguously defines an interface. It always precedes any implementation, except perhaps a reference implementation like a validator. Especially for a vendor-agnostic thing like Substrait, a specification should not favor any particular possible implementation; it should strike a balance between letting different implementations communicate with one another by means of the specified interface, while allowing users the freedom to innovate. Something being unclear or open to interpretation is not acceptable, unless it is explicitly unspecified behavior; when something like that is found, the number one priority should be to fix it. Unless the fix is to make it explicitly unspecified, this is necessarily a breaking change, because something that was previously legal due to not being explicitly illegal now actually is illegal.

To you, I think, a specification is more of a working document. The specification is secondary to the implementations; if two implementations can communicate with each other, it doesn't matter if they actually follow the specification to the letter, because the specification has already done its job at that point. Perhaps the specification should then be reworded to align with the implementations, which I guess would not be a breaking change if you look at it from this perspective, but just a bugfix. It should be brief and easy to read, making use of the background that any user should logically have already in order to do something with the specification; being overly precise is a waste of everyone's time.

I push back against that, perhaps too passionately, because I don't think it's going to work. I think everyone will just end up implementing their own dialect of Substrait, sending a lot of the benefits of having a common representation down the drain. Then, when nothing works together out of the box anyway, people will realize it's a whole lot easier to just make your own IR, and Substrait will either die completely or be just another pseudo-standard. But maybe I'm wrong. On behalf of all the people and companies investing time into making consumers and producers, I certainly hope so.

To stress just how big of a communication issue this is (and why I'm bringing it up), it took me a good ten minutes to even understand what you were talking about with this:

The fact that you forked code in the Java repo, modified it and are now proposing it for the main repo while never contributing back changes to the Java repo.

The way I see it, I am merely proposing an update to the specification here. If the specification would currently have actually specified a grammar at all rather than just saying impl pending, I would have taken that as a baseline instead. Nothing I proposed here had anything to do with substrait-java or any other implementation, because I consider a specification to be independent and distinct from any implementation. In general I reject the notion that an EBNF grammar has anything to do with implementation; EBNF is a means to specify a language, not to implement one, it just happens to also be useful for parser generators like ANTLR. And the grammar isn't even modified from the one in substrait-java, it's a complete rewrite, except I guess the filename because that made my test scripts easier at the time and I saw no reason to name it differently. The only reason I linked to substrait-java and investigated it at all is because I wanted to avoid breaking it as much as I could, for example by continuing to allow $s to be part of identifiers, and because, since it was also written by you, it provided clues as to what the specification is even talking about at times.

It took me this long not because I changed so much, but because it actually was that hard for me to make heads or tails of it and change as little as possible while still having everything be consistent. I want to make sure I actually understand a problem and make sure a proposed solution works before bothering someone else with my walls of text and to avoid breaking things more inadvertently, and I can't do that incrementally when the baseline doesn't make sense to me. Believe me, it would have been a hell of a lot easier to just start over if I truly didn't care about collaboration. The reason I went through the trouble of formulating a complete system, making sure that it actually works in an implementation, and writing extensive documentation for it on my own is so we now actually have a working baseline to discuss, rather than what amounts to little more than a few examples and a Java implementation that throws "not yet implemented" for evaluating all but the most trivial case.

I don't know how to point out what's wrong and inconsistent with the various bits and pieces any more specifically than what I already pointed out in the bullet points in the OP. If none of those strike you as "oh, right, that's indeed pretty broken" then I guess you should just close this PR. If it does but you just don't like my all-at-once method, then that's also fine, but I'm sorry to say that I just don't know how to fix those things incrementally. You'll have to find someone else.

FWIW, I'll be moving on from VD at the end of the year, so I'll in practice be out of your hair in ~11 weeks anyway (I made that decision a while back already for unrelated reasons, and it's no secret, so I might as well throw it out here). I'll still try to finish the validator, but someone else will need to take over maintenance from me after that. As for this repository, I don't think I'll be making any more contributions unless I really have to for work. It's pretty clear that my methods or background are not desirable, and I'm already pushing myself to be more positive than I naturally am, make smaller PRs than I naturally would, and don't intend to study relational algebra and read database docs in any more detail than I already am.

P.S. If this post seems especially rambly or incoherent, I saw your post just before heading home from volunteer work at around 2 AM. I've been writing and editing this since because there was not a snowball's chance in hell I'd fall asleep before answering. It's somehow 9:45 AM now and I'm really starting to fall asleep.

jacques-n · 2022-10-07T16:56:18Z

so I'll in practice be out of your hair...

I'm very saddened by this response. My goal in providing feedback was to improve our collaboration, not end it. You've provided massive positive benefit to the project and it is a bad outcome for you to depart.

we seem to fundamentally disagree on what a specification is..

I don't think that is true. I do think that we disagree on what the Substrait project is and where it is in it's lifecycle. To me, the specification is an important part of what we're trying to create. I think we're striving to recreate the same balance that Arrow has achieved: a specification + one or more reference client implementations.

specification..I don't think it's going to work

I think that it worked well in Arrow and I believe we can do it again here. On this point, we need to be looking for ways to start applying the "reference implementations must exist for proposed specification changes" paradigm.

I don't really care for relational algebra..I'm not going to study a field I'm not all that interested in..

I think this makes early iteration of the project similar to what we did in Arrow especially challenging. We need people focused on filling on the gaps before auditing is going to be productive (as opposed to frustrating). Compounding the challenge of abstract "specification work" is the goal for Substrait to sit at a new domain-specific abstraction level. Without a point of reference, simplification can easily result in difficulty of consumption. Substrait must get to the point where it is usable and consumable by people without domain expertise--but it isn't there yet.

I do however care for and have a background in engineering specifications in general

Your expertise and passion have been greatly appreciated and will be sorely missed. I hope you reconsider.

julianhyde · 2022-10-07T18:42:59Z

I don't see a CoC violation from @jvanstraten here.

It looks as if #333 was a much-needed discussion and got out of hand because of discussion threads. Someone should have been a grown-up and called a zoom meeting before it got so heated.

I've seen a lot of good work from @jvanstraten, acting in a strict-but-fair "editor" role that was much for the benefit of the project. Frankly I assumed that he was a paid employee of Jacques' because no one would have enough spare time to process each request so diligently.

I don't know what is Substrait's governance structure. I hope that he is speaking on behalf of (something like) a PMC. But I fear that he is speaking as a BDFL and if so there is a bigger problem than Jan's tone and positions.

jacques-n · 2022-10-08T00:50:40Z

I don't know what is Substrait's governance structure. I hope that he is speaking on behalf of (something like) a PMC.

If you interested, there is pending proposed governance model that was shared several weeks ago on the mailing list and we've incorporated the feedback of many members of the community. Happy to get your feedback as well.

WRT to my comments here. The members of the SMC have discussed this topic multiple times over the last several months amid multiple instances. There was consensus amongst us that while @jvanstraten has huge positive impact to the project there were several things that need to be improved. A culture of respect and collaboration is especially important early on in the establishment of any community. Because we had not formally adopted the governance document, the discussion was held in private as there was concern that it would be a jarring discussion to have in public without formally adopting the transparent pattern we outline in our proposed governance.

I fear that he is speaking as a BDFL and if so there is a bigger problem than Jan's tone and positions.

I have no interest in a BDFL model. I have a track record of the opposite. Please assume good intentions.

cpcloud · 2022-10-08T12:25:08Z

This was definitely not just Jacques here. I am also not interested in BDFL governance models.

jvanstraten · 2022-10-08T12:59:32Z

I'm sorry, but I honestly find this downright shocking. Several months? Without radical transparency you could also have told me in private, you know. Instead, everyone in VD anyway has been telling me how good of a job I've been doing. And then you put me personally on the spot in the sync meeting for bouncing ideas off of colleagues before spamming the whole community with questionable ideas? I believe kids practice that governance model by punching someone while yelling "no punch back!" and then acting like the victim when they are indeed punched back. Such a healthy dynamic. But clearly I'm the only problem here.

I know I'm not good at collaboration; I find it very difficult not to get carried away with something and when I started this project I was downright phobic of voicing my opinion in public. I voiced these concerns at VD but was placed on the project anyway for various management reasons, and I've been trying my damdnest to improve and act accordingly. Recently this has involved literally spending a workday on each comment with multiple rewrites and edits to tone it down, then asking a colleague to proofread to make sure I'm not out of line, and then editing to tone it down some more. My idle train of thought is nothing but forming arguments for Substrait discussions now and it's really starting to cause problems for my personal life. Apparently this is not and has not been good enough, so I apologize again for my personality hampering your ability to make good use of my "brilliance", and that it's getting in the way of you reviewing my proposals objectively.

I will probably end up regretting this comment as much as I regret #333 but I am beyond my breaking point right now and yet again wasting the nice weather on a day off arguing on my damn phone. If only for my personal mental health, I need to take a good number of steps back here. If that goes as far as ending up out of a job for a few months then so be it. I again apologize for stating facts and my opinion.

westonpace · 2022-10-08T13:10:23Z

@jacques-n is not wrong in suggesting that we want this community to be a place where people can feel comfortable contributing. That doesn't mean they aren't challenged and I agree that cold hard facts are the best way to do this kind of challenging.

I think there are times @jvanstraten 's opinions were interpreted (by more than just @jacques-n) as being more critical than needed. That being said, I am certainly guilty of this myself at times. I was quite hostile I felt during an early discussion on type derivation and more recently got frustrated and gave up during a discussion on random seeds. The entire discussion in #333 was not helped by the fact that I did not do a good job discouraging an internal discussion in VD and it led to a sort of brigading. Certainly there is frustration to go around. We wouldn't need consensus if we didn't have plenty of disagreement and compromise.

I would hope that we, as a community, can get to a place where we feel comfortable pointing out these offenses as they happen, and not letting things bottle up to a point of explosion. Again, I will share the blame of not doing more of this myself, and will aim to step in earlier in the future. A lengthy post calling out patterns of behavior is both hard to receive and hard to recover from. It shouldn't be the first indication that someone is upset and it is unfair to the person that receives it.

Yes, we spoke about this in private. Yes, I agreed that some of @jvanstraten 's comments were over the line but his intentions were good. I should have done a better job bringing up these points I describe above at that time. I do not think this is how we want to handle these things in the future.

westonpace · 2022-10-08T13:16:05Z

Instead, everyone in VD anyway has been telling me how good of a job I've been doing

FWIW, I am extremely grateful for all of the hard work that you've put in and I don't think anyone is saying otherwise. The spec is, undoubtedly, in a better place due to your critiques and contribution.

julianhyde · 2022-10-09T00:50:32Z

Jacques, I used the term BDFL because there was a bossiness to your tone in your first and second comments. The power imbalance between you and Jeroen is palpable.

In the Apache governance model (and AFAICT in the Substrait governance model also) all committers are peers when it comes to technical contributions.

There were times (in other threads) when, it seemed to me, you were justifying technical decisions because you had written the spec, and the spec was final. The spec is always subject to revision, if there is consensus among committers. If the committers (or perhaps SMC) have agreed to freeze the spec, then it is respectful to cite that decision, not to imply that your opinion as a senior committer trumps the opinion of a more junior committer. There is no such thing as committer seniority.

That said, in my work in Calcite I've been in your shoes many times, looking at a complex patch that has not been done quite the way I'd have done it, which took a long time to write, and a considerable time even to review. That process is incredibly stressful to both parties. The best solution I have found is to use inclusive language: "This is important work, and I'd really like to get it merged. The main problems I see are as follows... What do you think? I hear you saying you're uncomfortable working in Java. If we can find someone else to handle that, do you think you could handle the other languages?"

For what it's worth, I read

I'll try to ignore the unreasonableness of that request and instead focus on trying to work out what prompted it.

as a constructive comment, though it was a little poorly phrased. I hear Jeroen expressing his emotional reaction to your comment (which is healthy) and pledging to work on the underlying issues and find compromise.

I looked briefly at the proposed governance model. It looks OK. I have some minor comments, and I'll make them in the doc.

Lastly, long threads with many-paragraph replies are a red flag. It's probable that one or more parties in those threads are under stress for days at a time while the threads play out. These threads place an unacceptable burden on people's mental health, not to mention the huge amount of time it takes to write them.

jvanstraten · 2022-11-01T15:43:05Z

It seems like this thread has run its course, and since I no longer intend to work on this, I'll close this in the interest of cleaning up after myself. I won't delete the branch on my fork in case someone else wants to pick things up from where I left off.

jvanstraten added 5 commits October 3, 2022 16:25

docs: describe meta type system implemented in validator

73f35bf

docs: integrate meta type system with function docs

83d434b

docs: rewrote function introduction, address a few comments

4878dd7

chore: whitespace consistency in SubstraitType.g4

aa992e4

fix: derivation syntax in functions_arithmetic_decimal.yaml

e2c57bc

jacques-n reviewed Oct 6, 2022

View reviewed changes

jvanstraten closed this Nov 1, 2022

This was referenced Nov 1, 2022

fix: various fixes for #252 (write and DDL relations) and #284 (relation references) #288

Closed

Implement validation for simple extensions substrait-io/substrait-validator#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: type derivation and function docs overhaul #352

docs: type derivation and function docs overhaul #352

jvanstraten commented Oct 3, 2022

jacques-n commented Oct 3, 2022

jvanstraten commented Oct 5, 2022

jacques-n left a comment

CLAassistant commented Oct 6, 2022

jvanstraten commented Oct 7, 2022

jacques-n commented Oct 7, 2022 •

edited

Loading

julianhyde commented Oct 7, 2022

jacques-n commented Oct 8, 2022 •

edited

Loading

cpcloud commented Oct 8, 2022

jvanstraten commented Oct 8, 2022

westonpace commented Oct 8, 2022

westonpace commented Oct 8, 2022

julianhyde commented Oct 9, 2022

jvanstraten commented Nov 1, 2022

docs: type derivation and function docs overhaul #352

docs: type derivation and function docs overhaul #352

Conversation

jvanstraten commented Oct 3, 2022

Breaking change?

Features

Anti-features

Future proofing

Future work

jacques-n commented Oct 3, 2022

jvanstraten commented Oct 5, 2022

jacques-n left a comment

Choose a reason for hiding this comment

CLAassistant commented Oct 6, 2022

jvanstraten commented Oct 7, 2022

jacques-n commented Oct 7, 2022 • edited Loading

julianhyde commented Oct 7, 2022

jacques-n commented Oct 8, 2022 • edited Loading

cpcloud commented Oct 8, 2022

jvanstraten commented Oct 8, 2022

westonpace commented Oct 8, 2022

westonpace commented Oct 8, 2022

julianhyde commented Oct 9, 2022

jvanstraten commented Nov 1, 2022

jacques-n commented Oct 7, 2022 •

edited

Loading

jacques-n commented Oct 8, 2022 •

edited

Loading