-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FunC v0.5.0: syntax, refactoring, bugfixes, and a testing framework #1026
Conversation
It makes it easier to understand/debug Also, drop some unused enum values from that cases
* fully refactor run_tests.py, make it extensible for the future * an ability to write @compilation_should_fail tests * an ability to launch run_tests.py for a single .fc file * keep run_tests.js in sync with run_tests.py * extract legacy_tests names/hashes to a separate file shared between legacy_tester.py and legacy_tester.js
Seeing function name in debugger makes it much easier to delve into FunC sources
* @fif_codegen to match compiled.fif against an expected pattern * @fif_codegen_avoid to ensure compiled.fif doesn't contain a substring * both in Python and JS run_tests * consider tests/codegen_check_demo.fc for examples
…(...args); }` This will allow to easily implement camelCase wrappers aside stdlib, even without changing hashes of existing contracts. Also, stdlib renamings could be easily performed in the same manner, even with arguments reordered.
@code_hash to match (boc) hash of compiled.fif against expected. While being much less flexible than @fif_codegen, it nevertheless gives a guarantee of bytecode stability on compiler modifications.
…anged In auto-tests, @code_hash controls bytecode stability. In legacy tests, expected hashes are specified in a separate file.
They work alongside Lisp-style ;; and {--}, without any #pragma. Conceptually, a new syntax should be disabled by default and activated using a special compiler option. But now, we don't have an easy way to provide compiler options in func-js, blueprint, etc. Note, that introducing per-file #pragma is a wrong approach here, since if we want to fire human-readable error on using '//' without pragma, lexer should nevertheless work differently. (this could be controlled by a launch option, but see above)
In stdlib, all existing pure functions are asm-implemented. But since we introduced a `pure` keyword applicable to user-defined functions, we need to check that they won't have any side effects (exceptions, globals modification, etc.)
Note, that I have not added all builtin functions. I filtered out strange and actually unused in practice, like "int_at()" and similar, or "run_method0()" and similar. (Probably, they should be dropped off even from builtins) Also, I've modified some stdlib.fc legacy tests just to ensure that a resulting hash doesn't change.
All tests pass: it does not affect hashes (since modifying variables in a single expression was an error)
It changes all hashes, since the compiler needs to manipulate the stack in a different way now.
…_Let Before, #pragma allow-post-modification produced Op::_Let for every tensor entity (which became non-disabled if modification really happened). Although they are stripped off by the compiler and don't affect fif output, they pollute intermediate "AST" representation (ops). Now, Op::_Let is added only if var modification actually happens (which is very uncommon for real-wise code)
Before, such code `if (slices_equal() & status == 1)` was parsed as `if( (slices_equal()&status) == 1 )`. Note, that this change leads to hash changes of some verified contracts, but a new priority is more expected from the user experience.
`get` keyword behaves exactly like `method_id` (auto-calc hash), but it's placed on the left, similar to Tact: `get T name()`. `method_id(n)` is still valid, considering it can't be invoked by name, since a client will compute another hash. It's supposed it will be still used in tests and in low-level code (not to be called externally, but to be called after replacing c3). `get(hash)` is invalid, this keyword does not accept anything.
As it turned out, PSTRING() created a buffer of 128K. If asm_code exceeded this buffer, it was truncated. I've just dropped PSTRING() from there in favor of std::string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#51817
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have! |
A first-level heading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [ ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [ ]
@@ -296,7 +287,7 @@ void Op::show(std::ostream& os, const std::vector<TmpVar>& vars, std::string pfx | |||
if (noreturn()) { | |||
dis += "<noret> "; | |||
} | |||
if (!is_pure()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [ ]
All changes from PR "FunC v0.5.0": #1026 Instead of developing FunC, we decided to fork it. BTW, the first Tolk release will be v0.6, a metaphor of FunC v0.5 that missed a chance to occur.
All changes from PR "FunC v0.5.0": #1026 Instead of developing FunC, we decided to fork it. BTW, the first Tolk release will be v0.6, a metaphor of FunC v0.5 that missed a chance to occur.
Closing this PR. We decided to leave FunC as is. Instead, we forked it and started developing a new language, named TOLK. |
UPD. We decided not to develop FunC, but to fork it and start developing a new language, named TOLK.
See #1345
FunC v0.5.0 brings a lot of syntax enhancements, as well as internal refactoring encounpled with a brand new testing framework.
A brief changelog (all-in-one)
//
and/*
is now supported (and preferred)impure
has become deprecated, but its antonym keywordpure
is introducedmethod_id
is deprecated, also. It was replaced as too obscure. Now there isget
, written on the left:get int seqno() { ... }
compute-asm-ltr
andallow-post-modification
are deprecated (always on)#pragma remove-unused-functions
& | ^
to more intuitive onesimpure
, replacemethod_id
withget
, etc.Now, let's describe each subject in detail.
Traditional comments
//
and/* */
Prior to v0.5.0, FunC had List-style comments
;;
and{- -}
.Now, support for traditional comments
//
and/* */
has been added.Both old and new comments still work, without any options or pragmas. So, you may just use traditional comments after upgrade, and old code will still be compiled.
stdlib.fc
has tranditional comments now.Though
;;
are still supported,//
become a preferred choice.Q: Why is this feature enabled by default, without any pragma?
A: Conceptually, every new syntax feature should be disabled by default and activated using a special compiler option. But now, we don't have an easy way to provide compiler options in func-js, blueprint, etc. Note, that introducing per-file
#pragma
is a wrong approach here, since if we want to fire human-readable error on using '//' without pragma, lexer should nevertheless work differently. (this could be controlled by a launch option, but see above)Block comments can still be nested, though it's an undocumented feature, since nesting doesn't properly work in most JS highlighters, even in documentation and VS Code.
All real-world contracts I've found aren't broken, so this addition is pretty safe.
Impure by default,
pure
specifierPrior to v0.5.0, there was a keyword
impure
(function specifier). When absent, a function was treated as pure. If its result is unused, its call was deleted by the compiler.Though this behavior was documented, it was very unexpected to new-comers. For instance, various functions that don't return anything (threw an exception on mismatch, for example), were silently deleted. This situation was spoilt by the fact that FunC didn't check and validate function body, allowing impure operations inside pure functions.
Now,
impure
produces a warningpure
is introducedstdlib.fc
are markedpure
(so, any change in behavior will occur only to user-defined functions)pure
is used for a user-defined function, the compiler forbids impure operations in its body (exceptions, globals modification, calling non-pure functions, etc.)Note, that in the future, we may teach FunC to auto-detect pureness. In other words, absence of
pure
doesn't mean "a function is impure": it just means "a function is default", which for now is impure in terms of current capabilities. Automatic checks are much better than to trust a user whether he put "impure" or not. Because mostly he did not, since it's a very unexpected pattern in a programming world.Keyword
get
instead ofmethod_id
Prior to v0.5.0,
method_id
was used for two purposes:int seqno() method_id { ... }
() after_upgrade() method_id(1666) { ... }
In the second case,
after_upgrade()
is unable to be called by its name (from a lite-client, ton scanner, blueprint, etc.), because a client will calculate another hash of a string "after_upgrade". That's why specifying id implicitly is useless for get methods.To separate these two cases and because the keyword
method_id
is too obscure, a new syntax has been added:Keyword
get
is written on the left.get
does not accept an idmethod_id
should be used with an id:method_id(XXX)
, for tests and low-level codeUsing
method_id
without id is still valid (to make old code compile), but produces a warning. Should be replaced withget
.Mixing
get
andforall
makes no sense, but syntactically,get
is expected before anything else.Note, that since all functions are impure by default, get methods also are. They are allowed to call non-pure functions, for instance. It's not a problem when we remind that absence or
pure
does not mean "impure", it means "default". So, get methods are "default", actually.Deprecate pragmas
allow-post-modification
andcompute-asm-ltr
Now using them produces a warning: they are enabled always.
Note, that this change will most likely change Fift output (and a BoC hash) of your code. It's not bad and not good, it's just a fact. Emitting another "bytecode" after FunC/Fift update is expected, the end user does not rely on it. If your contract has already been deployed, its code won't be changed anyway, it's onchain. If your contract is under development, you are not interested in its hash. If you want to check a hash of an existing deployed contract, you should anyway take the exact version of FunC/Fift/stdlib used for deployment.
Also, I've refactored implementation of
allow-post-modification
to avoid producing disabledLET
instructions in an intermediate representation.Auto-inline, potential camelCase and stdlib renamings
Probably, one of the hardest commits in this MR.
Say, you just have the code
which is compiled into
But you (whyever) don't like names "begin_cell" and "end_cell": say, you want to use camelCase.
Now, you can just write a function-wrapper:
which is compiled exactly the same as original:
Note, that
beginCell()
andendCell()
don't even exist in fif output. Their usages were effectively replaced by asm commands, so they don't consume gas. Their code and declaration were not codegenerated, so they don't consume storage. The resulting BoC (and its hash) remains unchanged, so their usage is safe even in existing contracts.A function is treated as a wrapper (and auto-inlined) when its body is just
return anotherF(...)
, it passes all arguments to anotherF (probably, changing their order), return types match, it's not a getter, and so on.Such an ability gives a straight way in the future to safely rename functions in stdlib (to put things in order): just add wrappers over existing stdlib functions, marking original ones as deprecated (to be done later). Example:
If we want, we may introduce camelCase for stdlib:
Changing arguments order also works. Wrappers around built-in functions also work. Modifier methods
~
also work. From the IDE perspective, it just works out of the box.Note, that you don't write
pure
when wrapping a pure function. That's what has been already said: absence of "pure" doesn't mean "impure", it just means "default".Note, that FunC is smart enough in case of corner cases. For example, if a wrapper is used as a 1st class function:
Then
beginCell()
will be codegenerated, andstarter
will contain a reference to its continuation. Nevertheless, direct calls tobeginCell()
will still be inlined.#pragma
remove-unused-functions
Now, FunC can exclude unsed functions from Fift output. Disabled by default, enabled by this pragma.
A function is considered "used" if it has direct calls or indirect references from any other used function. Getters and special functions (
main()
,recv_internal()
, etc.) are used apriori, they are the roots of dfs search.Change priorities of operators
& | ^
Prior to v0.5.0, such code
if (slices_equal() & status == 1)
was parsed asif( (slices_equal()&status) == 1 )
. That was a reason of various errors in real-world contracts, just look at examples:Surprisingly, sometimes old parsing produced the expected result (since
true
is -1, not 1), but when not, it was very hard to find out an error.But anyway. As you can see, authors of this code expected
&
to have a lower priority, but it did not. Moreover, it was a bug: in almost all languages,&
has lower priority.So, we are fixing this bug. But since it's a breaking change, we introduce compilation errors in such cases instead of just silently compiling in a different (though often more expected) way.
Require parenthesis on probably wrong bitwise operators usage
Now, when you try to compile
It leads to an error:
Hence, the code should be rewritten:
Here is an example, where the correct change will be the latter:
While it could be annoying in some cases,
but in general, this case is indistinguishable from the
flags & 0xFF == 0
example.Since FunC doesn't have a
bool
type,& |
are used sometimes as logical ones, but sometimes as bitwise. That's the reason we decided occasions like above to be an error, not a warning. The only way to overcome such error is to use parenthesis, as you may see. In the future, we'll probably addbool
to the language, as well as&& ||
operators, which won't require parenthesis in obvious cases.Keeping in mind that
bool
will probably be implemented in the future, symbol!
is now reserved, identifiers are disallowed to start with it.Resuming a conversation, now using bitwise operators with any comparison operators requires parenthesis. Besides, mixing different operators
arg1 & arg2 | arg3
is also erroneous:I've also added a diagnostic for a common mistake in bitshift operators:
a << 8 + 1
is equivalent toa << 9
, probably unexpected.Keyword
builtin
, its usage in stdlibIn FunC, some functions are implemented at compiler level, not as asm instructions (mostly it's used for optimization, to produce more effective code when constant parameters are passed to that functions). For example,
load_uint()
,null?()
andthrow()
are built-ins.Prior to v0.5.0, such functions either weren't mentioned in stdlib, or were commented out:
Now, all built-in functions are placed in stdlib, marked as
builtin
:Note, that as earlier, they are available if you don't include stdlib.fc. The purpose of the
builtin
keyword is to make stdlib more intelligible.Tremendously enhanced internal framework for testing FunC
Prior to v0.5.0, there was an
auto-tests/tests
folder with several .fc files, having a comment described provided input and expected output. For example:There was a
run_tests.py
which traversed each file in a folder, detected such lines from comments, compiled to fif, and executed every testcase, comparing output.It was okay, it worked, but... This framework was very-very poor. I am speaking not about the amount of tests, but what exactly we can test using such possibilities.
For example, consider functions-wrappers described above:
Even if we don't do anything inside the FunC compiler, all tests for input-output will work :) Because what we really want to test, it that
beginCell()
is not codegenerated (noDECLPROC
and similar)beginCell()
is replaced withNEWC
(notCALLDICT
)None of these could be explained in terms of input-output.
This is only one of the examples, there is much more we may want to test that goes beyond current testing framework.
I have fully rewritten an internal testing framework and added lots of capabilities to it. Let's look though.
@compilation_should_fail
— checks that FunC compilation fails, and it's expected (this is called "negative tests").@stderr
— checks, when compilation fails, that stderr (compilation error) is expected.Example:
@fif_codegen
— checks that contents of compiled.fif matches the expected pattern.@fif_codegen_avoid
— checks that it does not match the pattern.The pattern is a multiline piece of fift code, optionally with "..." meaning "any lines here". It may contain //stack_comments, they will also be checked.
Example:
@code_hash
— checks that hash of compiled output.fif matches the provided value. It's used to "record" code boc hash and to check that it remains the same on compiler modifications. Being much less flexible than@fif_codegen
, it nevertheless gives a guarantee of bytecode stability.Example:
Of course, different tags can be mixed up in a single file: multiple
TESTCASE
, multiple@fif_codegen
, etc.Also, I've rewritten
run_tests.js
to be fully in sync withrun_tests.py
. It means, that now we can test fif codegen, compilation errors and so on for WASM also.Besides, I've slightly worked on
legacy_tests
, extracting names/hashes to a separate file shared between py and js.Chronologically, creating this testing framework was the first thing I've done. All the functionality above has been developed covered by necessary tests.
Moreover, I've downloaded sources of 300 verified contracts from verifier.ton.org and written a tool to launch FunC on a whole database after every commit. That makes me sure that current and future changes in the compiler don't break compilation of "flagship" codebase, and when BoC hashes (Fift output) are changed, I look through to ensure that changes are expected. That codebase lives outside of
ton-blockchain
repository.IDE plugin improvements
Our plugin for JetBrains IDE and VS Code extension also needs to be improved to support such vast amount of changes.
The principal point is to introduce a setting "FunC language level" (the user selects it for a project, like "PHP language level" in PHPStorm). When it's "v0.4.x", everything should work as earlier, to continue developing contracts using an old compiler. When it's "v0.5.x", then:
//
comments should be activated, as well as an inspection to transform;;
into//
impure
deprecation, a quick fix to remove itmethod_id
deprecation, a quick fix to transform intoget
pure
,builtin
Note, that grammar should correspond to the latest version of FunC, and when some keywords (or other syntax) is unsupported due to user's setting, they should be highlighted as errors.
Migration guide: from v0.4.x to v0.5.0
This is a "manual" for users to migrate their code:
// traditional
/* comments */
instead of old Lisp-style. IDE will suggest you to replace existing comments.impure
, since all functions are impure by default. IDE will suggest you to remove existing specifiers.method_id
, useget
keyword on the left:get int seqno() { ... }
. IDE will suggest you to replace existing specifiers.Related pull requests
To documentation:
To func-js:
To highlightjs-func:
To TON verifier:
To VS Code extension:
To JetBrains IDE plugin: in progress
Fixes #971
Fixes #596
Fixes #1022
How to review this pull request?
Chronologically, by commits. Every commit contains a completed feature with green tests. No "fix" and other intermediate commits exist in a branch history.