diff --git a/sphinx/c/index.rst b/sphinx/c/index.rst index 96bfa85f..059c9b9c 100644 --- a/sphinx/c/index.rst +++ b/sphinx/c/index.rst @@ -1,13 +1,10 @@ Gillian-C ========= -.. note:: - Everything this section is true at the time of artifact submission for PLDI 2020 on 2020/02/28. - -Gillian-C is the instantiation of Gillian to the C language (CompCert-C, to be precise). It can be found in the ``Gillian-C`` folder of the repository. +Gillian-C is the instantiation of Gillian to the C language (CompCert-C, to be precise). It can be found in the ``Gillian-C`` folder of the Gillian repository. .. toctree:: :titlesonly: - folder-structure symbolic-testing + pldi/index diff --git a/sphinx/c/folder-structure.rst b/sphinx/c/pldi/folder-structure.rst similarity index 100% rename from sphinx/c/folder-structure.rst rename to sphinx/c/pldi/folder-structure.rst diff --git a/sphinx/c/pldi/index.rst b/sphinx/c/pldi/index.rst new file mode 100644 index 00000000..89a50b82 --- /dev/null +++ b/sphinx/c/pldi/index.rst @@ -0,0 +1,10 @@ +PLDI’20 Artefact Documentation +============================== + +The following is the C artefact documentation for the :doc:`../../publications/gillian-part-1` PLDI'20 submission + +.. toctree:: + :titlesonly: + + folder-structure + symbolic-testing diff --git a/sphinx/c/pldi/symbolic-testing.rst b/sphinx/c/pldi/symbolic-testing.rst new file mode 100644 index 00000000..e934b0ca --- /dev/null +++ b/sphinx/c/pldi/symbolic-testing.rst @@ -0,0 +1,93 @@ +Symbolic Testing +================ + +Symbolic Testing of Collections-C +--------------------------------- + +Fixes +^^^^^ + +Symbolically testing Collections-C let to the following bug-fixing pull requests. They fix previously unknown bugs and usage of undefined behaviours: + +- `Fix buffer overflow `_ (bug) +- `Remove the usage of cc_comp_ptr `_ (undefined behaviour) +- `Test coincidentally passing while they should not `_ (bugs and undefined behaviours) +- `Fix overallocation `_ (bug) +- `Fix hashing function `_ (performance-reducing bug) + +Reproducing the Results +^^^^^^^^^^^^^^^^^^^^^^^ + +For license reason, we do not include the Collections-C code in the Gillian repository. +There is an `external repository `_ that contains the Collections-C code adapted to testing in Gillian-C and Klee. + +In order to clone it, simply run, from the Gillian folder: + +.. code-block:: bash + + cd .. + git clone https://github.com/GillianPlatform/collections-c-for-gillian.git collection- + cd Gillian + +There are two ways of launching the tests: + +- Using the ``bulk-wpst`` command of Gillian-C which has a nicer output (using Rely), but cannot run the tests in parallel. +- Using a bash script that will call ``gillian-c wpst`` as many times as there are files to test, but supports parallel mode (this is the one we used for measures). (NOTE: since then, we removed the option for parallel mode, and plan of having a better implementation in the future) + +Using bulk-wpst +""""""""""""""" + +From the Gillian folder run: + +.. code-block:: bash + + dune exec -- gillian-c bulk-wpst ../collections-c/for-gillian + +You will see every test suites executing one by one. Two tests will fail, this is intended. They represent two of the bugs we've found and are explained `here <#bug-tests>`_. + +Using the bash script +""""""""""""""""""""" + +From the Gillian folder, for each folder you want to test, use: + +.. code-block:: bash + + Gillian-C/scripts/testFolder.sh ../collections-c/for-gillian/folder + +For example, to run the test suite related to singly-linked lists, run: + +.. code-block:: bash + + Gillian-C/scripts/testFolder.sh ../collections-c/for-gillian/slist + +The ``array_test_remove.c`` buffer overflow bug +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This test corresponds to this pull request: `Fix buffer overflow `_. +It is particularly interesting: the original test suite did not catch it. We thought that a concrete test with the right values would catch it, but it didn't. The reason is that it overflowed but did not fail. It is therefore a *security issue*. However, our symbolic memory model cannot overflow, and the bug was caught. + +The ``list_test_zipIterAdd.c`` flawed test +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This test is also interesting but for different reasons. The code it is testing (the ``list_zip_iter_add`` function) does not contain any bug. +However, the test itself did contain a bug but still passed. Here is why: + +The test added two elements (``"h"`` and ``"i"``) in two separate lists at the index 2. It then tested that the elements actually appeared at the second index of their respective lists, in the following way: + +.. code-block:: c + + list_index_of(list1, "h", zero_if_ptr_eq, &index); + CHECK_EQUAL_C_INT(2, index); + + list_index_of(list1, "i", zero_if_ptr_eq, &index); + CHECK_EQUAL_C_INT(2, index); + +However, note that both tests are executed on ``list1``! What happened then is that ``list_index_of`` was not finding ``"i"`` in ``list1`` because it wasn't there, and therefore did not modify ``index``. Since the first check was correct, the value of ``index`` was still ``2`` and the test passed anyway. + +Our symbolic tests however, use symbolic 1-character strings, and assume **the bare minimum about the input values** to make them pass, in order to explore as many possible paths as possible. + +Here, we replaced every one-character strings ``"X"`` with one-character symbolic string ``str_X``. For the test to pass, it should be *enough* for ``str_h`` to be different from every element in ``list1`` and for ``str_i`` to be different from every element in ``list2``. And this is exactly what we assumed. However, we never assume that ``str_i`` has to be different from every element in ``list1`` because it is not necessary for the test to pass. + +However, here, the equality between every element of ``list1`` and ``str_i`` is tested. There is no indication as to the result of this test, so the execution branches. Therefore, there is a path created where ``list_index_of(list1, str_i, zero_if_ptr_eq, &index)`` will assign ``0`` to index, and the test will fail. + +This shows how symbolic testing helps writing *more robust* tests. diff --git a/sphinx/c/symbolic-testing.rst b/sphinx/c/symbolic-testing.rst index afada59b..09ff6845 100644 --- a/sphinx/c/symbolic-testing.rst +++ b/sphinx/c/symbolic-testing.rst @@ -22,14 +22,14 @@ It is possible to declare a symbolic string of fixed size by declaring all of it .. code-block:: c int a = __builtin_annot_intval("symb_int", a); - ASSUME(-128 <= a );ASSUME( a <= 127); + ASSUME(-128 <= a); ASSUME(a <= 127); char c_a = (char) a; char str_a[] = { c_a, '\0' }; Assumptions and Assertions ^^^^^^^^^^^^^^^^^^^^^^^^^^ -For any ``C`` boolean expression ``e``, it is possible to write: +For any C boolean expression ``e``, it is possible to write: .. code-block:: c @@ -42,95 +42,4 @@ A serialized C integer is a list of the form ``{{ "int", x }}`` where ``x`` is a ``ASSUME`` will call the internal GIL ``assume`` in the form ``assume(e = {{ "int", 1 }})`` which means "check that the obtained boolean expression is True", otherwise, cut the branch. ``ASSERT`` does the same as assume but calls the internal GIL ``assert`` instead. -As opposed to `Gillian-JS <../JavaScript/stest#assumptions-and-assertions>`_, we use C expressions directly, and not custom expressions. This benefits is that one does not have to learn a new syntax for writing tests. However, this causes the execution to branch a lot for. ``ASSUME`` will then cut any branch that we do not want. In Gillian-JS, given the complex control flow of JavaScript, there is a lot more branching happening, which can become quite difficult to handle. Also, in JavaScript, the very complex semantics of expressions can lead to behaviours that are not desired by the used, and providing a simpler expression syntax is more straightforward. - -Symbolic Testing of Collections-C ---------------------------------- - -Fixes -^^^^^ - -Symbolically testing Collections-C let to the following bug-fixing pull requests. They fix previously unknown bugs and usage of undefined behaviours: - -- `Fix buffer overflow `_ (bug) -- `Remove the usage of cc_comp_ptr `_ (undefined behaviour) -- `Test coincidentally passing while they should not `_ (bugs and undefined behaviours) -- `Fix overallocation `_ (bug) -- `Fix hashing function `_ (performance-reducing bug) - -Reproducing the Results -^^^^^^^^^^^^^^^^^^^^^^^ - -For license reason, we do not include the Collections-C code in the Gillian repository. -There is an `external repository `_ that contains the Collections-C code adapted to testing in Gillian-C and Klee. - -In order to clone it, simply run, from the Gillian folder: - -.. code-block:: bash - - cd .. - git clone https://github.com/GillianPlatform/collections-c-for-gillian.git collection- - cd Gillian - -There are two ways of launching the tests: - -- Using the ``bulk-wpst`` command of Gillian-C which has a nicer output (using Rely), but cannot run the tests in parallel. -- Using a bash script that will call ``gillian-c wpst`` as many times as there are files to test, but supports parallel mode (this is the one we used for measures). (NOTE: since then, we removed the option for parallel mode, and plan of having a better implementation in the future) - -Using bulk-wpst -""""""""""""""" - -From the Gillian folder run: - -.. code-block:: bash - - dune exec -- gillian-c bulk-wpst ../collections-c/for-gillian - -You will see every test suites executing one by one. Two tests will fail, this is intended. They represent two of the bugs we've found and are explained `here <#bug-tests>`_. - -Using the bash script -""""""""""""""""""""" - -From the Gillian folder, for each folder you want to test, use: - -.. code-block:: bash - - Gillian-C/scripts/testFolder.sh ../collections-c/for-gillian/folder - -For example, to run the test suite related to singly-linked lists, run: - -.. code-block:: bash - - Gillian-C/scripts/testFolder.sh ../collections-c/for-gillian/slist - -The ``array_test_remove.c`` buffer overflow bug -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -This test corresponds to this pull request: `Fix buffer overflow `_. -It is particularly interesting: the original test suite did not catch it. We thought that a concrete test with the right values would catch it, but it didn't. The reason is that it overflowed but did not fail. It is therefore a *security issue*. However, our symbolic memory model cannot overflow, and the bug was caught. - -The ``list_test_zipIterAdd.c`` flawed test -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -This test is also interesting but for different reasons. The code it is testing (the ``list_zip_iter_add`` function) does not contain any bug. -However, the test itself did contain a bug but still passed. Here is why: - -The test added two elements (``"h"`` and ``"i"``) in two separate lists at the index 2. It then tested that the elements actually appeared at the second index of their respective lists, in the following way: - -.. code-block:: c - - list_index_of(list1, "h", zero_if_ptr_eq, &index); - CHECK_EQUAL_C_INT(2, index); - - list_index_of(list1, "i", zero_if_ptr_eq, &index); - CHECK_EQUAL_C_INT(2, index); - -However, note that both tests are executed on ``list1``! What happened then is that ``list_index_of`` was not finding ``"i"`` in ``list1`` because it wasn't there, and therefore did not modify ``index``. Since the first check was correct, the value of ``index`` was still ``2`` and the test passed anyway. - -Our symbolic tests however, use symbolic 1-character strings, and assume **the bare minimum about the input values** to make them pass, in order to explore as many possible paths as possible. - -Here, we replaced every one-character strings ``"X"`` with one-character symbolic string ``str_X``. For the test to pass, it should be *enough* for ``str_h`` to be different from every element in ``list1`` and for ``str_i`` to be different from every element in ``list2``. And this is exactly what we assumed. However, we never assume that ``str_i`` has to be different from every element in ``list1`` because it is not necessary for the test to pass. - -However, here, the equality between every element of ``list1`` and ``str_i`` is tested. There is no indication as to the result of this test, so the execution branches. Therefore, there is a path created where ``list_index_of(list1, str_i, zero_if_ptr_eq, &index)`` will assign ``0`` to index, and the test will fail. - -This shows how symbolic testing helps writing *more robust* tests. +As opposed to Gillian-JS, we use C expressions directly, and not custom expressions. This benefits is that one does not have to learn a new syntax for writing tests. However, this causes the execution to branch a lot for. ``ASSUME`` will then cut any branch that we do not want. In Gillian-JS, given the complex control flow of JavaScript, there is a lot more branching happening, which can become quite difficult to handle. Also, in JavaScript, the very complex semantics of expressions can lead to behaviours that are not desired by the used, and providing a simpler expression syntax is more straightforward. diff --git a/sphinx/js/index.rst b/sphinx/js/index.rst index 5866c7a3..ade988ed 100644 --- a/sphinx/js/index.rst +++ b/sphinx/js/index.rst @@ -1,17 +1,10 @@ Gillian-JS ========== -.. note:: - Everything this section is true at the time of artifact submission for PLDI 2020 on 2020/02/28. - -.. danger:: - Gillian-JS is currently broken (`see here `_). - -Gillian-JS is the instantiation of Gillian to JavaScript (`ECMAScript 5 Strict `_), found in the ``Gillian-JS`` folder of the repository. +Gillian-JS is the instantiation of Gillian to JavaScript (`ECMAScript 5 Strict `_), found in the ``Gillian-JS`` folder of the Gillian repository. .. toctree:: :titlesonly: - folder-structure - js2gil symbolic-testing + pldi/index diff --git a/sphinx/js/folder-structure.rst b/sphinx/js/pldi/folder-structure.rst similarity index 100% rename from sphinx/js/folder-structure.rst rename to sphinx/js/pldi/folder-structure.rst diff --git a/sphinx/js/pldi/index.rst b/sphinx/js/pldi/index.rst new file mode 100644 index 00000000..c130ef90 --- /dev/null +++ b/sphinx/js/pldi/index.rst @@ -0,0 +1,11 @@ +PLDI'20 Artefact Documentation +============================== + +The following is the JavaScript artefact documentation for the :doc:`../../publications/gillian-part-1` PLDI'20 submission + +.. toctree:: + :titlesonly: + + folder-structure + js2gil + symbolic-testing diff --git a/sphinx/js/js2gil.rst b/sphinx/js/pldi/js2gil.rst similarity index 97% rename from sphinx/js/js2gil.rst rename to sphinx/js/pldi/js2gil.rst index f33cb794..a5c6fdf9 100644 --- a/sphinx/js/js2gil.rst +++ b/sphinx/js/pldi/js2gil.rst @@ -17,9 +17,9 @@ Additionally, indirect ``eval`` is not supported, as it is meant to be executed Correctness of JS-2-GIL ----------------------- -The JS-2-GIL compiler can be split into two compilers: JS-2-JSIL, which compiles JavaScript to JSIL, the intermediate representation that we have used in :doc:`../publications/javert` / :doc:`../publications/cosette` / :doc:`../publications/javert-2`; and JSIL-2-GIL, the compiler from JSIL to GIL, the intermediate representation of Gillian. +The JS-2-GIL compiler can be split into two compilers: JS-2-JSIL, which compiles JavaScript to JSIL, the intermediate representation that we have used in :doc:`../../publications/javert` / :doc:`../../publications/cosette` / :doc:`../../publications/javert-2`; and JSIL-2-GIL, the compiler from JSIL to GIL, the intermediate representation of Gillian. -Previously, we have tested JS-2-JSIL against Test262, the JavaScript official test suite (specifically, `this commit from 2016/05/30 `_). As Test262 commit targets ES6, we had to identify the subset of tests that are appropriate for JS-2-JSIL, as explained in detail in :doc:`../publications/javert`. We obtained 8797 applicable tests, of which JS-2-JSIL passes 100%. +Previously, we have tested JS-2-JSIL against Test262, the JavaScript official test suite (specifically, `this commit from 2016/05/30 `_). As Test262 commit targets ES6, we had to identify the subset of tests that are appropriate for JS-2-JSIL, as explained in detail in :doc:`../../publications/javert`. We obtained 8797 applicable tests, of which JS-2-JSIL passes 100%. We have initially tested JS-2-GIL successfully on the same 8797 tests and reported this in the submitted version of the paper. However, these tests were not systematically categorised and we were not able to automate the testing process to our satisfaction using the bulk testing mechanism of Gillian. For this reason, we have chosen to work with the latest version of Test262, forked `here `_, where each test comes with a precise description of its intended outcome. For example, a test that is supposed to fail at parsing time with a JavaScript native syntax error will contain the following in its header: @@ -146,4 +146,4 @@ Detailed Per-Folder Breakdown: Built-ins **Applicable** 2141 42 413 7 7 7 7 33 7 257 27 5 2 2 9 81 5 147 2878 7 7 46 336 5 6748 **Passing** 2141 42 413 7 7 7 7 33 7 257 27 5 2 2 9 81 5 144 2878 7 7 46 336 5 6745 **Failing** 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 3 -================================ ===== ======= ==== ========= ================== ========= ================== ===== ==== ======== ====== ======== ======== ===== ==== ==== === ====== ====== ========== ======== ====== ====== ========= ===== \ No newline at end of file +================================ ===== ======= ==== ========= ================== ========= ================== ===== ==== ======== ====== ======== ======== ===== ==== ==== === ====== ====== ========== ======== ====== ====== ========= ===== diff --git a/sphinx/js/pldi/symbolic-testing.rst b/sphinx/js/pldi/symbolic-testing.rst new file mode 100644 index 00000000..08d8831a --- /dev/null +++ b/sphinx/js/pldi/symbolic-testing.rst @@ -0,0 +1,228 @@ +Symbolic Testing +================ + +Symbolic Testing of Buckets.js +------------------------------ + +We symbolically test Buckets.js, a real-world JavaScript data-structure library, with the goal of obtaining 100% line coverage. The results are presented in the table below, with each row containing: + +- The name of the folder being tested, which also indicates the data structure in question +- The number of tests required for 100% line coverage +- The total number of GIL commands executed by running these tests +- The total testing time (in seconds) + +Testing Results +^^^^^^^^^^^^^^^ + +=================== ====== ============== ========== +Data Structure Tests GIL Commands Time (s) +=================== ====== ============== ========== +**arrays** 9 330,147 2.678 +**bag** 7 1,343,393 5.064 +**bstree** 11 3,751,092 12.507 +**dictionary** 7 401,575 1.833 +**heap** 4 1,492,204 3.411 +**linkedlist** 9 588,714 4.141 +**multidictionary** 6 1,106,650 3.803 +**queue** 6 407,106 2.140 +**priorityqueue** 5 2,312,226 4.121 +**set** 6 2,178,222 4.458 +**stack** 4 306,449 1.625 +**Total** **74** **14,217,778** **45.781** +=================== ====== ============== ========== + +The results are 1.3% slower and the number of executed GIL commands is 0.1% greater than reported in the submitted version---we will update accordingly. This is due to minor changes to the JS-2-GIL compiler and the JS symbolic engine. + +Reproducing the Results +^^^^^^^^^^^^^^^^^^^^^^^ + +Starting from the ``Gillian`` folder, execute the following: + +.. code-block:: bash + + dune build + make js-init-env + cd Gillian-JS/environment + +Then, to reproduce the results for a specific folder from the first column of the above table, execute the following: + +.. code-block:: bash + + ./testCosetteFolder.sh Examples/Cosette/Buckets/ + +In order to obtain the number of executed commands, append the ``count`` parameter to the last command. Therefore, for example, the command to run the tests for the ``queue`` data structure and obtain the number of executed commands is + +.. code-block:: bash + + ./testCosetteFolder.sh Examples/Cosette/Buckets/queue count + +**Note**: The times obtained when counting executed commands will be slower, due to the fact that the tests will be run in single-thread mode. + +Detailed Per-Folder Breakdown: Buckets.js +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= +**arrays** 1 2 3 4 5 6 7 8 9 **Total** +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= +**Time (s)** 0.259 0.288 0.264 0.264 0.259 0.285 0.258 0.569 0.232 2.678 +**GIL Commands** 33,903 34,675 34,896 42,866 30,483 55,210 34,765 39,532 23,817 330,147 +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= + +================ ====== ====== ======= ======= ======= ======= ======= ========= +**bag** 1 2 3 4 5 6 7 **Total** +================ ====== ====== ======= ======= ======= ======= ======= ========= +**Time (s)** 0.501 0.453 0.963 0.641 0.577 0.923 1.006 5.064 +**GIL Commands** 99,395 60,935 301,687 208,336 158,635 200,411 313,994 1,343,393 +================ ====== ====== ======= ======= ======= ======= ======= ========= + +================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= +**bstree** 1 2 3 4 5 6 7 8 9 10 11 **Total** +================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= +**Time (s)** 0.746 2.540 0.684 0.763 1.015 1.028 1.013 1.131 0.762 0.762 2.063 12.507 +**GIL Commands** 123,798 1,254,635 72,637 169,155 192,683 192,683 191,633 390,919 100,266 177,362 885,321 3,751,092 +================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= + +================ ====== ====== ====== ====== ====== ====== ====== ========= +**dictionary** 1 2 3 4 5 6 7 **Total** +================ ====== ====== ====== ====== ====== ====== ====== ========= +**Time (s)** 0.275 0.238 0.217 0.352 0.229 0.217 0.305 1.833 +**GIL Commands** 61,161 54,140 44,569 55,033 55,914 41,904 88,854 401,575 +================ ====== ====== ====== ====== ====== ====== ====== ========= + +================ ======= ======= ======= ======= ========= +**heap** 1 2 3 4 **Total** +================ ======= ======= ======= ======= ========= +**Time (s)** 0.517 1.487 0.629 0.778 3.411 +**GIL Commands** 135,140 804,659 169,522 382,883 1,492,204 +================ ======= ======= ======= ======= ========= + +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= +**linkedlist** 1 2 3 4 5 6 7 8 9 **Total** +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= +**Time (s)** 0.648 0.577 0.603 0.438 0.293 0.295 0.257 0.718 0.312 4.141 +**GIL Commands** 43,209 57,458 97,728 82,345 63,645 66,093 30,794 97,225 50,217 588,714 +================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= + +=================== ======= ======= ======= ======= ======= ======= ========= +**multidictionary** 1 2 3 4 5 6 **Total** +=================== ======= ======= ======= ======= ======= ======= ========= +**Time (s)** 0.504 0.813 0.566 0.579 0.678 0.663 3.803 +**GIL Commands** 130,145 312,351 166,638 145,627 158,934 192,955 1,106,650 +=================== ======= ======= ======= ======= ======= ======= ========= + +================ ====== ====== ====== ====== ====== ======= ========= +**queue** 1 2 3 4 5 6 **Total** +================ ====== ====== ====== ====== ====== ======= ========= +**Time (s)** 0.332 0.345 0.345 0.249 0.403 0.466 2.140 +**GIL Commands** 71,514 69,962 45,067 36,767 62,624 121,172 407,106 +================ ====== ====== ====== ====== ====== ======= ========= + +================= ======= ======= ======= ======= ========= ========= +**priorityqueue** 1 2 3 4 5 **Total** +================= ======= ======= ======= ======= ========= ========= +**Time (s)** 0.757 0.731 0.449 0.993 1.191 4.121 +**GIL Commands** 399,730 287,433 121,329 450,539 1,053,195 2,312,226 +================= ======= ======= ======= ======= ========= ========= + +================ ====== ======= ========= ======= ====== ======= ========= +**set** 1 2 3 4 5 6 **Total** +================ ====== ======= ========= ======= ====== ======= ========= +**Time (s)** 0.386 0.679 1.743 0.622 0.292 0.736 4.458 +**GIL Commands** 78,959 242,304 1,265,278 232,776 66,700 292,205 2,178,222 +================ ====== ======= ========= ======= ====== ======= ========= + +================ ====== ====== ====== ======= ========= +**stack** 1 2 3 4 **Total** +================ ====== ====== ====== ======= ========= +**Time (s)** 0.343 0.331 0.331 0.620 1.625 +**GIL Commands** 52,233 44,958 55,097 154,161 306,449 +================ ====== ====== ====== ======= ========= + +Reproducing the Buckets.js Bugs found by :doc:`../../publications/cosette` and :doc:`../../publications/javert-2` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Starting from the ``Gillian`` folder, execute the following: + +.. code-block:: bash + + dune build + make js-init-env + cd Gillian-JS/environment + +:doc:`../../publications/cosette` Multi-Dictionary Bug +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In order to reproduce the multi-dictionary bug reported by :doc:`../../publications/cosette`, execute: + +.. code-block:: bash + + ./testCosette.sh Examples/Cosette/Buckets/multidictionary/bug/multidictionary_bug.js + +You will obtain a failing model + +.. code-block:: text + + Assert failed with argument False. + Failing Model: + [ (#x1: #x2) ] + +The bug is caused by the library wrongly treating the case in which we try to remove a key-value pair for a key with no associated values. The code of the test is as follows: + +.. code-block:: js + + var dict = new buckets.MultiDictionary() + + var s = symb_string(s); + var x1 = symb_number(x1); + var x2 = symb_number(x2); + + dict.set(s, x1); + dict.set(s, x2); + + dict.remove(s, x1); + var res = dict.remove(s, x2); + Assert(((not (x1 = x2)) and (res = true)) or ((x1 = x2) and (res = false))); + +The test puts two symbolic numbers, ``x1`` and ``x2`` for the same symbolic key ``s`` into an empty multidictionary, then removes ``x1``, and then removes ``x2`` and registers the value returned by ``remove``. Then, it asserts that that value was ``true`` if the two keys were different, and ``false`` if the two keys were the same. What the failing model says is that, when the two keys are equal, the library, in fact, throws a native JavaScript error (indicated by the argument ``False`` of the failed assert). + +:doc:`../../publications/javert-2` Linked-List Bugs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In order to reproduce the linked-list bugs reported by :doc:`../../publications/javert-2`, execute: + +.. code-block:: bash + + ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_1.js + ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_2.js + ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_3.js + +All of the bugs are causes by the library treating non-integer indexing incorrectly; we explain the bug found by the first test in detail, the remaining two are analogous. For the first test, the failing model is as follows: + +.. code-block:: text + + Assert failed with argument + ((((#x3 == 0) /\ (#x2 == #x1)) \/ + ((#x3 == 1) /\ (#x2 == #x2))) \/ + (((! (#x3 == 0)) /\ (! (#x3 == 1))) /\ (#x2 == undefined))). + Failing Model: + [ (#x2: 4), (#x3: 0.5), (#x1: 3) ] + +The code of the test is as follows: + +.. code-block:: js + + var list = new buckets.LinkedList() + + var x1 = symb_number(x1); + var x2 = symb_number(x2); + var x3 = symb_number(x3); + + list.add(x1) + list.add(x2) + + var res = list.elementAtIndex(x3); + Assert( (((x3 = 0) and (res = x1)) or + ((x3 = 1) and (res = x2))) or + (((not (x3 = 0)) and (not (x3 = 1))) and (res = undefined)) ); + +The test inserts two symbolic numbers, ``x1`` and ``x2``, into an empty linked list, and then indexes the list with a third symbolic number, ``x3``. The expected outcome is that: if ``x3 = 0``, the indexing returns ``x1``; if ``x3 = 1``, the indexing returns ``x2``; and, otherwise, the indexing returns ``undefined``. The failing model, however, says that if ``x3 = 0.5``, the indexing will also return ``x2``. diff --git a/sphinx/js/symbolic-testing.rst b/sphinx/js/symbolic-testing.rst index c0f60ce8..bc879da7 100644 --- a/sphinx/js/symbolic-testing.rst +++ b/sphinx/js/symbolic-testing.rst @@ -4,7 +4,7 @@ Symbolic Testing Writing Symbolic Tests ---------------------- -The whole-program symbolic testing module of Gillian-JS (codenamed Cosette) extends JavaScript with a mechanism for declaring symbolic variables and performing first-order reasoning on them. +The whole-program symbolic testing module of Gillian-JS extends JavaScript with a mechanism for declaring symbolic variables and performing first-order reasoning on them. Declaring Symbolic Variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -104,229 +104,3 @@ Since we do not (yet) perform lazy initialisation in symbolic execution, errors Assume(not (typeOf x = Obj)); where ``typeOf`` is the built-in GIL typing operator and ``Obj`` is the built-in GIL object type. In this way, it is guaranteed that ``x`` is not an object (but may still equal ``null``). - -Symbolic Testing of Buckets.js ------------------------------- - -We symbolically test Buckets.js, a real-world JavaScript data-structure library, with the goal of obtaining 100% line coverage. The results are presented in the table below, with each row containing: - -- The name of the folder being tested, which also indicates the data structure in question -- The number of tests required for 100% line coverage -- The total number of GIL commands executed by running these tests -- The total testing time (in seconds) - -Testing Results -^^^^^^^^^^^^^^^ - -=================== ====== ============== ========== -Data Structure Tests GIL Commands Time (s) -=================== ====== ============== ========== -**arrays** 9 330,147 2.678 -**bag** 7 1,343,393 5.064 -**bstree** 11 3,751,092 12.507 -**dictionary** 7 401,575 1.833 -**heap** 4 1,492,204 3.411 -**linkedlist** 9 588,714 4.141 -**multidictionary** 6 1,106,650 3.803 -**queue** 6 407,106 2.140 -**priorityqueue** 5 2,312,226 4.121 -**set** 6 2,178,222 4.458 -**stack** 4 306,449 1.625 -**Total** **74** **14,217,778** **45.781** -=================== ====== ============== ========== - -The results are 1.3% slower and the number of executed GIL commands is 0.1% greater than reported in the submitted version---we will update accordingly. This is due to minor changes to the JS-2-GIL compiler and the JS symbolic engine. - -Reproducing the Results -^^^^^^^^^^^^^^^^^^^^^^^ - -Starting from the ``Gillian`` folder, execute the following: - -.. code-block:: bash - - dune build - make js-init-env - cd Gillian-JS/environment - -Then, to reproduce the results for a specific folder from the first column of the above table, execute the following: - -.. code-block:: bash - - ./testCosetteFolder.sh Examples/Cosette/Buckets/ - -In order to obtain the number of executed commands, append the ``count`` parameter to the last command. Therefore, for example, the command to run the tests for the ``queue`` data structure and obtain the number of executed commands is - -.. code-block:: bash - - ./testCosetteFolder.sh Examples/Cosette/Buckets/queue count - -**Note**: The times obtained when counting executed commands will be slower, due to the fact that the tests will be run in single-thread mode. - -Detailed Per-Folder Breakdown: Buckets.js -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= -**arrays** 1 2 3 4 5 6 7 8 9 **Total** -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= -**Time (s)** 0.259 0.288 0.264 0.264 0.259 0.285 0.258 0.569 0.232 2.678 -**GIL Commands** 33,903 34,675 34,896 42,866 30,483 55,210 34,765 39,532 23,817 330,147 -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= - -================ ====== ====== ======= ======= ======= ======= ======= ========= -**bag** 1 2 3 4 5 6 7 **Total** -================ ====== ====== ======= ======= ======= ======= ======= ========= -**Time (s)** 0.501 0.453 0.963 0.641 0.577 0.923 1.006 5.064 -**GIL Commands** 99,395 60,935 301,687 208,336 158,635 200,411 313,994 1,343,393 -================ ====== ====== ======= ======= ======= ======= ======= ========= - -================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= -**bstree** 1 2 3 4 5 6 7 8 9 10 11 **Total** -================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= -**Time (s)** 0.746 2.540 0.684 0.763 1.015 1.028 1.013 1.131 0.762 0.762 2.063 12.507 -**GIL Commands** 123,798 1,254,635 72,637 169,155 192,683 192,683 191,633 390,919 100,266 177,362 885,321 3,751,092 -================ ======= ========== ====== ======= ======= ======= ======= ======= ======= ======= ======= ========= - -================ ====== ====== ====== ====== ====== ====== ====== ========= -**dictionary** 1 2 3 4 5 6 7 **Total** -================ ====== ====== ====== ====== ====== ====== ====== ========= -**Time (s)** 0.275 0.238 0.217 0.352 0.229 0.217 0.305 1.833 -**GIL Commands** 61,161 54,140 44,569 55,033 55,914 41,904 88,854 401,575 -================ ====== ====== ====== ====== ====== ====== ====== ========= - -================ ======= ======= ======= ======= ========= -**heap** 1 2 3 4 **Total** -================ ======= ======= ======= ======= ========= -**Time (s)** 0.517 1.487 0.629 0.778 3.411 -**GIL Commands** 135,140 804,659 169,522 382,883 1,492,204 -================ ======= ======= ======= ======= ========= - -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= -**linkedlist** 1 2 3 4 5 6 7 8 9 **Total** -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= -**Time (s)** 0.648 0.577 0.603 0.438 0.293 0.295 0.257 0.718 0.312 4.141 -**GIL Commands** 43,209 57,458 97,728 82,345 63,645 66,093 30,794 97,225 50,217 588,714 -================ ====== ====== ====== ====== ====== ====== ====== ====== ====== ========= - -=================== ======= ======= ======= ======= ======= ======= ========= -**multidictionary** 1 2 3 4 5 6 **Total** -=================== ======= ======= ======= ======= ======= ======= ========= -**Time (s)** 0.504 0.813 0.566 0.579 0.678 0.663 3.803 -**GIL Commands** 130,145 312,351 166,638 145,627 158,934 192,955 1,106,650 -=================== ======= ======= ======= ======= ======= ======= ========= - -================ ====== ====== ====== ====== ====== ======= ========= -**queue** 1 2 3 4 5 6 **Total** -================ ====== ====== ====== ====== ====== ======= ========= -**Time (s)** 0.332 0.345 0.345 0.249 0.403 0.466 2.140 -**GIL Commands** 71,514 69,962 45,067 36,767 62,624 121,172 407,106 -================ ====== ====== ====== ====== ====== ======= ========= - -================= ======= ======= ======= ======= ========= ========= -**priorityqueue** 1 2 3 4 5 **Total** -================= ======= ======= ======= ======= ========= ========= -**Time (s)** 0.757 0.731 0.449 0.993 1.191 4.121 -**GIL Commands** 399,730 287,433 121,329 450,539 1,053,195 2,312,226 -================= ======= ======= ======= ======= ========= ========= - -================ ====== ======= ========= ======= ====== ======= ========= -**set** 1 2 3 4 5 6 **Total** -================ ====== ======= ========= ======= ====== ======= ========= -**Time (s)** 0.386 0.679 1.743 0.622 0.292 0.736 4.458 -**GIL Commands** 78,959 242,304 1,265,278 232,776 66,700 292,205 2,178,222 -================ ====== ======= ========= ======= ====== ======= ========= - -================ ====== ====== ====== ======= ========= -**stack** 1 2 3 4 **Total** -================ ====== ====== ====== ======= ========= -**Time (s)** 0.343 0.331 0.331 0.620 1.625 -**GIL Commands** 52,233 44,958 55,097 154,161 306,449 -================ ====== ====== ====== ======= ========= - -Reproducing the Buckets.js Bugs found by :doc:`../publications/cosette` and :doc:`../publications/javert-2` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Starting from the ``Gillian`` folder, execute the following: - -.. code-block:: bash - - dune build - make js-init-env - cd Gillian-JS/environment - -:doc:`../publications/cosette` Multi-Dictionary Bug -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In order to reproduce the multi-dictionary bug reported by :doc:`../publications/cosette`, execute: - -.. code-block:: bash - - ./testCosette.sh Examples/Cosette/Buckets/multidictionary/bug/multidictionary_bug.js - -You will obtain a failing model - -.. code-block:: text - - Assert failed with argument False. - Failing Model: - [ (#x1: #x2) ] - -The bug is caused by the library wrongly treating the case in which we try to remove a key-value pair for a key with no associated values. The code of the test is as follows: - -.. code-block:: js - - var dict = new buckets.MultiDictionary() - - var s = symb_string(s); - var x1 = symb_number(x1); - var x2 = symb_number(x2); - - dict.set(s, x1); - dict.set(s, x2); - - dict.remove(s, x1); - var res = dict.remove(s, x2); - Assert(((not (x1 = x2)) and (res = true)) or ((x1 = x2) and (res = false))); - -The test puts two symbolic numbers, ``x1`` and ``x2`` for the same symbolic key ``s`` into an empty multidictionary, then removes ``x1``, and then removes ``x2`` and registers the value returned by ``remove``. Then, it asserts that that value was ``true`` if the two keys were different, and ``false`` if the two keys were the same. What the failing model says is that, when the two keys are equal, the library, in fact, throws a native JavaScript error (indicated by the argument ``False`` of the failed assert). - -:doc:`../publications/javert-2` Linked-List Bugs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In order to reproduce the linked-list bugs reported by :doc:`../publications/javert-2`, execute: - -.. code-block:: bash - - ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_1.js - ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_2.js - ./testCosette.sh Examples/Cosette/Buckets/linkedlist/bug/linkedlist_bug_3.js - -All of the bugs are causes by the library treating non-integer indexing incorrectly; we explain the bug found by the first test in detail, the remaining two are analogous. For the first test, the failing model is as follows: - -.. code-block:: text - - Assert failed with argument - ((((#x3 == 0) /\ (#x2 == #x1)) \/ - ((#x3 == 1) /\ (#x2 == #x2))) \/ - (((! (#x3 == 0)) /\ (! (#x3 == 1))) /\ (#x2 == undefined))). - Failing Model: - [ (#x2: 4), (#x3: 0.5), (#x1: 3) ] - -The code of the test is as follows: - -.. code-block:: js - - var list = new buckets.LinkedList() - - var x1 = symb_number(x1); - var x2 = symb_number(x2); - var x3 = symb_number(x3); - - list.add(x1) - list.add(x2) - - var res = list.elementAtIndex(x3); - Assert( (((x3 = 0) and (res = x1)) or - ((x3 = 1) and (res = x2))) or - (((not (x3 = 0)) and (not (x3 = 1))) and (res = undefined)) ); - -The test inserts two symbolic numbers, ``x1`` and ``x2``, into an empty linked list, and then indexes the list with a third symbolic number, ``x3``. The expected outcome is that: if ``x3 = 0``, the indexing returns ``x1``; if ``x3 = 1``, the indexing returns ``x2``; and, otherwise, the indexing returns ``undefined``. The failing model, however, says that if ``x3 = 0.5``, the indexing will also return ``x2``. \ No newline at end of file