Skip to content

Commit

Permalink
Kwxm/conformance/improve filename check (#6730)
Browse files Browse the repository at this point in the history
* Fix incorrect filename

* Fix bad syntax in listData test

* WIP

* WIP

* Improve checks in file discovery code and update README

* Rearrange some monadic code
  • Loading branch information
kwxm authored Dec 30, 2024
1 parent 719fc18 commit d061d75
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 52 deletions.
25 changes: 14 additions & 11 deletions plutus-conformance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,19 @@ This suite tests the latest version of Plutus. Testing of older versions may be

The tests currently cover or will cover the Haskell and Agda implementation of:

- UPLC evaluation
- Typechecking for TPLC, including checking of alpha equivalence. (`tplc-typecheck-test`)
- Untyped Plutus Core (UPLC) evaluation
- Typechecking for Typed Plutus Core (TPLC), including checking of alpha equivalence. (`tplc-typecheck-test`)
- TPLC evaluation
- Erasure of TPLC to UPLC
- Coverage test
<!-- - Costing conformance? -->
- CPU/memory costing of scripts

## Adding/updating test outputs
## Organisation of tests
The tests mostly take the form of golden tests of fairly simple UPLC programs. The input files for the tests are organised in a tree of directories under the [test-cases](https://github.com/IntersectMBO/plutus/tree/master/plutus-conformance/test-cases) directory. If a directory in this tree contains one or more subdirectories then any other files in the directory are ignored and the subdirectories are recursively searched for test cases. If a directory `<name>` has no subdirectories then it is expected to contain a file called `<name>.uplc` and no other files with the `.uplc` extension. The file `<name>.uplc` should contain textual source code for a UPLC program, and the directory should also contain a file called `<name>.uplc.expected` containing the expected output of the program and a file called `<name>.uplc.budget.expected`containing the expected CPU and memory budgets. Any other files (for example `README` files) in the directory are ignored. See the [addInteger-01](https://github.com/IntersectMBO/plutus/tree/master/plutus-conformance/test-cases/uplc/evaluation/builtin/semantics/addInteger/addInteger-01) for an example of the expected format. To avoid difficulties with case-insensitive filesystems no two subdirectories of a test directory should have names which differ only by case (eg `True` and `true`).

To update or add test outputs, use the accept test option of the tests. E.g., to have the test results overwriting the `.expected` files in the Haskell implementation test suite (`haskell-conformance`) , run:
### Adding/updating test outputs

To update or add test outputs, use the accept test option of the tests. E.g., to have the test results overwrite the `.expected` files in the Haskell implementation test suite (`haskell-conformance`) , run:

`cabal test haskell-conformance --test-options=--accept`

Expand Down Expand Up @@ -60,16 +63,16 @@ import UntypedPlutusCore.Core.Type qualified as UPLC

type UplcProg = UPLC.Program Name DefaultUni DefaultFun ()

runUplcEvalTests :: (UplcProg -> Maybe UplcProg) -> IO ()
```
type UplcEvaluatorFun res = UplcProg -> Maybe res

Users can call this function with their own `runners` with the signature:
data UplcEvaluator =
UplcEvaluatorWithoutCosting (UplcEvaluatorFun UplcProg)
| UplcEvaluatorWithCosting (CostModelParams -> UplcEvaluatorFun (UplcProg, ExBudget))

```haskell
runner :: (UplcProg -> Maybe UplcProg)
runUplcEvalTests :: UplcEvaluator -> (FilePath -> Bool) -> (FilePath -> Bool) -> IO ()
```

The runner should evaluate a UPLC program and return a `Maybe UplcProg`. Given a UPLC program, the runner should return the evaluated program. In case of evaluation failure, the runner should return `Nothing`.
Users can call this function with their own `UplcEvaluatorFun`, which should evaluate a UPLC program and return a `Maybe UplcProg`, or a `Maybe (UplcProg, ExBudget)` if the budget tests are to be performed as well. Given a UPLC program, the runner should return the evaluated program. In case of evaluation failure, the runner should return `Nothing`. The two arguments of type `FilePath -> Bool` allow selected evaluation and budget tests (the ones for which the function returns `True`) to be ignored if desired.

<!--
### Type checker
Expand Down
102 changes: 61 additions & 41 deletions plutus-conformance/src/PlutusConformance/Common.hs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE UndecidableInstances #-}
Expand All @@ -22,7 +23,7 @@ import Data.Maybe (fromJust)
import Data.Text qualified as T
import Data.Text.IO qualified as T
import System.Directory
import System.FilePath (takeBaseName, (<.>), (</>))
import System.FilePath (takeBaseName, takeFileName, (<.>), (</>))
import Test.Tasty (defaultMain, testGroup)
import Test.Tasty.ExpectedFailure (expectFail)
import Test.Tasty.Extras (goldenVsDocM)
Expand Down Expand Up @@ -69,8 +70,16 @@ data UplcEvaluator =
-- there.
| UplcEvaluatorWithCosting (CostModelParams -> UplcEvaluatorFun (UplcProg, ExBudget))

-- | Walk a file tree, making test groups for directories with subdirectories,
-- and test cases for directories without.
{- | Walk a file tree, making test groups for directories with subdirectories, and
test cases for directories without. We expect every test directory to
contain a single `.uplc` file whose name matches that of the directory. For
example, the directory `modInteger-15` should contain `modInteger-15.uplc`,
and that file should contain a textual UPLC program. The directory should
also contain golden files `modInteger-15.uplc.expected`, containing the
expected output of the program, and `modInteger-15.uplc.budget.expected`,
containing the expected execution budget, although these will be created by
the testing machinery if they aren't already present.
-}
discoverTests
:: UplcEvaluator -- ^ The evaluator to be tested.
-> CostModelParams
Expand All @@ -87,45 +96,60 @@ discoverTests
discoverTests eval modelParams evaluationFailureExpected budgetFailureExpected = go
where
go dir = do
let name = takeBaseName dir
children <- listDirectory dir
subdirs <- flip wither children $ \child -> do
let fullPath = dir </> child
isDir <- doesDirectoryExist fullPath
pure $ if isDir then Just fullPath else Nothing
if null subdirs
-- no children, this is a test case directory
then
let tests = case eval of
UplcEvaluatorWithCosting f -> testGroup name
[ testForEval dir name (fmap fst . f modelParams)
, testForBudget dir name (fmap snd . f modelParams)
]
UplcEvaluatorWithoutCosting f -> testForEval dir name f
in pure tests
let name = takeBaseName dir
children <- listDirectory dir
subdirs <- flip wither children $ \child -> do
let fullPath = dir </> child
isDir <- doesDirectoryExist fullPath
pure $ if isDir then Just fullPath else Nothing
if null subdirs
-- no children, this is a test case directory
then do
-- Check that the directory <dir> contains exactly one .uplc file
-- and that it's called <name>.uplc, where <name> is the final path
-- component of <dir>.
uplcFiles <- findByExtension [".uplc"] dir
let expectedInputFile = takeFileName dir <.> ".uplc"
inputFilePath =
case uplcFiles of
[] -> error $ "Input file " ++ expectedInputFile ++ " missing in " <> dir
_:_:_ -> error $ "More than one .uplc file in " <> dir
[file] ->
if takeFileName file /= expectedInputFile
then error $ "Found file " ++ (takeFileName file)
++ " in directory " ++ dir
++ " (expected " ++ expectedInputFile ++ ")"
else file
let tests = case eval of
UplcEvaluatorWithCosting f -> testGroup name
[ testForEval dir inputFilePath (fmap fst . f modelParams)
, testForBudget dir inputFilePath (fmap snd . f modelParams)
]
UplcEvaluatorWithoutCosting f -> testForEval dir inputFilePath f
pure tests
-- has children, so it's a grouping directory
else testGroup name <$> traverse go subdirs
testForEval :: FilePath -> String -> UplcEvaluatorFun UplcProg -> TestTree
testForEval dir name e =
let goldenFilePath = dir </> name <.> "uplc.expected"
testForEval dir inputFilePath e =
let goldenFilePath = inputFilePath <.> "expected"
test = goldenTest
(name ++ " (evaluation)")
(takeFileName inputFilePath ++ " (evaluation)")
-- get the golden test value
(expectedToProg <$> T.readFile goldenFilePath)
-- get the tested value
(getTestedValue e dir)
(getTestedValue e inputFilePath)
(\ x y -> pure $ compareAlphaEq x y) -- comparison function
(updateGoldenFile goldenFilePath) -- update the golden file
in possiblyFailingTest (evaluationFailureExpected dir) test
testForBudget :: FilePath -> String -> UplcEvaluatorFun ExBudget -> TestTree
testForBudget dir name e =
let goldenFilePath = dir </> name <.> "uplc.budget.expected"
testForBudget dir inputFilePath e =
let goldenFilePath = inputFilePath <.> "budget" <.> "expected"
prettyEither (Left l) = pretty l
prettyEither (Right r) = pretty r
test = goldenVsDocM
(name ++ " (budget)")
(takeFileName inputFilePath ++ " (budget)")
goldenFilePath
(prettyEither <$> getTestedValue e dir)
(prettyEither <$> getTestedValue e inputFilePath)
in possiblyFailingTest (budgetFailureExpected dir) test
possiblyFailingTest :: Bool -> TestTree -> TestTree
possiblyFailingTest failureExpected test =
Expand All @@ -146,25 +170,21 @@ expectedToProg txt
Left _ -> Left txt
Right p -> Right $ void p

-- | Get the tested value. The tested value is either the shown parse or evaluation error,
-- | Get the tested value from a file (in this case a textual UPLC source
-- file). The tested value is either the shown parse error or evaluation error,
-- or a `UplcProg`.
getTestedValue ::
UplcEvaluatorFun res
-> FilePath
-> IO (Either T.Text res)
getTestedValue eval dir = do
inputFile <- findByExtension [".uplc"] dir
case inputFile of
[] -> error $ "Input file missing in " <> dir
_:_:_ -> error $ "More than 1 input files in " <> dir
[file] -> do
input <- T.readFile file
case parseTxt input of
Left _ -> pure $ Left shownParseError
Right p -> do
case eval (void p) of
Nothing -> pure $ Left shownEvaluationFailure
Just prog -> pure $ Right prog
getTestedValue eval file = do
input <- T.readFile file
pure $ case parseTxt input of
Left _ -> Left shownParseError
Right p ->
case eval (void p) of
Nothing -> Left shownEvaluationFailure
Just prog -> Right prog

-- | The comparison function used for the golden test.
-- This function checks alpha-equivalence of programs when the output is a program.
Expand Down

1 comment on commit d061d75

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Plutus Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.05.

Benchmark suite Current: d061d75 Previous: 719fc18 Ratio
validation-decode-auction_2-5 200.6 μs 189.7 μs 1.06
validation-decode-crowdfunding-success-1 245.4 μs 231.3 μs 1.06
validation-decode-crowdfunding-success-2 246 μs 230.7 μs 1.07
validation-decode-crowdfunding-success-3 246.1 μs 230.7 μs 1.07
validation-decode-currency-1 242.7 μs 227.4 μs 1.07
validation-decode-escrow-redeem_1-1 331.9 μs 310.8 μs 1.07
validation-decode-escrow-redeem_1-2 331.5 μs 310.8 μs 1.07
validation-decode-escrow-redeem_2-1 332.3 μs 312.9 μs 1.06
validation-decode-escrow-redeem_2-2 330.4 μs 313.2 μs 1.05
validation-decode-escrow-redeem_2-3 331.2 μs 312.8 μs 1.06
validation-decode-escrow-refund-1 331 μs 312.4 μs 1.06
validation-decode-future-increase-margin-1 242 μs 227.8 μs 1.06
validation-decode-future-increase-margin-2 331.8 μs 312.2 μs 1.06
validation-decode-future-increase-margin-3 331.7 μs 314.9 μs 1.05
validation-decode-future-increase-margin-4 698.2 μs 664.8 μs 1.05
validation-decode-future-pay-out-1 241.4 μs 228.1 μs 1.06
validation-decode-future-pay-out-3 333 μs 312.2 μs 1.07
validation-decode-future-pay-out-4 695.2 μs 661.4 μs 1.05
validation-decode-future-settle-early-1 242.6 μs 228.6 μs 1.06
validation-decode-future-settle-early-2 332.8 μs 313.6 μs 1.06
validation-decode-future-settle-early-3 331.5 μs 315.4 μs 1.05
validation-decode-game-sm-success_1-2 171.7 μs 161.5 μs 1.06
validation-decode-game-sm-success_1-4 171.6 μs 161.5 μs 1.06
validation-decode-game-sm-success_2-2 172.2 μs 159.3 μs 1.08
validation-decode-game-sm-success_2-3 534.1 μs 507.7 μs 1.05
validation-decode-game-sm-success_2-4 172.2 μs 159.9 μs 1.08
validation-decode-game-sm-success_2-5 534 μs 507 μs 1.05
validation-decode-game-sm-success_2-6 171.7 μs 160.8 μs 1.07
validation-decode-multisig-sm-2 594.6 μs 564.8 μs 1.05
validation-decode-multisig-sm-8 592.4 μs 563.5 μs 1.05
validation-decode-multisig-sm-10 594.2 μs 564.8 μs 1.05
validation-decode-ping-pong-1 502 μs 474.1 μs 1.06
validation-decode-ping-pong_2-1 500.5 μs 474.6 μs 1.05
validation-decode-prism-1 166.8 μs 156.4 μs 1.07
validation-decode-prism-3 244 μs 230 μs 1.06
validation-decode-pubkey-1 170.2 μs 159.9 μs 1.06
validation-decode-stablecoin_1-2 171.9 μs 160.2 μs 1.07
validation-decode-stablecoin_1-4 171.4 μs 160.4 μs 1.07
validation-decode-stablecoin_1-5 882.1 μs 839.2 μs 1.05
validation-decode-stablecoin_1-6 171.7 μs 160.1 μs 1.07
validation-decode-stablecoin_2-1 881.9 μs 837.7 μs 1.05
validation-decode-stablecoin_2-2 171.6 μs 160.4 μs 1.07
validation-decode-stablecoin_2-4 171.7 μs 160.5 μs 1.07
validation-decode-token-account-1 241.9 μs 228.5 μs 1.06
validation-decode-token-account-2 222.5 μs 208.6 μs 1.07
validation-decode-uniswap-1 245.8 μs 231.1 μs 1.06
validation-decode-uniswap-2 241.5 μs 228.8 μs 1.06
validation-decode-uniswap-4 185.5 μs 174.7 μs 1.06
validation-decode-uniswap-6 185.5 μs 176 μs 1.05
validation-decode-vesting-1 332 μs 315.1 μs 1.05

This comment was automatically generated by workflow using github-action-benchmark.

CC: @IntersectMBO/plutus-core

Please sign in to comment.