Kwxm/conformance/improve filename check (#6730)

* Fix incorrect filename * Fix bad syntax in listData test * WIP * WIP * Improve checks in file discovery code and update README * Rearrange some monadic code
IntersectMBO · Dec 30, 2024 · d061d75 · d061d75 · github-actions · Dec 31, 2024
1 parent 719fc18
commit d061d75
Show file tree

Hide file tree

Showing 2 changed files with 75 additions and 52 deletions.
diff --git a/plutus-conformance/README.md b/plutus-conformance/README.md
@@ -10,16 +10,19 @@ This suite tests the latest version of Plutus. Testing of older versions may be
 
 The tests currently cover or will cover the Haskell and Agda implementation of:
 
-- UPLC evaluation
-- Typechecking for TPLC, including checking of alpha equivalence. (`tplc-typecheck-test`)
+- Untyped Plutus Core (UPLC) evaluation
+- Typechecking for Typed Plutus Core (TPLC), including checking of alpha equivalence. (`tplc-typecheck-test`)
 - TPLC evaluation
 - Erasure of TPLC to UPLC
 - Coverage test
-<!-- - Costing conformance? -->
+- CPU/memory costing of scripts
 
-## Adding/updating test outputs
+## Organisation of tests
+The tests mostly take the form of golden tests of fairly simple UPLC programs.  The input files for the tests are organised in a tree of directories under the [test-cases](https://github.com/IntersectMBO/plutus/tree/master/plutus-conformance/test-cases) directory.  If a directory in this tree contains one or more subdirectories then any other files in the directory are ignored and the subdirectories are recursively searched for test cases.  If a directory `<name>` has no subdirectories then it is expected to contain a file called `<name>.uplc` and no other files with the `.uplc` extension. The file `<name>.uplc` should contain textual source code for a  UPLC program, and the directory should also contain a file called `<name>.uplc.expected` containing the expected output of the program and a file called `<name>.uplc.budget.expected`containing the expected CPU and memory budgets.  Any other files (for example `README` files) in the directory are ignored.  See the [addInteger-01](https://github.com/IntersectMBO/plutus/tree/master/plutus-conformance/test-cases/uplc/evaluation/builtin/semantics/addInteger/addInteger-01) for an example of the expected format.  To avoid difficulties with case-insensitive filesystems no two subdirectories of a test directory should have names which differ only by case (eg `True` and `true`).
 
-To update or add test outputs, use the accept test option of the tests. E.g., to have the test results overwriting the `.expected` files in the Haskell implementation test suite (`haskell-conformance`) , run:
+### Adding/updating test outputs
+
+To update or add test outputs, use the accept test option of the tests. E.g., to have the test results overwrite the `.expected` files in the Haskell implementation test suite (`haskell-conformance`) , run:
 
 `cabal test haskell-conformance --test-options=--accept`
 
@@ -60,16 +63,16 @@ import UntypedPlutusCore.Core.Type qualified as UPLC
 
 type UplcProg = UPLC.Program Name DefaultUni DefaultFun ()
 
-runUplcEvalTests :: (UplcProg -> Maybe UplcProg) -> IO ()
-```
+type UplcEvaluatorFun res = UplcProg -> Maybe res
 
-Users can call this function with their own `runners` with the signature:
+data UplcEvaluator =
+  UplcEvaluatorWithoutCosting (UplcEvaluatorFun UplcProg)
+  | UplcEvaluatorWithCosting (CostModelParams -> UplcEvaluatorFun (UplcProg, ExBudget))
 
-```haskell
-runner :: (UplcProg -> Maybe UplcProg)
+runUplcEvalTests :: UplcEvaluator -> (FilePath -> Bool) -> (FilePath -> Bool) -> IO ()
 ```
 
-The runner should evaluate a UPLC program and return a `Maybe UplcProg`. Given a UPLC program, the runner should return the evaluated program. In case of evaluation failure, the runner should return `Nothing`.
+Users can call this function with their own `UplcEvaluatorFun`, which should evaluate a UPLC program and return a `Maybe UplcProg`, or a `Maybe (UplcProg, ExBudget)` if the budget tests are to be performed as well. Given a UPLC program, the runner should return the evaluated program. In case of evaluation failure, the runner should return `Nothing`.  The two arguments of type `FilePath -> Bool` allow selected evaluation and budget tests (the ones for which the function returns `True`) to be ignored if desired. 
 
 <!-- 
 ### Type checker

diff --git a/plutus-conformance/src/PlutusConformance/Common.hs b/plutus-conformance/src/PlutusConformance/Common.hs
@@ -1,3 +1,4 @@
+{-# LANGUAGE BangPatterns         #-}
 {-# LANGUAGE FlexibleInstances    #-}
 {-# LANGUAGE OverloadedStrings    #-}
 {-# LANGUAGE UndecidableInstances #-}
@@ -22,7 +23,7 @@ import Data.Maybe (fromJust)
 import Data.Text qualified as T
 import Data.Text.IO qualified as T
 import System.Directory
-import System.FilePath (takeBaseName, (<.>), (</>))
+import System.FilePath (takeBaseName, takeFileName, (<.>), (</>))
 import Test.Tasty (defaultMain, testGroup)
 import Test.Tasty.ExpectedFailure (expectFail)
 import Test.Tasty.Extras (goldenVsDocM)
@@ -69,8 +70,16 @@ data UplcEvaluator =
   -- there.
   | UplcEvaluatorWithCosting (CostModelParams -> UplcEvaluatorFun (UplcProg, ExBudget))
 
--- | Walk a file tree, making test groups for directories with subdirectories,
--- and test cases for directories without.
+{- | Walk a file tree, making test groups for directories with subdirectories, and
+   test cases for directories without.  We expect every test directory to
+   contain a single `.uplc` file whose name matches that of the directory. For
+   example, the directory `modInteger-15` should contain `modInteger-15.uplc`,
+   and that file should contain a textual UPLC program.  The directory should
+   also contain golden files `modInteger-15.uplc.expected`, containing the
+   expected output of the program, and `modInteger-15.uplc.budget.expected`,
+   containing the expected execution budget, although these will be created by
+   the testing machinery if they aren't already present.
+-}
 discoverTests
   :: UplcEvaluator -- ^ The evaluator to be tested.
   -> CostModelParams
@@ -87,45 +96,60 @@ discoverTests
 discoverTests eval modelParams evaluationFailureExpected budgetFailureExpected = go
   where
     go dir = do
-        let name = takeBaseName dir
-        children <- listDirectory dir
-        subdirs <- flip wither children $ \child -> do
-            let fullPath = dir </> child
-            isDir <- doesDirectoryExist fullPath
-            pure $ if isDir then Just fullPath else Nothing
-        if null subdirs
-        -- no children, this is a test case directory
-        then
-            let tests = case eval of
-                    UplcEvaluatorWithCosting f -> testGroup name
-                        [ testForEval dir name (fmap fst . f modelParams)
-                        , testForBudget dir name (fmap snd . f modelParams)
-                        ]
-                    UplcEvaluatorWithoutCosting f -> testForEval dir name f
-            in pure tests
+      let name = takeBaseName dir
+      children <- listDirectory dir
+      subdirs <- flip wither children $ \child -> do
+        let fullPath = dir </> child
+        isDir <- doesDirectoryExist fullPath
+        pure $ if isDir then Just fullPath else Nothing
+      if null subdirs
+         -- no children, this is a test case directory
+        then do
+          -- Check that the  directory <dir> contains exactly one .uplc file
+          -- and that it's called <name>.uplc, where <name> is the final path
+          -- component of <dir>.
+          uplcFiles <- findByExtension [".uplc"] dir
+          let expectedInputFile = takeFileName dir <.> ".uplc"
+              inputFilePath =
+                case uplcFiles of
+                  [] -> error $ "Input file " ++ expectedInputFile ++ " missing in " <> dir
+                  _:_:_ -> error $ "More than one .uplc file in " <> dir
+                  [file] ->
+                    if takeFileName file /= expectedInputFile
+                    then error $ "Found file " ++ (takeFileName file)
+                               ++ " in directory " ++ dir
+                               ++  " (expected " ++ expectedInputFile ++ ")"
+                    else file
+          let tests = case eval of
+                UplcEvaluatorWithCosting f -> testGroup name
+                      [ testForEval dir inputFilePath (fmap fst . f modelParams)
+                      , testForBudget dir inputFilePath (fmap snd . f modelParams)
+                      ]
+                UplcEvaluatorWithoutCosting f -> testForEval dir inputFilePath f
+          pure tests
         -- has children, so it's a grouping directory
         else testGroup name <$> traverse go subdirs
     testForEval :: FilePath -> String -> UplcEvaluatorFun UplcProg -> TestTree
-    testForEval dir name e =
-        let goldenFilePath = dir </> name <.> "uplc.expected"
+    testForEval dir inputFilePath e =
+        let goldenFilePath = inputFilePath <.> "expected"
             test = goldenTest
-                   (name ++ " (evaluation)")
+                   (takeFileName inputFilePath ++ " (evaluation)")
                    -- get the golden test value
                    (expectedToProg <$> T.readFile goldenFilePath)
                    -- get the tested value
-                   (getTestedValue e dir)
+                   (getTestedValue e inputFilePath)
                    (\ x y -> pure $ compareAlphaEq x y) -- comparison function
                    (updateGoldenFile goldenFilePath) -- update the golden file
         in possiblyFailingTest (evaluationFailureExpected dir) test
     testForBudget :: FilePath -> String -> UplcEvaluatorFun ExBudget -> TestTree
-    testForBudget dir name e =
-        let goldenFilePath = dir </> name <.> "uplc.budget.expected"
+    testForBudget dir inputFilePath e =
+        let goldenFilePath = inputFilePath <.> "budget" <.> "expected"
             prettyEither (Left l)  = pretty l
             prettyEither (Right r) = pretty r
             test =  goldenVsDocM
-                    (name ++ " (budget)")
+                    (takeFileName inputFilePath ++ " (budget)")
                     goldenFilePath
-                    (prettyEither <$> getTestedValue e dir)
+                    (prettyEither <$> getTestedValue e inputFilePath)
         in possiblyFailingTest (budgetFailureExpected dir) test
     possiblyFailingTest :: Bool -> TestTree -> TestTree
     possiblyFailingTest failureExpected test =
@@ -146,25 +170,21 @@ expectedToProg txt
         Left _  -> Left txt
         Right p -> Right $ void p
 
--- | Get the tested value. The tested value is either the shown parse or evaluation error,
+-- | Get the tested value from a file (in this case a textual UPLC source
+-- file). The tested value is either the shown parse error or evaluation error,
 -- or a `UplcProg`.
 getTestedValue ::
     UplcEvaluatorFun res
     -> FilePath
     -> IO (Either T.Text res)
-getTestedValue eval dir = do
-    inputFile <- findByExtension [".uplc"] dir
-    case inputFile of
-        [] -> error $ "Input file missing in " <> dir
-        _:_:_ -> error $ "More than 1 input files in " <> dir
-        [file] -> do
-            input <- T.readFile file
-            case parseTxt input of
-                Left _ -> pure $ Left shownParseError
-                Right p -> do
-                    case eval (void p) of
-                        Nothing   -> pure $ Left shownEvaluationFailure
-                        Just prog -> pure $ Right prog
+getTestedValue eval file = do
+  input <- T.readFile file
+  pure $ case parseTxt input of
+    Left _ -> Left shownParseError
+    Right p ->
+      case eval (void p) of
+        Nothing   -> Left shownEvaluationFailure
+        Just prog -> Right prog
 
 -- | The comparison function used for the golden test.
 -- This function checks alpha-equivalence of programs when the output is a program.
Benchmark suite	Current: `d061d75`	Previous: `719fc18`	Ratio
`validation-decode-auction_2-5`	`200.6` μs	`189.7` μs	`1.06`
`validation-decode-crowdfunding-success-1`	`245.4` μs	`231.3` μs	`1.06`
`validation-decode-crowdfunding-success-2`	`246` μs	`230.7` μs	`1.07`
`validation-decode-crowdfunding-success-3`	`246.1` μs	`230.7` μs	`1.07`
`validation-decode-currency-1`	`242.7` μs	`227.4` μs	`1.07`
`validation-decode-escrow-redeem_1-1`	`331.9` μs	`310.8` μs	`1.07`
`validation-decode-escrow-redeem_1-2`	`331.5` μs	`310.8` μs	`1.07`
`validation-decode-escrow-redeem_2-1`	`332.3` μs	`312.9` μs	`1.06`
`validation-decode-escrow-redeem_2-2`	`330.4` μs	`313.2` μs	`1.05`
`validation-decode-escrow-redeem_2-3`	`331.2` μs	`312.8` μs	`1.06`
`validation-decode-escrow-refund-1`	`331` μs	`312.4` μs	`1.06`
`validation-decode-future-increase-margin-1`	`242` μs	`227.8` μs	`1.06`
`validation-decode-future-increase-margin-2`	`331.8` μs	`312.2` μs	`1.06`
`validation-decode-future-increase-margin-3`	`331.7` μs	`314.9` μs	`1.05`
`validation-decode-future-increase-margin-4`	`698.2` μs	`664.8` μs	`1.05`
`validation-decode-future-pay-out-1`	`241.4` μs	`228.1` μs	`1.06`
`validation-decode-future-pay-out-3`	`333` μs	`312.2` μs	`1.07`
`validation-decode-future-pay-out-4`	`695.2` μs	`661.4` μs	`1.05`
`validation-decode-future-settle-early-1`	`242.6` μs	`228.6` μs	`1.06`
`validation-decode-future-settle-early-2`	`332.8` μs	`313.6` μs	`1.06`
`validation-decode-future-settle-early-3`	`331.5` μs	`315.4` μs	`1.05`
`validation-decode-game-sm-success_1-2`	`171.7` μs	`161.5` μs	`1.06`
`validation-decode-game-sm-success_1-4`	`171.6` μs	`161.5` μs	`1.06`
`validation-decode-game-sm-success_2-2`	`172.2` μs	`159.3` μs	`1.08`
`validation-decode-game-sm-success_2-3`	`534.1` μs	`507.7` μs	`1.05`
`validation-decode-game-sm-success_2-4`	`172.2` μs	`159.9` μs	`1.08`
`validation-decode-game-sm-success_2-5`	`534` μs	`507` μs	`1.05`
`validation-decode-game-sm-success_2-6`	`171.7` μs	`160.8` μs	`1.07`
`validation-decode-multisig-sm-2`	`594.6` μs	`564.8` μs	`1.05`
`validation-decode-multisig-sm-8`	`592.4` μs	`563.5` μs	`1.05`
`validation-decode-multisig-sm-10`	`594.2` μs	`564.8` μs	`1.05`
`validation-decode-ping-pong-1`	`502` μs	`474.1` μs	`1.06`
`validation-decode-ping-pong_2-1`	`500.5` μs	`474.6` μs	`1.05`
`validation-decode-prism-1`	`166.8` μs	`156.4` μs	`1.07`
`validation-decode-prism-3`	`244` μs	`230` μs	`1.06`
`validation-decode-pubkey-1`	`170.2` μs	`159.9` μs	`1.06`
`validation-decode-stablecoin_1-2`	`171.9` μs	`160.2` μs	`1.07`
`validation-decode-stablecoin_1-4`	`171.4` μs	`160.4` μs	`1.07`
`validation-decode-stablecoin_1-5`	`882.1` μs	`839.2` μs	`1.05`
`validation-decode-stablecoin_1-6`	`171.7` μs	`160.1` μs	`1.07`
`validation-decode-stablecoin_2-1`	`881.9` μs	`837.7` μs	`1.05`
`validation-decode-stablecoin_2-2`	`171.6` μs	`160.4` μs	`1.07`
`validation-decode-stablecoin_2-4`	`171.7` μs	`160.5` μs	`1.07`
`validation-decode-token-account-1`	`241.9` μs	`228.5` μs	`1.06`
`validation-decode-token-account-2`	`222.5` μs	`208.6` μs	`1.07`
`validation-decode-uniswap-1`	`245.8` μs	`231.1` μs	`1.06`
`validation-decode-uniswap-2`	`241.5` μs	`228.8` μs	`1.06`
`validation-decode-uniswap-4`	`185.5` μs	`174.7` μs	`1.06`
`validation-decode-uniswap-6`	`185.5` μs	`176` μs	`1.05`
`validation-decode-vesting-1`	`332` μs	`315.1` μs	`1.05`