Skip to content

UPLC textual parser (plutus-core 1.45.0.0): fails to parse large (case (constr 0 ...) (lam ...)) when branch body is ~490 lines #7742

@Unisay

Description

@Unisay

Summary

The megaparsec-based UPLC textual parser in plutus-core 1.45.0.0 fails to parse a syntactically valid (case (constr 0 b1..b8) (lam ...)) expression when the single case branch contains a body of approximately 490 lines. The error position reported is misleading (points to the outer (lam start) due to megaparsec backtracking.

Reproduction

File: the HTLC scenario from Unisay/scalus-cape-submissions, commit 0846510184d98b054c4405c4d036154444467d9f, file src/htlc/htlc.uplc.

  • Size: 51 700 bytes
  • SHA-256: abd5609f2da11b37cdb71d8314b34d94012bee1f3b2a3b43793a715228818493
  • Program header: (program 1.1.0 ...)

Trigger structure (outer program body, lines 1–5 of the file):

(program 1.1.0 (case (constr 0
      (force (force (builtin chooseList)))
      (force (force (builtin fstPair))) (force (builtin headList))
      (force (builtin ifThenElse)) (force (builtin mkCons))
      (force (force (builtin sndPair))) (force (builtin tailList))
      (force (builtin trace))) (lam __builtin_ChooseList
    (lam __builtin_FstPair
      ...~490 lines of body...
    ))))

How to reproduce:

import UntypedPlutusCore.Parser (parseProgram)
import Data.Text.IO qualified as T

main :: IO ()
main = do
    src <- T.readFile "htlc.uplc"
    case parseProgram src of
        Left err -> print err
        Right _  -> putStrLn "OK"

Error:

test:5:32:
5 |       (force (builtin trace))) (lam __builtin_ChooseList
  |                                ^
  unexpected '('
  expecting ')'

What was ruled out

  • Paren balance: 437 opens = 437 closes. File is well-formed.
  • case/constr support: parser version 1.45.0.0 does support case/constr (language version 1.1.0). fibonacci Scalus 0.16.0 UPLC in the same repo — which also uses (force (case (constr 0 ...) ...)) but with a small body — parses and verifies fine.
  • Constants: removing (con string …) and (con data …) constants does not change the failure.
  • Structure in isolation: a minimal (case (constr 0 b1..b8) (lam x1 (lam x2 ... (lam x8 small-body)))) parses fine.
  • Hidden characters / encoding: no BOM, no CRLF, plain UTF-8.

Root cause hypothesis

caseTerm in UntypedPlutusCore/Parser.hs uses many term for branches. When term starts parsing the (lam __builtin_ChooseList ...) branch and then fails somewhere deep inside the ~490-line body, megaparsec's many backtracking causes the error to be reported at the outermost ( of the failed branch (line 5:32) rather than at the actual failure site. The actual parser error is lost and the reported position is misleading.

The underlying issue may be one of:

  1. A missing try combinator on one of the sub-parsers inside term, causing megaparsec to incorrectly drop progress and backtrack to the many term boundary.
  2. An error inside the branch body that is not recoverable — the parser cannot finish parsing (lam __builtin_ChooseList BODY) due to some construct inside BODY — but the reported location is wrong.

Impact

Any UPLC file generated by Scalus 0.16.0's toUplcOptimized() for contracts with 8+ forced builtins (e.g. HTLC, other Cardano V3 validators) will fail to parse under plutus-core ^>=1.45. The workaround is to use toUplc() (no CaseConstrApply optimization), which produces larger but parseable UPLC.

This blocks UPLC-CAPE benchmark submissions for complex Scalus 0.16.0 contracts.

Environment

  • plutus-core: 1.45.0.0
  • Parser: UntypedPlutusCore.Parser.parseProgram (megaparsec-based)
  • UPLC language version: 1.1.0 (Plutus V3 / Conway)
  • Confirmed via cabal repl in UPLC-CAPE repo

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions