Execution order in MediaWiki was made so that it is not significant; this will enve be more critical for Wikifunctions, which must be purely function, and that will run in random order, over possibly lot of backend servers running asynchronously (the first that replies fills the cache so that further invokation is avoided). Lazy evaluation is critical for Wikifunctions to succeed. Various things have been fixed in MediaWiki to make sure that it can behave as a purely functional language, allowing parallelization. There are still things to do and there a some third-part Mediawiki extensions that still depend on sequential evaluation. For a full lazy evaluation it should not be needed to fully process arguments, except as needed from the head (optionally from the tail): this requires virtuallization of string values, so that we don't need the full expansion of the wikicode, but instead can parse it lazily and partially from left-to-right, just as needed to take the correct decision in branches, and then eliminate unparsed parts so that we don't even need to evaluate them.
This is possible in PHP, as well as in Lua, or even in Javascript, by using a string interface and avoiding using functions like "strlen" as much as possible, that requires a full expansion of its parameter string (for example if we use a code that matches only prefixes or suffixes, we jsut need to expand as many initial or final nodes as needed). Internally the "string-like" object would actually be stored as a tree of nodes, as long as they are not expanded, or as a flat list of string nodes (not necessarily concatenated to avoid costly copies and reallocations) if they are expanded. Even the TidyHTML step is not required to take a single physical string argument, where it can as well read characters from a flat list of string nodes (many of them being possibly identical and not takign extra memory, except for the node itself in the tree or list, represented in an integer-indexed table. This means that nowhere in the parsing, expansion and generation of the HTML we could really need to have the whole page in memory in large string buffers, and we can parallelize all steps of parsing/expansion/tidyHTML so that they behave as though they were operating linearily, and incrementally. This would boost the performance, even on a single server thread.
If we really use lazy evaluation, the number of nodes to evaluate would be very frequently reduced (notably for wiki pages that expand a lot of shared templates or parser functions with conditional results (#ifxxx, #switch) or using only part of their parameter (such as "substring" functions from the start or end of the text). The in-memory cache for lazy evaluation can be the tree of nodes itself, whose evaluation and expansion is partial, and that can also be valuated using delegates running asynchronously in parallel, possibly on multiple server/evaluator instances.