Runtime static global checker for Lua 5.4.
This sounds a bit like a oxymoron, but to be specific:
bcstrict checks for accesses to unexpected globals within a chunk without executing it, by inspecting its bytecode.
bcstrict is intended to be executed by running Lua code on itself, at startup time, without explicit user(/author) intervention.
If called early, this looks kind of like perl's use strict 'vars'
. More so than strict.lua, in any case. If you squint.
-- check this file
require "bcstrict"()
-- allow access via _G and nothing else
require "bcstrict"{_G=1}
-- no direct global access at all
require "bcstrict"{}
local _G = _ENV
--[[ .. do things ... ]]
-- opportunistic checking
do
local ok, strict = pcall(require, "bcstrict")
if ok then strict() end
end
-- check some other chunk
local bcstrict = require "bcstrict"
local chunk = assert(loadfile "other.lua")
bcstrict(_ENV, chunk)
-- prevent usage anywhere else
package.loaded.bcstrict = function () end
The technique used by bcstrict is generally applicable; it was not hard to port from 5.3 to 5.4, and I don't expect it to be tremendously difficult to port to future versions of Lua, though it does hard-code some bytecode and dump format guts. Earlier versions would additionally require a replacement table.pack
, which is used to infer bytecode-relevant platform details (endianness, integer encodes wihch would otherwise also have to be hard-coded.
As far as I know, the representation of precompiled chunks is guaranteed not to change within a Lua version (x.y, e.g. 5.4) and always breaks between versions. This version of bcstrict is written and tested with Lua 5.4.3, so it should be compatible with all 5.4.z releases.
No effort goes into making this run on any Lua version other than the one whose bytecode it parses, but Lua 5.4 is syntactically backward-compatible enough with almost all 5.y code to at least compile it well enough to run bcstrict as a static analyzer.
There is also an older version of this code which targets 5.3, which can be found in the branches.
You must call the function returned by require "bcstrict"
! Since require avoids loading a module more than once, but there may be multiple files which need to be checked, each user of bcstrict has to actually run it.
Due to the design constraint of being implemented by parsing dumped bytecode, bcstrict has a slightly interesting concept of a global access: a get or set to a field of an upvalue which is, or can be traced up to, the first (and only!) upvalue of a chunk is forbidden if they key used does not exist the environment provided (or _ENV) when bcstrict is called.
It doesn't track any other variables. In particular, it won't catch "globals" that access a declared local _ENV, and it will complain even when you access fields of _ENV explicitly, e.g.:
-- OK
require "bcstrict"()
local _ENV = _ENV
print(not_defined)
-- not OK
require "bcstrict"()
print(_ENV["not_defined"])
In addition, bcstrict does nothing useful when called on non-chunk functions. Because global access compiles as an upvalue table access, it is fundamentally impossible to figure out which, if any, of a non-chunk function's upvalues is the top-level _ENV. For example, the inner functions returned by the following snippets compile to identical bytecode, but close over different variables.
local a
function f(b)
return function ()
a.c = d + b.e
end
end
local a
function g(b)
return function ()
c = a.d + b.e
end
end
local a, b
function h(_ENV)
return function ()
a.c = b.d + e
end
end
Debug information could be used to identify _ENV, if available; however, as the last example shows, it will also flag intentionally redefined local _ENV.
Lua's default behavior of silently accepting access to undefined (misspelled, out-of-scope, &c.) variables is hilariously error-prone and literally my #1 source of bugs while writing this damn module. There are three or so well-known ways of combatting this issue:
Careful testing. Look, if it works for you...
Set a metatable on the global environment table. Often good enough, but has side-effects which may make it unsuitable for libraries. More critically, however, this approach won't catch errors that only occur on code paths that didn't occur during testing.
Some sort of static analyzer. Probably luacheck. This works pretty well ... but you have to run it as a separate step.
This is an attempt to capture the benefits of static analysis (at least in the scope of preventing undeclared variable accesses) with minimal user overhead.