Commit Diff


commit - 0d927a36f0b8d71f12717eb1a7327cfc42a6a51c
commit + a5dd3271b298ba46f50a8a47603c7c84ff736be3
blob - 5c41b700cf04b2fe115da766ad718a50117efaae
blob + cd2d8a469ba69377072affd63bf2521f9d02713d
--- CHANGELOG.md
+++ CHANGELOG.md
@@ -14,3 +14,4 @@ and this project adheres to [Semantic Versioning](http
 - Integration with libFuzzer's `FuzzedDataProvider`.
 - libFuzzer custom mutator for Lua.
 - Examples with tests.
+- Documentation with usecases, API etc.
blob - a45083a78cd0f2a92a2d050ed1c058a14c25f93d
blob + 9637f4b31828dc11026c2a3911dfd7fa23e41345
--- README.md
+++ README.md
@@ -82,6 +82,10 @@ To gather baseline coverage, the fuzzing engine execut
 and the generated corpus, to ensure that no errors occurred and to understand
 the code coverage the existing corpus already provides.
 
+## Documentation
+
+See [documentation](docs/index.md).
+
 ## License
 
 Copyright © 2022-2023 [Sergey Bronnikov][bronevichok-url].
blob - /dev/null
blob + dfc6a046bddd1ac47adcd3141dbd85d6963419e5 (mode 644)
--- /dev/null
+++ docs/api.md
@@ -0,0 +1,97 @@
+## API
+
+### Fuzzing functions
+
+The `luzer` module provides a function `Fuzz()`.
+
+`Fuzz(test_one_input, custom_mutator, args)` starts the fuzzer. This function
+does not return.
+
+Function accepts following arguments:
+
+- `test_one_input` is a fuzzer's entry point (equivalent to `TestOneInput`), it
+  is a function that must take a single string argument. This will be repeatedly
+  invoked with a single string container.
+- `custom_mutator` defines a custom mutator function (equivalent to
+  `LLVMFuzzerCustomMutator`). Default is `nil`.
+- `args` is a table with arguments: the process arguments to pass to the
+  fuzzer. Field `corpus` specifies a path to a directory with seed corpus, see a
+  list with other options in the [libFuzzer documentation][libfuzzer-options-url].
+
+It may be desirable to reject some inputs, i.e. to not add them to the corpus.
+For example, when fuzzing an API consisting of parsing and other logic, one may
+want to allow only those inputs into the corpus that parse successfully. If the
+fuzz target returns `-1` on a given input, `luzer` will not add that input top
+the corpus, regardless of what coverage it triggers.
+
+### Structure-Aware Fuzzing
+
+`luzer` is based on a coverage-guided mutation-based fuzzer (LibFuzzer). It has
+the advantage of not requiring any grammar definition for generating inputs,
+making its setup easier. The disadvantage is that it will be harder for the
+fuzzer to generate inputs for code that parses complex data types. Often the
+inputs will be rejected early, resulting in low coverage. For solving this
+issue `luzer` offers `FuzzedDataProvider` and two functions to customize the
+mutation strategy which is especially useful when fuzzing functions that
+require structured input.
+
+Often, a `bytes` object is not convenient input to your code being fuzzed.
+Similar to libFuzzer, luzer provides a `FuzzedDataProvider` that can simplify the
+task of creating a fuzz target by translating the raw input bytes received from
+the fuzzer into useful primitive Lua types.
+
+You can construct the `FuzzedDataProvider` with:
+
+```lua
+local fdp = luzer.FuzzedDataProvider(input_bytes)
+```
+
+The `FuzzedDataProvider` then supports the following functions:
+
+- `consume_string(max_length)` - consume a string with length in the range `[0,
+  max_length]`. When it runs out of input data, returns what remains of the input.
+- `consume_strings(max_length, count)` - consume a list of `count` strings with
+  length in the range `[0, max_length]`.
+- `consume_integer(min, max)` - consume a signed integer with size in the range
+  `[min, max]`.
+- `consume_integers(min, max, count)` - consume a list of `count` integers in the
+  range `[min, max]`.
+- `consume_number(min, max)` - consume a floating-point value in the range
+  `[min, max]`.
+- `consume_numbers(min, max, count)` - consume a list of `count` floats in the
+  range `[min, max]`. If there's no input data left, returns `min`. Note that
+  `min` must be less than or equal to `max`.
+- `consume_boolean()` - consume either `true` or `false`, or `false` when no
+  data remains.
+- `consume_booleans(count)` - consume a list of `count` booleans.
+- `consume_probability()` - consume a floating-point value in the range `[0, 1]`.
+  If there's no input data left, always returns 0.
+- `remaining_bytes()` - returns the number of unconsumed bytes in the fuzzer
+  input.
+
+Examples:
+
+```lua
+> luzer = require("luzer")
+> fdp = luzer.FuzzedDataProvider(string.rep("A", 10^9))
+> fdp:consume_boolean()
+true
+> fdp:consume_string(2, 10)
+AAAAAAAAA
+```
+
+Learn more about grammar-based fuzzing in the
+[documentation](grammar_based_fuzzing.md).
+
+### Custom mutator
+
+- `LLVMFuzzerCustomMutator(data, max_size, seed)` - optional user-provided
+  custom mutator. Mutates raw data in [`data`, `data` + size of `data`) inplace.
+  Returns the new size, which is not greater than `max_size`. Given the same
+  `seed` produces the same mutation.
+- `LLVMFuzzerCustomCrossOver(data1, data2, max_size, seed)` - optional
+   user-provided custom cross-over function. Combines pieces of `data1` & `data2`
+   together into `out`. Returns the new size, which is not greater than `max_size`.
+   Should produce the same mutation given the same `seed`.
+
+[libfuzzer-options-url]: https://llvm.org/docs/LibFuzzer.html#options
blob - /dev/null
blob + 79b1a31cf71df268966e68b43c0057df776f28cb (mode 644)
--- /dev/null
+++ docs/grammar_based_fuzzing.md
@@ -0,0 +1,46 @@
+## Grammar-Based Fuzzing
+
+There is no anything special for grammar-based fuzzing in `luzer`. Projects
+listed below could help with generating grammar-aware inputs.
+
+### LGen
+
+LGen - the Lua Language Generator is a sentence (test data) generator based on
+syntax description and which uses coverage criteria to restrict the set of
+generated sentences. This generator takes as input a grammar described in a
+notation based on Extended BNF (EBNF) and returns a set of sentences of the
+language corresponding to this grammar.
+
+- URL: https://bitbucket.org/chentz/lgen/src/master/
+- URL: https://bitbucket.org/chentz/lgen/src/master/GenerationEngine/Grammars/
+- URL: http://lgen.wikidot.com/repgrammar
+
+### LPeg
+
+The `re` module supports a somewhat conventional regex syntax for pattern usage
+within LPeg.
+
+- URL: http://www.inf.puc-rio.br/~roberto/lpeg/re.html
+
+A Lua parser generator that makes it possible to describe grammars in a PEG
+syntax. The tool will parse a given input using a provided grammar and if the
+matching is successful produce an AST as an output with the captured values
+using Lpeg. If the matching fails, labelled errors can be used in the grammar
+to indicate failure position, and recovery grammars are generated to continue
+parsing the input using LpegLabel. The tool can also automatically generate
+error labels and recovery grammars for LL(1) grammars.
+URL: https://github.com/vsbenas/parser-gen
+
+Parsing common data formats via LPeg (e-mail, JSON, IPv4 and IPv6 addresses,
+INI, strftime, URL).
+
+- URL: https://github.com/spc476/LPeg-Parsers
+- URL: https://github.com/daurnimator/lpeg_patterns
+
+### References
+
+- [libFuzzer Tutorial][libfuzzer-tutorial-url]
+- [How To Split A Fuzzer-Generated Input Into Several ][split-inputs-url]
+
+[libfuzzer-tutorial-url]: https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md
+[split-inputs-url]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md
blob - /dev/null
blob + 01e8e738122b9121a3a0a85489aa641a78b0397a (mode 644)
--- /dev/null
+++ docs/index.md
@@ -0,0 +1,7 @@
+# luzer documentation
+
+* [Usage](usage.md)
+* [Test management](test_management.md)
+* [API](api.md)
+* [Grammar-based fuzzing](grammar_based_fuzzing.md)
+* [Contributing](../CONTRIBUTING.md)
blob - /dev/null
blob + 49b8a431d59764e87e84d5a04fac89de5cf500e0 (mode 644)
--- /dev/null
+++ docs/test_management.md
@@ -0,0 +1,103 @@
+## Test management
+
+luzer-based tests can be organized in two ways: as a standalone fuzz targets,
+as shown in the Quickstart section, and integrated into the test framework.
+
+### Using a test framework integration
+
+To use fuzzing in your normal development workflow, a tight integration with
+the Busted test framework is provided. This coupling allows the execution of
+fuzz tests alongside your normal unit tests and seamlessly detect problems on
+your local machine or in your CI, enabling you to check that found bugs stay
+resolved forever.
+
+Furthermore, the Busted integration enables great IDE support, so that
+individual inputs can be run or even debugged, similar to what you would expect
+from normal Busted tests.
+
+A fuzz test in [Busted][busted-url] looks similar to the following example:
+
+```lua
+local luzer = require("luzer")
+
+local function TestOneInput(buf)
+    local b = {}
+    buf:gsub(".", function(c) table.insert(b, c) end)
+    if b[1] == 'c' then
+        if b[2] == 'r' then
+            if b[3] == 'a' then
+                if b[4] == 's' then
+                    if b[5] == 'h' then
+                        assert(nil)
+                    end
+                end
+            end
+        end
+    end
+end
+
+describe("arithmetic functions", function()
+    it("sum of numbers", function()
+        luzer.Fuzz(TestOneInput)
+    end)
+end)
+```
+
+To run the tests, execute the following command: `busted spec/sum_spec.lua`.
+
+```sh
+$ busted spec/sum_spec.lua
+●
+1 success / 0 failures / 0 errors / 0 pending : 0.001857 seconds
+```
+
+#### Using standalone fuzz targets
+
+To use fuzzing in your normal development workflow, a tight integration with
+the Busted test framework is provided. This coupling allows the execution of
+fuzz tests alongside your normal unit tests and seamlessly detect problems on
+your local machine or in your CI, enabling you to check that found bugs stay
+resolved forever.
+
+Create a fuzz target invoking your code:
+
+```lua
+local luzer = require("luzer")
+
+local function TestOneInput(buf)
+    local fdp = luzer.FuzzedDataProvider(buf)
+    local str = fdp:consume_string(3)
+
+    local b = {}
+    str:gsub(".", function(c) table.insert(b, c) end)
+    local count = 0
+    if b[1] == "l" then count = count + 1 end
+    if b[2] == "u" then count = count + 1 end
+    if b[3] == "a" then count = count + 1 end
+
+    if count == 3 then assert(nil) end
+end
+
+luzer.Fuzz(TestOneInput)
+```
+
+Start the fuzzer using the fuzz target:
+
+```
+$ luajit examples/example_basic.lua
+INFO: Running with entropic power schedule (0xFF, 100).
+INFO: Seed: 1557779137
+INFO: Loaded 1 modules   (151 inline 8-bit counters): 151 [0x7f0640e706e3, 0x7f0640e7077a),
+INFO: Loaded 1 PC tables (151 PCs): 151 [0x7f0640e70780,0x7f0640e710f0),
+INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
+INFO: A corpus is not provided, starting from an empty corpus
+#2	INITED cov: 17 ft: 18 corp: 1/1b exec/s: 0 rss: 26Mb
+#32	NEW    cov: 17 ft: 24 corp: 2/4b lim: 4 exec/s: 0 rss: 26Mb L: 3/3 MS: 5 ShuffleBytes-ShuffleBytes-CopyPart-ChangeByte-CMP- DE: "\x00\x00"-
+...
+```
+
+While fuzzing is in progress, the fuzzing engine generates new inputs and runs
+them against the provided fuzz target. By default, it continues to run until a
+failing input is found, or the user cancels the process (e.g. with `Ctrl^C`).
+
+[busted-url]: https://lunarmodules.github.io/busted/
blob - /dev/null
blob + 58cf7306abd85deee0d792aa61d1b6c3f87b2f4d (mode 644)
--- /dev/null
+++ docs/usage.md
@@ -0,0 +1,141 @@
+## Usage
+
+### Fuzzing targets
+
+In general, `luzer` has an ability to write fuzzing tests for a Lua functions.
+However, steps may depend on implementation of function under test. Let's
+consider a three cases:
+
+- Fuzzing a Lua function implemented in Lua
+- Fuzzing a Lua function implemented in Lua C
+- Fuzzing a shared library via FFI
+
+#### Fuzzing a module written in Lua
+
+Let's create a fuzzing test for a parser of Lua source code used in `luacheck`
+module.
+
+Setup a target module using `luarocks`:
+
+```sh
+$ luarocks install --local luacheck
+```
+
+Create a file `luacheck_parser_parse.lua` with fuzzing target:
+
+```lua
+local parser = require("src.luacheck.parser")
+local decoder = require("luacheck.decoder")
+local luzer = require("luzer")
+
+local function TestOneInput(buf)
+    parser.parse(decoder.decode(buf))
+end
+
+luzer.Fuzz(TestOneInput, nil, {})
+```
+
+Execute test with PUC Rio Lua:
+
+```
+$ lua luacheck_parser_parse.lua
+```
+
+#### Fuzzing a function implemented in Lua C
+
+Lua functions could be implemented using so called Lua C API. Functions built
+in Lua runtime, external modules written in C/C++ are such examples. Learn more
+about Lua C API in chapter ["24 – An Overview of the C API
+"][programming-in-lua-24] of "Programming in Lua" book.
+
+Setup module using `luarocks`:
+
+```sh
+$ luarocks install --tree modules --lua-version 5.1 lua-cjson CC="clang" CFLAGS="-ggdb -fPIC -fsanitize=address" LDFLAGS="-fsanitize=address"
+
+Installing https://luarocks.org/lua-cjson-2.1.0.6-1.src.rock
+
+lua-cjson 2.1.0.6-1 depends on lua >= 5.1 (5.1-1 provided by VM)
+clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c lua_cjson.c -o lua_cjson.o
+clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c strbuf.c -o strbuf.o
+clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c fpconv.c -o fpconv.o
+gcc -shared -o cjson.so lua_cjson.o strbuf.o fpconv.o
+No existing manifest. Attempting to rebuild...
+lua-cjson 2.1.0.6-1 is now installed in /home/sergeyb/sources/luzer/build/modules (license: MIT)
+```
+
+Setup environment and execute test:
+
+```sh
+$ export LUA_PATH="$LUA_PATH;modules/lib/lua/5.1/?.lua"
+$ export LUA_CPATH="$LUA_CPATH;modules/lib/lua/5.1/?.so;./?.so"
+$ mkdir cjson-corpus
+$ echo -n "{}" > cjson-corpus/sample
+$ luajit luzer_example_json.lua
+```
+
+This way could be used for fuzzing Lua runtime. Let's consider a fuzzing
+target: Lear more about function `loadstring()` in chapter [8 – Compilation,
+Execution, and Errors][programming-in-lua-8] of "Programming in Lua" book.
+
+```lua
+local luzer = require("luzer")
+
+local function TestOneInput(buf)
+    assert(loadstring(buf)) ()
+end
+
+local args = {
+    max_total_time = 60,
+    print_final_stats = 1,
+}
+luzer.Fuzz(TestOneInput, nil, args)
+```
+
+Run fuzzing target with instrumented Lua runtime.
+
+#### Fuzzing a shared library via FFI
+
+Lua has a FFI library that allows seamless integration with C/C++ libraries.
+LuaJIT has a builtin [FFI library][ffi-library-url], that allows calling
+external C functions and using C data structures from pure Lua code.
+FFI library allows using `luzer` for fuzzing shared libraries.
+
+Example `examples/example_zlib.lua` demonstrates a test for ZLib library using
+FFI. For better results it is recommended to build ZLib with sanitizers.
+
+Run fuzzing target:
+
+```sh
+$ lua examples/example_zlib.lua
+```
+
+### Using custom mutators written in Lua
+
+`luzer` allows [custom mutators][libfuzzer-mutators-url] to be written in Lua 5.1
+(including LuaJIT), 5.2, 5.3 or 5.4.
+
+The environment variable `LIBFUZZER_LUA_SCRIPT` can be set to the path to the
+Lua mutator script. The default path is `./mutator.lua`.
+
+To run the Lua example, use
+
+```sh
+LIBFUZZER_LUA_SCRIPT=./mutator.lua example_compressed
+```
+
+All you need to do on the C/C++ side is adding `mutator.c` or `crossover.c`
+file as a compilation unit.
+
+Then write a Lua script that does what you would like the fuzzer to do, you
+might want to use the `mutator.lua` script. The environment variable
+`LIBFUZZER_LUA_SCRIPT` can be set to the path to the Lua mutator
+script. The default path is `./mutator.lua`. Then just run your fuzzing as
+shown in the examples above.
+
+[libfuzzer-mutators-url]: https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
+[ffi-library-url]: https://luajit.org/ext_ffi.html
+[programming-in-lua-8]: https://www.lua.org/pil/8.html
+[programming-in-lua-24]: https://www.lua.org/pil/24.html
+[atheris-native-extensions]: https://github.com/google/atheris/blob/master/native_extension_fuzzing.md
+[atheris-native-extensions-video]: https://www.youtube.com/watch?v=oM-7lt43-GA