commit a5dd3271b298ba46f50a8a47603c7c84ff736be3 from: Sergey Bronnikov via: Sergey Bronnikov date: Wed Feb 08 15:52:59 2023 UTC docs: add an initial version commit - 0d927a36f0b8d71f12717eb1a7327cfc42a6a51c commit + a5dd3271b298ba46f50a8a47603c7c84ff736be3 blob - 5c41b700cf04b2fe115da766ad718a50117efaae blob + cd2d8a469ba69377072affd63bf2521f9d02713d --- CHANGELOG.md +++ CHANGELOG.md @@ -14,3 +14,4 @@ and this project adheres to [Semantic Versioning](http - Integration with libFuzzer's `FuzzedDataProvider`. - libFuzzer custom mutator for Lua. - Examples with tests. +- Documentation with usecases, API etc. blob - a45083a78cd0f2a92a2d050ed1c058a14c25f93d blob + 9637f4b31828dc11026c2a3911dfd7fa23e41345 --- README.md +++ README.md @@ -82,6 +82,10 @@ To gather baseline coverage, the fuzzing engine execut and the generated corpus, to ensure that no errors occurred and to understand the code coverage the existing corpus already provides. +## Documentation + +See [documentation](docs/index.md). + ## License Copyright © 2022-2023 [Sergey Bronnikov][bronevichok-url]. blob - /dev/null blob + dfc6a046bddd1ac47adcd3141dbd85d6963419e5 (mode 644) --- /dev/null +++ docs/api.md @@ -0,0 +1,97 @@ +## API + +### Fuzzing functions + +The `luzer` module provides a function `Fuzz()`. + +`Fuzz(test_one_input, custom_mutator, args)` starts the fuzzer. This function +does not return. + +Function accepts following arguments: + +- `test_one_input` is a fuzzer's entry point (equivalent to `TestOneInput`), it + is a function that must take a single string argument. This will be repeatedly + invoked with a single string container. +- `custom_mutator` defines a custom mutator function (equivalent to + `LLVMFuzzerCustomMutator`). Default is `nil`. +- `args` is a table with arguments: the process arguments to pass to the + fuzzer. Field `corpus` specifies a path to a directory with seed corpus, see a + list with other options in the [libFuzzer documentation][libfuzzer-options-url]. + +It may be desirable to reject some inputs, i.e. to not add them to the corpus. +For example, when fuzzing an API consisting of parsing and other logic, one may +want to allow only those inputs into the corpus that parse successfully. If the +fuzz target returns `-1` on a given input, `luzer` will not add that input top +the corpus, regardless of what coverage it triggers. + +### Structure-Aware Fuzzing + +`luzer` is based on a coverage-guided mutation-based fuzzer (LibFuzzer). It has +the advantage of not requiring any grammar definition for generating inputs, +making its setup easier. The disadvantage is that it will be harder for the +fuzzer to generate inputs for code that parses complex data types. Often the +inputs will be rejected early, resulting in low coverage. For solving this +issue `luzer` offers `FuzzedDataProvider` and two functions to customize the +mutation strategy which is especially useful when fuzzing functions that +require structured input. + +Often, a `bytes` object is not convenient input to your code being fuzzed. +Similar to libFuzzer, luzer provides a `FuzzedDataProvider` that can simplify the +task of creating a fuzz target by translating the raw input bytes received from +the fuzzer into useful primitive Lua types. + +You can construct the `FuzzedDataProvider` with: + +```lua +local fdp = luzer.FuzzedDataProvider(input_bytes) +``` + +The `FuzzedDataProvider` then supports the following functions: + +- `consume_string(max_length)` - consume a string with length in the range `[0, + max_length]`. When it runs out of input data, returns what remains of the input. +- `consume_strings(max_length, count)` - consume a list of `count` strings with + length in the range `[0, max_length]`. +- `consume_integer(min, max)` - consume a signed integer with size in the range + `[min, max]`. +- `consume_integers(min, max, count)` - consume a list of `count` integers in the + range `[min, max]`. +- `consume_number(min, max)` - consume a floating-point value in the range + `[min, max]`. +- `consume_numbers(min, max, count)` - consume a list of `count` floats in the + range `[min, max]`. If there's no input data left, returns `min`. Note that + `min` must be less than or equal to `max`. +- `consume_boolean()` - consume either `true` or `false`, or `false` when no + data remains. +- `consume_booleans(count)` - consume a list of `count` booleans. +- `consume_probability()` - consume a floating-point value in the range `[0, 1]`. + If there's no input data left, always returns 0. +- `remaining_bytes()` - returns the number of unconsumed bytes in the fuzzer + input. + +Examples: + +```lua +> luzer = require("luzer") +> fdp = luzer.FuzzedDataProvider(string.rep("A", 10^9)) +> fdp:consume_boolean() +true +> fdp:consume_string(2, 10) +AAAAAAAAA +``` + +Learn more about grammar-based fuzzing in the +[documentation](grammar_based_fuzzing.md). + +### Custom mutator + +- `LLVMFuzzerCustomMutator(data, max_size, seed)` - optional user-provided + custom mutator. Mutates raw data in [`data`, `data` + size of `data`) inplace. + Returns the new size, which is not greater than `max_size`. Given the same + `seed` produces the same mutation. +- `LLVMFuzzerCustomCrossOver(data1, data2, max_size, seed)` - optional + user-provided custom cross-over function. Combines pieces of `data1` & `data2` + together into `out`. Returns the new size, which is not greater than `max_size`. + Should produce the same mutation given the same `seed`. + +[libfuzzer-options-url]: https://llvm.org/docs/LibFuzzer.html#options blob - /dev/null blob + 79b1a31cf71df268966e68b43c0057df776f28cb (mode 644) --- /dev/null +++ docs/grammar_based_fuzzing.md @@ -0,0 +1,46 @@ +## Grammar-Based Fuzzing + +There is no anything special for grammar-based fuzzing in `luzer`. Projects +listed below could help with generating grammar-aware inputs. + +### LGen + +LGen - the Lua Language Generator is a sentence (test data) generator based on +syntax description and which uses coverage criteria to restrict the set of +generated sentences. This generator takes as input a grammar described in a +notation based on Extended BNF (EBNF) and returns a set of sentences of the +language corresponding to this grammar. + +- URL: https://bitbucket.org/chentz/lgen/src/master/ +- URL: https://bitbucket.org/chentz/lgen/src/master/GenerationEngine/Grammars/ +- URL: http://lgen.wikidot.com/repgrammar + +### LPeg + +The `re` module supports a somewhat conventional regex syntax for pattern usage +within LPeg. + +- URL: http://www.inf.puc-rio.br/~roberto/lpeg/re.html + +A Lua parser generator that makes it possible to describe grammars in a PEG +syntax. The tool will parse a given input using a provided grammar and if the +matching is successful produce an AST as an output with the captured values +using Lpeg. If the matching fails, labelled errors can be used in the grammar +to indicate failure position, and recovery grammars are generated to continue +parsing the input using LpegLabel. The tool can also automatically generate +error labels and recovery grammars for LL(1) grammars. +URL: https://github.com/vsbenas/parser-gen + +Parsing common data formats via LPeg (e-mail, JSON, IPv4 and IPv6 addresses, +INI, strftime, URL). + +- URL: https://github.com/spc476/LPeg-Parsers +- URL: https://github.com/daurnimator/lpeg_patterns + +### References + +- [libFuzzer Tutorial][libfuzzer-tutorial-url] +- [How To Split A Fuzzer-Generated Input Into Several ][split-inputs-url] + +[libfuzzer-tutorial-url]: https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md +[split-inputs-url]: https://github.com/google/fuzzing/blob/master/docs/split-inputs.md blob - /dev/null blob + 01e8e738122b9121a3a0a85489aa641a78b0397a (mode 644) --- /dev/null +++ docs/index.md @@ -0,0 +1,7 @@ +# luzer documentation + +* [Usage](usage.md) +* [Test management](test_management.md) +* [API](api.md) +* [Grammar-based fuzzing](grammar_based_fuzzing.md) +* [Contributing](../CONTRIBUTING.md) blob - /dev/null blob + 49b8a431d59764e87e84d5a04fac89de5cf500e0 (mode 644) --- /dev/null +++ docs/test_management.md @@ -0,0 +1,103 @@ +## Test management + +luzer-based tests can be organized in two ways: as a standalone fuzz targets, +as shown in the Quickstart section, and integrated into the test framework. + +### Using a test framework integration + +To use fuzzing in your normal development workflow, a tight integration with +the Busted test framework is provided. This coupling allows the execution of +fuzz tests alongside your normal unit tests and seamlessly detect problems on +your local machine or in your CI, enabling you to check that found bugs stay +resolved forever. + +Furthermore, the Busted integration enables great IDE support, so that +individual inputs can be run or even debugged, similar to what you would expect +from normal Busted tests. + +A fuzz test in [Busted][busted-url] looks similar to the following example: + +```lua +local luzer = require("luzer") + +local function TestOneInput(buf) + local b = {} + buf:gsub(".", function(c) table.insert(b, c) end) + if b[1] == 'c' then + if b[2] == 'r' then + if b[3] == 'a' then + if b[4] == 's' then + if b[5] == 'h' then + assert(nil) + end + end + end + end + end +end + +describe("arithmetic functions", function() + it("sum of numbers", function() + luzer.Fuzz(TestOneInput) + end) +end) +``` + +To run the tests, execute the following command: `busted spec/sum_spec.lua`. + +```sh +$ busted spec/sum_spec.lua +● +1 success / 0 failures / 0 errors / 0 pending : 0.001857 seconds +``` + +#### Using standalone fuzz targets + +To use fuzzing in your normal development workflow, a tight integration with +the Busted test framework is provided. This coupling allows the execution of +fuzz tests alongside your normal unit tests and seamlessly detect problems on +your local machine or in your CI, enabling you to check that found bugs stay +resolved forever. + +Create a fuzz target invoking your code: + +```lua +local luzer = require("luzer") + +local function TestOneInput(buf) + local fdp = luzer.FuzzedDataProvider(buf) + local str = fdp:consume_string(3) + + local b = {} + str:gsub(".", function(c) table.insert(b, c) end) + local count = 0 + if b[1] == "l" then count = count + 1 end + if b[2] == "u" then count = count + 1 end + if b[3] == "a" then count = count + 1 end + + if count == 3 then assert(nil) end +end + +luzer.Fuzz(TestOneInput) +``` + +Start the fuzzer using the fuzz target: + +``` +$ luajit examples/example_basic.lua +INFO: Running with entropic power schedule (0xFF, 100). +INFO: Seed: 1557779137 +INFO: Loaded 1 modules (151 inline 8-bit counters): 151 [0x7f0640e706e3, 0x7f0640e7077a), +INFO: Loaded 1 PC tables (151 PCs): 151 [0x7f0640e70780,0x7f0640e710f0), +INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes +INFO: A corpus is not provided, starting from an empty corpus +#2 INITED cov: 17 ft: 18 corp: 1/1b exec/s: 0 rss: 26Mb +#32 NEW cov: 17 ft: 24 corp: 2/4b lim: 4 exec/s: 0 rss: 26Mb L: 3/3 MS: 5 ShuffleBytes-ShuffleBytes-CopyPart-ChangeByte-CMP- DE: "\x00\x00"- +... +``` + +While fuzzing is in progress, the fuzzing engine generates new inputs and runs +them against the provided fuzz target. By default, it continues to run until a +failing input is found, or the user cancels the process (e.g. with `Ctrl^C`). + +[busted-url]: https://lunarmodules.github.io/busted/ blob - /dev/null blob + 58cf7306abd85deee0d792aa61d1b6c3f87b2f4d (mode 644) --- /dev/null +++ docs/usage.md @@ -0,0 +1,141 @@ +## Usage + +### Fuzzing targets + +In general, `luzer` has an ability to write fuzzing tests for a Lua functions. +However, steps may depend on implementation of function under test. Let's +consider a three cases: + +- Fuzzing a Lua function implemented in Lua +- Fuzzing a Lua function implemented in Lua C +- Fuzzing a shared library via FFI + +#### Fuzzing a module written in Lua + +Let's create a fuzzing test for a parser of Lua source code used in `luacheck` +module. + +Setup a target module using `luarocks`: + +```sh +$ luarocks install --local luacheck +``` + +Create a file `luacheck_parser_parse.lua` with fuzzing target: + +```lua +local parser = require("src.luacheck.parser") +local decoder = require("luacheck.decoder") +local luzer = require("luzer") + +local function TestOneInput(buf) + parser.parse(decoder.decode(buf)) +end + +luzer.Fuzz(TestOneInput, nil, {}) +``` + +Execute test with PUC Rio Lua: + +``` +$ lua luacheck_parser_parse.lua +``` + +#### Fuzzing a function implemented in Lua C + +Lua functions could be implemented using so called Lua C API. Functions built +in Lua runtime, external modules written in C/C++ are such examples. Learn more +about Lua C API in chapter ["24 – An Overview of the C API +"][programming-in-lua-24] of "Programming in Lua" book. + +Setup module using `luarocks`: + +```sh +$ luarocks install --tree modules --lua-version 5.1 lua-cjson CC="clang" CFLAGS="-ggdb -fPIC -fsanitize=address" LDFLAGS="-fsanitize=address" + +Installing https://luarocks.org/lua-cjson-2.1.0.6-1.src.rock + +lua-cjson 2.1.0.6-1 depends on lua >= 5.1 (5.1-1 provided by VM) +clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c lua_cjson.c -o lua_cjson.o +clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c strbuf.c -o strbuf.o +clang -ggdb -fPIC -fsanitize=address -I/usr/include/lua5.1 -c fpconv.c -o fpconv.o +gcc -shared -o cjson.so lua_cjson.o strbuf.o fpconv.o +No existing manifest. Attempting to rebuild... +lua-cjson 2.1.0.6-1 is now installed in /home/sergeyb/sources/luzer/build/modules (license: MIT) +``` + +Setup environment and execute test: + +```sh +$ export LUA_PATH="$LUA_PATH;modules/lib/lua/5.1/?.lua" +$ export LUA_CPATH="$LUA_CPATH;modules/lib/lua/5.1/?.so;./?.so" +$ mkdir cjson-corpus +$ echo -n "{}" > cjson-corpus/sample +$ luajit luzer_example_json.lua +``` + +This way could be used for fuzzing Lua runtime. Let's consider a fuzzing +target: Lear more about function `loadstring()` in chapter [8 – Compilation, +Execution, and Errors][programming-in-lua-8] of "Programming in Lua" book. + +```lua +local luzer = require("luzer") + +local function TestOneInput(buf) + assert(loadstring(buf)) () +end + +local args = { + max_total_time = 60, + print_final_stats = 1, +} +luzer.Fuzz(TestOneInput, nil, args) +``` + +Run fuzzing target with instrumented Lua runtime. + +#### Fuzzing a shared library via FFI + +Lua has a FFI library that allows seamless integration with C/C++ libraries. +LuaJIT has a builtin [FFI library][ffi-library-url], that allows calling +external C functions and using C data structures from pure Lua code. +FFI library allows using `luzer` for fuzzing shared libraries. + +Example `examples/example_zlib.lua` demonstrates a test for ZLib library using +FFI. For better results it is recommended to build ZLib with sanitizers. + +Run fuzzing target: + +```sh +$ lua examples/example_zlib.lua +``` + +### Using custom mutators written in Lua + +`luzer` allows [custom mutators][libfuzzer-mutators-url] to be written in Lua 5.1 +(including LuaJIT), 5.2, 5.3 or 5.4. + +The environment variable `LIBFUZZER_LUA_SCRIPT` can be set to the path to the +Lua mutator script. The default path is `./mutator.lua`. + +To run the Lua example, use + +```sh +LIBFUZZER_LUA_SCRIPT=./mutator.lua example_compressed +``` + +All you need to do on the C/C++ side is adding `mutator.c` or `crossover.c` +file as a compilation unit. + +Then write a Lua script that does what you would like the fuzzer to do, you +might want to use the `mutator.lua` script. The environment variable +`LIBFUZZER_LUA_SCRIPT` can be set to the path to the Lua mutator +script. The default path is `./mutator.lua`. Then just run your fuzzing as +shown in the examples above. + +[libfuzzer-mutators-url]: https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md +[ffi-library-url]: https://luajit.org/ext_ffi.html +[programming-in-lua-8]: https://www.lua.org/pil/8.html +[programming-in-lua-24]: https://www.lua.org/pil/24.html +[atheris-native-extensions]: https://github.com/google/atheris/blob/master/native_extension_fuzzing.md +[atheris-native-extensions-video]: https://www.youtube.com/watch?v=oM-7lt43-GA