b0rk@mastodon.social
b0rk@mastodon.social

have any of you used a fuzzer for debugging? what specific fuzzing tool did you use?

|
Embed
Progress spinner
dcrosta@hachyderm.io
dcrosta@hachyderm.io

@b0rk Not a fuzzer, per se, but I have used hypothesis in Python testing, and it definitely finds bugs and edge cases I wasn't considering when coding or writing test cases...

hypothesis.readthedocs.io/en/l

|
Embed
Progress spinner
ogrisel@sigmoid.social
ogrisel@sigmoid.social

@b0rk if I remember correctly @vstinner used fuzzing quite extensively to find bugs in CPython and related libraries a few years ago.

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@dcrosta yeah I think it makes sense to include property-based testing, thanks!

|
Embed
Progress spinner
sam@decarboxy.chat
sam@decarboxy.chat

@b0rk I once used AFL very successfully for finding weird edge case bugs in some input file parsing code that I never would have found on my own

|
Embed
Progress spinner
jstepien@mastodon.social
jstepien@mastodon.social

@b0rk Some years ago I fuzzed a small C project using lcamtuf.coredump.cx/afl/. Loved it. It needed just a couple of seconds to find some glaring arithmetic underflows in my offset math.

|
Embed
Progress spinner
hikhvar@norden.social
hikhvar@norden.social

@b0rk never for debugging as in "I know there is a bug, I have to fix that". But often to find bugs in the first place.

|
Embed
Progress spinner
gordonguthrie@mastodon.scot
gordonguthrie@mastodon.scot

@b0rk always wrote my own - you want to generate a lot of 'near correct' and 'should be correct but your a dumbass' and raw random feeds give you lots of <monkeys writing Shakespeare>

(also I am not sure there were fuzzing tools then neither)

|
Embed
Progress spinner
nasser@friend.camp
nasser@friend.camp

@b0rk i used quickcheck in clojure to test some compiler stuff a while back! it was effective but not worth the effort of maintaining it for my particular use case in the end.

github.com/clojure/test.check

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@hikhvar do you have examples of tools you used? or did you write your own?

|
Embed
Progress spinner
mistersql@mastodon.social
mistersql@mastodon.social

@b0rk python hypothesis. Give it python code and it generates unit tests that feed every possible sort of data into the functions to see if they blow up or not.

hypothesis.readthedocs.io/en/l

|
Embed
Progress spinner
AndrewO@rls.social
AndrewO@rls.social

@b0rk @dcrosta Also in the property-based/generative testing camp: I've used JS fast-check on an XState state machine before realizing XState has its own model based testing library. 🤦‍♂️ It was a fun exercise though...

github.com/dubzzz/fast-check

xstate.js.org/docs/packages/xs

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

also, I'm trying to make a list of data formatting tools (like hexdump, xxd, jq, and graphviz) -- I feel like there are some I'm missing

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@hikhvar oh cool, I didn't know that go had it in the stdlib now!

|
Embed
Progress spinner
arthegall@mastodon.roundpond.net
arthegall@mastodon.roundpond.net

@b0rk consider some libraries, like python's `tabulate` or `rich`, that are language specific?

|
Embed
Progress spinner
april@macaw.social
april@macaw.social

@b0rk there are some neat tools "dasel" that are a jq that works with multiple different formats.

|
Embed
Progress spinner
hikhvar@norden.social
hikhvar@norden.social

@b0rk for Go it was gofuzz. But in recent Go versions this is part of the Stdlib

|
Embed
Progress spinner
davidr@hachyderm.io
davidr@hachyderm.io

@b0rk gnuplot, xmllint

|
Embed
Progress spinner
bmp@hachyderm.io
bmp@hachyderm.io

@b0rk I'm not sure how close this is to what you're aiming for, but jc (kellyjonbrazil.github.io/jc/) might fit the bill: it converts a *huge* number of bespoke formats to JSON to allow processing with jq/nu/etc.

|
Embed
Progress spinner
aadriasola@ruby.social
aadriasola@ruby.social

@b0rk yq for yaml

|
Embed
Progress spinner
evilstevie@mastod1.ddns.net
evilstevie@mastod1.ddns.net

@b0rk honestly I throw LibreCalc/Excel in that pile too. with careful delimiting and sometimes an additional stage of search/replace (sed ftw!) to get there. great for monster log files

|
Embed
Progress spinner
chrismartin@mastodon.social
chrismartin@mastodon.social

@b0rk I found Pup recently github.com/ericchiang/pup. It is described as Jq for HTML although I haven’t had a chance to try it myself yet.

|
Embed
Progress spinner
tshirtman@mas.to
tshirtman@mas.to

@b0rk once i was working on a program that handled multitouch interactions on a screen, and we some bugs that were impossible to reproduce, resulting from touches/noise, we had a recorder/replayer for touches, but hadn't captured the bugs, i wrote some code to generate many many points in random locations on the screen, in the file format of the recorder. Once i discovered a bug with it, i would manually bisect the file until i had a short sequence of touches reproducing it, and start analysis.

|
Embed
Progress spinner
schiermi@frankfurt.social
schiermi@frankfurt.social

@b0rk xmlstarlet

|
Embed
Progress spinner
wader@fosstodon.org
wader@fosstodon.org

@b0rk github.com/wader/fq but im a bit biased :)

|
Embed
Progress spinner
MichaelT@ruby.social
MichaelT@ruby.social

@b0rk I have used od quite a bit: man7.org/linux/man-pages/man1/

|
Embed
Progress spinner
dojoe@chaos.social
dojoe@chaos.social

@b0rk Calc/Excel, no kidding. They're great for quickly graphing the progression of trace data, latency histograms etc.
Trace - > grep/rg - > plaintext import or just ctrl-shift-alt-v - > (optional) massage data - > graph assistant.
Turning data into images is a powerful tool.

|
Embed
Progress spinner
shfo@mastodon.social
shfo@mastodon.social

@b0rk `cargo fuzz` in Rust is pretty easy and has caught a bunch of issues in parsers I've written.

|
Embed
Progress spinner
andyprice@mastodon.social
andyprice@mastodon.social

@b0rk That's a pretty broad category, I expect half of the tools in util-linux alone would fit into it.

|
Embed
Progress spinner
fcodvpt@framapiaf.org
fcodvpt@framapiaf.org

@b0rk You may like datascienceatthecommandline.co from @jeroenjanssens

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@shfo that’s cool, i’m learning a bunch of languages have built in fuzzers!!

|
Embed
Progress spinner
NicolaiBuchwitz@mastodon.social
NicolaiBuchwitz@mastodon.social

@b0rk github.com/jpmens/jo written by @jpmens

|
Embed
Progress spinner
plexus@toot.cat
plexus@toot.cat

@b0rk jet would fit the bill, but it's not used much outside the Clojure ecosystem

|
Embed
Progress spinner
bobthomson70@mastodon.social
bobthomson70@mastodon.social

@b0rk do sed and awk count?

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@bobthomson70 maybe yeah!

|
Embed
Progress spinner
janl@chaos.social
janl@chaos.social

@b0rk tr

|
Embed
Progress spinner
powersoffour@mastodon.social
powersoffour@mastodon.social

@b0rk I use ksv for kaitai-struct described data. Not sure it fits the use case since it requires a format declaration, but it's pretty great! And the web IDE that implements it (ide.kaitai.io/) is very useful too.

|
Embed
Progress spinner
gvwilson@mastodon.social
gvwilson@mastodon.social

@b0rk Have you seen Zeller et al's fuzzingbook.org/ ?

|
Embed
Progress spinner
fclc@mast.hpc.social
fclc@mast.hpc.social

@b0rk Can't forget the classics:

pipe into | base64, base32 etc.

grep, or if you feel like calling out/making some peeps remember the ancient times, make a separate entry for grep vs egrep

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@gvwilson no, thank you!

|
Embed
Progress spinner
kimvanwyk@fosstodon.org
kimvanwyk@fosstodon.org

@b0rk rich-cli handles a variety of data formats very nicely. csvkit is also excellent.

|
Embed
Progress spinner
cscott@kolektiva.social
cscott@kolektiva.social

@b0rk M-x hexl-mode

|
Embed
Progress spinner
pythoneer@cyberplace.social
pythoneer@cyberplace.social

@b0rk strings is a good companion to hexdump/xxd for those use cases where you just want to extract and list text strings from a binary

|
Embed
Progress spinner
cscott@kolektiva.social
cscott@kolektiva.social

@b0rk i used the SPIN model checker for some particularly nasty mulithreaded code. Model checkers are fuzzers, if you include the case where literally every possible input is tested. :)

spinroot.com/spin/whatispin.ht

|
Embed
Progress spinner
doy@recurse.social
doy@recurse.social

@b0rk yes! i used afl and crates.io/crates/quickcheck to track down some tricky issues in my terminal parsing library

|
Embed
Progress spinner
b0rk@mastodon.social
b0rk@mastodon.social

@doy nice thank you!

|
Embed
Progress spinner
janriemer@floss.social
janriemer@floss.social

@b0rk

qsv - CSVs sliced, diced & analyzed

github.com/jqnatividad/qsv

hq - jq, but for HTML

github.com/orf/hq

|
Embed
Progress spinner
ajwk@mastodon.social
ajwk@mastodon.social

@b0rk Miller (mlr)?

github.com/johnkerl/miller

|
Embed
Progress spinner
reubeno@hachyderm.io
reubeno@hachyderm.io

@b0rk it's more complex but i'm a fan of kaitai.io for visualizing binary files/structures; the github-hosted format library has parsers available for some well known file formats

|
Embed
Progress spinner
gabeguz@bsd.network
gabeguz@bsd.network

@b0rk `sort`, `uniq`, `sed` are 3 that I use frequently. oh, and `grep -v`

|
Embed
Progress spinner
NicolasRinaudo@functional.cafe
NicolasRinaudo@functional.cafe

@b0rk @dcrosta what difference do you make between fuzzing and PBT? I was under the impression that fuzzing was just the “does not crash” property.

I’ve used (and written) various PBT frameworks to great success, but will readily admit there’s a learning curve, and it takes a while to start writing useful properties that don’t basically reimplement the system under test

|
Embed
Progress spinner
pelavarre@social.vivaldi.net
pelavarre@social.vivaldi.net

""" have any of you used a fuzzer for debugging? what specific fuzzing tool did you use?
- @b0rk

Disrupt the smooth flow of data in flight or at rest!

honestly, that's my most basic fuzzer - cut my Internet cable, or switch to Mobile Data from Wi Fi, or switch down to 2.4 GHz WiFi from 5 Ghz WiFi - delete a file, remove a dir - next tool is poke Nulls into Tables

# bots go boom

|
Embed
Progress spinner
In reply to
boredzo@mastodon.social
boredzo@mastodon.social

@b0rk I've been getting a lot of good use out of Synalyze It! lately. It's a hex editor with the ability to give it structure definitions (which it calls “grammars”) with which it can parse the bytes and show you the structures' values.

Screenshot of two documents open in Synalyze It!, with two grammars also open. One document is a disk image that's been assigned the “HFS on-disk format” grammar, and is showing values from the volume header, such as the volume name. The other is pasted data with the “HFS header node” grammar, showing values from the header node of this volume's catalog tree.

|
Embed
Progress spinner
mudge@ruby.social
mudge@ruby.social

@b0rk have you seen Andrew Gallant’s CSV command line toolkit xsv? github.com/BurntSushi/xsv

|
Embed
Progress spinner
robryk@qoto.org
robryk@qoto.org

@b0rk @dcrosta

On that note fuzzing-for-equivalence is similar: check that two functions are equivalent by fuzzing something that runs both and crashes on different results.

It is a subset of property testing, and the subset of property testing that's easy to implement as a fuzz target is larger than this, but I've found fuzzing-for-equivalent to be useful and to be a good way to think about property-testing-like things done in fuzz targets.

|
Embed
Progress spinner
robryk@qoto.org
robryk@qoto.org

@b0rk

objdump?

|
Embed
Progress spinner
xenodium@indieweb.social
xenodium@indieweb.social

@b0rk I found xxd -i (generate C header) pretty magical when I wanted to embed a binary in a compiled executable.

|
Embed
Progress spinner
codebrewer@mastodon.social
codebrewer@mastodon.social

@b0rk perhaps csvkit and datamash

|
Embed
Progress spinner
SeanMP@mastodon.social
SeanMP@mastodon.social

@b0rk Less formatting, more selection and collating, but RecordStream is fantastically useful when you need to slice, map, reduce, etc., a bunch of records. Learning curve is somewhat steep unfortunately, but I've found it useful to have in my back pocket.

github.com/benbernard/RecordSt

|
Embed
Progress spinner
aykevl@mastodon.social
aykevl@mastodon.social

@b0rk not necessarily a fuzzer but I've reimplemented a few nontrivial algorithms in a different way and in those cases I simply generate random inputs and verify that both algorithms produce the same output - or in the case of floats, don't deviate too much. Put it in a loop and let it run a few minutes to be sure there is no bug (with a high degree of probability). This tends to catch bugs very quickly if they exist.

I guess that counts as property testing.

|
Embed
Progress spinner
jurgenhaas@fosstodon.org
jurgenhaas@fosstodon.org

@b0rk
Visidata is great for many formats.

visidata.org/

|
Embed
Progress spinner
wmspringer@hachyderm.io
wmspringer@hachyderm.io

@b0rk You mean, explaining the problem to your cat?

|
Embed
Progress spinner
pulkomandy@mastodon.tetaneutral.net
pulkomandy@mastodon.tetaneutral.net

@b0rk pyplot, wireshark (also for non-network uses, for example it can decode chunks in png files)

Somewhat unrelated: tig, gitk, git blame (do you plan a section about studying the history of the code to find when a problem appeared? Both in practical terms (git bisect) and a more high level approach (how do we make sure this type of bug doesn't happen again?)

|
Embed
Progress spinner
trs@metasocial.com
trs@metasocial.com

@b0rk visidata.org

|
Embed
Progress spinner
AinarG@mastodon.online
AinarG@mastodon.online

@b0rk, others have already mentioned the Octal Dumper aka od(1). It'll add that it is worth including simply for the legend/fact that the reason why keyword "do" is paired with "done" in Unix shell (as opposed to following the "if"-"fi" and "case"-"esac" pattern) is because od(1) was already a thing and actively used. Its options are quite odd (heh), but it's still in the POSIX, and there are people who prefer it to xxd and hexdump.

|
Embed
Progress spinner
kramse@social.kramse.org
kramse@social.kramse.org

@b0rk zeek.org has a zeek-cut tool that works on their own TSV based formats.

They include a header in each file, and then the tool can output only the ones you need, but by name

$ cat dns.log | zeek-cut id.orig_h query answers

I have not seen that before, and only used in their software products
docs.zeek.org/en/master/log-fo

|
Embed
Progress spinner
bazzargh@hachyderm.io
bazzargh@hachyderm.io

@b0rk I have used a related thing github.com/MozillaSecurity/lit (and wrote scripts based on ddmin). When you hit bugs with a fuzzer you can use a test case reducer like this to isolate the input that caused it - but it works on bugs found in the wild too, in that case web pages/js that crash the browser.

|
Embed
Progress spinner