I collect a few GB/day of logfiles from my ADSB antennae. Storing and processing these is mostly automated so gzip has been adequate.
Looking to improve I've tested a few newer methods, LZMA (xz) and ZSTD. Both support threading -T0
so they should be able to outperform the baseline gzip.
All tests done with a Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz with other random usage.
Results could become more accurate with echo 1 > /proc/sys/vm/drop_caches
and less background io.
flugcat-20210314.log:
2713597522 bytes
zstd:
zstd flugcat-20210314.log -o flugcat-20210314.log.3.zst 6.09s user 0.66s system 114% cpu 5.881 total
243135780 bytes
unzstd flugcat-20210314.log.3.zst 2.24s user 1.58s system 99% cpu 3.833 total
zstd - -T0 -9
:
zstd -T0 -9 flugcat-20210314.log -o flugcat-20210314.log.9.zst 39.54s user 0.99s system 339% cpu 11.948 total
190202201 bytes
unzstd flugcat-20210314.log.9.zst 1.87s user 1.52s system 80% cpu 4.201 total
zstd - -T0 -19
(max):
zstd -T0 -19 flugcat-20210314.log -o flugcat-20210314.log.19.zst 2442.36s user 3.44s system 352% cpu 11:34.70 total
142548948 bytes
unzstd flugcat-20210314.log.19.zst 1.81s user 1.49s system 99% cpu 3.306 total
xz:
xz flugcat-20210314.log 971.35s user 2.50s system 99% cpu 16:15.41 total
141154572 bytes
unxz flugcat-20210314.log.xz 11.77s user 1.52s system 41% cpu 31.900 total
xz - -T0
:
xz -T0 flugcat-20210314.log 1107.33s user 2.52s system 343% cpu 5:23.36 total
142435176 bytes
unxz flugcat-20210314.log.xz 11.58s user 1.71s system 62% cpu 21.346 total
gzip:
gzip flugcat-20210314.log 37.42s user 0.98s system 99% cpu 38.495 total
217945296 bytes
gunzip flugcat-20210314.log.gz 8.84s user 1.26s system 58% cpu 17.378 total
Grep through the messages and pull out the ones with flight info; MSG,1
.
uncompressed:
✗ time grep ^MSG,1 flugcat-20210314.log | awk -F , '{ print $5","$11 }' |sort |uniq > /dev/null
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox} ^MSG,1 2.55s user 0.42s system 87% cpu 3.404 total
awk -F , '{ print $5","$11 }' 0.88s user 0.03s system 26% cpu 3.400 total
sort 0.53s user 0.04s system 16% cpu 3.518 total
uniq > /dev/null 0.03s user 0.00s system 0% cpu 3.516 total
zstd:
✗ time zstdgrep ^MSG,1 flugcat-20210314.log.zst | awk -F , '{ print $5","$11 }' |sort |uniq > /dev/null
zstdgrep ^MSG,1 flugcat-20210314.log.zst 4.60s user 1.06s system 156% cpu 3.630 total
awk -F , '{ print $5","$11 }' 0.90s user 0.10s system 27% cpu 3.629 total
sort 0.54s user 0.03s system 15% cpu 3.749 total
uniq > /dev/null 0.03s user 0.00s system 0% cpu 3.748 total
xz:
✗ time xzgrep ^MSG,1 flugcat-20210314.log.xz | awk -F , '{ print $5","$11 }' |sort |uniq > /dev/null
xzgrep ^MSG,1 flugcat-20210314.log.xz 15.11s user 2.13s system 128% cpu 13.392 total
awk -F , '{ print $5","$11 }' 0.97s user 0.10s system 7% cpu 13.391 total
sort 0.53s user 0.05s system 4% cpu 13.522 total
uniq > /dev/null 0.03s user 0.00s system 0% cpu 13.522 total
gzip:
✗ time zgrep ^MSG,1 flugcat-20210314.log.gz | awk -F , '{ print $5","$11 }' |sort |uniq > /dev/null
zgrep ^MSG,1 flugcat-20210314.log.gz 12.64s user 1.88s system 140% cpu 10.363 total
awk -F , '{ print $5","$11 }' 1.03s user 0.06s system 10% cpu 10.363 total
sort 0.56s user 0.02s system 5% cpu 10.484 total
uniq > /dev/null 0.02s user 0.01s system 0% cpu 10.483 total
ZSTD works really well and I agree with Arch linux deciding to use it in their packaging.
Personally, I'll be using -T0 -9
to save some space (1%) vs gzip but still reasonably quick to compress.