Dan's Musings

CI/CD Package Compression Could Be So Much Better

Pretty much every package format on the planet -- pip, rpm, debian, what have you -- is built upon some form of archive format and or some form of compression format. The package is put in an archive and is then compressed.

But in CI/CD pipelines, projects can generate several dozen of these packages per day. It's useful to keep track of several of them, perhaps builds from different git branches. But storing all these files can add up to a lot of disk space.

This situation could be vastly improved if the various packaging systems didn't compress the information found in package archives. It's just like the lesson learned in video compression. Don't compress each frame by itself. Rather, share compression information across frames. Rather than compress each file, share compression across files.

Consider if the archive format were simply tar with no compression. The tar format is just files aligned to 512 byte blocks. If you put these tar archives of the CICD builds on ZFS or btrfs you could get crazy good compression across different files across different archives.

I'm kicking around designing my own package format and I'm going to use this idea in my design someday hopefully.