Tag: programming

Fast appending files to tar archive is impossible.

Eventually, tar is very slow for appending files to the existing tarball. I’m particularly talking about following options:

-r – append files to the end of an archive
-u – only append files newer than copy in archive

Logically thinking, for -u to work, it should accomplish linear search through the archive. Than bigger the archive, than slower the search. Moreover, if you’ll try to append in the loop, it will accomplish search as many times as many iterations you loop has. I would advice to use in the most exceptional case ONLY. Try avoiding

# -u. slow inefficient approach of taring multiple files
for file in $(ls -A)
do
    tar -uf tarball.tar $file;		#traverses all archive to append the file.
done

You’d think that -r option usage forces tar application to append files to the end of the archive, getting the position of the archive’s end from archive’s index. It doesn’t. Tar format is designed in a way that it has no index.

# -r approach is also slow and inefficient
for file in $(ls -A)
do
    tar -rf tarball.tar $file;		#traverses all archive to append the file.
done

However, TAR supports several formats for its archive. But they are not well-documented. I had a brief overview of them, and looks like
–format=gnu is the most recent and featured one. And It still has no index. I no longer understand why tar is even used. Despite of that, below is a workaround, allowing for packing unlimited amounts of files right instant. I recommend to never use append function with tar format. Instead, get to know what are you going to archive, prepare necessary files, and archive them all.

# faster approach for taring multiple files. No appending
ls -A >> list.txt
tar -cT list.txt -f backup.tar