How to compress data for archival: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(Created page with "= Using '''tar''' command = == One file == == One directory == == Several files ==")
 
Line 1: Line 1:
= Using '''tar''' command =
= Using '''tar''' command =
The '''tar''' command is used to ''archive'' files, that is to put many files into one single ''archive'' file, with '''no compression'''.
In some cases this is the goal, one file simplifies file management, it is fast, and it is suitable for streaming the archive to
an archival storage device, such as tape storage.
However, nowadays, mostly it is desired to add some kind of compression of the archived data.
<code>tar</code> can do this by using '''external compression programs''', such as <code>gzip</code> or <code>bzip2</code>.
<code>gzip</code>, GNU Zip, provides reasonable compression and relatively fast.
<code>bzip2</code> compresses better, but significantly slower. In many cases, <code>gzip</code> is good enough. 


== One file ==
== One file ==


== One directory ==
== One directory ==
For this example we use a directory that contains '''6172 files containing 165 MB''' of data.
Archival without compression takes 8.5 sec and produces a file of 150MB in size:
<pre>
# 8.5 sec
$ tar -cvf archive.tar my_data
....
$ ls -lh archive.tar
-rw-r--r-- 1 drozmano drozmano 150M Jul  4 08:42 gromacs.tar
</pre>
Individual files use some integral number of storage blocks,
this is why the archive is smaller than the total size of the original files.
Data padding to fill the last block is only done once in the case of the archive.


Archival with compression using the <code>gzip</code> is actually '''faster''' (5.9 sec) due to less data that needs to be written to the compressed archive:
<pre>
# 5.9 sec
$ tar -czvf archive.tar.gz my_data
....
$ ls -lh archive.tar.gz
-rw-r--r--  1 drozmano drozmano  33M Jul  4 08:46 archive.tar.gz
</pre>


== Several files ==
== Several files ==

Revision as of 14:57, 4 July 2022

Using tar command

The tar command is used to archive files, that is to put many files into one single archive file, with no compression. In some cases this is the goal, one file simplifies file management, it is fast, and it is suitable for streaming the archive to an archival storage device, such as tape storage.

However, nowadays, mostly it is desired to add some kind of compression of the archived data. tar can do this by using external compression programs, such as gzip or bzip2.

gzip, GNU Zip, provides reasonable compression and relatively fast.

bzip2 compresses better, but significantly slower. In many cases, gzip is good enough.

One file

One directory

For this example we use a directory that contains 6172 files containing 165 MB of data.

Archival without compression takes 8.5 sec and produces a file of 150MB in size:

# 8.5 sec
$ tar -cvf archive.tar my_data
....
$ ls -lh archive.tar
-rw-r--r-- 1 drozmano drozmano 150M Jul  4 08:42 gromacs.tar

Individual files use some integral number of storage blocks, this is why the archive is smaller than the total size of the original files. Data padding to fill the last block is only done once in the case of the archive.


Archival with compression using the gzip is actually faster (5.9 sec) due to less data that needs to be written to the compressed archive:

# 5.9 sec
$ tar -czvf archive.tar.gz my_data
....
$ ls -lh archive.tar.gz
-rw-r--r--  1 drozmano drozmano  33M Jul  4 08:46 archive.tar.gz

Several files