How to compress data for archival: Difference between revisions
(Added navbox) |
|||
(8 intermediate revisions by 2 users not shown) | |||
Line 13: | Line 13: | ||
== One directory == | == One directory == | ||
For this example we use a directory that contains '''6172 files | For this example we use a directory <code>my_data</code> that contains '''165 MB of data in 6172 files'''. | ||
To create an archive, the '''cvf''' options are used, | To create an archive, the '''cvf''' options are used, | ||
Line 21: | Line 21: | ||
Archival without compression takes 8.5 sec and produces a file of 150MB in size. | Archival without compression takes 8.5 sec and produces a file of 150MB in size. | ||
<pre> | <pre> | ||
$ tar cvf archive.tar my_data | $ tar cvf archive.tar my_data | ||
.... | .... | ||
$ ls -lh archive.tar | $ ls -lh archive.tar | ||
-rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:42 | -rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:42 archive.tar | ||
</pre> | </pre> | ||
Individual files use some integral number of storage blocks, | Individual files use some integral number of storage blocks, | ||
Line 33: | Line 32: | ||
Archival with compression using the <code>gzip</code> is requested by using the '''z''' option, | Archival with compression using the <code>gzip</code> is requested by using the '''z''' option, | ||
and it is actually '''faster''' (5.9 sec) due to less data that | and it is actually '''faster''' (5.9 sec) due to less data that need to be written to the compressed archive. | ||
The archive size is 33MB, that is '''20% of the original size'''. | The archive size is 33MB, that is '''20% of the original size'''. | ||
<pre> | <pre> | ||
$ tar czvf archive.tar.gz my_data | $ tar czvf archive.tar.gz my_data | ||
.... | .... | ||
Line 46: | Line 44: | ||
The archive size is 26MB, that is '''16% of the original size'''. | The archive size is 26MB, that is '''16% of the original size'''. | ||
<pre> | <pre> | ||
$ tar cjvf archive.tar.bz2 my_data | $ tar cjvf archive.tar.bz2 my_data | ||
.... | .... | ||
Line 56: | Line 53: | ||
The archive size is the smallest, 22MB, which is '''13% of the original size'''. | The archive size is the smallest, 22MB, which is '''13% of the original size'''. | ||
<pre> | <pre> | ||
$ tar cJvf archive.tar.xz my_data | $ tar cJvf archive.tar.xz my_data | ||
.... | .... | ||
Line 83: | Line 79: | ||
$ ls -lh | $ ls -lh | ||
-rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:57 archive.tar | -rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:57 archive.tar | ||
-rw-r--r-- 1 drozmano drozmano 26M Jul 4 09:16 archive.tar. | -rw-r--r-- 1 drozmano drozmano 26M Jul 4 09:16 archive.tar.bz2 | ||
-rw-r--r-- 1 drozmano drozmano 33M Jul 4 09:15 archive.tar.gz | -rw-r--r-- 1 drozmano drozmano 33M Jul 4 09:15 archive.tar.gz | ||
-rw-r--r-- 1 drozmano drozmano 22M Jul 4 09:17 archive.tar.xz | -rw-r--r-- 1 drozmano drozmano 22M Jul 4 09:17 archive.tar.xz | ||
</pre> | </pre> | ||
[[Category:Guides]] | |||
[[Category:How-Tos]] | |||
{{Navbox Guides}} |
Latest revision as of 20:28, 21 September 2023
Using tar command
The tar command is used to archive files, that is to put many files into one single archive file, with no compression. In some cases this is the goal, one file simplifies file management, it is fast, and it is suitable for streaming the archive to an archival storage device, such as tape storage.
However, nowadays, mostly it is desired to add some kind of compression of the archived data.
tar
can do this by using external compression programs, such as gzip
or bzip2
.
gzip
, GNU Zip, provides reasonable compression and relatively fast.
bzip2
compresses better, but significantly slower. In many cases, gzip
is good enough.
One directory
For this example we use a directory my_data
that contains 165 MB of data in 6172 files.
To create an archive, the cvf options are used,
which means Create Verbose a File.
These options instruct tar
to create a new file for the archive and
also print file names of the files being added to the archive.
Archival without compression takes 8.5 sec and produces a file of 150MB in size.
$ tar cvf archive.tar my_data .... $ ls -lh archive.tar -rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:42 archive.tar
Individual files use some integral number of storage blocks, this is why the archive is smaller than the total size of the original files. Data padding to fill the last block is only done once in the case of the archive.
Archival with compression using the gzip
is requested by using the z option,
and it is actually faster (5.9 sec) due to less data that need to be written to the compressed archive.
The archive size is 33MB, that is 20% of the original size.
$ tar czvf archive.tar.gz my_data .... $ ls -lh archive.tar.gz -rw-r--r-- 1 drozmano drozmano 33M Jul 4 08:46 archive.tar.gz
Archival with the bzip2
compression is requested by the j option, and takes about twice as long, 16.8 sec.
The archive size is 26MB, that is 16% of the original size.
$ tar cjvf archive.tar.bz2 my_data .... $ ls -lh archive.tar.bz2 -rw-r--r-- 1 drozmano drozmano 26M Jul 4 08:58 archive.tar.bz2
Archival with the newer XZ
compression is requested by the J option, and takes the longest, 4-times as long, 30.7 sec:
The archive size is the smallest, 22MB, which is 13% of the original size.
$ tar cJvf archive.tar.xz my_data .... $ ls -lh archive.tar.xz -rw-r--r-- 1 drozmano drozmano 22M Jul 4 09:02 archive.tar.xz
One file
If one wants to compress just one file.
For example, there is an uncompressed archive, archive.tar
and we want to compress it.
Here are the possible commands:
$ls -lh -rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:57 archive.tar $ tar czvf archive.tar.gz archive.tar .... $ tar cjvf archive.tar.bz2 archive.tar .... $ tar cJvf archive.tar.xz archive.tar ..... $ ls -lh -rw-r--r-- 1 drozmano drozmano 150M Jul 4 08:57 archive.tar -rw-r--r-- 1 drozmano drozmano 26M Jul 4 09:16 archive.tar.bz2 -rw-r--r-- 1 drozmano drozmano 33M Jul 4 09:15 archive.tar.gz -rw-r--r-- 1 drozmano drozmano 22M Jul 4 09:17 archive.tar.xz