Extract files

From 太極
Jump to navigation Jump to search

tar --extract

When using the tar --extract command, you don't need to specify extra parameters for the following extensions:

  • .tar (uncompressed tar files)
  • .tar.gz (gzip-compressed tar files)
  • .tar.xz (xz-compressed tar files)
  • .tar.Z (compress-compressed tar files)
tar --extract -f file.tar
tar --extract -f file.tar.gz
tar --extract -f file.tar.xz
tar --extract -f file.tar.Z

Painless file extraction on Linux

Painless file extraction on Linux

#!/bin/bash

if [ $# -eq 0 ]; then
    echo -n "filename> "
    read filename
else
    filename=$1
fi

if [ ! -f "$filename" ]; then
    echo "No such file: $filename"
    exit $?
fi

case $filename in
    *.tar)      tar xvf $filename;;
    *.tar.bz2)  tar xvjf $filename;;
    *.tbz)      tar xvjf $filename;;
    *.tbz2)     tar xvjf $filename;;
    *.tgz)      tar xvzf $filename;;
    *.tar.gz)   tar xvzf $filename;;
    *.gz)       gunzip -v $filename;;
    *.bz2)      bunzip2 -v $filename;;
    *.zip)      unzip -v $filename;;
    *.Z)        uncompress -v $filename;;
    *)          echo "No extract option for $filename"
esac

Extract tar.gz or zip to a specified directory

tar xzvf XXXX.tar.gz -C DIRECTORY
# single or double quotes will give an error
#
# tar xzvf ~/Downloads/inSilicoDb_2.7.0.tar.gz -C "~/Downloads"
# tar: ~/Downloads: Cannot open: No such file or directory
# tar: Error is not recoverable: exiting now
# $ tar xzvf ~/Downloads/inSilicoDb_2.7.0.tar.gz -C '~/Downloads'
# tar: ~/Downloads: Cannot open: No such file or directory
# tar: Error is not recoverable: exiting now

unzip XXX.zip -d DIRECTORY

Extract gz file but keep the original gz file

gunzip -c x.txt.gz > x.txt

gunzip -c which simply writes the output stream to stdout

Extract .xz file

xz -d archive.xz

Extract tar.xz file

The bottomline is we don't need the 'z' parameter (used for gz ONLY but does not work for xz file) in the tar command for tar.xz files. And the method also works for tar.gz files. The argument '-f' means the archive file. Recall that the tar command can be used to store and extract files, so no default parameters.

tar xf archive.tar.xz
tar xf archive.tar.gz

Extract tar.bz2 file

tar -xjvf archive.tar.bz2  # replace z with j as we compare it to tar.gz file

How To Extract and Decompress a .bz2/.tbz2 File

See this article from cyberciti.biz.

bzip2 -d your-filename-here.bz2
# OR
bzip2 -d -v your-filename-here.bz2
# OR
bzip2 -d -k your-filename-here.bz2
# OR
bunzip2 filename.bz2

rar files

How to Extract (Open) a RAR File in Linux

sudo apt install unrar
unrar x myfile.rar
# x option is used to keep the file structure

10 Basic Encryption Terms Everyone Should Know and Understand

https://www.makeuseof.com/tag/encryption-terms/

How to Encrypt and Decrypt Files and Directories

How to install and use 7zip file archiver

Compare zip, tar.xz, tar.gz, 7z

The compression rate comparison is (from best to worst) 7z > tar.xz > tar.gz > zip.

For example, consider qt-everywhere-opensource-src-5.5.0 from http://download.qt.io/official_releases/qt/5.5/5.5.0/single/

  • zip 540M
  • tar.xz 305M
  • tar.gz 436M
  • 7z 297M

Extract one files from tar.gz

Extract a file called etc/default/sysstat from config.tar.gz tarball:

$ tar -zxvf config.tar.gz etc/default/sysstat

Noe that a new directory etc/default will be created under the current directory if it does not exist.

Wildcard based extracting

You can also extract those files that match a specific globbing pattern (wildcards). For example, to extract from cbz.tar all files that begin with pic, no matter their directory prefix, you could type:

$ tar -xf cbz.tar --wildcards --no-anchored 'pic*'

To extract all php files, enter:

$ tar -xf cbz.tar --wildcards --no-anchored '*.php'

remove leading directory components on extraction with tar

AVFS and Archivemount

If we want to extract certain files from a tarballj/archive, it is more efficient to use a virtual filesystem like AVFS. PS. for a large archive file, even extracting only a single file at the top directory it is terribly slow if we use the tar command directly.

Before we install the utility, let's look at the package dependecies of AVFS and Archivemount.

$ apt-cache showpkg archivemount
Package: archivemount
Versions: 
0.8.1-1 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages)
 Description Language: 
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages
                  MD5: d6302be9f06a91afa32326ab175e2086
 Description Language: en
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_i18n_Translation-en
                  MD5: d6302be9f06a91afa32326ab175e2086


Reverse Depends: 
  archivemount:i386,archivemount
Dependencies: 
0.8.1-1 - libarchive13 (0 (null)) libc6 (2 2.4) libfuse2 (2 2.8.1) fuse (2 2.8.5-2) archivemount:i386 (0 (null)) 
Provides: 
0.8.1-1 - 
Reverse Provides: 
brb@T3600 ~ $ apt-cache showpkg avfs
Package: avfs
Versions: 
1.0.1-2 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages) (/var/lib/dpkg/status)
 Description Language: 
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages
                  MD5: bce08fbc36fd7b8e3c454f36f0daf699
 Description Language: en
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_i18n_Translation-en
                  MD5: bce08fbc36fd7b8e3c454f36f0daf699


Reverse Depends: 
  avfs:i386,avfs
  worker,avfs
Dependencies: 
1.0.1-2 - libc6 (2 2.14) libfuse2 (2 2.8.1) fuse (0 (null)) unzip (0 (null)) zip (0 (null)) arj (0 (null)) lha (0 (null))
 zoo (0 (null)) rpm (0 (null)) p7zip (16 (null)) p7zip-full (0 (null)) cdparanoia (0 (null)) 
wget (0 (null)) avfs:i386 (0 (null)) 
Provides: 
1.0.1-2 - 
Reverse Provides:

Install it now.

sudo apt-get install avfs
mountavfs
# Assume MyFile.tar.gz exists in the current directory
ls ~/.avfs/$PWD/MyFile.tar.gz#       
# Alternatively, browse the content in Nautilus, but you need to add a trailing # character by hand to the path 
# (Ctrl-L to access the address bar).
...
cat ~/.avfs/$PWD/MyFile.tar.gz#/README
# another tarball
ls ~/.avfs/$PWD/MyFile2.tar.gz#       
umountavfs

For some reason, avfs sometimes does not work:( In this case, Ubuntu's Archive Manager does work. Maybe the file is too large.

brb@T3600 ~/Downloads $ time ls ~/.avfs/$PWD/Homo_sapiens_UCSC_hg19.tar.gz#/
ls: cannot access /home/brb/.avfs//home/brb/Downloads/Homo_sapiens_UCSC_hg19.tar.gz#/nown	exact	1	SingleClassTriAllelic,InconsistentAlleles	2	1000GENOMES,SSMP,	2	A,T,	22.000000,2274.000: Input/output error
ls: cannot access /home/brb/.avfs//home/brb/Downloads/Homo_sapiens_UCSC_hg19.tar.gz#/chr12	25482890	rs544684287	G	A	0	.	molType=genomic;class=single
chr12	25482914	rs558575390	T	G	0	.	m: Input/output error
000,?0.999500,0.000500,??797?chr3?27877637?27877638?rs1478557?0?+?G?G?A
4?rs555100828?0?+?T?T?C
76?chr2?103777623?103777624?rs181283085?0?+?A?A?A
chr12?25482890?rs544684287?G?A?0?.?molType=genomic;class=single?chr12?25482914?rs558575390?T?G?0?.?m
G?A
Homo_sapiens
nown?exact?1?SingleClassTriAllelic,InconsistentAlleles?2?1000GENOMES,SSMP,?2?A,T,?22.000000,2274.000
README.txt
T?C

real	25m51.340s
user	0m0.000s
sys	0m0.003s
brb@T3600 ~/Downloads $ ls ~/.avfs/$PWD/annovar.latest.tar.gz#/
annovar

For archivemount, see Cool User File Systems: ArchiveMount

archivemount files.tgz mntDir
umount mntDir