Curl

From 太極
Revision as of 08:54, 13 August 2019 by Brb (talk | contribs) (Created page with "= [http://daniel.haxx.se/docs/curl-vs-wget.html curl vs wget] = <syntaxhighlight lang='bash'> sudo apt-get install curl </syntaxhighlight> For example, the Download link at th...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

curl vs wget

sudo apt-get install curl

For example, the Download link at the National Geographic Travel Photo Contest 2014 works for curl but not wget. I can use curl with -o option but wget with -o will not work in this case. Note with curl, we can also use the -O (capital O) option which will write output to a local file named like the remote file.

curl \
 http://travel.nationalgeographic.com/u/TvyamNb-BivtNwcoxtkc5xGBuGkIMh_nj4UJHQKuoXEsSpOVjL0t9P0vY7CvlbxSYeJUAZrEdZUAnSJk2-sJd-XIwQ_nYA/ \
 -o owl.jpg

Should I Use Curl Or Wget? and curl vs Wget

  • The main benefit of using the wget command is that it can be used to recursively download files.
  • The curl command lets you use wildcards to specify the URLs you wish to retrieve. And curl supports more protocols than wget (HTTP, HTTPS, FTP) does.
  • The wget command can recover when a download fails whereas the curl command cannot.

Actually curl supports continuous downloading too. But not all FTP connection supports continuous downloading. The following examples show it is possible to use the continuous downloading option in wget/curl for downloading file from ncbi FTP but not from illumina FTP.

$ wget -c ftp://igenome:[email protected]/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz
--2017-04-13 10:46:16--  ftp://igenome:*password*@ussd-ftp.illumina.com/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz
           => ‘Drosophila_melanogaster_Ensembl_BDGP6.tar.gz’
Resolving ussd-ftp.illumina.com (ussd-ftp.illumina.com)... 66.192.10.36
Connecting to ussd-ftp.illumina.com (ussd-ftp.illumina.com)|66.192.10.36|:21... connected.
Logging in as igenome ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /Drosophila_melanogaster/Ensembl/BDGP6 ... done.
==> SIZE Drosophila_melanogaster_Ensembl_BDGP6.tar.gz ... 762893718
==> PASV ... done.    ==> REST 1706053 ... 
REST failed, starting from scratch.
 
==> RETR Drosophila_melanogaster_Ensembl_BDGP6.tar.gz ... done.
Length: 762893718 (728M), 761187665 (726M) remaining (unauthoritative)
 
 0% [                                                                                                                   ] 374,832     79.7KB/s  eta 2h 35m ^C
 
$ curl -L -O -C - ftp://igenome:[email protected]/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz
** Resuming transfer from byte position 1706053
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  727M    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
curl: (31) Couldn't use REST

$ wget -c ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz
--2017-04-13 10:52:02--  ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz
           => ‘common_all_20160601.vcf.gz’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 2607:f220:41e:250::7, 130.14.250.10
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|2607:f220:41e:250::7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /snp/organisms/human_9606_b147_GRCh37p13/VCF ... done.
==> SIZE common_all_20160601.vcf.gz ... 1023469198
==> EPSV ... done.    ==> RETR common_all_20160601.vcf.gz ... done.
Length: 1023469198 (976M) (unauthoritative)
 
24% [===========================>                                                                                       ] 255,800,120 55.2MB/s  eta 15s    ^C
 
$ wget -c ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz
--2017-04-13 10:52:11--  ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz
           => ‘common_all_20160601.vcf.gz’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 2607:f220:41e:250::7, 130.14.250.10
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|2607:f220:41e:250::7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /snp/organisms/human_9606_b147_GRCh37p13/VCF ... done.
==> SIZE common_all_20160601.vcf.gz ... 1023469198
==> EPSV ... done.    ==> REST 267759996 ... done.    
==> RETR common_all_20160601.vcf.gz ... done.
Length: 1023469198 (976M), 755709202 (721M) remaining (unauthoritative)
 
47% [++++++++++++++++++++++++++++++========================>                                                            ] 491,152,032 50.6MB/s  eta 12s    ^C

$ curl -L -O -C - ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 65  976M   65  639M    0     0  83.7M      0  0:00:11  0:00:07  0:00:04 90.4M^C

curl man page, supported protocols

https://curl.haxx.se/docs/manpage.html

wget overwrites the existing file

Use the -N or --timestamping option to turn on time-stamping. Don't re-retrieve files unless newer than local.

wget and username/password

http://www.cyberciti.biz/faq/wget-command-with-username-password/

Download and Un-tar(Extract) in One Step

If we don't want to avoid saving a temporary file, we can use one piped statement.

curl http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 | tar xvz
# OR
wget http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 -O - | tar jx

# For .gz file
wget -O - ftp://ftp.direcory/file.gz | gunzip -c > gunzip.out

See shellhacks.com. Note that the magic part of the wget option "-O -"; it will output the document to the standard output instead of a file.

The "-c" in gunzip is to have gzip output to the console. PS. it seems not necessary to use the "-c" option.

Download and execute the script in one step

See Execute bash script from URL. Note "-s" parameter in curl means the silent mode.

curl -s https://server/path/script.sh | sudo sh

curl -s http://server/path/script.sh | sudo bash /dev/stdin arg1 arg2

sudo -v && wget -nv -O- https://download.calibre-ebook.com/linux-installer.sh | sudo sh /dev/stdin

curl and POST request

curl and proxy

How to use curl command with proxy username/password on Linux/ Unix

Website performance

httpstat – A Curl Statistics Tool to Check Website Performance

wget/curl a file with correct name when redirected

wget --trust-server-names <url> 
# Or
wget --content-disposition <url>
# Or
curl -JLO <url>

wget to download a folder

https://stackoverflow.com/questions/8755229/how-to-download-all-files-but-not-html-from-a-website-using-wget

wget -A pdf,jpg,PDF,JPG -m -p -E -k -K -np http://site/path/

wget to download a website

To download a copy of a complete web site, use the recursive option ('-r') By default it will go up to five levels deep. You can change the default level by using the '-l' option.

All files linked to in the documents are are downloaded to enable complete offline viewing ('-p' and '--convert-links' options). Instead of having the progress messages displayed on the standard output, you can save it to a log file with the -o option.

wget -p --convert-links -r -l2 linux.about.com -o logfile
wget -p --convert-links -r -l1 https://csgillespie.github.io/efficientR # create csgillespie/efficientR

Internet application: wttr.in, check weather from console/terminal

Internet application: cheat.sh

See man -> Cheat.sh.