Curl: Difference between revisions
(→wget) |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= | = curl vs wget = | ||
* http://daniel.haxx.se/docs/curl-vs-wget.html | |||
* [https://www.howtogeek.com/447033/how-to-use-curl-to-download-files-from-the-linux-command-line/ How to Use curl to Download Files From the Linux Command Line] | |||
<syntaxhighlight lang='bash'> | <syntaxhighlight lang='bash'> | ||
sudo apt-get install curl | sudo apt-get install curl | ||
Line 178: | Line 180: | ||
= Download a website = | = Download a website = | ||
* [https://www.makeuseof.com/tag/how-do-i-download-an-entire-website-for-offline-reading/ 6 Tools to Download an Entire Website for Offline Reading] | |||
* [https://github.com/ArchiveBox/ArchiveBox ArchiveBox] - Open-source self-hosted web archiving. | |||
== wget == | == wget == | ||
* http://linux.about.com/od/commands/a/Example-Uses-Of-The-Command-Wget.htm | * http://linux.about.com/od/commands/a/Example-Uses-Of-The-Command-Wget.htm |
Latest revision as of 11:33, 10 February 2024
curl vs wget
- http://daniel.haxx.se/docs/curl-vs-wget.html
- How to Use curl to Download Files From the Linux Command Line
sudo apt-get install curl
For example, the Download link at the National Geographic Travel Photo Contest 2014 works for curl but not wget. I can use curl with -o option but wget with -o will not work in this case. Note with curl, we can also use the -O (capital O) option which will write output to a local file named like the remote file.
curl \ http://travel.nationalgeographic.com/u/TvyamNb-BivtNwcoxtkc5xGBuGkIMh_nj4UJHQKuoXEsSpOVjL0t9P0vY7CvlbxSYeJUAZrEdZUAnSJk2-sJd-XIwQ_nYA/ \ -o owl.jpg
Should I Use Curl Or Wget? and curl vs Wget
- The main benefit of using the wget command is that it can be used to recursively download files.
- The curl command lets you use wildcards to specify the URLs you wish to retrieve. And curl supports more protocols than wget (HTTP, HTTPS, FTP) does.
The wget command can recover when a download fails whereas the curl command cannot.
Actually curl supports continuous downloading too. But not all FTP connection supports continuous downloading. The following examples show it is possible to use the continuous downloading option in wget/curl for downloading file from ncbi FTP but not from illumina FTP.
$ wget -c ftp://igenome:[email protected]/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz --2017-04-13 10:46:16-- ftp://igenome:*password*@ussd-ftp.illumina.com/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz => ‘Drosophila_melanogaster_Ensembl_BDGP6.tar.gz’ Resolving ussd-ftp.illumina.com (ussd-ftp.illumina.com)... 66.192.10.36 Connecting to ussd-ftp.illumina.com (ussd-ftp.illumina.com)|66.192.10.36|:21... connected. Logging in as igenome ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /Drosophila_melanogaster/Ensembl/BDGP6 ... done. ==> SIZE Drosophila_melanogaster_Ensembl_BDGP6.tar.gz ... 762893718 ==> PASV ... done. ==> REST 1706053 ... REST failed, starting from scratch. ==> RETR Drosophila_melanogaster_Ensembl_BDGP6.tar.gz ... done. Length: 762893718 (728M), 761187665 (726M) remaining (unauthoritative) 0% [ ] 374,832 79.7KB/s eta 2h 35m ^C $ curl -L -O -C - ftp://igenome:[email protected]/Drosophila_melanogaster/Ensembl/BDGP6/Drosophila_melanogaster_Ensembl_BDGP6.tar.gz ** Resuming transfer from byte position 1706053 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 727M 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 curl: (31) Couldn't use REST $ wget -c ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz --2017-04-13 10:52:02-- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz => ‘common_all_20160601.vcf.gz’ Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 2607:f220:41e:250::7, 130.14.250.10 Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|2607:f220:41e:250::7|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /snp/organisms/human_9606_b147_GRCh37p13/VCF ... done. ==> SIZE common_all_20160601.vcf.gz ... 1023469198 ==> EPSV ... done. ==> RETR common_all_20160601.vcf.gz ... done. Length: 1023469198 (976M) (unauthoritative) 24% [===========================> ] 255,800,120 55.2MB/s eta 15s ^C $ wget -c ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz --2017-04-13 10:52:11-- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz => ‘common_all_20160601.vcf.gz’ Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 2607:f220:41e:250::7, 130.14.250.10 Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|2607:f220:41e:250::7|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /snp/organisms/human_9606_b147_GRCh37p13/VCF ... done. ==> SIZE common_all_20160601.vcf.gz ... 1023469198 ==> EPSV ... done. ==> REST 267759996 ... done. ==> RETR common_all_20160601.vcf.gz ... done. Length: 1023469198 (976M), 755709202 (721M) remaining (unauthoritative) 47% [++++++++++++++++++++++++++++++========================> ] 491,152,032 50.6MB/s eta 12s ^C $ curl -L -O -C - ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/VCF/common_all_20160601.vcf.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 65 976M 65 639M 0 0 83.7M 0 0:00:11 0:00:07 0:00:04 90.4M^C
curl man page, supported protocols
https://curl.haxx.se/docs/manpage.html
curl complete guide
wget overwrites the existing file
Use the -N or --timestamping option to turn on time-stamping. Don't re-retrieve files unless newer than local.
wget to specify the output directory
Use the -P prefix or --directory-prefix=prefix option. For example, wget URL -P /tmp or wget URL -P /tmp/ .
Hide progress bar output
curl hide progress bar output on Linux/Unix shell scripts
wget and username/password
http://www.cyberciti.biz/faq/wget-command-with-username-password/
Download and Un-tar(Extract) in One Step
If we don't want to avoid saving a temporary file, we can use one piped statement.
curl http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 | tar xvz # OR wget http://download.osgeo.org/geos/geos-3.5.0.tar.bz2 -O - | tar jx # For .gz file wget -O - ftp://ftp.direcory/file.gz | gunzip -c > gunzip.out
See shellhacks.com. Note that the magic part of the wget option "-O -"; it will output the document to the standard output instead of a file.
The "-c" in gunzip is to have gzip output to the console. PS. it seems not necessary to use the "-c" option.
Download and execute the script in one step
See Execute bash script from URL. Note "-s" parameter in curl means the silent mode.
curl -s https://server/path/script.sh | sudo sh curl -s http://server/path/script.sh | sudo bash /dev/stdin arg1 arg2 sudo -v && wget -nv -O- https://download.calibre-ebook.com/linux-installer.sh | sudo sh /dev/stdin
Download and install binary software using sudo
One example (Calibre) is like
sudo -v && wget -nv -O- https://raw.githubusercontent.com/kovidgoyal/calibre/master/setup/linux-installer.py | \ sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"
Note that in wget the option "-O-" means writing to standard output (so the file from the URL is NOT written to the disk) and "-nv" means no verbose.
If the option "-O-" is not used, we'd better to use "-N" option in wget to overwrite an existing file.
Another example is adding the GPG key.
# https://docs.docker.com/install/linux/docker-ce/ubuntu/ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
See the Logging and Download options in wget's manual.
-O file --output-document=file The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If - is used as file, documents will be printed to standard output, disabling link conversion. (Use ./- to print to a file literally named -.)
curl and POST request
- http://superuser.com/questions/149329/what-is-the-curl-command-line-syntax-to-do-a-post-request
- https://learn.adafruit.com/raspberry-pi-physical-dashboard?view=all (the original post I saw)
- http://conqueringthecommandline.com/book/curl
curl and proxy
How to use curl command with proxy username/password on Linux/ Unix
Website performance
httpstat – A Curl Statistics Tool to Check Website Performance
wget/curl a file with correct name when redirected
wget --trust-server-names <url> # Or wget --content-disposition <url> # Or curl -JLO <url>
wget to download a folder
wget -A pdf,jpg,PDF,JPG -m -p -E -k -K -np http://site/path/
wget -r ftp://server-address.com/directory
Download a website
- 6 Tools to Download an Entire Website for Offline Reading
- ArchiveBox - Open-source self-hosted web archiving.
wget
- http://linux.about.com/od/commands/a/Example-Uses-Of-The-Command-Wget.htm
- https://www.gnu.org/software/wget/manual/wget.html
- 11 Best Free Website Downloader Software For Windows
- How To Download A Website With Wget The Right Way
- Downloading an Entire Web Site with wget from linuxjournal
- How to Use wget, the Ultimate Command Line Downloading Tool. Windows users can use Cygwin or Windows 10’s Ubuntu’s Bash shell.
- How to ignore specific type of files to download in wget?
To download a copy of a complete web site, use the recursive option ('-r') By default it will go up to five levels deep. You can change the default level by using the '-l' option.
All files linked to in the documents are are downloaded to enable complete offline viewing ('-p' and '--convert-links' options). Instead of having the progress messages displayed on the standard output, you can save it to a log file with the -o option.
wget -p --convert-links -r -l2 linux.about.com -o logfile wget -p --convert-links -r -l1 https://csgillespie.github.io/efficientR # create csgillespie/efficientR wget -p --convert-links -r -l2 --reject WMA,doc,mp4,ppt,pdf,zip,exe,vcf https://xxx.xxx # Exclude certain file types. Takes only 10 sec, for example.
2 Ways to Download Files From Linux Terminal
wget -m --convert-links --page-requisites website_address
- –convert-links : links are converted so that internal links are pointed to downloaded resource instead of web
- –page-requisites: downloads additional things like style sheets so that the pages look better offline
Note
- The index.html file downloaded using the above command still differs from the website (the hyperlink). It seems it has nothing to do with the option --convert-links or -m.
- We can use wget to download the original index.html and place in the downloaded website folder. The downloaded index.html file will look perfect on the browser. We can use this way to modify the index.html file. (Cf. it seems it does not work if we place index.html file inside the folder downloaded using HTTrack).
- --noparent if you want to avoid downloading folders and files above the current level.
- The links in css/html files will be changed. So they are not the same as the original.
HTTrack Website Copier
- https://www.httrack.com/
- WebHTTrack Website Copier! sudo apt install webhttrack On Ubuntu, the app is in a web application (http://HOSTNAME:8080). We can launch it by typing 'webhttrack' in order to launch it in our default browser. See Grabbing Websites with WebHTTrack in Linux-magazine.
- When I run it in Ubuntu, it started an http server with port 8080. The interface is in a browser. After finishing the download, we can browse the mirrored website using the same http server.
- Use this tip to exclude some file types we don't need to download. This can save lots of time if we have big files from *.zip, *.ZIP, *.vcf, *.WMA, *.tar, *.tar.gz, *.ova, *.mp4, *.exe, *.jar, *.ogg, *.pdf, *.ppt.
Steps
- Seleect an existing project or create a new project. Next.
- Action: Update existing download. Add a URL. Click "Set Options.."
- Click "Scan Rules" and enter the following (one long line) in the box. Clikc OK.
-*.ova -*.doc -*.mp4 -*.ppt -*.pdf -*.WMA -*.zip -*.exe +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/*
- Click "Scan Rules" and enter the following (one long line) in the box. Clikc OK.
- Click Next.
- Click Start.
Note
- It looks we can modify the local directory name. So we can keep a timestamp backup.
- It seems even I choose 'update', it still download all files again?
- It seems it will change *.htm file names to *.html.
httrack: command-line program
- http://www.httrack.com/html/fcguide.html, http://www.httrack.com/html/httrack.man.html
- WebHTTrack Website Copier
httrack http://www.documentfoundation.org -* +*.htm* +*.pdf -O /home/floeff/websites
- Create a Local Copy of a Website with HTTrack
- How to copy website using HTTrack
- How to use HTTrack on Mac Terminal
Save Web Pages As Single HTML Files With Monolith
Save Web Pages As Single HTML Files For Offline Use With Monolith (Console)
Internet application: wttr.in, check weather from console/terminal
- https://github.com/chubin/wttr.in. https://wttr.in is located in Germany.
- The weather and visualization backend is wego (both look very similar). The weather data is based on forecast.io which leads to darksky.net.
- Display Weather Forecast In Your Terminal With Wttr.in
$ curl wttr.in $ curl wttr.in/?m # check to use the metric system $ curl wttr.in/washington $ curl wttr.in/olney # not sure about which olney $ curl wttr.in/~olney # show the exact location at the bottom $ curl wttr.in/taipei
- 10 Ways to Check the Weather From Your Linux Desktop
Internet application: cheat.sh
See man -> Cheat.sh.
Cookies
My automatic NYT crossword downloading script
Files downloaded from a browser and wget
Same file downloaded through a browser and the wget command has a different file size and behavior.
$ ls -lh biotrip*.gz -rw-r--r-- 1 brb brb 198M May 15 09:11 biotrip_0.1.0_may19.tar.gz -rw-rw-r-- 1 brb brb 195M May 14 16:57 biotrip_0.1.0.tar.gz $ file biotrip_0.1.0_may19.tar.gz # downloaded from a browser (chrome browser, Mac or Linux) biotrip_0.1.0_may19.tar.gz: POSIX tar archive $ file biotrip_0.1.0.tar.gz # downloaded from the wget command biotrip_0.1.0.tar.gz: gzip compressed data, from HPFS filesystem (OS/2, NT) $ tar xzvf biotrip_0.1.0_may19.tar.gz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now