PDF: Difference between revisions
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Issues = | |||
* In one instance, '''Adobe Acrobat''' was able to present a heatmap correctly, while Mac's '''Preview''' and Linux's '''Evince''' produced a plot with multiple vertical lines. | |||
= Ubuntu PDF viewer = | = Ubuntu PDF viewer = | ||
[https://www.ubuntupit.com/linux-pdf-viewer-best-15-pdf-readers-reviewed-for-linux-users/ Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users] | [https://www.ubuntupit.com/linux-pdf-viewer-best-15-pdf-readers-reviewed-for-linux-users/ Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users] | ||
Line 199: | Line 202: | ||
= How to convert pdf to image on Linux command line = | = How to convert pdf to image on Linux command line = | ||
[https://www.cyberciti.biz/faq/how-to-convert-pdf-to-image-on-linux-command-line/ How to convert pdf to image on Linux command line]. I got an error when I used the '''convert''' command; see [https://stackoverflow.com/a/53180170 ImageMagick security policy 'PDF' blocking conversion] for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04. | [https://www.cyberciti.biz/faq/how-to-convert-pdf-to-image-on-linux-command-line/ How to convert pdf to image on Linux command line]. I got an error when I used the '''convert''' command; see [https://stackoverflow.com/a/53180170 ImageMagick security policy 'PDF' blocking conversion] for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04. | ||
= Convert PDF to Word = | |||
[https://www.makeuseof.com/best-pdf-to-word-converters-for-linux The 7 Best PDF-to-Word Converters for Linux]. Adobe reader, ONLYOFFICE, Sejda PDF Desktop, PDF24, Smallpdf, Okular, PDF Studio. | |||
= Merge multiple pdf files into one pdf file = | = Merge multiple pdf files into one pdf file = | ||
Line 260: | Line 266: | ||
= Security = | = Security = | ||
* [https://www.adobe.com/acrobat/resources/can-pdfs-contain-viruses.html Can PDFs contains viruses] by Adobe | |||
** [https://www.virustotal.com/gui/home/upload VirusTotal] | |||
** Configure Acrobat not to launch '''non-PDF attachments''' with external applications. | |||
** Adjust or '''disable JavaScript''' in Acrobat to further protect against vulnerabilities. | |||
** Use Adobe '''cloud storage''' for your PDF storage (oneDrive also scanned files on upload too) | |||
* [https://www.lifewire.com/best-free-online-virus-scanners-1356651 The 6 Best Free Online Virus Scanners of 2023] | |||
** VirusTotal | |||
** MetaDefender Cloud | |||
** Avira | |||
** Jotti's Malware Scan | |||
** Kaspersky VirusDesk | |||
** FortiGuard Online Scanner | |||
* Here are some steps you can take to check if a PDF file is clean and free of viruses: | * Here are some steps you can take to check if a PDF file is clean and free of viruses: | ||
** Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program. | ** Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program. | ||
** Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender. | ** Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender. | ||
* ''' | * '''How a pdf file can have a virus''' | ||
** A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer. | ** A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer. | ||
** Another way a PDF file can contain a virus is through embedded scripts or macros. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus. | ** Another way a PDF file can contain a virus is through '''embedded scripts''' or '''macros'''. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus. | ||
** Finally, a PDF file could contain an embedded file, such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus. | ** Finally, a PDF file could contain an '''embedded file''', such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus. | ||
* ''' | * '''Does adobe reader have built-in protection to detect any problems in a pdf file'''. Adobe Reader has built-in security features to help protect your computer from malicious PDF files. These features include: | ||
** '''Sandboxing''': Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system. | ** '''Sandboxing''': Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system. | ||
** '''Protected View''': When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed. | ** '''Protected View''': When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed. | ||
Line 290: | Line 307: | ||
== OCR == | == OCR == | ||
[https://www.adobe.com/acrobat/how-to/pdf-to-excel-xlsx-converter.html Export to Excel]. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR). | [https://www.adobe.com/acrobat/how-to/pdf-to-excel-xlsx-converter.html Export to Excel]. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR). | ||
= OCR = | |||
* [https://github.com/ocrmypdf/OCRmyPDF OCRmyPDF]. OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched. | |||
* [https://www.linuxlinks.com/OCRTools/ 9 Best Free and Open Source OCR Tools] | |||
= Print a text file with line numbers = | = Print a text file with line numbers = |
Latest revision as of 09:59, 22 October 2024
Issues
- In one instance, Adobe Acrobat was able to present a heatmap correctly, while Mac's Preview and Linux's Evince produced a plot with multiple vertical lines.
Ubuntu PDF viewer
Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users
- Okular (install through app store, annotation function, trim margins/selection) Best
- How to use annotations, How to annotate documents using Okular. Click Tools -> Review to display the annotation tools.
- Click 'Reviews' icon to see a list of annotations that were made.
- Adobe Reader
- Qoppa PDF Studio
- Foxit Reader (By default it will be installed to ~/opt/foxitsoftware/foxitreader). It freezes my Pop_OS 20.04.
- MuPDF (lightweight, seems no thumbnail option, no GUI interface)
- XPDF
- Qpdfview
- GNU GV
- Zathura
- Atril Document Reader
- ePDF Viewer
- Calibre
- Google Drive
- Master PDF Editor
Change the default viewer
Right Click(pdf)-> Properties-> Open With-> Okular (or anything) -> Set as default.
PDF reader
The default one Evince seems slow when I try to view odroid magazine.
MuPDF is good at speed. Okular is good at annotation.
I installed and tried MuPDF (github source code). It seems faster and I don't see blank pages when I view one odroid magazine. In terms of speed, mupdf >> xpdf >> okular >> Evince.
To change it to be the default program for opening PDF files, right click the file and select Property. Go to the Open With tab. Choose your file viewer.
sudo apt-get install mupdf
Keyboard shortcuts for mupdf (man mupdf) or http://mupdf.com/docs/manual. Note these are case-sensitive.
W - fit to width H - fit to height L - rotate page left (clockwise) R - rotate page right (counter-clockwise) 12g - go to page 12 >,< - go to the next or previous page +,- - zoom in or out / - search for text n,N - Find the next or previous search result. h,j,k,l - Scroll page left, down, up, or right.
Tip: to copy a text, use the right mouse button to select a text. Then use Ctrl+c to copy it. It seems it does not work all the time:(
Other pdf viewer choices are
- acroread
- Allow to have custom colors for page background and document text.
- The custom colors works well on Macbook Pro (2880 x 1440). Background color #494949 and text color #494949.
- xpdf. old-fashioned. slow.
- evince. slow.
- okular (KDE/Qt application)
- Annotation tool such as highlighter is under Tools > Review (F6).
- Allow to change its background color. Though it works, the result using 'invert colors' option is not good on Dell U2312HM. We can try other option like 'dark & light colors' where we can change the individual colors for the background (say #494949) and text.
- Not as fast as mupdf. It can open a variety of ebook formats.
- MacOS should work but it needs to install KDE.
- Able to show file properties eg Page Size (eg 50x36 in), Creator (eg PowerPoint), Producer (eg Mac OS X Quartz PDFContext), PDF version (eg 1.3)
- kpdf
- gv
- qpdfview. slow. Used by Raspbian june 2018.
- Foxit or PDF-XChange Viewer(needs wine)
Browsers
- Why You Don't Need Adobe Reader (And What to Use Instead)
- Use Adobe Acrobat chrome extension to edit PDF files (Aug 2021)
PDF crop
6 Best PDF Page Cropping Tools For Linux
krop
It is easy to use and works fine on Ubuntu 20.04 (I am using 0.5.1 though the current version is 0.6.0).
http://arminstraub.com/software/krop
Install manually
$ sudo apt install python3-poppler-qt5 python3-pypdf2 python3-pip $ pip3 install https://github.com/arminstraub/krop/archive/v0.6.0.tar.gz --user Successfully built krop Installing collected packages: krop WARNING: The script krop is installed in '/home/brb/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
I can add ~/.local/bin to the global PATH. But how do I make it available through Activities?
pdfcrop
pdfcrop (briss is better)
https://askubuntu.com/questions/124692/command-line-tool-to-crop-pdf-files
sudo apt-get install texlive-extra-utils pdfcrop input.pdf output.pdf # no margins, works but seems too tight pdfcrop --margins 5 input.pdf output.pdf # crop pdf but keep 5 bp from each side of page pdfcrop --margins '5 10 20 30' input.pdf output.pdf # left, top, right and bottom margins of 5, 10, 20, and 30 pt # To actually crop something away, use negative values in the argument for crop. # For example, to crops 50 pts from the left, top, right, bottom (in this order). pdfcrop --margins '-50 -50 -50 -50' input.pdf output.pdf
One problem I found is (for newer PDFs with meta data) --margins initially removes the entire margin before implementing the adjustment. This will cause some pages being chopped out.
briss
This java program gives me a better control on cropping
- Download the file briss-0.9.tar.gz (8.7 MB) and extract it
- Run java -jar briss-0.9.jar
- Load the pdf file. It will ask what pages to be excluded from merging (This function does not work). Click 'Cancel' to continue.
- It will automatically create two rectangle areas; one for odd (left) pages and the other for even (right)pages
- Now we work on the left page first. Enlarge the selection to suit our need. Then right click & choose 'Select/Deselect rectangle' (a dash line will be added to the edges of the rectangle) and then 'Copy rectangles'.
- Work on the right page. Right click and choose 'Delete rectangle'. Then 'Paste rectangles'.
- Now we can click 'Action -> Preview' to preview the result. If we are satisfied with the result, we can click 'Action -> Crop PDF'. Done.
Adobe Acrobat
Crop a page using the Crop tool
- Tools > Edit PDF
- Crop Pages
- Drag a rectangle on the page
- Double-click inside the cropping rectangle
- set the page range or select All under Page Range.
- OK to crop the page or pages.
pdftk
Extract pages
pdftk oldfile.pdf cat 3-8 output newfile.pdf pdftk oldfile.pdf cat 5 9 11 output newfile.pdf
Remove certain pages
https://www.linux.com/learn/manipulating-pdfs-pdf-toolkit
sudo apt install pdftk # remove pages 10 to 25 from a PDF file pdftk myDocument.pdf cat 1-9 26-end output removedPages.pdf # remove the last page pdftk infile.pdf cat 1-r2 output outfile.pdf # remove the last 2 pages pdftk infile.pdf cat 1-r3 output outfile.pdf
Rotate using pdftk
First I convert jpg files to pdf using imagemagic.
convert *.jpg INPUT.pdf
Then I install pdftk and follow this to do a rotation.
$ sudo apt install snapd $ sudo snap install pdftk # Suppose I want to rotate page 1 to page 2. $ /snap/bin/pdftk INPUT.pdf rotate 1-2west output OUTPUT.pdf
PDF Mix Tool
- https://gitlab.com/scarpetta/pdfmixtool
- PDF Mix Tool 1.0 Released With Overhauled Interface, PDF Metadata Editing And Qt6 Support
Open pdf in mobile (iOS/Android) browsers
- iOS supports to preview PDF files in browsers
- Android browsers cannot preview PDF files (tested on Chrome, Brave, Opera).
- The only way I tested successfully is firefox nightly build + Android PDF.js (made by Manuel Reimer). PS. Firefox on Android does not allow people to install any add-ons on regular firefox so we need to install the FF nightly build version. We also need to follow the instruction to create an AMO account. ... Firefox settings -> Advanced -> Custom Add-on collection. Enter ID (5149765) & collection name (android). Now go back to Advanced -> Add-ons. We have a list of add-ons to install from. Now click '+' to enable the individual add-ons we like. How FF browser links to my AMO account?
- kiwi browser. Haven’t tried yet.
PDF highlight and annotation
Evince
How to Annotate PDFs in Linux (Beginner's Guide)
Okular
Install Okular by
sudo apt-get install okular
To highlight a line, click F6 (Tools -> Review) to turn on the annotation tool bar (it will be shown on the left hand side of the documentation). You can then click
- the 4th icon to highlight a line (it may not be able to select the right texts we want. But when it works the result is nice)
- the last icon to draw an ellipse or a rectangle (to change from an ellipse to a rectange you can click Settings -> configure Okular... -> annotation)
Another method is to use a windows program and run it using Wine. See the discussion here.
Android & iOS
Xodo Free. Cross platform.
How to convert pdf to image on Linux command line
How to convert pdf to image on Linux command line. I got an error when I used the convert command; see ImageMagick security policy 'PDF' blocking conversion for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04.
Convert PDF to Word
The 7 Best PDF-to-Word Converters for Linux. Adobe reader, ONLYOFFICE, Sejda PDF Desktop, PDF24, Smallpdf, Okular, PDF Studio.
Merge multiple pdf files into one pdf file
https://stackoverflow.com/questions/2507766/merge-convert-multiple-pdf-files-into-one-pdf
pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf
Arrange, merge, split, rotate, crop
PDFArranger: Merge, Split, Rotate, Crop Or Rearrange PDF Documents (PDF-Shuffler Fork)
https://github.com/jeromerobert/pdfarranger
Editing
- Download Master PDF Editor 4 For Linux (Free To Use Version)
- Xournal, Handwritten Notes And PDF Annotation Tool Xournal++ Update Brings New Floating Toolbox
- PDF Arranger 1.7.0 Released With New Features And Enhancements Jan 2021
- Merging and preserving bookmarks
- Edit PDFs on Linux with these open source tools
TOC/table of contents
- How to create clickable table of contents in a PDF?
- Preview Tip: Making a linked Table of Contents
- PDFOutliner
Insert links in a document
How to Create Internal Links in PDFs with Adobe Acrobat
Print scale
Print > Scale > slce to page
Print multiple pages per sheet: pdfnup
The program is similar to psnup.
sudo apt install texlive-extra-utils
Search
sudo apt install pdfgrep # or brew install on macOS pdfgrep 'pattern' *.pdf
Extract tables from pdf
Split view
It is useful if we want to compare two pages side by side.
- Use split-window view from Adobe reader.
- How to compare two PDF documents side by side from foxit (Windows, mac, Linux).
- Using browsers
Optimize for mobile device
Compress PDF files
The Best Ways to Compress PDFs for Free
Security
- Can PDFs contains viruses by Adobe
- VirusTotal
- Configure Acrobat not to launch non-PDF attachments with external applications.
- Adjust or disable JavaScript in Acrobat to further protect against vulnerabilities.
- Use Adobe cloud storage for your PDF storage (oneDrive also scanned files on upload too)
- The 6 Best Free Online Virus Scanners of 2023
- VirusTotal
- MetaDefender Cloud
- Avira
- Jotti's Malware Scan
- Kaspersky VirusDesk
- FortiGuard Online Scanner
- Here are some steps you can take to check if a PDF file is clean and free of viruses:
- Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program.
- Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender.
- How a pdf file can have a virus
- A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer.
- Another way a PDF file can contain a virus is through embedded scripts or macros. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus.
- Finally, a PDF file could contain an embedded file, such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus.
- Does adobe reader have built-in protection to detect any problems in a pdf file. Adobe Reader has built-in security features to help protect your computer from malicious PDF files. These features include:
- Sandboxing: Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system.
- Protected View: When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed.
- Enhanced Security: Adobe Reader has an Enhanced Security feature that can be enabled to provide additional protection against malicious PDF files. This feature restricts certain actions, such as accessing the internet or running JavaScript, that could be used by a malicious PDF file to harm your computer.
- Printing a PDF file to a new PDF file using a printer setting can help to remove some potentially malicious content from the original file. This process is sometimes referred to as "refrying" a PDF. When you print a PDF to a new PDF, the content is essentially flattened and any interactive elements, such as form fields, multimedia, or JavaScript, are removed. This can help to mitigate some security risks associated with opening a potentially malicious PDF file.
- However, it's important to note that this process is not foolproof and may not remove all potentially malicious content from the original file. Additionally, refrying a PDF can result in a loss of quality or changes to the layout and formatting of the document.
Adobe reader
Close the Tools pane in Acrobat Reader DC
Sign/signature
- How to Sign a PDF: 6 Ways to Secure Electronic Signatures
- service.cancer.gov -> Search pdf signature -> Digitally Sign a Document in Adobe Acrobat or Reader
- Open the file in Adobe Acrobat. Click the Certificates icon on the RHS (may need to use More Tools to add the icon to the side bar). A new "Certificates" bar should appear at the top below the tool bar. Click "Digital Sign" icon and follow the instruction on the screen to do the rest (select an area, select a digital ID, continue, sign, save, replace, PIN, OK).
OCR
Export to Excel. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR).
OCR
- OCRmyPDF. OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.
- 9 Best Free and Open Source OCR Tools
Print a text file with line numbers
How To Add Line Numbers To Text Files On Linux.
For print out R source code, we should only keep the code starting with the function definition because that's the way RStudio will display.
# 1. Using 'nl' command $ nl -b a file.txt # 2. Using 'cat' command $ cat -n file.txt # 3. Using 'awk' command $ awk 'BEGIN{i=1} /.*/{printf "%d.% s\n",i,$0; i++}' file.txt # 4. Using 'sed' command $ sed '/./=' file.txt | sed '/./N; s/\n/ /' # 5. Using 'less' command $ less -N file.txt
Turn web pages into PDF
Command line in Linux
How to Convert a Web Page to a PDF File or Images in Linux. wkhtmltopdf
How to turn web pages into PDFs using 'google-chrome'
- How to Create PDF of Webpage Using Google Chrome Headless. pagedown::chrome_print()
google-chrome --headless --disable-gpu --print-to-pdf=file1.pdf http://www.example.com/
- How to turn web pages into PDFs with Puppeteer and NodeJS
Print website to PDF online
- The problem of using 'google-chrome' is I don't have a lot of controls.
- https://webtopdf.com/ (it works great when I test this page. It turns the website into a 44 pages pdf though it lost the table of contents). It has lots of options like zoom, margins, ...
- To preserve the table of contents, I checked Auto Bookmark.
- Convert Pages to PDF to Access Blocked Websites
Rmd to PDF
Printable PDF
- Printable Annual Checklist
- Blank Monthly Calendar Print Out