PDF: Difference between revisions

From 太極
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Issues =
* In one instance, '''Adobe Acrobat''' was able to present a heatmap correctly, while Mac's '''Preview''' and Linux's '''Evince''' produced a plot with multiple vertical lines.
= Ubuntu PDF viewer =
= Ubuntu PDF viewer =
[https://www.ubuntupit.com/linux-pdf-viewer-best-15-pdf-readers-reviewed-for-linux-users/ Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users]
[https://www.ubuntupit.com/linux-pdf-viewer-best-15-pdf-readers-reviewed-for-linux-users/ Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users]
Line 199: Line 202:
= How to convert pdf to image on Linux command line =
= How to convert pdf to image on Linux command line =
[https://www.cyberciti.biz/faq/how-to-convert-pdf-to-image-on-linux-command-line/ How to convert pdf to image on Linux command line]. I got an error when I used the '''convert''' command; see [https://stackoverflow.com/a/53180170 ImageMagick security policy 'PDF' blocking conversion] for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04.
[https://www.cyberciti.biz/faq/how-to-convert-pdf-to-image-on-linux-command-line/ How to convert pdf to image on Linux command line]. I got an error when I used the '''convert''' command; see [https://stackoverflow.com/a/53180170 ImageMagick security policy 'PDF' blocking conversion] for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04.
= Convert PDF to Word =
[https://www.makeuseof.com/best-pdf-to-word-converters-for-linux The 7 Best PDF-to-Word Converters for Linux]. Adobe reader, ONLYOFFICE, Sejda PDF Desktop, PDF24, Smallpdf, Okular, PDF Studio.


= Merge multiple pdf files into one pdf file =
= Merge multiple pdf files into one pdf file =
Line 260: Line 266:


= Security =
= Security =
* [https://www.adobe.com/acrobat/resources/can-pdfs-contain-viruses.html Can PDFs contains viruses] by Adobe
** [https://www.virustotal.com/gui/home/upload VirusTotal]
** Configure Acrobat not to launch '''non-PDF attachments''' with external applications.
** Adjust or '''disable JavaScript''' in Acrobat to further protect against vulnerabilities.
** Use Adobe '''cloud storage''' for your PDF storage (oneDrive also scanned files on upload too)
* [https://www.lifewire.com/best-free-online-virus-scanners-1356651 The 6 Best Free Online Virus Scanners of 2023]
** VirusTotal
** MetaDefender Cloud
** Avira
** Jotti's Malware Scan
** Kaspersky VirusDesk
** FortiGuard Online Scanner
* Here are some steps you can take to check if a PDF file is clean and free of viruses:
* Here are some steps you can take to check if a PDF file is clean and free of viruses:
** Use antivirus software: Make sure your computer is protected by up-to-date antivirus software. Most antivirus programs can scan files for viruses and other malware before you open them.
** Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program.
** Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program.
** Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender.
** Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender.
** Check the file properties: Right-click on the PDF file and select `Properties`. Check the `General` tab for information about the file size, type, and date created/modified. If anything seems suspicious, such as an unusually large file size or a recent modification date, it might be best to avoid opening the file.
** Be cautious: If you're unsure about the source of the PDF file or if anything seems suspicious, it's best to err on the side of caution and not open the file.


* '''how a pdf file can have a virus'''  
* '''How a pdf file can have a virus'''  
** A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer.
** A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer.
** Another way a PDF file can contain a virus is through embedded scripts or macros. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus.
** Another way a PDF file can contain a virus is through '''embedded scripts''' or '''macros'''. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus.
** Finally, a PDF file could contain an embedded file, such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus.
** Finally, a PDF file could contain an '''embedded file''', such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus.


* '''does adobe reader have built-in protection to detect any problems in a pdf file'''. Adobe Reader has built-in security features to help protect your computer from malicious PDF files. These features include:
* '''Does adobe reader have built-in protection to detect any problems in a pdf file'''. Adobe Reader has built-in security features to help protect your computer from malicious PDF files. These features include:
** '''Sandboxing''': Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system.
** '''Sandboxing''': Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system.
** '''Protected View''': When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed.
** '''Protected View''': When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed.
Line 290: Line 307:
== OCR ==
== OCR ==
[https://www.adobe.com/acrobat/how-to/pdf-to-excel-xlsx-converter.html Export to Excel]. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR).
[https://www.adobe.com/acrobat/how-to/pdf-to-excel-xlsx-converter.html Export to Excel]. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR).
= OCR =
* [https://github.com/ocrmypdf/OCRmyPDF OCRmyPDF]. OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.
* [https://www.linuxlinks.com/OCRTools/ 9 Best Free and Open Source OCR Tools]


= Print a text file with line numbers =
= Print a text file with line numbers =

Latest revision as of 09:59, 22 October 2024

Issues

  • In one instance, Adobe Acrobat was able to present a heatmap correctly, while Mac's Preview and Linux's Evince produced a plot with multiple vertical lines.

Ubuntu PDF viewer

Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users

  • Okular (install through app store, annotation function, trim margins/selection) Best
  • Adobe Reader
  • Qoppa PDF Studio
  • Foxit Reader (By default it will be installed to ~/opt/foxitsoftware/foxitreader). It freezes my Pop_OS 20.04.
  • MuPDF (lightweight, seems no thumbnail option, no GUI interface)
  • XPDF
  • Qpdfview
  • GNU GV
  • Zathura
  • Atril Document Reader
  • ePDF Viewer
  • Calibre
  • Google Drive
  • Master PDF Editor

Change the default viewer

Right Click(pdf)-> Properties-> Open With-> Okular (or anything) -> Set as default.

PDF reader

The default one Evince seems slow when I try to view odroid magazine.

MuPDF is good at speed. Okular is good at annotation.

I installed and tried MuPDF (github source code). It seems faster and I don't see blank pages when I view one odroid magazine. In terms of speed, mupdf >> xpdf >> okular >> Evince.

To change it to be the default program for opening PDF files, right click the file and select Property. Go to the Open With tab. Choose your file viewer.

sudo apt-get install mupdf

Keyboard shortcuts for mupdf (man mupdf) or http://mupdf.com/docs/manual. Note these are case-sensitive.

W    - fit to width
H    - fit to height
L    - rotate page left (clockwise)
R    - rotate page right (counter-clockwise)
12g  - go to page 12
>,<  - go to the next or previous page
+,-  - zoom in or out
/    - search for text
n,N  - Find the next or previous search result.
h,j,k,l - Scroll page left, down, up, or right.

Tip: to copy a text, use the right mouse button to select a text. Then use Ctrl+c to copy it. It seems it does not work all the time:(

Other pdf viewer choices are

  • acroread
    • Allow to have custom colors for page background and document text.
    • The custom colors works well on Macbook Pro (2880 x 1440). Background color #494949 and text color #494949.
  • xpdf. old-fashioned. slow.
  • evince. slow.
  • okular (KDE/Qt application)
    • Annotation tool such as highlighter is under Tools > Review (F6).
    • Allow to change its background color. Though it works, the result using 'invert colors' option is not good on Dell U2312HM. We can try other option like 'dark & light colors' where we can change the individual colors for the background (say #494949) and text.
    • Not as fast as mupdf. It can open a variety of ebook formats.
    • MacOS should work but it needs to install KDE.
    • Able to show file properties eg Page Size (eg 50x36 in), Creator (eg PowerPoint), Producer (eg Mac OS X Quartz PDFContext), PDF version (eg 1.3)
  • kpdf
  • gv
  • qpdfview. slow. Used by Raspbian june 2018.
  • Foxit or PDF-XChange Viewer(needs wine)

Browsers

PDF crop

6 Best PDF Page Cropping Tools For Linux

krop

It is easy to use and works fine on Ubuntu 20.04 (I am using 0.5.1 though the current version is 0.6.0).

http://arminstraub.com/software/krop

Install manually

$ sudo apt install python3-poppler-qt5 python3-pypdf2 python3-pip
$ pip3 install https://github.com/arminstraub/krop/archive/v0.6.0.tar.gz --user
Successfully built krop
Installing collected packages: krop
  WARNING: The script krop is installed in '/home/brb/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

I can add ~/.local/bin to the global PATH. But how do I make it available through Activities?

pdfcrop

pdfcrop (briss is better)

https://askubuntu.com/questions/124692/command-line-tool-to-crop-pdf-files

sudo apt-get install texlive-extra-utils

pdfcrop input.pdf output.pdf  # no margins, works but seems too tight

pdfcrop --margins 5 input.pdf output.pdf   # crop pdf but keep 5 bp from each side of page

pdfcrop --margins '5 10 20 30' input.pdf output.pdf  
#  left, top, right and bottom margins of 5, 10, 20, and 30 pt 

# To actually crop something away, use negative values in the argument for crop.
# For example, to crops 50 pts from the left, top, right, bottom (in this order).
pdfcrop --margins '-50 -50 -50 -50' input.pdf output.pdf

One problem I found is (for newer PDFs with meta data) --margins initially removes the entire margin before implementing the adjustment. This will cause some pages being chopped out.

briss

briss

This java program gives me a better control on cropping

  1. Download the file briss-0.9.tar.gz (8.7 MB) and extract it
  2. Run java -jar briss-0.9.jar
  3. Load the pdf file. It will ask what pages to be excluded from merging (This function does not work). Click 'Cancel' to continue.
  4. It will automatically create two rectangle areas; one for odd (left) pages and the other for even (right)pages
  5. Now we work on the left page first. Enlarge the selection to suit our need. Then right click & choose 'Select/Deselect rectangle' (a dash line will be added to the edges of the rectangle) and then 'Copy rectangles'.
  6. Work on the right page. Right click and choose 'Delete rectangle'. Then 'Paste rectangles'.
  7. Now we can click 'Action -> Preview' to preview the result. If we are satisfied with the result, we can click 'Action -> Crop PDF'. Done.

Adobe Acrobat

Crop a page using the Crop tool

  1. Tools > Edit PDF
  2. Crop Pages
  3. Drag a rectangle on the page
  4. Double-click inside the cropping rectangle
  5. set the page range or select All under Page Range.
  6. OK to crop the page or pages.

pdftk

Extract pages

pdftk oldfile.pdf cat 3-8 output newfile.pdf

pdftk oldfile.pdf cat 5 9 11 output newfile.pdf

Remove certain pages

https://www.linux.com/learn/manipulating-pdfs-pdf-toolkit

sudo apt install pdftk

# remove pages 10 to 25 from a PDF file
pdftk myDocument.pdf cat 1-9 26-end output removedPages.pdf

# remove the last page
pdftk infile.pdf cat 1-r2 output outfile.pdf

# remove the last 2 pages
pdftk infile.pdf cat 1-r3 output outfile.pdf

Rotate using pdftk

First I convert jpg files to pdf using imagemagic.

convert *.jpg INPUT.pdf

Then I install pdftk and follow this to do a rotation.

$ sudo apt install snapd
$ sudo snap install pdftk 

# Suppose I want to rotate page 1 to page 2.
$ /snap/bin/pdftk INPUT.pdf rotate 1-2west output OUTPUT.pdf


PDF Mix Tool

Open pdf in mobile (iOS/Android) browsers

  • iOS supports to preview PDF files in browsers
  • Android browsers cannot preview PDF files (tested on Chrome, Brave, Opera).
    • The only way I tested successfully is firefox nightly build + Android PDF.js (made by Manuel Reimer). PS. Firefox on Android does not allow people to install any add-ons on regular firefox so we need to install the FF nightly build version. We also need to follow the instruction to create an AMO account. ... Firefox settings -> Advanced -> Custom Add-on collection. Enter ID (5149765) & collection name (android). Now go back to Advanced -> Add-ons. We have a list of add-ons to install from. Now click '+' to enable the individual add-ons we like. How FF browser links to my AMO account?
    • kiwi browser. Haven’t tried yet.

PDF highlight and annotation

Evince

How to Annotate PDFs in Linux (Beginner's Guide)

Okular

Install Okular by

sudo apt-get install okular

To highlight a line, click F6 (Tools -> Review) to turn on the annotation tool bar (it will be shown on the left hand side of the documentation). You can then click

  1. the 4th icon to highlight a line (it may not be able to select the right texts we want. But when it works the result is nice)
  2. the last icon to draw an ellipse or a rectangle (to change from an ellipse to a rectange you can click Settings -> configure Okular... -> annotation)

Another method is to use a windows program and run it using Wine. See the discussion here.

Android & iOS

Xodo Free. Cross platform.

How to convert pdf to image on Linux command line

How to convert pdf to image on Linux command line. I got an error when I used the convert command; see ImageMagick security policy 'PDF' blocking conversion for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04.

Convert PDF to Word

The 7 Best PDF-to-Word Converters for Linux. Adobe reader, ONLYOFFICE, Sejda PDF Desktop, PDF24, Smallpdf, Okular, PDF Studio.

Merge multiple pdf files into one pdf file

https://stackoverflow.com/questions/2507766/merge-convert-multiple-pdf-files-into-one-pdf

pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf

Arrange, merge, split, rotate, crop

PDFArranger: Merge, Split, Rotate, Crop Or Rearrange PDF Documents (PDF-Shuffler Fork)

https://github.com/jeromerobert/pdfarranger

Editing

TOC/table of contents

Insert links in a document

How to Create Internal Links in PDFs with Adobe Acrobat

Print scale

Print > Scale > slce to page

Print multiple pages per sheet: pdfnup

The program is similar to psnup.

sudo apt install texlive-extra-utils

Search

How can I grep in PDF files?

sudo apt install pdfgrep # or brew install on macOS

pdfgrep 'pattern' *.pdf

Extract tables from pdf

Split view

It is useful if we want to compare two pages side by side.

Optimize for mobile device

k2pdfopt

Compress PDF files

The Best Ways to Compress PDFs for Free

Security

  • Can PDFs contains viruses by Adobe
    • VirusTotal
    • Configure Acrobat not to launch non-PDF attachments with external applications.
    • Adjust or disable JavaScript in Acrobat to further protect against vulnerabilities.
    • Use Adobe cloud storage for your PDF storage (oneDrive also scanned files on upload too)
  • Here are some steps you can take to check if a PDF file is clean and free of viruses:
    • Scan the file: Before opening the PDF file, you can use your antivirus software to scan it for viruses. Right-click on the file and select the option to scan it with your antivirus program.
    • Use an online virus scanner: There are several online virus scanners that allow you to upload a file and scan it for viruses. Some popular options include VirusTotal, Jotti's Malware Scan, and Metadefender.
  • How a pdf file can have a virus
    • A PDF file can contain a virus or other malicious code in several ways. One way is by exploiting vulnerabilities in the software used to view the PDF file. For example, if there is a security flaw in a PDF reader, a specially crafted PDF file could exploit that flaw to execute malicious code on the user's computer.
    • Another way a PDF file can contain a virus is through embedded scripts or macros. PDF files can contain JavaScript code or other types of scripts that can be executed when the file is opened. If the script contains malicious code, it could infect the user's computer with a virus.
    • Finally, a PDF file could contain an embedded file, such as an executable program, that is automatically launched when the PDF is opened. If the embedded file is malicious, it could infect the user's computer with a virus.
  • Does adobe reader have built-in protection to detect any problems in a pdf file. Adobe Reader has built-in security features to help protect your computer from malicious PDF files. These features include:
    • Sandboxing: Adobe Reader uses a technique called sandboxing to isolate the PDF reader from the rest of the system. This means that even if a malicious PDF file is able to exploit a vulnerability in the reader, it will be contained within the sandbox and unable to harm the rest of the system.
    • Protected View: When you open a PDF file from an untrusted source, such as the internet or an email attachment, Adobe Reader can open it in Protected View. This is a read-only mode that restricts access to certain features and prevents potentially dangerous content from being executed.
    • Enhanced Security: Adobe Reader has an Enhanced Security feature that can be enabled to provide additional protection against malicious PDF files. This feature restricts certain actions, such as accessing the internet or running JavaScript, that could be used by a malicious PDF file to harm your computer.
  • Printing a PDF file to a new PDF file using a printer setting can help to remove some potentially malicious content from the original file. This process is sometimes referred to as "refrying" a PDF. When you print a PDF to a new PDF, the content is essentially flattened and any interactive elements, such as form fields, multimedia, or JavaScript, are removed. This can help to mitigate some security risks associated with opening a potentially malicious PDF file.
    • However, it's important to note that this process is not foolproof and may not remove all potentially malicious content from the original file. Additionally, refrying a PDF can result in a loss of quality or changes to the layout and formatting of the document.

Adobe reader

Close the Tools pane in Acrobat Reader DC

Sign/signature

  • How to Sign a PDF: 6 Ways to Secure Electronic Signatures
  • service.cancer.gov -> Search pdf signature -> Digitally Sign a Document in Adobe Acrobat or Reader
  • Open the file in Adobe Acrobat. Click the Certificates icon on the RHS (may need to use More Tools to add the icon to the side bar). A new "Certificates" bar should appear at the top below the tool bar. Click "Digital Sign" icon and follow the instruction on the screen to do the rest (select an area, select a digital ID, continue, sign, save, replace, PIN, OK).

OCR

Export to Excel. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR).

OCR

Print a text file with line numbers

How To Add Line Numbers To Text Files On Linux.

For print out R source code, we should only keep the code starting with the function definition because that's the way RStudio will display.

# 1. Using 'nl' command
$ nl -b a  file.txt

# 2. Using 'cat' command
$ cat -n file.txt 

# 3. Using 'awk' command
$ awk 'BEGIN{i=1} /.*/{printf "%d.% s\n",i,$0; i++}' file.txt 

# 4. Using 'sed' command
$ sed '/./=' file.txt | sed '/./N; s/\n/ /'

# 5. Using 'less' command
$ less -N file.txt 

Turn web pages into PDF

Command line in Linux

How to Convert a Web Page to a PDF File or Images in Linux. wkhtmltopdf

How to turn web pages into PDFs using 'google-chrome'

Print website to PDF online

  • The problem of using 'google-chrome' is I don't have a lot of controls.
  • https://webtopdf.com/ (it works great when I test this page. It turns the website into a 44 pages pdf though it lost the table of contents). It has lots of options like zoom, margins, ...
    • To preserve the table of contents, I checked Auto Bookmark.
  • Convert Pages to PDF to Access Blocked Websites

Rmd to PDF

Using {pagedown} in Docker

Printable PDF

  • Printable Annual Checklist
  • Blank Monthly Calendar Print Out