PDF: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 239: Line 239:


= Extract tables from pdf =
= Extract tables from pdf =
[[R#Extracting_tables_from_PDFs|Extracting tables from PDFs]]
* [[R#Extracting_tables_from_PDFs|Extracting tables from PDFs]]
* [https://www.makeuseof.com/websites-to-extract-table-from-pdf/ 6 Websites to Extract a Table From a PDF]


= Split view =
= Split view =

Revision as of 12:45, 4 February 2023

Ubuntu PDF viewer

Linux PDF Viewer: Best 15 PDF Readers Reviewed for Linux Users

  • Okular (install through app store, annotation function, trim margins/selection) Best
  • Adobe Reader
  • Qoppa PDF Studio
  • Foxit Reader (By default it will be installed to ~/opt/foxitsoftware/foxitreader). It freezes my Pop_OS 20.04.
  • MuPDF (lightweight, seems no thumbnail option, no GUI interface)
  • XPDF
  • Qpdfview
  • GNU GV
  • Zathura
  • Atril Document Reader
  • ePDF Viewer
  • Calibre
  • Google Drive
  • Master PDF Editor

Change the default viewer

Right Click(pdf)-> Properties-> Open With-> Okular (or anything) -> Set as default.

PDF reader

The default one Evince seems slow when I try to view odroid magazine.

MuPDF is good at speed. Okular is good at annotation.

I installed and tried MuPDF (github source code). It seems faster and I don't see blank pages when I view one odroid magazine. In terms of speed, mupdf >> xpdf >> okular >> Evince.

To change it to be the default program for opening PDF files, right click the file and select Property. Go to the Open With tab. Choose your file viewer.

sudo apt-get install mupdf

Keyboard shortcuts for mupdf (man mupdf) or http://mupdf.com/docs/manual. Note these are case-sensitive.

W    - fit to width
H    - fit to height
L    - rotate page left (clockwise)
R    - rotate page right (counter-clockwise)
12g  - go to page 12
>,<  - go to the next or previous page
+,-  - zoom in or out
/    - search for text
n,N  - Find the next or previous search result.
h,j,k,l - Scroll page left, down, up, or right.

Tip: to copy a text, use the right mouse button to select a text. Then use Ctrl+c to copy it. It seems it does not work all the time:(

Other pdf viewer choices are

  • acroread
    • Allow to have custom colors for page background and document text.
    • The custom colors works well on Macbook Pro (2880 x 1440). Background color #494949 and text color #494949.
  • xpdf. old-fashioned. slow.
  • evince. slow.
  • okular (KDE/Qt application)
    • Annotation tool such as highlighter is under Tools > Review (F6).
    • Allow to change its background color. Though it works, the result using 'invert colors' option is not good on Dell U2312HM. We can try other option like 'dark & light colors' where we can change the individual colors for the background (say #494949) and text.
    • Not as fast as mupdf. It can open a variety of ebook formats.
    • MacOS should work but it needs to install KDE.
    • Able to show file properties eg Page Size (eg 50x36 in), Creator (eg PowerPoint), Producer (eg Mac OS X Quartz PDFContext), PDF version (eg 1.3)
  • kpdf
  • gv
  • qpdfview. slow. Used by Raspbian june 2018.
  • Foxit or PDF-XChange Viewer(needs wine)

Browsers

PDF crop

6 Best PDF Page Cropping Tools For Linux

krop

It is easy to use and works fine on Ubuntu 20.04 (I am using 0.5.1 though the current version is 0.6.0).

http://arminstraub.com/software/krop

Install manually

$ sudo apt install python3-poppler-qt5 python3-pypdf2 python3-pip
$ pip3 install https://github.com/arminstraub/krop/archive/v0.6.0.tar.gz --user
Successfully built krop
Installing collected packages: krop
  WARNING: The script krop is installed in '/home/brb/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

I can add ~/.local/bin to the global PATH. But how do I make it available through Activities?

pdfcrop

pdfcrop (briss is better)

https://askubuntu.com/questions/124692/command-line-tool-to-crop-pdf-files

sudo apt-get install texlive-extra-utils

pdfcrop input.pdf output.pdf  # no margins, works but seems too tight

pdfcrop --margins 5 input.pdf output.pdf   # crop pdf but keep 5 bp from each side of page

pdfcrop --margins '5 10 20 30' input.pdf output.pdf  
#  left, top, right and bottom margins of 5, 10, 20, and 30 pt 

# To actually crop something away, use negative values in the argument for crop.
# For example, to crops 50 pts from the left, top, right, bottom (in this order).
pdfcrop --margins '-50 -50 -50 -50' input.pdf output.pdf

One problem I found is (for newer PDFs with meta data) --margins initially removes the entire margin before implementing the adjustment. This will cause some pages being chopped out.

briss

briss

This java program gives me a better control on cropping

  1. Download the file briss-0.9.tar.gz (8.7 MB) and extract it
  2. Run java -jar briss-0.9.jar
  3. Load the pdf file. It will ask what pages to be excluded from merging (This function does not work). Click 'Cancel' to continue.
  4. It will automatically create two rectangle areas; one for odd (left) pages and the other for even (right)pages
  5. Now we work on the left page first. Enlarge the selection to suit our need. Then right click & choose 'Select/Deselect rectangle' (a dash line will be added to the edges of the rectangle) and then 'Copy rectangles'.
  6. Work on the right page. Right click and choose 'Delete rectangle'. Then 'Paste rectangles'.
  7. Now we can click 'Action -> Preview' to preview the result. If we are satisfied with the result, we can click 'Action -> Crop PDF'. Done.

Adobe Acrobat

Crop a page using the Crop tool

  1. Tools > Edit PDF
  2. Crop Pages
  3. Drag a rectangle on the page
  4. Double-click inside the cropping rectangle
  5. set the page range or select All under Page Range.
  6. OK to crop the page or pages.

pdftk

Extract pages

pdftk oldfile.pdf cat 3-8 output newfile.pdf

pdftk oldfile.pdf cat 5 9 11 output newfile.pdf

Remove certain pages

https://www.linux.com/learn/manipulating-pdfs-pdf-toolkit

sudo apt install pdftk

# remove pages 10 to 25 from a PDF file
pdftk myDocument.pdf cat 1-9 26-end output removedPages.pdf

# remove the last page
pdftk infile.pdf cat 1-r2 output outfile.pdf

# remove the last 2 pages
pdftk infile.pdf cat 1-r3 output outfile.pdf

Rotate using pdftk

First I convert jpg files to pdf using imagemagic.

convert *.jpg INPUT.pdf

Then I install pdftk and follow this to do a rotation.

$ sudo apt install snapd
$ sudo snap install pdftk 

# Suppose I want to rotate page 1 to page 2.
$ /snap/bin/pdftk INPUT.pdf rotate 1-2west output OUTPUT.pdf


PDF Mix Tool

Open pdf in mobile (iOS/Android) browsers

  • iOS supports to preview PDF files in browsers
  • Android browsers cannot preview PDF files (tested on Chrome, Brave, Opera).
    • The only way I tested successfully is firefox nightly build + Android PDF.js (made by Manuel Reimer). PS. Firefox on Android does not allow people to install any add-ons on regular firefox so we need to install the FF nightly build version. We also need to follow the instruction to create an AMO account. ... Firefox settings -> Advanced -> Custom Add-on collection. Enter ID (5149765) & collection name (android). Now go back to Advanced -> Add-ons. We have a list of add-ons to install from. Now click '+' to enable the individual add-ons we like. How FF browser links to my AMO account?
    • kiwi browser. Haven’t tried yet.

PDF highlight and annotation

Install Okular by

sudo apt-get install okular

To highlight a line, click F6 (Tools -> Review) to turn on the annotation tool bar (it will be shown on the left hand side of the documentation). You can then click

  1. the 4th icon to highlight a line (it may not be able to select the right texts we want. But when it works the result is nice)
  2. the last icon to draw an ellipse or a rectangle (to change from an ellipse to a rectange you can click Settings -> configure Okular... -> annotation)

Another method is to use a windows program and run it using Wine. See the discussion here.

Android & iOS

Xodo Free. Cross platform.

How to convert pdf to image on Linux command line

How to convert pdf to image on Linux command line. I got an error when I used the convert command; see ImageMagick security policy 'PDF' blocking conversion for a solution by editing the file </etc/ImageMagick-6/policy.xml> on my Ubuntu 20.04.

Merge multiple pdf files into one pdf file

https://stackoverflow.com/questions/2507766/merge-convert-multiple-pdf-files-into-one-pdf

pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf

Arrange, merge, split, rotate, crop

PDFArranger: Merge, Split, Rotate, Crop Or Rearrange PDF Documents (PDF-Shuffler Fork)

https://github.com/jeromerobert/pdfarranger

Editing

TOC/table of contents

Insert links in a document

How to Create Internal Links in PDFs with Adobe Acrobat

Print scale

Print > Scale > slce to page

Print multiple pages per sheet: pdfnup

The program is similar to psnup.

sudo apt install texlive-extra-utils

Search

How can I grep in PDF files?

sudo apt install pdfgrep # or brew install on macOS

pdfgrep 'pattern' *.pdf

Extract tables from pdf

Split view

It is useful if we want to compare two pages side by side.

Optimize for mobile device

k2pdfopt

Compress PDF files

The Best Ways to Compress PDFs for Free

Adobe reader

Close the Tools pane in Acrobat Reader DC

Sign/signature

  • How to Sign a PDF: 6 Ways to Secure Electronic Signatures
  • service.cancer.gov -> Search pdf signature -> Digitally Sign a Document in Adobe Acrobat or Reader
  • Open the file in Adobe Acrobat. Click the Certificates icon on the RHS (may need to use More Tools to add the icon to the side bar). A new "Certificates" bar should appear at the top below the tool bar. Click "Digital Sign" icon and follow the instruction on the screen to do the rest (select an area, select a digital ID, continue, sign, save, replace, PIN, OK).

OCR

Export to Excel. Acrobat PDF converter automatically extracts and formats the data into editable text thanks to optical character recognition (OCR).

Print a text file with line numbers

How To Add Line Numbers To Text Files On Linux.

For print out R source code, we should only keep the code starting with the function definition because that's the way RStudio will display.

# 1. Using 'nl' command
$ nl -b a  file.txt

# 2. Using 'cat' command
$ cat -n file.txt 

# 3. Using 'awk' command
$ awk 'BEGIN{i=1} /.*/{printf "%d.% s\n",i,$0; i++}' file.txt 

# 4. Using 'sed' command
$ sed '/./=' file.txt | sed '/./N; s/\n/ /'

# 5. Using 'less' command
$ less -N file.txt 

Turn web pages into PDF

Command line in Linux

How to Convert a Web Page to a PDF File or Images in Linux. wkhtmltopdf

How to turn web pages into PDFs using 'google-chrome'

Print website to PDF online

  • The problem of using 'google-chrome' is I don't have a lot of controls.
  • https://webtopdf.com/ (it works great when I test this page. It turns the website into a 44 pages pdf though it lost the table of contents). It has lots of options like zoom, margins, ...
    • To preserve the table of contents, I checked Auto Bookmark.
  • Convert Pages to PDF to Access Blocked Websites

Rmd to PDF

Using {pagedown} in Docker

Printable PDF

  • Printable Annual Checklist
  • Blank Monthly Calendar Print Out