(Re)generate: (find-pdf-like-intro)
Source code:  (find-eev "eev-intro.el" "find-pdf-like-intro")
More intros:  (find-eev-quick-intro)
              (find-eev-intro)
              (find-here-links-intro)
              (find-refining-intro)
              (find-eepitch-intro)
This buffer is _temporary_ and _editable_.
It is meant as both a tutorial and a sandbox.



Note: you will need a basic understanding of eepitch and
code-c-d to understand parts of this intro. See:

  (find-eev-quick-intro "6.1. The main key: <F8>")
  (find-eev-quick-intro "9. Shorter hyperlinks")
  (find-eev-quick-intro "9.1. `code-c-d'")





1. PDF-like documents

Let's introduce a bit of (improvised!) terminology: we will say that a document is "PDF-like" when it is in a format like PDF, PostScript, DVI or DJVU - i.e., divided into pages. Emacs has a standard mode for viewing PDF-like documents, (find-enode "Document View") but we will see a more eev-like way of pointing to pages of PDF-like documents.

2. Preparation

We need to start by downloading a PDF file to use in our examples. If you run this e-script * (eepitch-shell) * (eepitch-kill) * (eepitch-shell) cd wget -nc http://anggtwu.net/TannerLectures/Coetzee99.pdf you will download a local copy of J.M. Coetzee's "The Lives of Animals" into your home directory. To check that the PDF has been downloaded, use: (find-fline "~/") (find-fline "~/" "Coetzee99.pdf") (find-sh0 "ls -l ~/Coetzee99.pdf") Eev also implements another way, called "psne", to download local copies of files from the internet. "Psne-ing" a URL like http://anggtwu.net/TannerLectures/Coetzee99.pdf downloads it to a local file with a name like: $S/http/anggtwu.net/TannerLectures/Coetzee99.pdf ~/snarf/http/anggtwu.net/TannerLectures/Coetzee99.pdf that is _much_ longer that just "~/Coetzee99.pdf"; this has the advantage of preserving more information about the URL from which the file came, but sometimes these longer names feels clumsy. Psne-ing is discussed a more advanced tutorial: (find-psne-intro) In this tutorial we will use the home directory and the shorter file name.

3. Hyperlinks to PDF files

If you have xpdf installed then this sexp (find-pdf-page "~/Coetzee99.pdf") should work as a "hyperlink to the PDF": it calls xpdf as an external program - like we did with browsers in the main tutorial - (find-eev-quick-intro "3.1. Non-elisp hyperlinks") (find-eev-quick-intro "3.1. Non-elisp hyperlinks" "find-firefox") to display the PDF file that we downloaded. The main keys of xpdf are: q quit xpdf PageDown scroll down/go to next page PageUp scroll up/go to previous page arrows scroll within the current page + zoom in one step - zoom out out step 0 set zoom to 125% alt-f toggle full-screen; use twice to fit document to page Note that `q' "goes back to Emacs". If you have the program pdftotext installed - hint: "apt-get install poppler-utils"! - then you can also display PDFs in another way. This sexp (find-pdf-text "~/Coetzee99.pdf") work as a "hyperlink to the _text_ of the PDF": it extracts the text from the PDF using the program "pdftotext" and displays that in an Emacs buffer.

4. Hyperlinks to pages of PDF files

It is possible to create hyperlinks that point to a specific page in a PDF file. Compare what happens when you run these sexps: (find-pdf-page "~/Coetzee99.pdf") (find-pdf-page "~/Coetzee99.pdf" 1) (find-pdf-page "~/Coetzee99.pdf" 1 "The Lives of Animals") (find-pdf-page "~/Coetzee99.pdf" 3) (find-pdf-page "~/Coetzee99.pdf" 3 "LECTURE I") (find-pdf-page "~/Coetzee99.pdf" 3 "LECTURE I" "[113]") The top three sexps open the PDF at page 1 - the default. The bottom three sexps open it at page 3. The arguments after the number are ignored by Emacs - they are there to make these links more expressive for humans. The hyperlinks to the text of a PDF interpret the numeric number as a page number and the following arguments as strings to search for. Try: (find-pdf-text "~/Coetzee99.pdf" 1) (find-pdf-text "~/Coetzee99.pdf" 1 "The Lives of Animals") (find-pdf-text "~/Coetzee99.pdf" 3) (find-pdf-text "~/Coetzee99.pdf" 3 "LECTURE I") (find-pdf-text "~/Coetzee99.pdf" 3 "LECTURE I" "[113]") For more information about these string arguments, see: (find-refining-intro "1. Pos-spec-lists") A pair of sexps like this, in which both point to the same position of a PDF, (find-pdf-page "~/Coetzee99.pdf" 3 "LECTURE I" "[113]") (find-pdf-text "~/Coetzee99.pdf" 3 "LECTURE I" "[113]") will be called a `find-pdf'-pair. [Video links:] (find-eev2020video "4:52" "`find-pdf-page' calls an external program") (find-eev2020video "5:26" "`find-pdf-text' converts the PDF to text and")

5. A convention on page numbers

The `(+ -110 113)'s in (find-livesofanimalspage (+ -110 113) "LECTURE I.") (find-livesofanimalstext (+ -110 113) "LECTURE I.") are a bit mysterious at first sight. We are accessing a PDF that is an excerpt of a book. The third page of the PDF has a "[113]" at its footer to indicate that it is the page 113 of the book. Let's use the terms _page number_ and _page label_ to distinguish the two numberings: in this case, the page whose page number is 3 is the page whose page label is 113. These two sexps (find-livesofanimalspage (+ -110 113)) (find-livesofanimalspage 3) are equivalent, but the first one is more human-friendly: the 113 is a page label, and the -110 is an adjustment (we call it the "offset") to convert the 113 that humans prefer to see into the 3 that xpdf needs to receive.

6. How the external programs are called

Both `find-pdf-page' and `find-pdf-text' invoke external programs - but how, exactly? Let's take a look at a hack that shows this. If you prepend an "ee-" to `find-pdf-page' and `find-pdf-text' sexps, like in: (ee-find-pdf-page "~/Coetzee99.pdf") (ee-find-pdf-page "~/Coetzee99.pdf" 3) (ee-find-pdf-text "~/Coetzee99.pdf") (ee-find-pdf-text "~/Coetzee99.pdf" 3) you will get sexps that stop just before invoking the external programs - they just show how these externals programs _would be_ invoked, i.e., they show the options that would be passed to them. The results of the sexps above will be lists like these: ("xpdf" "-fullscreen" "~/Coetzee99.pdf") ("xpdf" "-fullscreen" "~/Coetzee99.pdf" "3") ("pdftotext" "-layout" "-enc" "Latin1" "~/Coetzee99.pdf" "-") ("pdftotext" "-layout" "-enc" "Latin1" "~/Coetzee99.pdf" "-") Note that `ee-find-pdf-text' does not pass the argument "3" to "pdftotext". A sexp like (find-pdf-text "~/Coetzee99.pdf" 3) first produces the conversion to text of the full PDF, and then finds the page 3 in it by counting formfeeds, as described here: (find-enode "Pages" "formfeed")

7. Shorter hyperlinks to PDF files

If you run these sexps (code-pdf-page "livesofanimals" "~/Coetzee99.pdf") (code-pdf-text "livesofanimals" "~/Coetzee99.pdf" -110) they will define the functions `find-livesofanimalspage' and `find-livesofanimalstext', and then these hyperlinks should work: (find-livesofanimalspage) (find-livesofanimalstext) (find-livesofanimalspage (+ -110 113)) (find-livesofanimalstext (+ -110 113)) (find-livesofanimalspage (+ -110 113) "LECTURE I.") (find-livesofanimalstext (+ -110 113) "LECTURE I.") (find-livesofanimalspage (+ -110 127) "wrong thoughts") (find-livesofanimalstext (+ -110 127) "wrong thoughts") (find-livesofanimalspage (+ -110 132) "into the place of their victims") (find-livesofanimalstext (+ -110 132) "into the place of their victims") (find-livesofanimalspage (+ -110 133) "To write that book I had to think") (find-livesofanimalstext (+ -110 133) "To write that book I had to think") (find-livesofanimalspage (+ -110 134) "woke up haggard in the mornings") (find-livesofanimalstext (+ -110 134) "woke up haggard in the mornings") (find-livesofanimalspage (+ -110 143) "Babies have no self-consciousness") (find-livesofanimalstext (+ -110 143) "Babies have no self-consciousness") (find-livesofanimalspage (+ -110 145) "squirrel doing its thinking") (find-livesofanimalstext (+ -110 145) "squirrel doing its thinking") (find-livesofanimalspage (+ -110 147) "Rilke's panther") (find-livesofanimalstext (+ -110 147) "Rilke's panther") (find-livesofanimalspage (+ -110 162) "a grasp of the meaning") (find-livesofanimalstext (+ -110 162) "a grasp of the meaning") (find-livesofanimalspage (+ -110 164) "last common ground") (find-livesofanimalstext (+ -110 164) "last common ground") Hyperlinks like (find-livesofanimalspage (+ -110 113) "LECTURE I.") (find-livesofanimalstext (+ -110 113) "LECTURE I.") behave roughly as abbreviations for: (find-pdf-page "~/Coetzee99.pdf" (+ -110 113) "LECTURE I.") (find-pdf-text "~/Coetzee99.pdf" (+ -110 113) "LECTURE I.") [Video links:] (find-eev2020video "10:22" "1.3. Shorter hyperlinks to PDFs and videos") (find-eev2020video "10:45" "`code-pdf-page' creates short hyperlink functions") (find-eev2020video "11:38" "let's try...") (find-eev2020video "11:55" "`find-fongspivatext'")

8. `find-pdf'-pairs

Let's introduce some terminology. Remember that we call a pair of sexps like (find-pdf-page "~/Coetzee99.pdf" (+ -110 113) "LECTURE I.") (find-pdf-text "~/Coetzee99.pdf" (+ -110 113) "LECTURE I.") a "`find-pdf'-pair"; a pair like (find-livesofanimalspage (+ -110 113) "LECTURE I.") (find-livesofanimalstext (+ -110 113) "LECTURE I.") will be called a "short `find-pdf'-pair", as in: (find-eev-quick-intro "9. Shorter hyperlinks") and a pair like (code-pdf-page "livesofanimals" "~/Coetzee99.pdf") (code-pdf-text "livesofanimals" "~/Coetzee99.pdf" -110) will be called a `code-pdf'-pair. The "livesofanimals" will the called the _stem_. The "-110" will be called the _offset_.

9. Generating three pairs

Eev has a high-level function that generates at once, for a single PDF file, a `find-pdf'-pair, a `code-pdf'-pair, and a short `find-pdf'-pair. To see what it produces, try: (find-code-pdf-links "~/Coetzee99.pdf") (find-code-pdf-links "~/Coetzee99.pdf" "livesofanimals") The second link above produces a temporary buffer containing this: ;; (find-pdf-page "~/Coetzee99.pdf") ;; (find-pdf-text "~/Coetzee99.pdf") (code-pdf-page "livesofanimals" "~/Coetzee99.pdf") (code-pdf-text "livesofanimals" "~/Coetzee99.pdf") ;; (find-livesofanimalspage) ;; (find-livesofanimalstext) `find-code-pdf-links' is somewhat similar to `find-latex-links', in this aspect: (find-eev-quick-intro "7.5. `find-latex-links'" "change the \"{stem}\"") If you run just (find-code-pdf-links "~/Coetzee99.pdf") it will generate a buffer that has "{c}"s in several places and that follows the convention that "the first line regenerates the buffer". If you substitute the "{c}" in the top sexp by "livesofanimals" and type `M-e' the buffer will be recreated with each "{c}" replaced by "livesofanimals". The user-friendly way to run `find-code-pdf-links' is by typing `M-h M-p' in Dired mode. If you want to generate the three pairs for a file "~/foo/bar/story.pdf" then visit the directory "~/foo/bar/", put the cursor on the line that lists the file "story.pdf", and type `M-h M-p'. Try it with our test file: (find-fline "~/" "Coetzee99.pdf")

10. Generating a pair with the page number

If you type `M-h M-p' and you're not in Dired mode then `M-h M-p' will try to generate a short `find-pdf'-pair pointing to the current position in the current page of the current PDF file (converted to text). The function bound to `M-h M-p' tries to guess four things: the stem, the offset, the page number, and the string to the be used as a pos-spec. Let's see first a situation where everything works. Run the four sexps below and type `M-h M-p': (code-pdf-page "livesofanimals" "~/Coetzee99.pdf") (code-pdf-text "livesofanimals" "~/Coetzee99.pdf" -110) (kill-new "wrong thoughts") (find-livesofanimalstext (+ -110 127) "wrong thoughts") You will get an elisp hyperlinks buffer whose middle links are four short `find-pdf'-pairs, all pointing to the current page: # (find-livesofanimalspage 17) # (find-livesofanimalstext 17) # (find-livesofanimalspage (+ -110 127)) # (find-livesofanimalstext (+ -110 127)) # (find-livesofanimalspage 17 "wrong thoughts") # (find-livesofanimalstext 17 "wrong thoughts") # (find-livesofanimalspage (+ -110 127) "wrong thoughts") # (find-livesofanimalstext (+ -110 127) "wrong thoughts") The second and the fourth pairs use "(+ -110 127)" instead of "17" as the page number; the third and the fourth pairs point to the string "wrong thoughts" in the page.

11. How `M-h M-p' guesses everything

The method that `M-h M-p' uses to guess the stem, the offset, the page and the pos-spec is so error-prone and gives unexpected results so often that it's worth to describe it in detail. 1. The stem is taken from the global variable `ee-page-c'. 2. Every call to a function like `find-xxxtext' sets `ee-page-c' to "xxx" - for example, a call to `find-livesofanimalstext' sets `ee-page-c' to "find-livesofanimalstext". So `ee-page-c' usually holds the stem of the last function of the form `find-xxxtext' that was run. 3. The offset is taken from the global variable `ee-page-offset'. 4. A call to, say, `find-livesofanimalstext', sets `ee-page-offset' to the offset that was declared here: (code-pdf-text "livesofanimals" "~/Coetzee99.pdf" -110) So `ee-page-offset' usually holds the offset of the last function of the form `find-xxxtext' that was run. 5. The page number is obtained by counting the number of formfeeds between the beginning of the buffer and the current position. If there are 16 formfeeds then the current page is 17. 6. The pos-spec - "wrong thoughts" in the example - is the string in the top of the kill ring. See: (find-refining-intro "2. Refining hyperlinks" "kill-new") If you want to see an example where `M-h M-p' guesses everything wrong you can type `M-h M-p' here... as we're not in Dired mode `M-h M-p' will think that we're in the conversion to text of "livesofanimals", in page 1, and it will generate hyperlinks to that page of the book!

12. Other ways to generate `code-pdf'-pairs

The easiest way is with `M-h M-e'. See: (find-audiovideo-intro "4.1. `find-extra-file-links'" "M-h M-e") There is also `M-P', that is a "wrapping function" that transforms the current line, like `M-B' - see: (find-eev-quick-intro "8.4. Creating e-script blocks" "M-B") `M-P' parses the current line as a short string and a file name, and then deletes the current line and inserts in its place a block of five lines containing a `code-pdf'-pair and some comments. Try: (eek "<down> M-P ;; eewrap-pdflike") livesofanimals ~/Coetzee99.pdf