|
REPLs in strange places: Lua, LaTeX, LPeg, LPegRex, TikZ (subtitles)
The main page about this video is here.
Its subtitles in Lua are here.
The rest of this page contains a conversion of the subtitles in Lua
to a slightly more readable format.
00:00 Hi! My name is Eduardo Ochs
00:02 and the title of this talk is: REPLs
00:05 in strange places - Lua, LateX, LPeg, LPegRex,
00:09 and TikZ. I'm the author of an Emacs
00:12 package called eev, and this is a talk
00:15 at the EmacsConf 2023, that is happening in
00:18 December 2023, at the internets.
00:20 And this is one of the
00:23 examples of diagrams that we are
00:25 going to see - let me show how I generate
00:27 it... one second,
00:30 I have to use a smaller font here...
00:34 this is a file called ParseTree2.lua...
00:39 let me go back to this block of tests again...
00:42 and now if I run
00:44 this...
00:46 we get these outputs here at the
00:51 right, and then in this line here it
00:53 generates a PDF, and if I type f8 here it
00:58 shows the PDF in the lower right window.
01:03 Let me start by explaining
01:06 briefly what is eev.
01:08 First: it is something that
01:12 appeared by accident in the mid-90s - I
01:14 explained this story in my
01:16 presentation at the EmacsConf 2019...
01:20 it's a package... it's an Emacs
01:23 package that is part of ELPA... it has at
01:26 least 10 users - those are the ones
01:30 that I know by name...
01:32 eev means `emacs-execute-verbosely'...
01:37 eev is something that treats eval-last-sexp
01:41 as the central feature of Emacs...
01:43 eev blurs the distinction between
01:45 programmers and users, and it replaces
01:48 the slogan "users should not be forced to
01:51 see Lisp", that is something that Richard
01:54 Stallman told me once, by "users should see
01:57 Lisp instead of buttons" and "new users
02:00 should see Lisp in the first 5 minutes"...
02:03 I'm going to show
02:05 some examples of that soon.
02:07 Eev uses code in comments a lot,
02:11 and also tests in comments...
02:13 I changed my way of presenting it
02:16 and it became very REPL-centric
02:18 in the last few years, in the
02:21 sense that I start by explaining its
02:24 main features by its support for REPLs...
02:28 eev supposes
02:30 that we want to keep
02:32 executable notes of everything - I'm also
02:35 going to show examples of this in a
02:37 second... eev has lots of "videos for
02:40 people who hate videos", and it tries to
02:43 do everything with very little magic and
02:46 without black boxes - I'm going to explain
02:48 many of these things very soon.
02:51 This is a figure that that I'm going
02:55 to show in details soon, that is
02:58 about something important about Lua...
03:01 the font is very bad now, so let me
03:04 change the font... the figure is this one...
03:07 and...
03:09 what most people do when they
03:12 visit a file with something
03:14 interesting on it is that they just go
03:16 there and they set a bookmark there, or
03:18 they put the position in a register...
03:21 but I prefer to keep
03:26 links to everything that is interesting
03:29 as elisp hyperlinks. So, for example, this is
03:31 an elisp hyperlink to a file, that goes
03:34 to this anchor here, and to this string
03:37 after this anchor... this is a variant
03:41 that opens that file in the window
03:44 at the right -
03:46 here... and this is
03:50 a sexp that changes the font. I
03:53 have a command with a very short name
03:56 that does that, but I
03:58 prefer to keep that as a one-liner.
04:03 About the videos... we can see
04:07 the list of first-class videos of eev
04:09 by executing this, M-x find-1stclassvideos,
04:13 or by running this alias here, M-x 1c...
04:16 and then what we see is this...
04:19 the first sexp here
04:24 regenerates this buffer - so we can make a
04:27 mess here and then run this and the
04:29 original buffer is regenerated again in
04:32 a clean way...
04:33 each of these things here
04:38 opens a buffer with information about
04:40 a video... let me take a specific
04:43 example here... this video here is about
04:48 one of the ancestors of this talk, that
04:51 is a library that I wrote
04:54 for creating diagrams in LaTeX using
04:58 a package called Pict2e using REPLs...
05:04 anyway...
05:05 the thing is that if we
05:09 run a sexp like this one and we don't
05:12 have a local copy of the video eev
05:15 will try to download to the local copy -
05:17 and instead of doing that by asking
05:20 something like "do you want me
05:21 to download the local copy? Blah
05:23 blah blah blah blah..." it simply opens a
05:27 buffer like this, I mean, if we don't
05:30 have a local copy yet it will open a
05:33 buffer like this one, in which these
05:37 things here in comments are links to the
05:40 documentation... I mean, this thing here
05:44 explains the idea of local copies
05:46 of files from the internet...
05:48 there are more details here, and here...
05:52 and this is a script that we
05:57 can execute line by line, so instead of
06:00 this script being hidden behind the
06:03 button that we just press after a
06:06 question like "Do you want me to do
06:08 something blah blah blah? Yes or no?"
06:11 the script is visible here and we can
06:13 execute it step by step... it creates a
06:17 terminal with a shell here in the
06:20 right window, and when we type f8 in
06:24 one of these lines here the lines are
06:27 sent... (...) so this is going
06:31 to download a copy of the video... the
06:34 wget says that I already have a copy of
06:36 the video and its subtitles... and so on.
06:40 And after getting a copy of the video
06:43 we can run this sexp here and it displays
06:47 the video.
06:51 I said that eev has lots of
06:55 "videos for people who hate videos", and
06:58 the idea is that very few
07:00 people are going to watch the videos in
07:02 real time... and most of the people that
07:05 I know - or: most of the people that
07:07 are interested in eev in some
07:10 way... they are going to watch just
07:13 small sections of the video, and most of
07:16 the time they're just going to read the
07:17 subtitles of the video. So, for each
07:20 one of the videos we have a page
07:24 about the video... let me see if I
07:27 have internet here... yes. This is a
07:31 page...
07:34 and usually these pages have a link
07:37 to another page that
07:40 has all the subtitles of the
07:43 video... uh, wherever... in this one
07:46 it's not so visible...
07:48 but anyway, there are several
07:50 ways of accessing the subtitles of the
07:52 video, and one of the ways is by running
07:56 this sexp here,
07:58 that opens a file in Lua that is
08:01 what I use to generate the
08:04 subtitles.
08:05 Anyway... by the way, these things... each
08:09 one of these things here is a hyperlink
08:12 to a position of the video, so if I type
08:15 this the right way it goes to that
08:19 position. Anyway, let me go back...
08:22 also, the tutorials of eev... the
08:27 "intros" of eev, that start with "find-" and
08:31 end with "-intro", they have lots of blocks
08:34 that say "[Video links:]", like this one, and
08:38 these blocks have links to positions
08:41 in videos, and if we don't have a local
08:44 copy of the video yet the thing shows
08:47 us a script that lets us download the
08:49 local copy.
08:51 Anyway, I said that I was going
08:54 to explain what I mean by "magic" and
08:58 "black boxes".
09:01 this is something that I've been
09:03 trying to explain for a long time, and I
09:04 think that I got a very good explanation
09:07 about that in a video that I made
09:09 about something called eev-wconfig, that
09:12 is a tool for configuring eev on
09:16 Windows without "magic" - without buttons
09:18 that do things without explaining what
09:22 they're doing.
09:23 This is a part of the subtitles
09:26 of the video, let me read that...
09:29 eev-wconfig is an attempt to solve the
09:32 problem of how to install these things
09:34 on Windows both without magic and with
09:37 very little
09:38 magic. Remember this slogan: "any
09:41 sufficiently advanced technology is
09:44 indistinguishable from
09:46 magic". Here in this video I'm going to
09:48 use the term magic as a shorthand
09:52 for sufficiently advanced technology,
09:55 that is something that is complex and
09:57 non-obvious and that is
10:00 indistinguishable from magic in the
10:02 sense of being almost impossible to
10:04 understand. And I'm also going to use a
10:07 the term "black box" as a near-synonym for
10:10 magic, and sometimes the term
10:13 "black box" is more convenient even though
10:15 it's a bit longer - it has more
10:17 letters - because when I use the term
10:20 black box it invites us to use
10:22 expressions like "opening the black box",
10:25 and I'm going to use that
10:26 expression a lot.
10:34 Now let me try to explain what is...
10:37 sorry, let me change the font...
10:43 what is Lua. Lua is a minimalistic
10:47 language, in the sense of
10:50 "batteries not included"... it uses
10:53 associative tables for most of its data
10:56 structures...
10:58 and it is so minimalistic
11:00 that its default print function, when
11:03 we tell... when we create an associative
11:07 table and we ask it to print...
11:10 when we ask "print" to print an
11:13 associative table it just prints the
11:15 address of the table. Here are some
11:18 examples... here is a table, and when we
11:22 ask "print" to print it it just says
11:24 that it's the table at this address here.
11:27 So, one of things that that most
11:30 people do when they start using Lua is
11:32 that either they download a package with
11:35 a pretty-printing function or they write
11:37 their own pretty-printing functions. My
11:39 own pretty-printing function is called
11:41 PP, with upper case letters, and it works
11:45 like this...
11:46 and it prints associative tables
11:50 in a way like this. It says that for
11:53 the key 1 the the value associated to
11:56 it is 2, for the key 2 the value is
11:58 3, and for the key 3 the value is 5.
12:07 When I started using Lua one of my
12:11 favorite languages was also a language
12:13 that used associative tables a lot -
12:15 it was called Icon...
12:17 and I had to write my own
12:20 pretty-printing functions for Icon, so
12:24 I just had to port my pretty-printing
12:27 functions to Lua... and my first
12:29 version looked at something like this... it
12:32 just had some some global functions... lots
12:35 of them, actually...
12:38 and after a while I rewrote it, and I
12:42 rewrote it again, and again, and again, and
12:44 this is one of the versions of that,
12:47 is not even the default at this
12:49 point...
12:51 "Tos" is for "to string"...
12:55 and this is a demo...
12:59 it's very modular, so it's easy to replace
13:01 parts of it, or to toggle flags... and this
13:05 is an example. If I try to print the
13:08 table of methods for a certain
13:10 class... I will need a smaller font...
13:14 it prints the table like this, with the
13:16 names of the methods and then links to
13:19 the source code of the functions...
13:22 these links only make sense in Emacs and
13:24 in eev...
13:26 and when we run a link like this one...
13:29 it shows the source code in the
13:32 window at the right. So, for some
13:35 functions the source code is three lines,
13:37 for other ones it's one line... and
13:41 whatever. Anyway, let me go
13:44 back... Lua can be used in many different
13:47 styles... most people hate other people's
13:50 styles... when I started using it in the
13:54 year 2000 I learned most of the basic
13:57 language in a single day - it was very
13:59 similar to things that I was already
14:01 using... and then I rewrote the the mini-
14:06 language that I was using to
14:08 generate the HTML for my pages
14:13 in Lua... actually I had to rewrite it
14:16 many times, but the first version I
14:18 certainly did in my first weeks or first
14:21 months using Lua...
14:23 In the beginning I was just using
14:27 it for writing programs that either
14:30 didn't take any input at all - because
14:32 the input was already in the source file -
14:35 or that worked as Unix programs,
14:39 that would read files
14:42 and process these files in some way
14:44 and output something.
14:47 I mentioned the "basic language" here...
14:51 I only learned how to use closures,
14:54 metatables, and coroutines many years later...
14:59 in the beginning, when I started using Lua,
15:01 it didn't have a package manager...
15:03 it appeared later, it is called
15:05 Luarocks... it has had this package
15:09 manager for several years, most
15:13 of the rocks for Luarocks are poorly
15:15 documented and hacker-unfriendly,
15:18 so you can't rely just on the
15:21 documentation and you can't rely just on the
15:23 source code, because, I mean... if you are
15:26 a genius of course you can, but for
15:29 people who are either lazy, or dumb, or
15:32 whatever, like me, or unfocused...
15:34 the source code is hard to
15:37 understand and hard to tinker with.
15:40 Some rocks are excellent. The
15:43 best rocks are well documented
15:45 but they are hacker-unfriendly
15:47 in a sense that I hope that
15:50 I'll be able to explain soon.
15:52 The best rocks use local
15:56 variables and metatables a lot -
15:59 so if you are beginner
16:02 learning Lua you're not going to
16:04 understand what their source code do...
16:06 they use lots of dirty tricks.
16:09 Let me talk a bit about object
16:12 orientation in Lua. It can be done in
16:15 many ways...
16:16 the main book about Lua, called
16:19 "Programming in Lua", by one of the authors
16:21 of the language, Roberto Ierusalimschy,
16:23 presents several ways of doing
16:26 object orientation in Lua... I hated all
16:30 of these ways - and also the ways that I
16:32 tried from the rocks.
16:34 And then I wrote my own way
16:38 of doing object orientation in Lua... it's
16:40 very minimalistic, it's in this file here,
16:43 eoo.lua... the main code is just this five
16:48 lines here...
16:49 and here's an example of how it works.
16:55 Here we define the class Vector,
16:59 with some metamethods...
17:01 this metamethod here will tell Lua
17:04 what to do when the
17:08 user asks to add two vectors, this one
17:12 here tells Lua what to do when the user
17:15 asks Lua to convert a vector to a string,
17:18 and... whatever, this one is
17:21 something that I'm going to explain in a
17:24 second. So, here we create a vector with
17:27 these coordinates, 3 and 4... here we create
17:30 another Vector... if we "print" here then Lua
17:33 uses this function here, in the __tostring...
17:37 if we add the two vectors it uses this
17:39 function here, in the __add metamethod, and
17:43 if we run the method :norm...
17:46 it is defined here, in the table __index.
17:52 Anyway...
18:00 Even this thing being so small I used
18:02 to forget how its innards worked all
18:04 the time. Actually I always forget how
18:08 things work and I have to remember them
18:09 somehow... and I have to have
18:12 tricks for remembering, and tricks for
18:16 summarizing things, and diagrams, and so
18:19 on. And every time that I forgot how this
18:22 thing worked I went back to the
18:24 source code, and then I looked at the
18:26 diagrams... or, of course, in the
18:29 first times I had to draw the diagrams...
18:32 and I run the examples, and of course in
18:35 in the beginning I thought that the code
18:36 was clear and my examples were very
18:38 brief, and so I had to rewrite the
18:40 examples many times until they became,
18:45 let's say...
18:47 perfect.
18:49 I was saying that Lua can be used in
18:52 many ways, and in my way of using Lua - in
18:56 my favorite way - everything can be
18:58 inspected and modified from REPLs,
19:02 like we can do in Emacs and in SmallTalk,
19:05 or sort of. So, in my
19:08 favorite way of using Lua there's no
19:10 security at all, everything can be
19:12 changed at all times.
19:16 Of course most people hate that...
19:19 My init file has lots of classes... by the
19:23 way, instead of keeping many small files
19:26 with many things I put lots of stuff
19:28 in just one big init file.
19:31 My init file has lots of classes,
19:35 and lots of global functions, and
19:37 lots of cruft - and people hate that,
19:40 of course. This is an example...
19:43 this is the index at the top
19:46 of my init file,
19:49 the classes start here, and then
19:54 we have some functions, and
19:58 then we have functions that load
20:01 certain packages, and then we have... cruft.
20:04 Whatever.
20:05 Most people think that my style
20:07 of using Lua is dirty, and dangerous...
20:10 and they wouldn't touch my Lua code
20:12 with a 10 feet pole... but most of the
20:16 things that I'm going to present here in
20:18 this presentation are ideas that should
20:20 be easy to port to other environments
20:23 and other languages, especially the
20:25 diagrams... so the code is not so important.
20:30 Now let me talk a bit about LuaLaTeX,
20:33 that is LaTeX with a Lua interpreter
20:36 embedded inside, and two ways
20:40 of generating pictures in LaTeX: TikZ,
20:43 that is very famous, and Pict2e, that is not
20:46 very famous and that is very low level...
20:49 and I think that not many people use it.
20:52 I said before that when I
20:56 learned Lua I realized that it was
20:59 very good for writing little
21:01 languages. I was doing my PhD at the
21:04 time and typesetting the diagrams for
21:08 my PhD thesis was very boring, so
21:11 one of the things that I did was that I
21:13 created a little language for typesetting
21:16 the diagrams for me. it was
21:19 called Dednat because initially
21:22 it only generated diagrams for
21:25 Natural Deduction, and then it had
21:27 several versions...
21:28 these are the slides for my
21:32 presentation about Dednat6... "Dednat6 is
21:35 an extensible semi-preprocessor for
21:38 LuaLaTeX that understands diagrams in
21:41 ASCII art"... in the sense that when I have
21:46 a .tex file that has this, and when
21:50 Dednat6 is loaded,
21:52 when I give the right commands
21:54 Dednat6 interprets this block here as
21:59 something that defines this
22:01 diagram... oops, sorry, it interprets this
22:07 diagram here, this diagram in
22:10 comments here, as something that defines
22:13 a diagram called foo... a deduction called
22:16 foo, and it generates this code here...
22:20 so that we can just invoke
22:23 the definition of the
22:27 deduction by typing \ded{foo}.
22:31 And Dednat6 also
22:34 supports another language for typesetting
22:37 bidimensional diagrams with
22:40 arrows and stuff for category Theory and
22:42 blah blah blah... the specifications of
22:45 these diagrams look like this...
22:47 here is a... sorry, here is a very good
22:54 example, this is a huge diagram...
22:58 sorry, one second...
22:59 so, the source code that generates
23:02 this diagram here is just this thing at
23:04 the left, so it's very visual... we can
23:09 typeset the diagram in ASCII art here and
23:12 then in this part here we tell how
23:15 the nodes are to be joined, which
23:18 arrows have to to have annotations, and
23:20 so on...
23:21 and this language is extensible in
23:24 the sense that... uh, where's that...
23:32 here: comments that start with "%:"
23:37 are interpreted as
23:41 definitions for tree diagrams,
23:43 lines that start with "%D"
23:50 define 2D diagrams with arrows and
23:53 stuff, and lines that start with "%L"
23:56 contain blocks of Lua code
23:59 that we can use to extend the interpreter
24:02 on-the-fly...
24:05 anyway, here are some recent
24:08 examples of diagrams that I used
24:13 Dednat6 to typeset... this diagram
24:18 here was generated by this
24:21 specification here...
24:23 and this diagram here with the
24:28 curved arrows was generated by this
24:31 specification here.
24:34 So, Dednat6 was very easy to extend,
24:38 and at some point I started to use it
24:41 to generate diagrams using Pict2e -
24:44 mainly for the classes that I give
24:47 at the University... I teach mathematics and
24:49 whatever... in a bad place. Whatever...
24:55 Let me show an animation... here is a
24:59 diagram that I generated with Dednat6,
25:03 and it is a flip book animation, like... we
25:06 type PgUp and PgDn and we go
25:09 to the next page of the book and to the
25:11 previous page of the book...
25:13 and here is the source code that generates
25:15 that. This source code is not very visual,
25:19 so it's quite clumsy to edit that
25:21 diagram directly in the .tex file like
25:26 that...
25:27 These diagrams were inspired
25:29 by something called my Manim, that...
25:34 I forgot the name of the guy, but
25:36 it's a guy that makes many videos about
25:38 Mathematics, and he created this library
25:41 called Manim for generating his
25:43 animations, and other people adapted
25:46 his library to make it more accessible...
25:51 I tried to learn it, but
25:54 each animation, even an animation
25:57 that has very few frames... each
25:59 animation took ages to render, so it
26:02 wasn't fun... and animations in PDFs can
26:05 be rendered in seconds. So these
26:09 things were fun for me, because my laptop
26:12 is very very slow, and my Manim was not fun.
26:19 Anyway, writing code like this
26:23 inside a .tex file was not very
26:26 fun because it was hard to
26:30 debug... so in 2022 I started to play
26:35 with ways of generating these
26:39 diagrams from REPLs, and I found a
26:42 way for Pict2e and a way for TikZ...
26:46 each one of these ways became a video...
26:49 if you go to the list of first-class
26:52 videos of eev you're going to see
26:55 that there's a video about Pict2e here
26:57 here and a video about TikZ...
27:01 here you have some some information
27:04 like length, an explanation, etc...
27:08 and here are the pages for these videos.
27:12 My page about the video about Pict2e
27:15 looks like this, it has some diagrams...
27:17 whatever... and this one is much
27:19 nicer, and a lot of people
27:24 watched that video... I mean, I think
27:28 that 250 people watched it - for me that's
27:32 a million of people...
27:34 and this video is about how to
27:38 extract diagrams from the manual... from
27:41 the TikZ manual and how to run those
27:45 examples in a REPL and modify
27:48 them bit by bit... this is a a
27:53 screenshot... but let me go back.
27:58 At that point these things were just
28:00 prototypes, the code was not very nice...
28:03 and in this year I wrote... I was able
28:08 to unify those two ways of generating PDFs,
28:12 the one for TikZ and the one for Pict2e,
28:16 and I unified them with many other
28:19 things that generated diagrams.
28:21 The basis of these things is
28:24 something called Show2.lua... I'm not going
28:28 to show its details now, but its
28:34 extension that generates TikZ code
28:38 is just this, so we can specify a
28:42 diagram with just a block like this,
28:45 and then uh if we
28:49 run :show00() it returns a string
28:54 that is just the body... the inner
28:57 body of the .tex file, if we run this we
29:00 see the whole .tex file, and if we run
29:03 this we save the .tex file and we
29:06 compile the .tex file to generate a PDF...
29:08 and if we run this we show the PDF in
29:12 the lower right window.
29:15 And that's the same thing for all
29:18 my recent programs that generate
29:20 PDFs - they are all
29:23 integrated... here is the one that...
29:26 the basis for all my modules that generate
29:29 diagrams with Pict2e...
29:31 its demos are not very interesting,
29:34 so let me show some demos of
29:37 extensions that do interesting things...
29:42 so, this is a diagram that I created
29:45 by editing it in a REPL...
29:47 I create several Pict objects here...
29:51 and if I execute this it
29:55 compiles an object, generates a PDF, and
29:59 if I tap this... here is the PDF.
30:02 And if I just ask Lua to
30:07 display what is "pux", here,
30:10 it shows the source code in Pict2e
30:14 of the diagram... and the
30:18 nice thing is that it is indented, so
30:21 it's easy to debug the Pict2e code.
30:23 If anyone is interested the
30:26 module that does the tricks for
30:29 indentation is very easy to understand...
30:31 it has lots of tests and test blocks,
30:34 and I think that its data
30:37 structures are easy to understand.
30:40 Anyway... here is another
30:49 example. The :show() is
30:52 here... it generates a 3D diagram.
31:02 Now let me talk about parsers and
31:06 REPLs in VERY strange places... I mean,
31:09 using REPLs to build parsers step by step
31:13 and" replacing parts by more complex
31:19 parts. So, I said that Lua is very
31:24 minimalistic, and everybody knows that
31:27 implementations of regular expressions
31:29 are big and complex..
31:32 so, instead of coming with
31:35 full regular expressions Lua comes with
31:37 something called "patterns" and a
31:41 library function called "string.match".
31:44 Here is
31:47 a copy of the part of the manual that
31:50 explains the syntax... a part of the
31:52 syntax of of patterns... here's how
31:57 string.match is described in the
31:59 manual - it's just this... "looks for
32:03 the first match of pattern in the string
32:05 as blah blah blah"... and then we have to
32:08 go to the other section of the menual
32:10 that explains patterns.
32:15 Lua patterns are so simple,
32:19 so limited, that they don't even
32:22 have the the alternation operator...
32:27 here is how it is described in the
32:30 elisp manual -
32:32 backslash-pipe specifies
32:36 an alternative, blah blah blah.
32:38 When we want to to build more
32:43 complex... regular expressions,
32:46 patterns, grammars, etc... we have to use
32:49 an external library for that... no,
32:53 sorry, a library that is external
32:56 but that was written by one of the
32:58 authors of Lua itself. This library
33:02 is called Lpeg, and its manual says...
33:07 "Lpeg is a new pattern matching library for
33:09 Lua based on Parsing Expression Grammars
33:14 (PEGs)". The manual is very terse, I
33:18 found it incredibly hard to read... it
33:22 doesn't have any diagrams - it has some
33:24 examples, though... and the Lua Wiki
33:29 has a big page called Lpeg Tutorial
33:34 with lots of examples...
33:36 but it it also doesn't have
33:39 diagrams and I found some things
33:41 incredibly hard to understand.
33:43 For example, this is something that is in
33:46 the the manual of Lpeg that I saw and I
33:48 thought: "Wow, great! This makes all sense
33:51 and is going to be very useful!"...
33:53 it's a way to to build
33:54 grammars that can be recursive,
33:57 and they sort of can encode BNF
34:01 grammars... we just have to translate the
34:03 BNF a bit to get rid of some
34:06 recursions and to translate them to
34:08 something else.
34:09 And the manual also has some things
34:13 that I thought: "Oh, no! I don't have any
34:15 idea of what this thing does"... and in fact
34:18 I saw these things for the first
34:20 time more than 10 years ago and they
34:22 only started to make sense one year ago.
34:28 One example is group captures.
34:32 Lpeg also comes with a
34:35 module called the Re module... let me
34:38 pronounce as it in Portuguese - the Re
34:40 module... its manual says: "The Re
34:45 module (provided by the file re.lua in the
34:48 distribution) supports a somewhat conventional
34:51 regular expression syntax for pattern usage
34:54 within lpeg"... and
34:58 this is a quick reference... this
35:04 thing is very brief, it has some nice
35:06 examples but it's hard to understand anyway...
35:09 and here are some comments about
35:13 my attempts to learn Re.lua. This is
35:18 a class... in this case it's a very small
35:20 class... this file implements a :pm()
35:25 method - I'm going to show examples of
35:28 other :pm() methods very soon - so, this is
35:32 a :pm() method for Re.lua that lets us
35:35 compare the syntax of Lua patterns, Lpeg,
35:39 and Re... let's see this example here... so,
35:45 if we run this it loads my version of
35:50 lpeg... no, sorry, my version of lpegrex...
35:53 and it shows that when we apply
35:57 the :pm() method to this Lua pattern, this
36:01 lpeg pattern, and this Re pattern
36:05 they all give the same results. So we can
36:08 use this thing... this kind of thing here
36:10 to show how to translate from Lua
36:14 patterns, that are familiar because
36:16 they're similar to regular expressions,
36:19 only weaker...
36:20 to lpeg, that is super weird
36:24 and to Re, that is not so weird.
36:28 Anyway, the comment says that in 2012
36:34 I had a project that needed a
36:37 precedence passer that could parse
36:39 arithmetical expressions with the right
36:44 precedences... and at that point I was
36:47 still struggling with pure lpeg, and I
36:49 couldn't do much with it, so I tried to
36:52 learn Re.lua instead, and I wrote this old
36:55 class here...
36:58 that allowed me to use a preprocessor
37:00 on patterns for Lua. And the thing is that
37:03 with this preprocessor I could
37:05 specify precedence grammars using this
37:07 thing here, that worked, but was super
37:11 clumsy... and I gave up after a few attempts.
37:16 and in 2022 I heard about something
37:19 called lpegrex,
37:22 that was a... a kind of extension or Re,
37:29 and it was much more powerful than re.lua,
37:32 but after a while I realized that it
37:35 had the same defects as re.lua...
37:37 and let me explain that, because
37:38 it has all to do with the things about
37:44 black boxes and magic that I told in the
37:48 beginning. Both... I mean, sorry, neither
37:52 re.lua or lpegrex had some features that
37:57 I needed... they didn't let us explore...
38:01 sorry, they received a pattern that was
38:03 specified as a string, and it converted
38:06 that into an lpeg pattern, but it didn't
38:09 let us explore the the lpeg patterns
38:12 that it generated...
38:13 their code was written in a way
38:17 that was REPL-unfriendly - I
38:21 couldn't modify parts of the code
38:24 bit by bit in a REPL and try to change
38:28 the code without changing the
38:30 original file... the code was very
38:34 hard to explore, to hack, and to extend -
38:36 in my opinion... the documentation was not
38:39 very clear... and I sent one or two messages
38:43 to the the developer of lpegrex and...
38:48 he was too busy to help me. He
38:51 answered it very briefly, and, uh, to be
38:54 honest I felt... rejected. I felt that I
38:57 wasn't doing anything interesting...
38:58 whatever, whatever...
39:02 So, in 2022 I was trying to learn lpegrex
39:09 because I was thinking that it would
39:11 solve my problems - but it didn't...
39:14 it didn't have the features that I needed,
39:16 it was hard to extend, hard to explore,
39:19 and hard to debug, and I
39:23 decided to rewrite it in a more
39:27 hacker-friendly way - in the sense that...
39:31 was modular, and I could replace any
39:33 part of the module from a REPL...
39:35 my version of it was called ELpeg1.lua...
39:43 and I decided that in my version I
39:47 wouldn't have the part that
39:50 receives a grammar specified as a string
39:53 and converts that to lpeg... I would
39:55 just have the backend part, that are the
39:58 functions in lpeg that let us specify
40:03 powerful grammars.
40:09 Let me go back. Let me explain a
40:13 bit about lpeg... Lua has
40:17 coercions: the + expects to receive
40:21 true numbers, and if one of its arguments,
40:24 or both of them, are strings, it converts
40:27 the string... the strings to numbers so in
40:30 this case here, 2+"3",
40:32 it returns the number 5,
40:36 and this is the concatenation
40:39 operator... it expects to receive
40:42 strings, so in this case it will
40:45 convert the number 2 to the string "2",
40:47 and the concatenation of thes two
40:50 things will be 23... oops, sorry, "23"
40:53 as a string.
40:55 Lpeg also has some coercions.
40:58 I usually set these
41:02 globals to let me write my grammars
41:06 in a very compact way, so instead
41:10 of lpeg.B, lpeg.C, etc I use these globals,
41:15 like uppercase B, uppercase C, and so on...
41:18 and with these globals I can write
41:21 things like this: C(1)*"_"...
41:28 and lpeg knows that lpeg.C...
41:33 it sort of expands this to lpeg.C,
41:37 but lpeg.C expects to receive
41:42 an lpeg pattern, and 1 is not yet an
41:44 lpeg pattern, so it is coerced into an
41:48 lpeg pattern by calling lpeg.P,
41:52 so this short thing here becomes
41:57 equivalent to lpeg.C(lpeg.P(1)), and the
42:04 multiplication, when at least one of its
42:07 arguments is an lpeg pattern... it expects
42:10 to receive two lpeg patterns, and in
42:13 this case the one at the right is
42:15 just a string, so it is coerced to an lpeg
42:18 pattern by using lpeg.P.
42:22 With this idea we can sort of
42:25 understand the comparison here. I mean,
42:28 let me run it again... this first part is
42:32 very similar to a regular expression
42:34 here at the left...
42:35 and when we apply this... Lua pattern
42:40 to this subject here the result
42:49 is this thing here, this thing, this
42:52 thing and this thing... I'm going to
42:54 call each one of these results
42:57 "captures", so each of these things
43:00 between parentheses "captures" a substring
43:03 of the original string and these
43:06 captured substrings are returned in a
43:08 certain order. Here is how to express the
43:11 same thing in lpeg...
43:13 it's very cryptic but it's a
43:17 good way to understand the some basic
43:20 operators of lpeg, I mean we can look at
43:24 the manual and understand and
43:26 what C, S and R do, and also
43:35 exponentiation... and this strange thing
43:38 here receives this string here, runs
43:41 a function that I have defined, that
43:43 converts it to an object of a certain
43:46 class, and that class
43:47 represents Re patterns, so this thing
43:51 is treated as a pattern for re.lua,
43:54 and it is matched against the string,
43:57 and it returns the same thing as the
43:59 other one.
44:01 Also, this thing here also has a
44:05 comparison with lpegrex, but these
44:08 patterns are very trivial, they
44:10 don't do anything very strange...
44:13 so let's go back and see what
44:15 kinds of very strange things there are.
44:18 Here is the page of lpegrex at github,
44:24 here's the documentation...
44:29 it's relatively brief,
44:32 it explains lpegrex as being an
44:35 extension of Re.lua, so it explains
44:40 mainly the additional features... here is a
44:43 quick reference that explains only the
44:45 additional features...
44:46 some of the these things
44:50 I was able to understand
44:53 by struggling a lot, and some I wasn't
44:58 able to even by spending several evenings
45:02 try to to build examples...
45:07 and this is something very nice. Lpegrex
45:12 comes with some example parsers... and
45:16 here is a parser that parses the Lua
45:19 grammar - I mean, this is the the grammar
45:22 for Lua 5.4 at the end of the
45:27 reference manual... it's just this... this
45:31 is in a kind of BNF, and this is the BNF
45:34 translated
45:36 to the language of lpegrex, so this
45:40 thing uses many constructions that are
45:43 in re.lua and some extra constructions that
45:47 are described here... and with these
45:51 examples I was able to to understand
45:54 some of the...
45:56 of these things here that are
45:58 described here in the quick
46:00 reference - but not all.
46:05 So, I wasn't able to use lpegrex
46:10 by itself, because some things didn't
46:14 make much sense, and I decided to
46:16 reimplement it in my own style,
46:19 because that would be a way to map...
46:24 to at the very least map what I had
46:27 understood and what I didn't, learn
46:30 one feature at a time, do comparisons, and
46:32 so on.
46:34 Here I pointed to two features of lpeg...
46:38 in one I said "Oh, great! This thing can
46:41 be used to to define grammars, even
46:44 recursive grammars", and so on...
46:47 and this is an "Oh, no!" feature - one
46:50 thing that didn't make any sense at all...
46:54 group captures. One thing that I did to
46:56 understand group captures was to
47:00 represent them as diagrams. Of course in
47:02 the beginning I was drawing these
47:04 diagrams by hand, but then I realized
47:08 that I could use the bits of lpeg
47:10 that I already knew to build a grammar
47:14 that would parse a little language and
47:17 generate these diagrams in LaTeX, and I was
47:21 able to make this.
47:23 In this diagram here
47:26 this thing above the arrow is Lua code...
47:30 a piece of Lua code that
47:33 specifies an lpeg pattern... this
47:37 thing here at the top is the string that
47:40 is being matched, and the things below
47:43 the underbraces are the captures that
47:46 each thing... sorry, that each thing
47:51 captures.
47:53 For example, this underbrace here
47:58 corresponds to this pattern here,
48:00 that parses a single character but
48:02 doesn't return any captures, this thing
48:05 here parses a single "b" and doesn't
48:09 return any captures, this thing here
48:11 parses a single character and captures
48:14 it, and this thing here parses the
48:17 character "d" and captures it... and this
48:21 other thing here transforms this
48:24 pattern into another pattern...
48:27 returns first a capture with all
48:33 the string that was parsed by this
48:35 pattern here, and then all the captures
48:37 returned by this thing here before
48:40 the ":".
48:42 So, this was a way to build
48:46 concrete examples for things that the
48:49 lpag manual was explaining in a very terse
48:52 way, and it worked for me - some things
48:55 that were very
48:57 mysterious started to make sense, and I
48:59 started to have intelligent questions
49:02 to ask in the mailing list.
49:08 And with that I was able to
49:11 understand what are group captures,
49:14 and group captures that receive a name...
49:20 Well, let me explain what this does.
49:23 This thing here captures... sorry, parses
49:27 the empty string and returns this as a
49:29 constant... so, this is something that
49:32 doesn't exist in regular expressions...
49:35 it parses nothing and
49:38 returns this as a capture... then this
49:41 thing here returns these two
49:44 constants here, and parses the empty
49:47 string, and this thing here converts
49:51 the results of this thing here into a
49:54 group capture, and stores it in the label
50:00 "d"... and then here's another constant
50:03 capture. And I realized that these things
50:05 here were similar to how Lua
50:09 specifies building lists...
50:14 when we build... sorry, tables. When
50:17 we build a table, and we say that the
50:19 first element of the table is here, this
50:21 element is put at the end of the table...
50:24 when after the that would say d=42...
50:27 we are putting the 42
50:31 in the the slot whose key is "d".
50:35 This was happening with lpeg captures,
50:38 but there was something very strange...
50:40 these group captures could hold
50:46 more than one capture - more than one
50:49 value... so there was something between
50:52 lists and tables. I started to use this
50:56 notation to...
50:59 explain in my notation what they
51:04 were doing... many things started
51:07 to make sense, many mysterious
51:10 sentences in the manual started to
51:12 make sense... but some didn't...
51:15 but at least I was able to send
51:20 some intelligent questions to the
51:22 mailing lis,t and the author of Lua and
51:26 lpeg answered some of them...
51:28 he was not very happy about my
51:31 questions - he... told me that those
51:35 diagrams were a waste of time, the
51:38 manual was perfectly clear, and so on...
51:41 whatever - but I was able to...
51:45 so, it was weird, but I was able to
51:49 understand lots of things from his
51:52 answers. This is a copy of one of
51:56 my messages, then there's another one,
51:58 another one, some of them had diagrams...
52:01 then he complained about these diagrams,
52:04 he said that these things here, that look
52:07 like table constructors, "do not exist"...
52:14 whatever... anyway, once I understood
52:17 group captures many features
52:20 were very easy to understand
52:23 and I started to be able to use lpeg to
52:26 to build some very interesting things...
52:29 I was able to reproduce some
52:33 of the features that I saw in lpegrex -
52:36 remember that this... where is that?
52:41 this is the syntax of Lua... here -
52:45 I was able to understand
52:49 how these things here were translated to
52:52 lpeg code... to lpeg patterns
52:56 by using group captures in a certain
52:58 way... I was able to implement them
53:00 in ELpeg1.lua...
53:05 and after some time I was able to use
53:09 ELpeg1.lua to build grammars that
53:13 were able to parse
53:16 arithmetical expressions with the
53:18 right precedence... and here's an example
53:21 in which I built the grammar step by step...
53:23 and I test the current grammar, and I
53:26 replace a bit, and then I test the new
53:28 grammar and so on...
53:30 and you can see that the result is
53:34 always a tree that is drawn in a
53:37 nice two dimensional way...
53:41 At this point these powers here
53:46 are returned as a list,
53:49 as an operation "pow"
53:53 with several arguments, here... and then
53:57 I apply a kind of parsing combinator,
54:01 here... that transforms these trees into
54:04 other trees and with these combinators
54:08 here I can specify that the "^" is
54:12 associative in a certain direction...
54:15 that the "/" is associative in
54:17 another direction... the "-" uses
54:20 the same direction as a the "/",
54:23 and so on... and they have the
54:25 right precedences.
54:27 So, here are the tests...
54:31 here is my file ELpeg1.lua... it has
54:37 several classes, each class has tests
54:41 after it...
54:43 I was able to implement something
54:46 that lpegrex has, that is called
54:50 "keywords", that is very useful for parsing
54:54 programs in programming languages...
54:57 I was able to implement something
54:59 similar to the debugger... to the
55:04 lpeg debugger lpeg uses... I was
55:09 frustrated by some limitations of
55:11 the lpeg debugger, and I implemented
55:14 my own that is, uh... much better!...
55:20 Let me show something else... I was
55:24 able to translate a good part of the
55:27 Lua parser, here, to ELpeg1.lua... I haven't
55:34 finished yet, but I have most of the
55:38 the translation here...
55:41 and after having all that I was able to
55:46 build other grammars very quickly...
55:50 writing new parsers finally became fun.
55:54 And here's one example that I showed in the
56:00 beginning.
56:03 If I remember correctly...
56:07 I took a figure from the Wikipedia...
56:10 I don't have its link now...
56:12 but I specified a grammar that parses
56:17 exactly the example that appears
56:19 in the Wikipedia...
56:22 so, with my grammar, considering that
56:25 the top level entry is "Stmt", when I
56:29 parse this string here
56:32 the result is this tree...
56:37 and I can do some operations on that,
56:41 I can define how this thing is to be
56:44 converted into LaTeX,
56:46 I can define other operations
56:49 that convert trees into other trees, and
56:53 here are some tests of these operations...
56:58 This is what I showed in the beginning...
57:00 I'm not going to explain all the details
57:02 of this thing now...
57:05 this :show() converts this thing
57:09 into LaTeX in the way specified by these
57:13 instructions here, that says that...
57:18 well, whatever...
57:25 and here's the result - the LaTeXed result...
57:35 and these diagrams here are generated by
57:42 this file here, that defines a simple
57:46 grammar that parses this thing here,
57:49 and then LaTeXes it in a certain way, and
57:52 and also tests to check if this code here...
57:56 this Lua code that generates an lpeg grammar...
58:01 parses this subject here and
58:06 returns the expected result...
58:11 So: this is the code that I
58:14 wanted to show. I wanted to show many
58:17 more things but I wasn't able to prepare
58:20 them before the conference... and I hope
58:23 that soon - for some value of "soon" -
58:27 I'll be able to create REPL-based
58:30 tutorials for lpeg, Re, and ELpeg1.lua...
58:31 where lpeg is something very famous,
58:35 Re is a module of lpeg...
58:39 I could also do something like this
58:42 for lpegrex... and ELpeg1.lua is
58:47 the thing that I wrote, the one that
58:52 has test in comments, and the tests
58:57 usually generate trees, and sometimes
58:59 they generate TeX code.
59:01 Yeah, so that's it! I wanted to
59:05 present much more but I wasn't able to
59:07 prepare it... so: sorry, thanks, bye! =)
|