BlogMe3 - an extensible language for generating HTML(2013feb10: this is a mess! The rewrite of blogme3 into something much cleaner (BlogMe4) is essentially ready, but I have not yet checked if all my 250+ .blogme files would work with it...) Quick index:
1. Basic concepts1.1. Parsing (and pos and subj)Some of the most fundamental functions in the code of BlogMe are "parsers". They all try to parse a pattern in the "subject string" stored in the global variable "subj", starting at the position stored in the global variable "pos" (the names "subj" and "pos" come from Icon).
On success these patterns advance "pos" and return some non-nil value; on failure they keep "pos" unchanged, and return nil. Let's fix some terminology. Consider the grammar below; we will refer to these "patterns" by the names of the "non-terminal symbols", at the left of the "::=" signs.
All the "*"s and "+"s above are greedy. Parsers for all these symbols except "bigword" and "rest" can be implemented using just {lua patterns}; the code is {here}. As a curiosity, note that "parseblock" could be implemented with Lua's "balanced pattern" operator, as "%b[]" - but instead of doing that we use a table that tells for each "[" or "]" in "subj" where is the corresponding "]" or "[". The code is {here} and {here}.
1.2. EvaluationParsing, of course, is not enough - what really matters is that certain "expressions" can be "evaluated". For example, if we evaluate the following string as a "vblock" ("vblock" stands for the "value of a block"),
we get:
Let's follow in details what is happening here. To evaluate a block, we first parse its "head word" - HREF, in this case - and then we parse the "argument list" for that word; how to parse the argument list depends on the word, as in Lisp and Forth; we will see the details very soon - and then we call the "blogme code" associated to HREF, with those arguments; in the case of HREF its blogme code is stored in a global Lua function with the same name, and so this code is called as:
which returns:
Note that to generate the second argument, "foo5bar ploc", we had to evaluate the block "[+ 2 3]"; the result was the result of calling the blogme code for "+" with arguments "2" and "3". 1.3. Argument parsersJust as the blogme code for "HREF" was stored in Lua's table of globals (_G), the code for the argument parser for "HREF" was stored in _A["HREF"]. In the case of HREF, the argument parser function is "readvvrest", whose definition is:
the "readers" are like the "parsers", but they run "parsespaces()" before them, and when they "fail" they do not move pos back to before the spaces, and the return the empty string instead of nil. Now here are the exact rules for evaluating a block (in pseudocode, and without any error-checking):
Note that the table _B is checked before _G - this is to allow us to have blogme words with the same names as Lua functions, but whose blogme code is different from the lua function with the same name. When we evaluate a vword or a vrest we may have to concatenate several partial results - some from parsing "words" or "normalchars", some from parsing "blocks" - to form the final result. The convention (the code is {here}) is that when we only have one partial result coming from a block, then it is not transformed in any way - this lets us have blocks that return, say, Lua tables. For example, with the right (obvious) definitions for "print", "expr:", and "+", this
would print the same output as:
1.4. The core and the angg files1.5. Invoking blogme3.luaIf we are at /tmp, and there's a file /tmp/blogme whose contents are
and we invoke blogme3.lua with arguments "-o foo.html -i foo.blogme" , we will see something like this,
Let's understand what happened. The first thing that blogme3.lua does is to extract from arg[0] the directory where blogme3.lua resides, and add it to the path (the code is here; then it loads some files, with
then it processes the command-line arguments. For each recognized command-line argument there is an entry in the table _O - defined in options.lua - that describes how to process that option; for example, for "-i" we have this:
The loop that processes the options is this simple recursion, in blogme3.lua: The argument following "-o" is the name of the output file; as we shall see (in sec ___), some setup actions can only be performed after "-o" - for example, all definitions that depend on the base directory for relative links. The option "-i" treats the argument following it as the name of an input file to be evaluated "in the normal way"; the contents of foo.blogme are evaluated as a "vrest", and the result of this is discarded (that's why the "Blah" at the end of foo.blogme disappeared!), but the contents of the global variable _output are written to the output file, whose name is the global variable _. If either outfile or the outcontents were empty we would get an error - but htmlize treated its first argument ("Foo bar") as the title of the html page ("[J Foo bar]" -> "Foo bar"), used the "rest" of its arguments ("[P A paragraph]") as the body of the html, wrapped that within html headers, and stored the result in outcontents. The word "lua:" (...) A more precise descriptionThe core of Blogme is made of a parser that recognizes a very simple language, and an interpreter coupled to the parser; as the parser goes on processing the input text the interpreter takes the outputs of the parser and interprets these outputs immediately. This core engine should the thought as if it had layers. At the base, a (formal) grammar; then functions that parse and recognize constructs from that grammar; then functions that take what the parser reads, assemble that into commands and arguments for those commands, and execute those commands. I think that the best way to describe Blogme is to describe these three layers and the implementation of the top two layers - the grammar layer doesn't correspond to any code. Looking at the actual code of the core is very important; the core is not a black box at all - the variables are made to be read by and changed by user scripts, and most functions are intended to be replaced by the user eventually, either by less simplistic versions with more features, or, sometimes, by functions only thinly connected to the original ones. 2. AncestorsI know that it sounds pretentious to say that, but it's true... Blogme descends from three important "extensible" programming languages - Forth, Lisp, and Tcl - and from several The design of Blogme was inspired mainly by - or borrows ideas from - Forth, Lisp, and Tcl. 2.1. ForthThis is a Forth program that prints "
Forth reads one word at a time and executes it immediately
(sometimes it "compiles" the word instead of running it, but we
can ignore this now). ` So - the Forth interpreter (actually the "outer interpreter" in
Forth's jargon; the "inner interpreter" is the one that executes
bytecodes) reads the word ` 2.2. LispIn Lisp all data structures are built from "atoms"
(numbers, strings, symbols) and "conses"; a list like
the "
the " 2.3. Tcl(3) Tcl. In Tcl the main data structure is the string, and Tcl
doesn't even have the distinction that Lisp has between atoms and
conses - in Tcl numbers, lists, trees and program code are just
strings that can be parsed in certain ways. Tcl has an evaluation
strategy, given by 11 rules, that describes how to "expand", or
"substitute", the parts of the program that are inside Here are some examples of Tcl code:
2.4. THBlogme descends from a "language" for generating HTML that I
implemented on top of Tcl in 1999; it was called TH. The crucial
feature of Tcl on which TH depended was that in
but it wasn't hard to construct slightly longer TH scripts in which
a part of the "body of the page" - the second argument to htmlize -
would become, say, an ASCII diagram that would be formatted as a
3. The source files3.1. brackets.lua: the parsers (_A and _B)3.2. definers.lua: def and DEF (_AA)3.3. escripts.lua: htmlizelines (_E)3.4. elisp.lua: makesexphtml (_EHELP, _EBASE, etc)3.5. dooptions (_O)(2007apr18: Hey! The rest of this page refers to BlogMe2, that is obsolete... I just finished rewriting it (-> BlogMe3), but I haven't had the time yet to htmlize its docs...) (2005sep28: I wrote this page in a hurry by htmlizing two of blogme's documentation files, README and INTERNALS, which are not very clean...) See also the entry about BlogMe in my page about little languages. 4. IntroductionThe "language" that blogme2.lua accepts is extensible and can deal with input having a lot of explicit mark-up, like this,
and conceivably also with input with a lot of implicit mark-up and with control structures, like these examples (which haven't been implemented yet):
BlogMe also support executing blocks of Lua code on-the-fly, like this:
4.1. How the language worksBlogMe's language has only one special syntactical construct, " To "evaluate" an expression like
we only parse its "head" - " 4.2. How
|
HREF(_A["HREF"]()) |
_A["HREF"]
returns a function, vargs2
, that uses the rest
to produce arguments for HREF
. Running vargs2()
in that
situation returns
"http://foo/bar", "a link" |
and HREF
is called as HREF("http://foo/bar", "a link")
.
So, to define HREF
as a head all we would need to do ("would"
because it's already defined) is:
HREF = function (url, text) return "<a href=\""..url.."\">"..text.."</a>" end _A["HREF"] = vargs2 |
def
Defining new heads is so common - and writing out the full Lua code
for a new head, as above, is so boring - that there are several tools
to help us with that. I will explain only one of them, "def
":
def [[ HREF 2 url,text "<a href=\"$url\">$text</a>" ]] |
"def
" is a lua function taking one argument, a string; it
splits that string into its three first "words" (delimited by blanks)
and a "rest"; here is its definition:
restspecs = { ["1"]=vargs1, ["2"]=vargs2, ["3"]=vargs3, ["4"]=vargs4, ["1L"]=vargs1_a, ["2L"]=vargs2_a, ["3L"]=vargs3_a, ["4L"]=vargs4_a } def = function (str) local _, __, name, restspec, arglist, body = string.find (str, "^%s*([^%s]+)%s+([^%s]+)%s+([^%s]+)%s(.*)") _G[name] = lambda(arglist, undollar(body)) _A[name] = restspecs[restspec] or _G[restspec] or error("Bad restspec: "..name) end |
The first "word" ("name") is the name of the head that we're defining; the second "word" ("restspec") determines the _GETARGS function for that head, and it may be either a special string (one of the ones registered in the table "restspecs") or the name of a global function.
_G
: Lua's table of globals (rmt)
_W
: blogme words
_P
: low-level parsers
_A
: argument-parsing functions for blogme words
_AA
: abbreviations for argument-parsing functions (see `def')
_V
: blogme variables (see "$" and `withvars')
(Source code: the function `run_head
', at the end of blogme2-inner.lua.)
Let's examine an example. When blogme processes:
[HREF http://foo bar] |
it expands it to:
<a href="http://foo">bar</a> |
When the blogme evaluator processes a bracketed expression it first
obtains the first "word" of the brexp (called the "head" of the
brexp), that in this case is "HREF
"; then it parses and evaluates
the "arguments" of the brexp, and invokes the function associated to
the word "HREF
" using those arguments. Different words may have
different ways of parsing and evaluating their arguments; this is like
the distinction in Lisp between functions and special forms, and like
the special words like LIT in Forth. Here are the hairy details: if
HREF
is defined by
HREF = function (url, str) return "<a href=\""..url.."\">"..str.."</a>" end _W["HREF"] = HREF _A["HREF"] = vargs2 |
then the "value" of [HREF http://foo bar]
will be the same as
the value returned by HREF("http://foo", "bar")
, because
_W["HREF"](_A["HREF"]()) |
will be the same as:
HREF(vargs2()) |
when vargs2
is run the parser is just after the end of the
word "HREF
" in the brexp, and running vargs2()
there parses
the rest of the brexp and returns two strings, "http://foo"
and
"bar"
.
See: (info "(elisp)Function Forms")
and: (info "(elisp)Special Forms")
(Corresponding source code: most of blogme2-inner.lua.)
Blogme has a number of low-level parsers, each one identified by a string (a "blogme pattern"); the (informal) "syntax" of those blogme patterns was vaguely inspired by Lua5's syntax for patterns. In the table below "BP" stands for "blogme pattern".
BP Long name/meaning Corresponding Lua pattern -----+----------------------+-------------------------- "%s" | space char | "[ \t\n]" "%w" | word char | "[^%[%]]" "%c" | normal char | "[^ \t\n%[%]]" "%B" | bracketed expression | "%b[]" "%W" | bigword | "(%w*%b[]*)*" (but not the empty string!) |
The low-level parsing functions of blogme are of two kinds (levels):
See: (info "(bison)Semantic Values")
These low-level parsing functions are stored in the table `_P
', with the index being the "blogme patterns". They use the global
variables `subj
', `pos
', `b
', `e
', and `val
'.
An example: running _P["%w+"]()
tries to parse a (non-empty)
series of word chars starting at pos
; running _P["%w+:string"]()
does the same, but in case of success the semantic
value is stored into `val
' as a string -- the comment ":string
" in the name of the pattern indicates that this is a "parse
and process" function, and tells something about how the semantic
value is built.
(*): Blogme patterns containing a semicolon (";") violate the convention that says that patterns that fail do not advance pos. Parsing "A;B" means first parsing "A", not caring if it succeds or fails, discarding its semantic value (if any), then parsing "B", and returning the result of parsing "B". If "A" succeds but "B" fails then "A;B" will fail, but pos will have been advanced to the end of "A". "A" is usually "%s*".
(To do: write this stuff, organize.)
Files:
There is no .tar.gz yet (coming soon!).
Lua seems to be quite popular in the M$-Windows world, but I haven't used W$ for anything significative since 1994 and I can't help with W$-related questions. If you want to try BlogMe on W$ then please consider writing something about your experience to help the people coming after you.
A BlogMe mode for emacs and a way to switch modes quickly (with M-m).
A note on usage (see the corresponding source code):
blogme2.lua -o foo.html -i foo.blogme |
This behaves in a way that is a bit unexpected: what gets written
to foo.html is not the result of "expanding" the contents of
foo.blogme - it's the contents of the variable blogme_output
. The
function (or "blogme word") htmlize
sets this variable. Its
source code is here.
History: BlogMe is the result of many years playing with little languages; see this page. BlogMe borrowed many ideas from Forth, Tcl and Lisp.
How to get in touch with the author.
A test (2007apr26):
#* # (find-localcfile "foo/") # (find-localcfile "foo/" "ignored") # (find-localc "foo/" "should be ignored?") # (find-localc "foo/bar" "becomes a tag") # (find-localcw3m "foo/bar.html#tag" "ignored") # (find-remotecfile "foo/") # (find-remotecfile "foo/" "ignored") # (find-remotec "foo/" "should be ignored?") # (find-remotec "foo/bar" "becomes a tag") # (find-remotecw3m "foo/bar.html#tag" "ignored") |