Crim 0.01 - a Forth-like language Copyleft (C) Eduardo Ochs, 2000 This is the README file. 2000jul19 Introduction ------------ Crim is a very loosely-defined language based in Forth. Forth is already extremely extensible and one can change its syntax completely with relative ease; Crim is intended to let people do the same with even less effort. One can test entirely different inner interpreters in Crim just by adding new "modes" to its current inner interpreter, and the Crim tools used to implement words like Forth's LIT or <.">, that take immediate data, should be very easy to extend to complex parsers. The simple ideas behind Crim's way to deal with immediate data -- the streams stack plus the RSR words -- were the fruit of many weeks of very hard thinking in the middle of '95, and for me (I'm biased, of course:-) they still sound like the Forth counterpart of one of those mathematical theorems that establish deep connections in a simple way, like the Curry-Howard isomophism, that, by the way, is as marginal in Mathematics as Forth is marginal in Computer Science... Five years after getting to those ideas I'm still convinced that they are the way to go, and now that I've got some insights from Darrell Johnson's Perpol (http://www.boswa.com/misc/) about using Nasm to do the boring stuff I'm resurrecting the whole thing up and releasing an implementation that does some more interesting things, like calling C functions. Note: the Forth inner interpreter's three modes are not generally named; here I'll call them "head", "forth", and "assembler". See the other docs in this directory. Compiling and running Crim -------------------------- You must have Tcl, Make and Nasm. Just unpack everything and run "make"; then running "demo0", "demo1", etc will give you lots of debugging dump from the demos. If you want to run any of them without debugging info give them a numeric argument of 0, like: "demo2 0". The most interesting demo is demo2, as it is about calling C functions. It just prints "Hello There" in two lines, but the really interesting part is its bytecode, that is in see demo2.lst... the engine that runs it is demo2.engine.c; both are generated from demo2.tf by tclstuff, as described below. Note that this is still a hacker's version -- there's no interactive mode. This is not (yet) a Forth... But, on the other hand, the code is very simple and I hope that it shows the ideas clearly. The .tf files ------------- Urgh. The .tf ("Tcl-Forth") files are processed by tclstuff to generate a C file and a nasm file, that are then compiled and linked together to generate an executable; a .tf file contains a Crim "program" that is to be executed by the Crim "engine". The syntax of the .tf files is somewhat horrible. At this moment Crim doesn't have any definite syntax, just some ideas about its bytecode -- and even the details of the bytecode change when we change the set of instructions with one-byte forms or when we change the engine-xxx.c, for example to add more primitives or more modes. I'm using nasm to generate the array (well, sort of...) that contains the Crim bytecode that the engine will execute. IT IS MUCH BETTER TO EXAMINE THE BYTECODE BY INSPECT THE .lst FILE GENERATED BY NASM THAN TO TRY TO UNDERSTAND EVERYTHING BY LOOKING THE .tf FILES. THE .tf FILES ARE JUST HACKS!!! YOU HAVE BEEN WARNED!!! Having said that we can proceed to the technicalities. Tclstuff is usually invoked like this: cat foobar.tf | tclstuff foobar This will produce a "foobar.asm" and maybe a "foobar.engine.c" (if the "engine" variable is set; read on). Here's a short description of the .tf file syntax. Lines beginning with certaing strings are processed in special ways: * Lines that start with "#" are ignored. * Lines that start with "asm" are sent to stdout immediately. The nasm code is also sent to stdout as it is produced, and so the lines starting with "asm" will usually get into the nasm code. * Lines that start with "tcl" are evaluated by Tcl (technical details: in the toplevel, with "uplevel #0 $restofline"). Some examples of usage: * "tcl parray a_code" - Show the contents of the a_code array. This is useful to understand how tclstuff works. * "tcl set engine bletch.c" - If your .tf file has a line like this then tclstuff will produce a C file after writing out all nasm code; its name will be derived from the name of the .tf file -- foobar.tf generates foobar.engine.c, for example -- and it will contain some #defines and array definitions (more technically: the result of a "[join $c_defs "\n"]") followed by a copy of bletch.c. The resulting foobar.engine.c is a valid C file (or at least it's meant to!); bletch.c usually lacks some definitions. Other lines are processed in a Forthish fashion, one "word" at a time; as usual in the Forth world, words are delimited by whitespace. The only predefined words are the "tick words" listed below. Those ending with two "''"s gobble the two words coming after them; those ending with a single "'" gobble one word. The double-tick words are like their single-tick friends but they also define synonyms with better-behaved names that C and nasm can accept. Quick descriptions: tick word: used to define: "X' E" defines: HPRIM' a primitive head, like DOCOL (":") H_E E FPRIM' a primitive Forth word, like DUP F_E E SFPRIM' a primitive word with a one-byte F_E SF_E E ("short") form, like EXIT (";") FIPPRIM' a Forth-IP primitive. When the engine FIP_E E tries to execute Forth code in the address covered by a FIP it executes the FIP primitive instead. FIPs are often pushed on the return stack when we want to call words and have them do some something special when they finish. ' or F' a Forth word whose head is located at F_E ADR_E E the current ("HERE") address. S' same, but the word will also have a F_E ADR_E SF_E E short form and will usually be called using the short form. These tick words will define some words than can be later used in a .tf file; the action associated to each of the defined words is always like "output $a_code($word) to the nasm file"; I haven't yet extended tclstuff to support other actions for defined words. A line like "tcl parray a_code" in a .tf file will show which words have been defined up to that point, and for each one the corresponding value of $a_code($word); this can be useful for understanding tclstuff.