Computer Graphics Workshop '97 Lecture Notes

1/29/97

Today's topics
Why scripting languages are useful

Why do people use scripting languages rather than C?

Scripting languages usually allow more run-time extensibility than C programs do. For example, new procedures and data types can usually be defined in a scripting language while the base program is still running, while in C you would have to recompile and restart your program in order to add a new data type to it.

Many scripting languages are interpreted at the program text level, which means they allow the user to type in new programs at runtime. This is useful for rapidly prototyping new programs, because the delay between typing in part of a new program and seeing it run is eliminated. In C and other compiled languages the compilation time usually destroys this interactivity.

In addition to interactivity and extensibility, most scripting languages have the advantage that the same program runs without modification on more than one type of computer. Combined with the above features, this means that executable program code can be sent among computers. (Example: Java applets.) This is useful for portability and for exploring more dynamic types of programming.

The Programming Language Exploration pages describe many scripting languages.

How they are implemented

Many (most?) extension languages are implemented in C. Examples include many currently available Scheme implementations, Java, Python, and Perl. The general structure of the program implementing a scripting language looks like this:

  1. Read in program text from the standard input.
  2. Convert it into an intermediate format. (Parse tree, bytecodes, ...)
  3. Step through the intermediate program code, executing each instruction.
  4. If/when done, return to step 1.

Some scripting languages convert the program text into the intermediate format as a preprocessing step. (For example, Java is interpreted at the bytecode level.)

Some scripting languages allow your C program to maintain control of the main loop and call the scripting language to evaluate new code. This type of language is called an embeddable language. Others (like Java) take control of the main loop, and allow you to add new commands to the language (see below). These are usually called extension languages. Many embeddable languages are also extensible.

Recall that scripting languages allow the user to define new data types at runtime. How are these data types and objects of these types represented? Typically the scripting language defines an "opaque data type", which is just a pointer in C, representing objects in the scripting language. In order to get information about the object, the C program must call a procedure to ask the scripting language's environment about the object in question. This layer of indirection is what makes runtime extensibility possible.

For example, in SCM, all Scheme objects are represented in C as variables of type SCM. These represent all objects in Scheme, including numbers, strings, and cons cells. For example, an SCM representing a cons cell can be decomposed using the C CAR and CDR macros.

Interfaces to C and C++

Extension languages have some mechanism for allowing the programmer to write a C function and create a new command for it in the interpreter. Then when a program written in the scripting language calls this command, the interpreter looks it up in its list of commands, sees it is implemented as a C function, and calls that function with the arguments that were passed in from the scripting language.

A hypothetical example:

> (my-c-function 2 5)
would get parsed by the interpreter, which would find that the command "my-c-function" is actually implemented by the C procedure "foo". Foo would then get called with an argument or arguments representing the values passed in from the extension language (here, 2 and 5). It would convert these values from the extension language's format, if necessary, compute its result, convert that result into the extension language's format, and return it to the extension language.

Note the distinction from a procedure written in the scripting language itself. Functions implemented in C are generally functions used for implementing the scripting language. Higher-level procedures written in the scripting language then tie these basic commands together to form programs.

How do you add new commands so they are available in this way? You tell the interpreter what function to call upon receiving a certain string. This function must match a certain signature (for example, taking an opaque data type as its first and only argument, which represents the list of arguments passed from the extension language). Basically, you're registering a callback with the interpreter, in exactly the same way you did with Inventor's sensors and draggers.

Automatic interface generation

The above process of registering a new command assumes the implementing procedure matches a certain signature. What if we already have a large library of C or C++ functions, and want to hook them up to an extension language?

We can write a set of "glue" procedures, or "stubs", which match the signature the interpreter expects. After implementing a few of these stubs, it rapidly becomes apparent that they all share the same common structure. (This mechanism is the same for various extension languages, from SCM to Java. Note the similarity with the working of "foo", above.)

  1. Convert the arguments passed from the extension language into C-compatible data structures. (For example, convert Scheme vectors or Java arrays into C arrays.) This may include checking that the number of arguments passed was correct, and in dynamically typed languages like Scheme, that the type of each argument was correct, signalling an error where appropriate.
  2. Call the C function to which this glue procedure was designed to interface. (This implies that for each C function in the library has a glue procedure.)
  3. Take the result of the above function call, convert it into the extension language's format, and return it to the interpreter.

Implementing these glue procedures is not difficult once the structure for the particular extension language has been discovered. In fact, implementing them becomes tedious rather quickly, because most of the code in them is the same. For that reason many extension languages provide stub generators, which read in the descriptions of functions in some format and output these stub functions automatically. Most of these stub generators either operate on C procedures only, or output only stub skeletons which must be filled in manually (like javah).

Since Open Inventor is a C++ class library (no automatic stub generator for SCM), and it was impractical to manually write stub functions, Header2Scheme was developed. This program descends a directory tree full of header files for C++ classes, and outputs glue procedures for each class. This program was used to automatically generate the Scheme interface to Open Inventor (aside from a few exceptions which had to be hand-coded).

These glue procedures work in exactly the way described above with one exception. It was necessary to provide some sort of dynamic system for making method calls on C++ objects. Think about the differences from pure C functions: methods are conceptually "bound" to a particular object (actually, in C++ they are scoped within a class), while functions live in the global scope. It would have been unworkable to require the user to remember which methods were associated with which class, and make a function call by typing (className::methodName object args...). Therefore the "->" operator was introduced, which checked the passed object's class to see if the requested method existed, and called it if so. This allowed the imitation of C++ syntax. The wrapper function "send" allowed Scheme and C++ objects to be treated in the same way, using the message passing paradigm.

Example applications

There are several examples of applications which use scripting languages to provide user extensibility. Photobook, an image database program, has basic database access and retrieval functions written in C, and a user interface written in Tcl/Tk. The user can write drivers which do various analyses on the database (like how well it's performing) just by writing a new Tcl script.

ALIVE had two scripting language interfaces: Tcl and Scheme. The Tcl interface was higher-level, allowing the user to load in new creatures at run time. The Scheme interface provided lower level access to the internals of the creatures, and was designed to allow new creatures to be written entirely in Scheme.

Netscape is currently based around Javascript, an interpreted language for representing web documents. Each time a document is loaded, it is converted into a Javascript object tree, with objects representing the document and its components (like paragraphs, lists, pictures, Java applets, and embedded plugins). Javascript allows small pieces of interpreted code to be embedded in web pages.


Back to the CGW '97 home page

$Id: index.html,v 1.1 1997/01/24 17:50:06 kbrussel Exp $