" /> Francis Cheng: February 2008 Archives

« January 2008 | Main | March 2008 »

February 28, 2008

Type Parameters in ECMAScript 4th Edition

The ECMAScript 4th Edition draft specification includes something called type parameters, also known as parameterized types or generics. This feature is useful for adding type checking to collection classes. For example, let's say you have a List class that you have customized to your liking.

You like it so much, in fact, that you use it in many different situations. The only problem is that in ActionScript 3.0, you can't easily use strong typing to specify that a specific instance of List should contain only a certain data type. You could extend List, I supppose, and create a ListOfIntegers class, a ListOfStrings class, etc., but that can get hard to manage if you use it for more than just a few different types.

The type parameters feature, however, makes strong typing of collections much easier. If you saw my previous post about Vectors, you've already seen type parameters in action because the Vector class requires them. But let's rewrite the List class using a type parameter:

What we've done is define List such that it is now a "list of some type", where the type is represented by the parameter "T". In other words, the class List.<T> is "generic" in the sense that it is not tied down to a specific data type, at least until you specify a type when you create an instance of the class. This means that we can use the parameter "T" inside our class definition as if it were a type. You can see in the declaration of the variable head that we used a type annotation of "T". When we later create an instance of the List.<T> class, the compiler will substitute the type we specify for every occurrence of "T".

Let's say I have a class called Employee and I want to create an instance of my List class that stores only instances of the Employee class. Here's what it would look like:

So you're probably wondering what's up with the funny syntax for type parameters. How did we end up with the syntax className.<Type> for type parameters? Well, the working group went through a long discussion about it, which you can read on the type parameters proposal discussion page at the ecmascript.org site.

ES4 Type Parameters are Unbounded

One thing you may have noticed about the List.<T> class is that T can be any type. We passed in the type "Employee", but we could have passed in int, string, or any other built-in or custom type. There's no way in ES4 to limit, or in other words create bounds for, the types that you can pass in to for "T". Some languages with generics allow you to constrain the type parameter by setting up type boundaries. In other words, some languages allow you to specify a restricted set of types that can be used as typed arguments. For example, in Java you can define a class that takes a type parameter that must extend another class, so you could say class List<T extends Employee>, to get a List class that takes only subclasses of Employee. The ES4 spec does not require implementations to provide bounded type parameters.

ES4 Type Parameters are Invariant

Best way to explain this is through an example. Let's take that Employee class we talked about earlier and extend it with a new subclass called TechWriter. Clearly, TechWriter is a subclass of Employee, and can be safely used anywhere that Employee is used. But does that mean that List.<TechWriter> is a subtype of List.<Employee>? It turns out the answer is no. The ramifications of this can be seen in the following example, in which the function reprimand() takes a parameterized List as its parameter:

Calling the reprimand() function with a value of type List.<TechWriter> causes an error because the function is expecting the type List.<Employee>. Other languages use something called variance annotations to solve this problem. In this case, the error would go away if the subtype relationship of the two lists matched the subtype relationship of the types Employee and TechWriter. Such variance is called covariance. Java, for example, allows you to use wildcards to indicate that you want to allow covariance, something like <? extends Employee> would do it if you used that in the declaration of reprimand().

The type parameters proposal does leave open the possibility that variance annotations could be added in a later edition of ECMAScript. The proposal points to a journal article that proposes a variance annotation style that is simpler than Java's. If ECMAScript goes that route, you would be able to rewrite the function to support covariant type parameters with just one extra symbol (note the + symbol in the declaration of reprimand()):

Personally, I like the brevity of this style of variance annotation and hope that they adopt this for ECMAScript 5. The proposal lists a workaround, however, that purportedly mitigates the need for variance annotations. I have to say, though, that the proposed alternative is little intense for my taste, but hey, I'm just a tech writer. You can have a look at it and see if you like it better than using a variance annotation. Look for the code example that defines a class called Container.<T> on the type parameters proposal page under the section titled Variance Roadmap.

Here are some useful links I came across in researching type parameters. These deal mostly with generics in Java, C++ and C#:

February 19, 2008

ECMAScript 4th edition Reference Implementation M2

Last week the ES4 working group released a new version of the ES4 reference implementation (RI). The new build, M2 (for Milestone 2), replaces the M1 build that was posted back in November 2007. Today, one of the most active implementors of the RI, Graydon Hoare of Mozilla, posted a detailed "punch list" of all the proposed features that are either included or not included in the latest version of the RI.

Here's a little background about the RI for those of you who haven't been following the progress of the ES4 working group. Most of the working group members found the pseudo-code used in the ES3 specification hard to work with in that it was both hard to get the specification correct and hard to implement. As pseudocode, it was also hard to identify bugs. Not wishing to propogate these same problems into the next edition of the specification, the working group decided to create an reference implementation that serves several purposes. First, it provides an initial proof of concept that the new edition of the language can indeed be implemented. Second, it provides a much more precise way to specify what the language is supposed to do. Third, it makes it much easier to identify bugs in the specification.

After considering a few languages, the working group settled on Standard ML '97 and use a compiler/programming environment called Standard ML of New Jersey, or SML/NJ for short. That need not concern you, however, if you just want to play around with some of the new features because Dave Herman has created a set of binaries for Windows, Mac OSX and Linux that you can download from the ecmascript.org downloads area.

Here is Graydon's punch list of features that are in the "M2" build of the RI (from his original post to ES4-Discuss):

Implemented, may have bugs:

  • classes and interfaces
  • namespaces
  • pragmas
  • let, const, let-const
  • iterators
  • enumerability control
  • type expressions / definitions / annotations
  • runtime type checks ("standard mode")
  • nullability
  • destructuring assignment
  • slice syntax
  • hashcode
  • catchalls
  • map & vector
  • date & time improvements
  • meta objects
  • static generics
  • string trim
  • typeof
  • globals
  • expression closures
  • name objects
  • type operators (is / to / cast / wrap)

Implemented and partly working, but still in flux / work to do:

  • inheritance checking
  • strict mode
  • type parameters
  • structural types
  • numbers & decimal
  • getters & setters (structural part is incomplete)
  • packages

Partially implemented / not yet working:

  • program units
  • generic function
  • updates to unicode
  • updates to regexps

Unimplemented:

  • generators
  • tail calls
  • triple quotes
  • stack inspection
  • reformed with
  • resurrected eval (eval exists but may be wrong)
  • help for the argument object
  • "this function" / "this generator"

In case you missed it above, here is a link to the downloads area of the ecmascript.org site if you want to download binary versions of the RI (The source code is also available on that page):

February 13, 2008

Vectors in ECMAScript 4

A new built-in class named Vector is proposed for ECMAScript edition 4. This class is similar to the Array class, but is designed for better performance, efficiency and error checking. Some interesting aspects of the Vector class:

  • vectors are dense;
  • vectors do bounds checking;
  • vectors can be fixed length;
  • vectors have type parameters;
  • vectors have the same methods as arrays.

I'll take a closer look at each of these bullet points, but first, here's how the Vector class and constructor declaration will probably look in ActionScript, where 'T' represents a data type:

The Vector class is is defined with the final attribute right now, which means that you cannot create subclasses. This could change, however, before the specification is finalized. The length parameter has a type annotation of uint, which means that the maximum vector size is 232-1 and the largest index value you can use is 232-2.

Vectors are dense

If you are familiar with ActionScript arrays, you know that they can have "holes" in them. For example, let's say you have an array that has three elements and the length property is 3. That's a "dense" array with no holes. However, I can write to an arbitrary index number and the array will automatically "grow" to accommodate me. For example, if I write to index 10, the length of the array will grow to 11, but the array will have a hole in it between index 3 and 9, inclusive. Or, you could look at it as having 7 holes, with each undefined element considered a hole. Having holes like this can really slow down the process of iterating through the array. Vectors, by contrast, are always dense. You cannot have holes in a vector. Every element of a vector is always defined.

Vectors do bounds checking

So what happens if I have a vector with three elements and I try to assign a value to index 10, as I did with an array in the previous paragraph? The short answer is that you get a runtime error (a RangeError to be precise). In fact, this happens even you just try to read from that index, much less write to that index.

There is a little flexibility built in, however, for those who want to change the size of the vector. First, you can "grow" a vector by assigning to the very next available index. For example, if I have a vector with three elements (index positions 0, 1, and 2), I can directly assign a fourth element to index position 3. Second, I can alter the value of the length property to grow or shrink the size of the vector. For example, if I change the length to 11, I'll increase the size of my vector so that it contains 11 elements. This is somewhat similar to how the length property works for arrays, but the significant difference is that with vectors, all 11 of the elements are defined. Both of these techniques, however, work only if the fixed property is set to false.

Vectors can be fixed length

You may have noticed that the second parameter to the Vector constructor function is a boolean named fixed. The default value is false, but you can set this to true either at construction time or anytime thereafter using the fixed property of the Vector class. Whenever fixed is set to true, any attempt to change the size of the vector will generate a runtime error, whether the attempt is made by assigning a value to the index number that is equal to the value of length or by directly changing the value of the length property.

Vectors have type parameters

You can use type parameters to designate that you want a "mono-typed" vector, which means that the vector can contain values of only one specific type. For example, you can define a vector that holds only integers and has a fixed length of 7 by declaring:

The resulting intVector vector will be constrained to values of type int.

Vectors have the same methods as Arrays

The current plan is for the Vector class to have as many of the Array methods as possible. Vector will contain not only the traditional methods such as push(), pop(), slice(), sort(), etc., but also some of the newer methods such as every(), filter(), indexOf(), some(), etc. A complete list should be available soon on ecmascript.org.

This only scratches the surface of the new Vector class. Rest assured that there will be more to come about this new built-in class as we move forward with the draft specification. Here are some relevant links on the ecmascript.org site:

February 7, 2008

Logical Assignment Operators

Thought I'd switch gears this week and talk about Logical Assignment Operators (||=, &&=), a feature that is proposed for ECMAScript 4th edition, but is already implemented in ActionScript 3.0. To understand the rationale behind the Logical Assignment Operators, you need to understand how the logical operators in ECMAScript and ActionScript work. Most people think of these operators in purely boolean terms. In other words, if you have two boolean variables, conditionA and conditionB, you can use the logical operators to calculate logical results. For example, if one condition is true and the other is false, the logical AND operator returns a value of false because both conditions have to be true in order for logical AND to return true. The logical OR operator, on the other hand, returns true in this case because it takes only one condition to be true for the logical OR operator to return true:

The logical operators are far more interesting than this, however, in that they are not limited to Boolean values. You can use other values as operands to the logical operators. For example, let's say you want to check whether values have been assigned to a pair of string variables. The result is true only if both variables contain string values.

But it gets even more interesting because the logical operators don't necessarily return a Boolean value. What's really going on is that the logical operator returns the value of one operand or the other. For example, the logical AND operator examines the first operand and if that operand is either false or can be converted to false, the operator returns the first operand and never even looks at the second operand. If the first operand turns out to be true, the second operand is returned. We can see this in action if we just change the type annotation of the variable bothStringsSet in the previous example to type String:

This is null because the value of the first operand, str1, converts to false, so the logical AND operator returns the value of str1, which is null

This is also null because even though the value of the first operand now converts to true, this just means that the logical AND operator returns the value of the second operand, str2, which is still null.

The logical AND operator does the same thing here as it did in the previous code snippet--it returns the value of the second operand because str1 converts to true, but we get a different value as a result because the second operand now has the value bar. That's not a very practical example, so let's look at something that makes a little more sense.

Many programmers use the logical OR operator to assign default values. The logical OR operator is the exact opposite of the logical AND operator. In other words, the logical OR operator returns the first operand if it is either true or can be converted to true, and otherwise returns the second operand. What does this have to do with assigning default values? Well, you only want to assign a default value if a variable is undefined, so you could write the following code to assign a default value:

Alternatively, you could use the logical OR operator to serve the same purpose:

If myVar is undefined, the logical OR operator considers that as false, so it returns the value of the second operand, "default". If myVar has a value assigned already, the logical OR operator considers that as true and just re-assigns the value of myVar to myVar. It's a matter of personal taste. Some argue that using the if statement is easier to read. Nevertheless, there are some programs that use the logical operator technique, so at the very least it's handy to know about it.

Finally, we get to the "assignment" part of the logical assignment operators. For those of you who find the logical OR technique for assigning default values distasteful because it's hard to read, you may want to stop reading now. I say this because a common technique among Ruby and Perl programmers is to use an even more abbreviated form of this technique that takes advantage of the logical OR assignment operator. In other words, the following code:

can be rewritten using the logical OR assignment operator:

There you have what I believe to be the most common use of the logical OR assignment operator. Examples of the logical AND assignment operator are less common, but one use-case could be to increment a variable, but only when that variable is greater than zero. For example, the following code increments myNum only if it is not zero:

As parting words, I should say that I'm not advocating the use of this technique, but I do think it's worth understanding because you may come across code that uses this technique.

February 1, 2008

Proper Tail Calls and State Machines

It turns out that Proper Tail Calls (PTC) are handy if you want to create a simple "state machine" or "finite state machine" (See my previous post if you don't know what PTC is). A state machine is a way to create a model for the way an object behaves. For example, let's create a state machine model for a United States traffic light. First I'll use a design pattern to create one, then I'll show you a completely different approach using PTC. Here are the components of the model:

  • Finite number of states. There are three states in our example: red, yellow, and green.
  • Input. This is an event-driven model, so the input is an event that signifies a light change.
  • Finite number of transitions. There are three transitions in our example: red to green, green to yellow, and yellow to red.

There are many different ways to implement a state machine. You can use a series of if/else statements or a switch clause that controls the transitions from state to state, but that approach doesn't scale well. Instead, we'll use the state design pattern, originally presented by Gamma, et. al. in their famous "Design Patterns" book.

Traffic Light Using the State Pattern

First we start with a class that Gamma calls the "context", which in our case is a class I'll call TrafficLight. This class is sort of the "master" class that we'll use to represent the state machine. Second, we'll create an interface, let's call it IState, that contains methods that represent all the behaviors in our state machine. Well, in our case there's only one behavior, which is changeColor(). Third, we'll implement this interface with classes that represent each of our states. We'll have three classes: RedLight, YellowLight, and GreenLight, all of which will implement the IState interface.

For simplicity's sake, I'm just making my variables public and I'm ignoring the need for event listeners and a timer. Also, I'm including only the RedLight class because the YellowLight and GreenLight classes are essentially the same:

TrafficLight Class

IState Interface

RedLight Class

That's a lot of structure for a simple model, but the payoff is that you have a model that's easy to extend. For example, we could easily adapt this state machine to handle British traffic lights that have an extra state where red and yellow both appear. All it would take is a new RedYellowLight class and minor changes to the TrafficLight and RedLight classes. Now let's take a different approach that uses PTC.

Traffic Light with PTC

When I mean different, I mean that we'll leave the nicely structured world of class-based programming and enter the wild world of dynamic programming. So turn off Strict mode and let's not worry about type annotations. If we have PTC at our disposal, we can use it to design a quick and easy state machine that uses a function to represent each state. It can't get much simpler than that. When it is time to transition to a new state we make a tail call to the appropriate function.

I added a variable named cycles to control the number of times the program runs. What's interesting is that if I set cycles to anything over 252, I get a stack overflow error. I'm only decrementing in the redLight function, so that means each decrement of cycles represents three functions placed on the stack. So the error appears after a little more 750 of those functions are placed onto the stack. Clearly, you wouldn't want to use this technique in ActionScript today, but if PTC becomes part of ActionScript, the stack overflow error would not occur because each tail call function would replace the preceding call on the stack. Of course, this is not necessarily a better way to implement a state machine, but it is an alternative that certainly has an appealing simplicity to it. It may not scale as well as the design pattern technique, but may come in handy for implementing simple games.

Here are some links to resources about state machines: