PLPV

This is the first of a series of posts based on experiences at POPL (Principles of Programming Languages), 2012, in Philadelphia PA.

PLPV (Programming Languages meet Program Verification) is a workshop co-located with POPL, that focuses on ways to reduce the burden of program verification by exploiting properties of the underlying programming language (e.g., types, contracts).

Alan Jeffrey (Bell Labs) gave the first talk of the morning, digging into an intriguing connection between linear temporal logic (LTL) and functional reactive programming (FRP) via the Curry-Howard isomorphism. The basic idea is that because signals in FRP are time-indexed values, their types should be time-indexed as well: but time-indexed types can be read as LTL formulas, following the “types as propositions” slogan. One interesting observation was that the “until” modality in LTL does not have a natural programming interpretation, but its dual, the “constrains” modality, does. (Recall that, for any formulae A, B: A “constrains” B holds at time S whenever, for all times T >= S, if A holds in the interval [S,T) then B holds at T.) In particular, whereas the “constrains” modality forms the basis of compositional reasoning about rely/guarantee properties of concurrent systems, it is reflected in programming as (decoupled) causal functions on signals: functions whose outputs at time T depend on the history of inputs up to (but not including) T. Alan used these insights to show that FRP primitives such as switching and looping can be encoded in a fairly direct manner via LTL types. In the ensuing Q/A session, it was suggested that “until” may be interpreted in programming as “progress,” and that decidability results for LTL could be leveraged for type checking FRP. Overall, super exciting stuff: this will require some further meditation to internalize and apply!

Over a coffee break, I had a nice conversation with Chris Hawblitzel (MSR Redmond) about our ongoing work on ActionScript, and he was quite happy about the potential for impact in this work. We shared our experiences working with product teams at Adobe and Microsoft.

Next, Benjamin Pierce (UPenn) gave a keynote on CRASH/SAFE, a new collaborative project on building a computer with information-flow safety built in at all levels: hardware, operating system, programming language. In particular, Benjamin went over some interesting design tidbits that showcased what can be conceived and achieved when all legacy constraints are removed: the power and challenges of applying the state-of-the-art with a clean slate.

Over lunch, I was pleasantly surprised to find my undergrad classmate, Akash Lal (who is now at MSR India, and works on concurrency); and I had a long conversation on language design work with Ravi Chugh (who has recently joined Mozilla, and works on type systems). We were joined by Pat Rondon (UCSD, who is the principal driver of the very cool “liquid types” line of work) and Jeff Vaughn (UCLA, who has been working on the now much larger Android security project that we started at UMD).

I skipped the session immediately after lunch to catch up on emails (and some sleep, which, for some reason, always eludes me the first night of a hotel stay).

The last session of the day featured another cool talk on “dependent interoperability”: the basic idea being that it should be possible to mix programs in simply-typed and dependently-typed sublanguages by mediating value-passing at the boundaries, via “boundary terms” that translate values up and down as required, and whose types can be derived by looking at the corresponding constructors for the data types of those values. For example, a library for dependently typed lists would require passing lengths around, and it should be possible to have a simply-typed client use this library by managing lengths on the side. Overall, this seems to mirror hybrid / gradual typing, but is sufficiently intriguing to deserve a deeper look.

POPL begins tomorrow! Eagerly looking forward to a lot of learning throughout the rest of the week. (Our type inference paper will be presented on Friday.)

The V8 Myth: Why JavaScript is not a Worthy Competitor

Disclaimer: The following opinion is personal; it does not in any way reflect the opinion of Adobe.

UPDATE (01/18/2012): There have been some misunderstandings around the purpose of this post. The purpose is not to undermine JavaScript; it is to simply point out why ActionScript can do better given the compilation/execution model it inherently has.

UPDATE (01/18/2012): Some benchmark numbers posted below. More to come.

If you found yourself gaping in shocking disbelief at the premise in the title of this post, congratulations! You are the target audience for this post.

JavaScript programs are untyped, (relatively) small programs that are shipped/loaded as source code, and then compiled and run on the fly. In contrast, ActionScript programs are typed, (relatively) large programs that are compiled to byte code, shipped/loaded as byte code, and then run on the fly.

There is something inherently wrong in an argument that is based on the claim that JavaScript can do all its heavy-lifting work post-load time, and still do as good as, if not better than, a language that has the opportunity to do much of that heavy-lifting work pre-load time. What is wrong in such an argument is the reverence of, the dependence on, and the submission to, magic. Unfortunately, sooner or later, we all get to hear that Santa does not exist: the question is, can we grow up sooner?

Dynamic analysis is a great complement to static analysis: unfortunately, it is not a replacement. An ActionScript program that has been optimized to death by an AOT compiler can, almost trivially, beat a JavaScript program that is optimized on the fly by the JIT compiler. The only way out would be to let the JIT compiler work till death, but that is not an option! Checkmate.

So yes, you will hear about all the great things in V8 (JavaScript VM), including type inference. The fact is, there is no way a JIT compiler can afford to do the kind of modular analysis that an algorithm implemented in an AOT compiler can. To prove this unbelievably obnoxious claim, I compared the performance of the untyped JavaScript benchmarks in the Tamarin (ActionScript VM) performance suite, by passing them through, on the one hand, V8, and on the other hand, the type inference algorithm we recently prototyped in Falcon (ActionScript compiler), followed by Tamarin.

The ActionScript toolchain beat the JavaScript toolchain by around 3x.

(Note that a side-to-side comparison was never possible before we implemented type inference in Falcon; effectively, we regenerated the fully typed versions of the untyped JavaScript benchmarks automatically, and let the performance benefits enjoyed by typed ActionScript programs carry over to untyped ActionScript programs.)

So, let us stop worrying about JavaScript, and aim higher. As we focus on gaming, ActionScript programs will require significantly better optimizations for performance. ActionScript has just the right DNA mix for success, and it will become the 21st century language it could always be.

Sample numbers for Tamarin’s test/performance/jsbench benchmarks (running time in seconds for maximum datasizes in each case): 

  • SOR: 23.1 (V8), 4.6 (Falcon + type inference + AVM)
  • LUFact: 138.7 (V8),  23.3 (Falcon + type inference + AVM)
  • HeapSort: 10.0 (V8),  14.7 (Falcon + type inference + AVM)
  • FFT: 75.6 (V8),  31.1 (Falcon + type inference + AVM)
  • Crypt: 25.1 (V8),  7.4 (Falcon + type inference + AVM)

 

Type Inference for ActionScript

A distinguishing feature of ActionScript is that its type system combines static and dynamic typing: an idea that is popularly known as “gradual typing.” Gradual typing is a fancy name for a rather simple, familiar idea. The idea is to facilitate the gradual evolution of “scripts” to “programs,” where scripts describe dynamically typed, volatile code written for prototyping and testing, and programs describe statically typed, mature code written for scaling and maintaining. In other words, gradually typed languages encourage the mixing of statically-typed and dynamically-typed code fragments, with dynamic checks and casts enforcing the soundness of typing at the boundaries between those fragments. The key promise of a gradually typed language is that the statically typed fragments continue to benefit from performance optimizations and other guarantees typically enjoyed by programs in statically typed languages, while still being able to interact with the dynamically typed fragments. Unfortunately, while ActionScript goes part of the way towards gradual typing bliss, it falls short of keeping any such promise.

For example, consider the following code:

function f():Number { ... }
var x = f(); // var x:* = f()
var y:Number = x++;

Today, this code runs as follows (well, almost: there are some tricks played in the JIT, but we will revisit their limitations in another post). The result of f(), typed Number, is passed into the dynamically typed variable x. (The missing type for x is interpreted as the dynamic type, *: something that we will revisit below.) Since the content of x must match its (dynamic) type, the result is “boxed” at run time: the static type Number is converted to a dynamic type tag, the tag is attached to the result, and the result is (literally) packed in a box to store into x. Next, x is passed into a variable y with static type Number, and is incremented. At this point, the dynamic type tag attached to the content of x is checked to be Number, the boxed result is “unboxed,” the result is stored into y, and the result is incremented and boxed again to store it back into x.

Horrifying, isn’t it?

And now, imagine what so many unnecessary checks and casts do to loops.

To be fair, dynamic types have their uses. No matter how expressive your type system is (think polymorphic types, logical types, algebraic types, dependent types, …) you can never express all invariants of your program with types and still hope to check them statically: laws of physics will haunt you (the “Turing halting problem,” anyway). Dynamic types allow you to get around the limitations of the type system, whatever they may be.

However, dynamic types should be the fallback, not the default! In particular, it is an unnecessary and costly mistake to interpret missing types as dynamic types. Often, it is (relatively) easy for the compiler to infer that a missing type is in fact a static type. For example, the code above can be rewritten automatically to the following code, involving no dynamic checks and casts:

function f():Number { ... }
var x:Number = f(); // var x:* = f()
var y:Number = x++;

This code runs much faster!

(Of course, “real-world” examples can be far more complicated than the above example, but the point remains.)

Bottom line, it is not the job of the programmer to tell the compiler whether some code fragment can be typed statically or dynamically. At best, it is an unnecessary hassle; at worst, it is an impediment to optimizations.

So, what can we do to fix this situation?

Last summer, we put our heads together to think hard, and by mid-summer, we had come up with a revolutionary type inference algorithm for ActionScript. (This research will be published in this year’s POPL; check out the paper!) The algorithm relies on a modular analysis of data flows through source code, where the data may (of course) include functions and objects. The distinguishing feature of the algorithm is that it is provably backward-compatible: existing programs can be aggressively optimized by type inference without the slightest fear of breaking their run-time behaviors! (To guarantee this property, the algorithm relies on several technical novelties, including “kind-based closure computation” and “type-directed escape analysis”: the details can be found in the paper.) Furthermore, the algorithm is provably efficient, which means it can not only be implemented in the ActionScript compiler, but also made available in the ActionScript IDE (Flash Builder)!

Over Christmas, I implemented a prototype of our type inference algorithm in Falcon, the next-generation compiler for ActionScript, and ran our dynamically typed benchmarks through the pipeline. Our algorithm could automatically recover static types for those benchmarks, giving us around 2x-5x speedups without even touching the VM! This is, of course, only the beginning: byte code optimizations, language improvements, and a lot more will continue to bump up performance in the coming year, and I will discuss some of those ideas in later posts. It also makes sense to investigate adapting and integrating our algorithm in the JIT, so that opportunities for performance optimizations are found not only in recompiled source code, but also in existing byte code.

These are exciting times.

Security analysis of Flash applications?

While my primary involvement at Adobe happens to be around the ActionScript language, I also have some background in security, and recently I have been thinking about channeling some of that into designing and implementing tools for secure programming in ActionScript. ActionScript programs are compiled to run as Flash applications (on the web) or as AIR applications (on the personal computer / mobile device); as such, they are run on platforms with security models, and their security on those platforms is controlled by various security mechanisms, both at the language-level as well as at the platform-level.

As for most platforms, there are several security controls that an application developer needs to grok, to churn out something that is actually secure (more on that later); and these controls have various nuanced security implications. Application developers who target these platforms may range from sophisticated power-users who have a deep understanding of web, computer, and mobile security (including environment-specific assumptions, attacks, and defenses) to “novice” graphics-artists who have neither the time nor the patience (and rightly so) to appreciate any cyber-security issue beyond what might be driven by common sense. For the latter users, security controls perhaps need to be automated. And even for the former users, it may be unreasonable to expect thorough and precise knowledge of the semantics of every available security control, let alone the effects they may have (desired or not) when combined in code. Yet, we must assume that attackers will always be sophisticated enough to identify any security vulnerabilities in the code, and exploit them in the worst possible manner.

So what can we do to improve this situation? Rigorous documentation can help. Illustrative tutorials can help some more. But, ultimately, we need to recognize that programmers (as humans) are not very good at combinatorial reasoning…machines are. In other words, programmers need automated tools to help them identify security vulnerabilities and eliminate them in their code. Usually, there is a gap between their high-level security intentions and the low-level security mechanisms that they need to maneuver to satisfy those intentions. Filling that gap is the job of experts; but since not all programmers are experts or have access to experts or even know that such expertise is necessary, the job has to be done by tools designed and implemented by experts…once and for all.

So I met with Peleus Uhley (platform security strategist) today, to discuss some ideas and plans along these lines. I also ended up meeting several others in the security engineering team, including: Dimitri DeFigueiredo (security researcher), who remarked that having language (static or runtime) support for immutability would be very nice from a secure-programming perspective; and Brad Arkin (senior director of product security and privacy), who shared some cool thoughts on improving the perception of security around Flash and AIR. Overall, I pushed the vision of implementing some static security analysis in our authoring tools (Flash Builder / Flash Pro) and eventually baking it into the compiler. Higher-level security intentions become security annotations (manual or auto-generated), and lower-level enforcement of those intentions with the available security controls becomes the job of a specialized static analysis-plus-synthesis tool that correctly understands what security really means on Flash and AIR, and how the security controls are implemented. And understanding what security means on those platforms is really a matter of specification: perhaps something like authorized information flow. To be investigated.

STOP

This is the fourth of a series of posts based on experiences at POPL (Principles of Programming Languages), 2011, in Austin TX. (Here’s the firstsecond, and third.)

STOP (Script To Program Evolution) is a special interest workshop, colocated with POPL.

The day began with an invited talk by John Field (IBM Research) on Thorn, a new language for distributed computation with features for concurrency (actors with message passing) and gradual typing (like types).

I talked further about our POPL work, and gave a demo of our implementation. The response, as in POPL, was very encouraging: the overall consensus seemed to be that this was a really cool, “out-of-the-box” idea that would be great to continue work on.

Some students presented really cool work of particular interest to me. Tim Disney (UC Santa Cruz) talked about gradual information flow. It seems to me that this work can be extended to DCC (Abadi et al, POPL 1999), and thus may be more generally applicable to program analysis. Something to dig into when I have time. Ravi Chugh (UC San Diego) talked on polymorphic type inference with run-time logs. This work seemed to be along the lines of our POPL work, with some differences: we did not try to infer polymorphic types, but the types we infer are “principal” (most general). Frankly I do not think that their iterative inference algorithm based on contextual information has meaningful foundations; but I have to think more about this to be sure.

Phil Wadler expressed the need for evidence on whether type inference really matters for scripting languages. I had some thoughts around this, which I wanted to throw up for discussion but we ran out of time. I think that we need to be clear about what benefits types bring to programmers, and at what costs. Indeed, why would programmers want to evolve scripts to programs? Types improve robustness and performance. My stand is that to sell gradual typing, we need to sell performance, perhaps more so than robustness, since performance is a measurable benefit that the programmer can weigh against the cost of annotating or rewriting parts of the program.

Furthermore, should type inference coexist with gradual type checking? And what should the absence of type annotations mean: “dynamic” or “unknown”? (Remember Phil Wadler’s comment on the importance of syntax!)

After a quick lunch and a round of goodbyes, I rushed to the airport. There, I ran into Cormac Flanagan. Unfortunately, we were on different flights back. Picking up on a previous chat at POPL, we briefly talked some more on dynamic type inference and gradual typing; it seems they have made some interesting progress on semantic models for contracts, which may generalize some of this stuff. Seriously fun area to collaborate on in the future.

POPL (Day 3)

This is the third of a series of posts based on experiences at POPL (Principles of Programming Languages), 2011, in Austin TX. (Here’s the first and the second.)

The day began with a retrospective on Robin Milner, who passed away a few months back after a life of fundamental work on verification, languages, and concurrency. The session featured invited talks by Bob Harper (CMU), Alan Jeffrey (Bell Labs), Andy Gordon (MSR Redmond), Peter Sewell (University of Cambridge), and John Harrison (Intel).

I skipped the next session as well as the final session of the day in the interests of socializing, since this was after all the final day of POPL, and furthermore, several people were leaving before the end of the day to attend the PLDI PC meeting in Chicago.

The session in between was on compilation, which featured some cool talks: on a convex characterization of loop transformations, and tractable optimization algorithms over this expressive space; and on modular instruction selection. But the most interesting talk for me was on the essence of compiling with traces (by folks at UC Los Angeles). I followed up by wondering whether their theory said anything interesting about type specialization, which is an important application of trace-based JIT compilation: in particular, whether it subsumes or can coexist usefully with type inference. I briefly talked with Jens Palsburg (UC Los Angeles) on this, and thought about it further on my flight back; it would certainly be interesting to dive deeper into this at some point.

In the evening, some of us went to a pub, drinking beer and playing pool; this was followed by a sumptuous BBQ dinner. At some point (triggered by some random conversation), it occurred to me that there may be a relationship between dynamic software updating and concurrency control. Something worth exploring.

POPL (Day 2)

This is the second of a series of posts based on experiences at POPL (Principles of Programming Languages), 2011, in Austin TX. (Here’s the first.)

I’d stayed up most of the night to prepare my talk, so I skipped the first session of the day to catch up on some sleep. The next session was on separation logic, which had some cool talks: on a separation logic that allows compositional reasoning about atomicity for programs that manipulate concurrent data structures (by folks at Northeastern University); and on modular reasoning for deterministic parallelism (by folks at Cambridge).

Lunch followed, with some pleasant conversations with Stephen Freund (Williams College), Todd Millstein (UC Los Angeles), and Sorin Lerner (UC San Diego).

I sat through the next session fine-tuning my slides, and in the process managing to miss some interesting talks which I have to catch later: on amortized resource analysis; and on symmetric lenses.

The final session of the day was on types (with the final talk being mine). First up was a talk on dynamic multirole session types; Phil Wadler pointed out in his follow-up that an implementation of this system may not allow as must distribution as claimed. Next up was a talk on practical affine types, which showed how to infer some of the associated proof constructs via subtyping (similar to how one moves from sum and product types to union and intersection types). Phil Wadler stressed the importance of syntax in making something “practical”; in particular, he wondered whether the annotations on arrows that affine rules require were indeed practical. On the surface, this seems to be a trivial complaint; but there is definitely something to it, something worth remembering.

Finally, I talked about dynamic type inference of static types. A lot of anticipation had already built up around this talk; we had got top reviews for this paper, and apparently some other groups had been independently trying to solve this problem. The key idea of our paper is to infer types by running tests and logging subtyping constraints along observed paths; what makes it interesting is that we can prove a soundness theorem that is qualified by the coverage of those tests, and we are able to show that the coverage needs to be only linear in the number of functions in a program (although it may be exponential in the maximum number of paths in a function). Such a dynamic analysis technique is especially useful for dynamic languages, where (in our experience and of several others) static analysis is too imprecise. The cool thing about our technique is that while we are precise, we are not “too precise”: our analysis can extract enough abstract information to let us generalize types and predict type errors.

Fortunately, the talk generated a huge amount of interest. Mitch Wand and Matthias Felleisen (Northeastern University) followed up, and I was kept busy long after the talk by others. Sam Tobin-Hochstadt (Northeastern University) and Madan Musuvathi expressed interest in the implementation (which involves wrapping); Sam pointed out the “values-as-proxies” ECMAScript proposal for comparison, and Madan was on the lookout for similar tricks for his concurrency  work. Fritz Henglein (University of Copenhagen) pointed out that some of his work from the 90s may be relevant, where he had done the “converse”: static inference of dynamic types. Dave Herman (Mozilla) and I chatted at length on our similar roles at different companies, and discussed the importance of collaborating in the future.

The day ended with Mooly Sagiv (Tel Aviv University) describing the workings of the POPL 2011 PC (which everybody agreed had done a remarkable job), and Phil Wadler presenting the most influential paper award (from POPL 2001) to Samin Ishtiaq and Peter O’Hearn for their work on bunched implications as an assertion language for mutable data structures (a.k.a. separation logic). Peter O’Hearn, in his acceptance speech, recalled that this paper was ready almost half a year before the POPL deadline, but they decided to wait because the best work in programming languages deserves to go to POPL.

POPL (Day 1)

This is the first of a series of posts based on experiences at POPL (Principles of Programming Languages), 2011, in Austin TX.

Tom Ball (MSR Redmond) launched proceedings with his usual flair and some cool slides on POPL 2011 records.

The first session of the day was an invited talk by Xavier Leroy (INRIA Rocquencourt). I’ve known Xavier by face and reputation since the summer of 2002, when I was an intern in the Cristal group (working with Gérard Huet). Xavier talked about how his and others’ work on certified compilers over the years has ensured that the verification of realistic compilers is now well within reach. The next frontier is certified static analyzers, on which he provided some hope of progress. Finally, he called for more verification work around floating point (FP); of particular interest was a list of unfounded compiler optimizations around FP, which Alex Aiken (Stanford) extended in his follow-up with the curious fact that FP addition is non-commutative in C. Overall, a thoroughly educational talk, littered with memorable quotes:

“Old pots make tasty soups.”
“It makes me nervous to fly an airplane since I know they are designed using floating-point arithmetic.”

The next session was on memory models, which featured two excellent talks (by folks at University of Cambridge): on certified compilation of C under an exposed x86 memory model; and on semantics and proofs for C++ concurrency.

I skipped the next session; part of the typical POPL experience is being drawn into extended conversations with old and new friends on “stuff.” I was particularly surprised to find my undergraduate advisor on the loose.

The final session of the day was on types. Phil Wadler (University of Edinburgh) gave a hugely entertaining talk (in his inimitable style) on blames for the polymorphic lambda calculus. Dimitrios Vytiniotis (MSR Cambridge) showed a way of resolving the tension between generative type abstraction and type-level computation; Karl Crary (CMU) in his follow-up seemed to suggest that there might be a cleaner way.

In the evening, Tom Ball and co. did live jazz, much beer and scotch flowed, and the night ended with an interesting dinner: amidst some very lively conversations with Samin Ishtiaq (MSR Cambridge), Ranjit Jhala (UC San Diego), Cormac Flanagan (UC Santa Cruz), Madan Musuvathi (MSR Redmond), Eran Yahav (IBM Research NY and Technion Institute of Technology Haifa), and many others.

A round of introductions

We have some exceptionally talented people working in our Languages and Runtimes group.

Basil Hosmer (language design, type systems, concurrency), a freakishly clever veteran behind several products at the company (previously at Macromedia).

Adam Welc (concurrency, parallelism), a prolific author on transactional memory (previously at Intel Labs).

Simon Wilkinson (VM, concurrency, parallelism), a dexterous hacker of runtimes (previously at the University of Manchester).

Mathew Zaleski (JIT, interpreter), a seasoned expert on dynamic compilation (previously at IBM and the University of Toronto).

And finally, Bernd Mathiske (VM, language design), our manager and the dude who hired us all (previously at Sun Labs).

Furthermore, we collaborate with several people in the ActionScript Engineering team, including Lars Hansen, Jeff Dyer, and Edwin Smith.

In particular, Basil Hosmer, Lars Hansen, Jeff Dyer, and I together are the ActionScript Language Design Team, which means that we are responsible for the definition of the language and the design of future versions of the language.

Setting the context

Our aim is to evolve ActionScript into a safe and efficient programming language.

There’s a lot to do.

ActionScript Language Specification
(Because you can’t improve something you don’t understand.)
ActionScript is the core scripting language in which Flash applications are developed. While its roots lie in JavaScript, there are several features in ActionScript that go beyond JavaScript (e.g., class-based inheritance, namespaces, strict-mode compilation, application domains). Unfortunately, the only “specification” of ActionScript has been its implementation, which means that some aspects of the language remain arcane, and any bugs in the implementation cause backward-compatibility problems. We are working on a rigorous definitive specification of the syntax and semantics of ActionScript programs.

Gradual Type Inference for ActionScript
(Because if you have something, it should really count.)
ActionScript enforces a combination of static and dynamic checks on programs, but it is unclear what such checks guarantee. Furthermore, there is no type inference in ActionScript. We are working towards a new gradual type system for ActionScript with the following main design intentions: (1) reconciling type inference with static types (annotations) and the dynamic type (*) via the notion of “unsafe” code regions; (2) providing meanings to types via a combination of static and dynamic checks that provably restrict runtime errors to unsafe code regions; and (3) optimizing code by eliminating dynamic checks wherever they are subsumed by static checks.