Author Archive: Larry Masinter

Forking Standards and Document Licensing

There’s been quite a bit of controversy in the web standards community over the copyright licensing terms of standards specifications, and whether those terms should allow “forking”: allowing anyone to create their own specification, using the words of the original, without notice or getting explicit permission. Luis Villa, David Baron, Robin Berjon have written eloquently about this topic.

While a variety of arguments in favor of open licensing of documents have been made, what seems to be missing is a clear separation of the goals and methods of accomplishing those goals.

Developing a Policy on Forking

While some kinds of “allowing forking” are healthy; some are harmful. The “right to fork” may indeed constitute a safeguard against standards groups going awry, just as it does for open source software. The case for using the market to decide rather than arguing in committee is strong. Forking to define something new and better or different is tolerable, because the market can decide between competing standards. However, there are two primary negative consequences of forking that we need to guard against:

  1. Unnecessary proliferation of standards (“The nice thing about standards is there are so many to choose from”). That is, when someone is designing a system, if there are several ways to implement something, the question becomes which one to use? If different component or subsystem designers choose different standards, then it’s harder to put together new systems that combine them. (For example, it is a problem that Russia’s train tracks are a different size than European train tracks.) Admittedly, it is hard to decide which forks are “necessary”.
  2. Confusion over which fork of the standard is intended. Forking where the new specification is called the same thing and/or uses the same code extension points without differentiation is harmful, because it increases the risk of incompatibility. A “standard” provides a standard definition of a term, and when there is a fork which doesn’t rename or recode, there can be two or more competing definitions for the same term. This situation comes with more difficulties, because the designers of one subsystem might have started with one standard and the designers of another subsystem with another, and think the two subsystems will be compatible, when in fact they are just called the same thing.

The arguments in favor of forking concentrate on asserting that allowing for (1) is a necessary evil, and that the market will correct by choosing one standard over another. However, little has been done to address (2). There are two kinds of confusion:

  1.  humans: when acquiring or building a module to work with others, people use standards as the description of the interfaces that module needs. If there are two versions of the same specification, they might not know which one was meant.
  2. automation: many interfaces use look-up tables and extension points. If an interface is forked, the same identifier can’t be used for indicating different protocols.

The property of “standard” is not inheritable; any derivative work of a standard must go through the process standardization itself to be called a Standard.

Encouraging wide citation of forking policy

The extended discussions over copyright and document license in W3C seems somewhat misdirected. Copyright by itself is a weak tool for preventing any unwanted behavior, but especially since standards group are rarely in a position to enforce copyright claims.

While some might consider trademark and patent rights as other means of discouraging (harmful) forking, these “rights” mechanisms were not designed to solve the forking problem for standards.  More practically, “enforcement” of appropriate behavior will depend primarily on community action to accept or reject implementors who don’t play nice according to expected norms. At the same time, we need to make sure the trailblazers are not at risk.

Copyright can be used to help establish expected norms

To make this work, it is important to work toward a community consensus on what constitutes acceptable and unacceptable forking, and publish it; for example, a W3C Recommendation “Forking W3C Specifications” might include some of the points raised above. Even when standards specifications are made available with a license that allows forking (e.g. the Creative Commons CC-by license), the license statement could also be accompanied by a notice that pointed to the policy on forking.

Of course this wouldn’t legally prevent individuals and groups from making forks, but hopefully would discourage harmful misuse, while still encouraging innovation.
Dave McAllister, Director of Open Source
Larry Masinter, Principal Scientist

Updating HTTP

Much of the excitement about advancing the Web has been around HTML5 (the fifth version of the HyperText Markup Language) and its associated specifications; these describe appearance and interactive behavior of the Web.

The HyperText Transfer Protocol (HTTP) is the network protocol used to manage the transfer of HTML and other content, as well as applications that use HTTP. There has been significant progress in updating the HTTP standard.

The third edition of  HTTP/1.1  is nearing completion by the HTTPbis working group of the IETF. This edition is a major rewrite of the specification to fix errors, clarify ambiguities, document compatibility requirements, and prepare for future standards. It represents years of editing and consensus building by Adobe Senior Principal Scientist Roy T. Fielding, along with fellow editors Julian Reschke, Yves Lafon, and Mark Nottingham. The six proposed RFCs define the protocol’s major orthogonal features in separate documents, thereby improving its readability and focus for specific implementations and forming the basis for the next step in HTTP evolution.

Now with HTTP/1.1 almost behind us, the Web community has started work on HTTP/2.0 in earnest, with a focus on improved performance, interactivity, and use of network resources.

Progress on HTTP/2.0 has been rapid; a recent HTTPbis meeting at Adobe offices in Hamburg made significant advancements on the compression method and interoperability testing of early implementations. For more details, I’ve written on why HTTP/2.0 is needed, as well as sounding some HTTP/2.0 concerns.

Larry Masinter
Principal Scientist

Linking and the Law

This article contains some thoughts based on the article “Publishing and Linking on the Web” co-authored with Jeni Tennison and Dan Appelquist for the W3C TAG.

If you type a Web address into your browser you will most likely be taken to a Web page consisting of text and images. Sometimes you may be taken to a game where you can pretend to be a race car driver or throw stones at pigs but in most cases, you will get a Web page. From the information on the page you may be able to access related material by simply clicking. This capability is what makes the Web the Web.

If you are creating a Web page you can use material from other sources in different ways. You can provide a link to the material or you can embed it – include or transclude - within your material. These two ways of using material that is not authored by you are quite different and treated differently by courts.

Here is a page from Wikipedia that includes the picture of a whale from another web site:
Blue whale cropped large

 

The above page is from http://en.wikipedia.org/wiki/Blue_whale and if you click on the image in Wikipedia it tells you where the image came from and that it is in the public domain “because it contains materials that originally came from the U.S. National Oceanic and Atmospheric Administration, taken or made as part of an employee’s official duties.”

With embedding you see the embedded content on the page. Linking, on the other hand requires a user action. The link requires the user to click on it to be taken to another Web page. There are advantages to inclusion vs. linking. If you include material, that material is not going to change out from under you, whereas material at the end of a link may change. In the worst case, it could be replaced by malware.

In recent years there has been a rash of legal cases relating to linking and embedding. There was, for example, the case of a student who resides in the UK and was facing possible extradition to the United States for posting links on a Web site, which itself is not US-based and is not primarily intended for US users, to material that the US considers to be copyrighted. This case above also raises the question of jurisdiction (more on that later).

The general principle at play here is the notion of agency. If you link to something, you’re less responsible for it being available than if you include it; and if you transclude something, you’re less responsible then if you include it (transclude a copy you made). Most of the questions are whether you’re responsible for making information available that people don’t want to have shared (bomb making, pornography, copyright infringement). If you do decide to embed, the material should be attributed and, unless it is a brief quote, requires permission; otherwise, you may be held responsible for copyright violation.

Linking, is generally allowed — the argument has been made in several places that restricting linking is like interfering with free speech. The idea being that a hyperlink is nothing more than a reference or footnote, and that the ability to refer to a document is a fundamental right of free speech. There have been a few cases in the U.S. that have implied that the act of linking without permission is legally actionable, but these have been overturned.

Still, you need to be careful. The words accompanying a link can express an opinion — for example the HTML code

 

<pre>Joe’s Bar has &lt ;a  href=”http://joes.bar/menu.html” &gt; great  food&lt;/a&gt;</pre>

 

links “great food” to the bar’s menu — but some opinions may be construed as defamatory or libellous.

 

And then again, the material you link to may be so inflammatory that even minor responsibility might be risky; it’s best not to link to Nazi propaganda, child pornography or “How to Make a Bomb”. Web media has been very effective in political campaigns, but if you link to political material it may be judged to be seditious by some governments, and you may be held responsible.

 

Restricting Links

Even though linking, in general, does not violate copyright, some sites may want to restrict linking to all or part of their content.

This Digital Reader article ridicules an Irish newspaper for trying to charge for merely giving directions on where to find information but the request for payment is understandable. If you are a newspaper that invests in creating original content you would like to monetize your investment. The New York Times now allows a certain number of links per month. The Wall Street Journal requires you to subscribe. Other news media have similar policies. So, a link may tell you where to find a book but the library may charge a fee or be accessible only via membership.

Incidentally, links to The Digital Reader where the policy by the Irish newspaper was reported have ceased to work. Thus, while a link may not violate copyright, publishers have the right to restrict linking and may impose a number of conditions such as pay barriers or age verification that must be satisfied before a link is followed.

 

Restricting Deep Linking

Many web sites restrict deep-linking, i.e. links to pages other than the top page, because this allows links to bypass advertising or the legal Terms and Conditions or because a deep link may leave the source of the material unclear. Often, legal Terms and Conditions are used to restrict deep linking but not only are such terms difficult to enforce, there are simple technical mechanisms that are more effective.

 

Jurisdiction

The World Wide Web is truly an international phenomenon and as we have discussed, linking has been compared to freedom of speech. But there are limits to freedom of speech and, as we discuss above, some uses of external material may lead to legal action. If I live in the U.S. and host a web site in a Scandinavian country that has links to offensive material, where could I be prosecuted? If I host a website in a country that does not have a bilateral copyright agreement with the U.S. and the website includes swaths of U.S. copyrighted material, can I be prosecuted? If so, where? In the case of certain kinds of international disputes, there are agreements that such disputes will be settled by mediation or arbitration. Perhaps, we need to formalize a similar capability for the Web.

 

Linking to material that did not originate with you is an essential feature of the Web and one that gives it much of its power. In general, linking to other material, as opposed to inclusion or transclusion, is safe and carries little risk but, as we explain above, it still pays to be careful.

Ashok Malhotra
Standards Professional, Oracle

Larry Masinter
Principal Scientist Adobe

 

Testing: The Third Pillar of Standards

Recently, a series of “Test the Web Forward” events have been scheduled to promote getting the community involved in building test cases for important Web standards. A few months ago, I participated in “Test the Web Forward/Paris” in Paris.  The next “Test the Web Forward/Sydney” event is scheduled for February 8th and 9th in Sydney, Australia. These events, held in various cities around the world, are open to everyone who is passionate about Web standards, and bring together developers and standards experts.

Why is testing important? When we think about “standards,” we usually think about the two initial components: (1) specifications — written descriptions about how the standards work, and (2) implementations — software that implements the specification. A suite of test cases becomes an essential link between specifications and implementations.

When it comes to standards and standardization, what people care about is compatibility — the ability to use components from multiple sources with the expectation that those components will work together. This connection is there for all kinds of software standards, whether Application Program Interfaces (APIs), rules for communicating over the network (protocols), computer languages, or smaller component pieces (protocol elements) used by any of those.

On the Web, the APIs are frequently JavaScript, the protocol is often HTTP, and the languages include HTML, CSS, and JavaScript. URLs, host names and encoding labels and MIME types are protocol elements.

The “Create the Web” tour demonstrated the relationship between specification and implementation. “Test the Web Forward” brings in test cases to ensure that the promise of compatibility isn’t empty. Building the global information infrastructure requires a focus not only on new developments, but on compatibility, reliability, performance, and security. The challenge of testing is that the technology is complex, the specifications are new, and the testing needed is extensive.

I encourage everyone who is passionate about the Web and Web standards to attend the “Test the Web Forward” event in Sydney or other related events. Get involved and help make the Web a more interoperable place.

Larry Masinter
Principal Scientist

Governance and standards: Publishing and Linking on the Web

Governance is the process by which society defines expectations, grants power, or verifies performance, through laws, regulations, or other means. Societies govern communication, for example, to support copyright, privacy, or to help manage defamatory or illegal material. As the Internet becomes increasingly central to the way people communicate, it is also increasingly subject to governance.

Unfortunately, a number of problems commonly arise when dealing with governance of the Internet.

Regulations often don’t match the technology. Ordinarily, we use analogies to talk about technology; for example, we talk about “publishing a page”, but the actual process of putting up a web page is very different from physical publishing by making and distributing printed paper. So a rule “It’s okay to read this page, but you can’t make a copy of it” doesn’t acknowledge that, in order to read a page, the bits that make the page must necessarily be copied to the reader’s computer.

Different goals conflict. Law enforcement might require that a site owner keep records of everyone who posts information, in order to be able to track down those who post illegal or defamatory material, while, at the same time, privacy regulation might insist that the same site owner not keep records.

The internet is global, but governance is local. The jurisdiction of law, regulation and social values are geographically based, but the Internet has no simple boundaries. Yet values, regulation, laws from different jurisdictions are inconsistent, and often conflicting. Is it possible to conform to the norms of everyone from a single web site?

Technology standards can help reduce some of the difficulties by providing appropriate terminology, guidance and standards. For example, W3C standards for accessibility have helped reduce some of the unnecessary variability between accessibility guidelines in various countries. In another example, many countries have created regulations and laws that reference common standards for digital signatures of documents, which in turn helps extend the applications that can be supported by electronic communication.

Recently, as a member of the Technical Architecture Group of the World Wide Web Consortium, I’ve been helping produce a First Public Working Draft of a new document called Publishing and Linking on the Web.

This is the first step of getting community consensus on the document and any recommendations. Your thoughts are welcome! Please review the document, share it, discuss it, make comments. Only by discussion can we develop a a common understanding of the alignment of technology and values, and help standards groups, policy makers, and those building new Internet content and services.

 

 

 

Internationalized Resource Identifiers: Standards Progress

The idea of a Uniform Resource Locator (URL) is a key Web innovation: the “hyper” of hypertext. URLs function as a combined locator (how to find it) and identifier (how to name it) for reference to other Internet resources within documents (using hypertext, such as the HyperText Markup Language [HTML]), email, and a variety of other Internet protocols (e.g., the HyperText Transfer Protocol [HTTP]).
 
URLs were designed to be portable and easily transcribed at a time when most computers had very limited support for character sets. As a result, the allowed characters for a URL is limited to a subset of safe characters that are always available, much like identifiers in most programming languages: the ASCII letters, digits, and a few punctuation characters.  However, unlike programming languages, URLs are frequently made visible to users. Web users see and type URLs, and it is common for people to use URLs in advertising, written communication, and spoken announcements.
 
Since most of the world uses languages which are written with characters not allowed in URLs, there has been considerable interest in development of a kind of URL which allows the use of other (“non-ASCII”) characters drawn from Unicode — the standard for representing characters for the world’s languages. This new identifier is called an Internationalized Resource Identifier (IRI); it overlaps the existing URL syntax, based on the idea that some systems might still be URL-only while others might allow IRIs.
 
This was pretty good in theory, but in practice there have been a number of problems: For example, having multiple ways of writing the same identifier can cause security and reliability problems if implementations aren’t uniform. The standard, rather than converging, has undergone some pressure and divergence because of the wide variety of implementations.
 
Work continues to try to bring the concerned implementors together to work out the details and ensure that there is a single standard for IRIs in browsers, email, HTML, plain text, and other contexts. Specifications are developed in the World Wide Web consortium (W3C) and the Internet Engineering Task force. Adobe’s Larry Masinter and Roy Fielding continue to work on the related standards as editors, specification authors and reviewers.

As with most standards, the overall concept is simple; it’s the details that are difficult given that any changes to the core addressing standards for the Web have significant implications for security, reliability, and compatibility with existing deployed systems.