Dublin Core archaeology

DCMI logoTriggered by a question from Stefane Fermigier, I did some digging in the archives of the Dublin Core Metadata Initiative (DCMI).

The question was why the Dublin Core terms for different dates use different naming pattern. For example, the date when a document was created is expressed simply with created, but the dates when one was submitted for review or accepted use an apparent variant of the hungarian notation, i.e. dateSubmitted or dateAccepted.

It turns out that the original set of Dublin Core metadata elements from 1998 only contained a single generic date element that was vaguely defined as follows:

  • date – A date associated with the creation or availability of the resource.

Then a few years later in 2000 the standard group introduced the concept of “qualifiers” for “refining” the definition of a more generic element. The following date qualifiers were specified:

  • created – Date of creation of the resource.
  • valid – Date (often a range) of validity of a resource.
  • available – Date (often a range) that the resource will become or did become available.
  • issued – Date of formal issuance (e.g., publication) of the resource.
  • modified – Date on which the resource was changed.

Another two years down the line, in 2002, the standards group adopts a nice change and status tracking mechanism and declares to “all legacy Elements and Element Refinements the status of Recommended”. That makes tracking any further changes pretty easy.

For example, later that year two new date refinements were proposed as submitted and accepted. However, when accepting those proposals, the standards group adjusted the terms to dateSubmitted and dateAccepted to produce the following two definitions:

In the end the question of why exactly a different naming pattern was used remains unclear, but the record shows that the decision to change the names was explicitly made by the standards group ten years ago. Someone closer to the DCMI group might still remember the details of that decision.

JAAS authentication and OSGi

I was looking at how to best do JAAS-based authentication in an OSGi environment, but didn’t really find much useful material, so I’m sharing my findings here in the hope that others will jump in and add anything I may have missed.

Basically what I want to achieve is being able to run the following code unmodified in an OSGi bundle, and have the login() call access the set of JAAS authentication services that are currently available in the OSGi environment. I should be able to deploy and undeploy such authentication services without any changes to this code or the configuration of the containing bundle.

LoginContext context = new LoginContext(...);
context.login();
try {
    ...; // do something
} finally {
    context.logout();
}

So far the best thing I’ve found is the JAAS support that Guillaume Nodet described a few years ago. If I understand correctly, the relevant code lives in Apache Karaf nowadays, even though also Apache Felix mentions it and Guillaume’s original post refers to Apache ServiceMix. I’ve given up hope trying to identify which Maven dependency I should use to get this code.

However, the trouble I see with the ProxyLoginModule class, that seems like the core piece of glue in the Karaf JAAS support, is that it requires the login() call in the client code to explicitly pass the name of the bundle and the contained LoginModule class that are to be used for authentication. That breaks my expectation of zero code or configuration changes in the client bundle for adding or removing new authentication services. Also, it looks like only a single authentication service can be used at a time.

A more promising solution is described in a presentation that was apparently given by Stefan Vladov in the OSGi Community Event 2011. However, I couldn’t find any references to actual running code that implements that solution.

Please share any relevant pointers or other information in the comments below!

Maven release builds with Jenkins and Git

We have a Jenkins continuous integration server that among other things allows us to run Maven release builds centrally using the M2 release plugin for Jenkins.

This setup worked fine with Subversion, but needed some tweaking after our recent switch to Git and github:enterprise. Here’s what I did to make it work:

  • The release plugin needs write access to the upstream repository, so I had to configure Jenkins to use an ssh key associated with a real account instead of a deploy key that only gives read access.
  • To tie the release commits to the Jenkins server, I configured the global “user.name” and “user.email” settings of the Jenkins account to “Jenkins” and “jenkins@…”.
  • Finally, I hit an “”ref HEAD is not a symbolic ref” error caused by Jenkins by default using a detached HEAD. A quick web search uncovered a solution as described by Stephen Connolly in a related CloudBees user forum thread. The solution was to set the “Checkout/merge to local branch (optional)” option under advanced Git settings on the Jenkins build configuration screen.

With that setup in place, we can again cut new releases with just a single click of the “Schedule Maven Release Build” button. Nice!

Apache Jackrabbit 2.3 is out

Today we announced the release of Apache Jackrabbit 2.3.0. It’s the result of over nine months of development since the Jackrabbit 2.2 release, and contains changes related to over a hundred new features, improvements and bug fixes. See the release notes for the full details.

Before you rush in an upgrade all your production systems to Jackrabbit 2.3, note that this release has explicitly been marked unstable. In fact all 2.3.x releases will be unstable development releases cut directly from trunk. A stable 2.4 maintenance branch will be created in a few months for production-ready releases. See the Jackrabbit roadmap for more details about our new unstable/stable release strategy.

Apache Tika issues over time

Apache Tika is one of the open source projects I actively work on. Tika is just gearing up for a new release, so I wanted to look at where we are in terms of open vs. resolved issues. The report over the entire lifetime of Tika looks pretty nice:

The red line shows the number of new issues created, and the green one the number of resolved issues. There are a few small bumps and plateaus along the way, but the overall trajectories are looking very healthy with plenty of new issues coming in and most of them getting resolved at a good rate. Assuming no big surprises, we’ll be seeing TIKA-1000 filed sometime next year.

Talking about the repository

Last week I was busy presenting Apache Jackrabbit and our commercial work on top of it. It started in Tuesday evening when I presented the Apache Jackrabbit project at the Swiss Open Source Awards ceremony  in Zürich.

Jackrabbit had been nominated for the award in the community category (other categories were business case and youth). Even though the award went to another project, the award ceremony and the included Webtuesday presentation on MongoDB was pretty nice. I had three minutes to present the Apache Jackrabbit project to the audience, so I put together the following short graphical overview to support the presentation.

Next in my schedule was the .adaptTo(Berlin) meetup on Thursday and Friday. The technical meetup was organized by pro!vision GmbH in cooperation with Adobe and focused on the Apache Sling project and related technologies, most notably our CQ5 (now WEM) product built on top of it. The meetup was well organized (thanks, pro!vision!) and I really liked the no-nonsense tone of the event, with plenty of in-depth presentations by people who really knew their stuff. See Gabriel’s photos for a glimpse of the action.

I contributed with two talks about the JCR content repository. The first was about repository performance, a topic that’s not too well covered in the available documentation. I tried to pack quite a bit of information to my presentation that’s shown below.

My second presentation was a bit higher level talk about the changes we’ve been making to the deployment and management architecture of the repository. The core repository still remains the same, but it can now be deployed as an OSGi bundle and managed through JMX, as explained in the presentation below.

 

For more good stuff, see also all the other presentations from the .adaptTo(Berlin) meetup.

Taking control of com.adobe on Maven Central

The com.adobe space on Maven Central has so far been used by various third parties (including myself from before joining Adobe) to make Adobe releases available to Maven clients.

We want to have better control over that space and to make it easier for Adobe projects to publish their releases on Maven Central, which is why I’ve requested a repository that we can use for this. We still need to figure out the internal processes by which new releases can be posted there, but at least the technical bits are already being taken care of.

Checksum: an interface for monitoring streams in Java

Consider a case where you want to monitor the data passing through a stream. Typically you’d subclass the FilterInputStream or FilterOutputStream class for that, but sometimes it would be more convenient to implement an interface instead.

The CheckedInputStream and CheckedOutputStream utility classes in the java.util.zip package can be used for this. They act as stream decorators that send all passing bytes to a given Checksum instance. The Checksum interface is primarily designed for calculating and accessing checksums like is done in the CRC32 and Adler32 implementations, but you could just as well use the interface for other kinds of stream tracking.

I came up with this trick when looking for a way to implement a minimal watchdog timer that can monitor activity in both input and output streams. The code was for a special bootstrap class loader where I needed to minimize the number of separate implementation classes, which is why the Checksum interface and the existing stream decorator classes came in so handy!