Inside CDL

ARK (Archival Resource Key)

An ARK is a URL created to allow persistent, long-term access to information objects. ARKs can identify objects of any type: digital documents, databases, images, software, and websites, as well as physical objects (books, bones, statues, etc.) and even intangible objects (chemicals, diseases, vocabulary terms, performances).

ARKs support persistent identification, which is necessary and useful because both the protocols used to access objects (such as http and ftp) and the sites that host the objects are subject to change. An ARK contains parts that are impervious to such changes and parts that are flexible enough to support technological changes/improvements. The idea is to create a stable "name" or reference that can be permanently associated with that specific object.

ARK Anatomy

An ARK is represented by a sequence of characters that contains the label, "ark:", optionally preceded by the protocol name ("http://") and hostname that begins every URL. That first part of the URL, or the "Name Mapping Authority" (NMA), is mutable and replaceable, as neither the web server itself nor the current web protocols are expected to last longer than the identified objects. The immutable, globally unique identifier follows the "ark:" label. This includes a "Name Assigning Authority Number" (NAAN) identifying the naming organization, followed by the name that it assigns to the object.

Here is a diagrammed example:

         http://example.org/ark:/13030/654xz321/s3/f8.05v.tiff
         \________________/ \__/ \___/ \______/ \____________/
           (replaceable)     |     |      |       Qualifier
                |       ARK Label  |      |    (NMA-supported)
                |                  |      |
   Name Mapping Authority (NMA)    |    Name (NAA-assigned)
                                   |
                        Name Assigning Authority Number (NAAN)

The ARK syntax can be summarized,

    [http://NMA/]ark:/NAAN/Name[Qualifier]
The NMA part, which makes the ARK actionable (clickable in a web browser), is in brackets to indicate that it is optional and replaceable. ARKs are intended to work with objects that last longer than the organizations that provide services for them, so when the provider changes it should not affect the object's identity. A different provider hosting the object would simply replace the NMA to reflect the new "home" of the object. For example,
     http://bnf.fr/ark:/13030/tf5p30086k
might become
http://portico.org/ark:/13030/tf5p30086k

Note that the ark:/NAAN/Name remains the same.

The NAAN part, following the "ark:" label, uniquely identifies the organization that assigned the Name part. Often the initial access provider (the first NMA) coincides with the original namer (represented by the NAAN), however, access may be provided by one or more different entities instead of or in addition to the original naming authority. The NAAN used above, 13030, represents the California Digital Library.

A sampling of other NAANs registered for ARK assignment includes

12025    National Library of Medicine
13030    California Digital Library
13960    Internet Archive
27927    Portico/Ithaka Electronic-Archiving Initiative
12148    Bibliothèque National de France
78319    Google
64269    Digital Curation Centre
67531    University of North Texas
62624    New York University
15230    Rutgers University
88435    Princeton University
61001    University of Chicago
78428    University of Washington
13038    World Intellectual Property Organization
80444    Northwest Digital Archives
25593    Emory University
25031    University of Kansas
17101    Centre for Ecology & Hydrology, UK
65323    University of Calgary
61001    University of Chicago
52327    Bibliothèque et Archives Nationales du Québec
39331    National Library of Hungary
26677    Library and Archives Canada
20775    University of California San Diego
29114    University of California San Francisco
28722    University of California Berkeley
21198    University of California Los Angeles

Generating ARKs

Any institution may obtain a NAAN and begin assigning ARKs. Because long-term identifiers often look like random strings of letters and digits, institutions typically generate (or mint in ARK parlance) and track identifiers with software. To mint ARKs, you may use any software that can produce identifiers conforming to the ARK specification. CDL uses the open-source "noid" (nice opaque identifiers, rhymes with "employed") software, which creates minters and accepts commands that operate them. The noid software documentation explains how to use noid not only to mint identifiers but also to serve as an institution's "identifier resolver".

Once minted and publicized as being associated with a specific object, the ARK becomes a stable, unique, and compact reference that can be included in metadata records, databases, redirection tables, etc. It is often useful to generate and assign ARKs well before institutional commitment has been decided because it is easier than changing the original object identifier that may have been in long established use prior to that decision.

Please contact the CDL if you are interested in generating and using ARKs for your information objects.

ARKs in Action

An ARK provides extra services above and beyond that of an ordinary URL. Instead of connecting to one thing, an ARK should connect to three things:

  • the object itself,
  • a brief metadata record if you append a single question mark to the ARK, and
  • a maintenance commitment from the current server when you append two question marks.
In a web browser, for example, if you enter
http://ark.cdlib.org/ark:/13030/tf5p30086k?
it returns a brief machine- and eye-readable metadata record, such as
erc:
who:   (:unav) unavailable
what:  Truckee River, below Truckee Station, looking towards Eastern
        Summit. -- Photographer's number: 222 -- Photographer's series:
        Central Pacific Railroad, California.
when:  (:unav) unavailable
where: http://ark.cdlib.org/ark:/13030/tf5p30086k
It is a side-benefit of ARKs that an object's metadata doesn't need an identifier different from that for the object, which cuts in half the number of identifiers that need to be generated and managed.

Assignment and Support Policy Statements

The CDL assigns identifiers within the ARK domain under the NAAN 13030 and according to the following principles:

  • No ARK shall be re-assigned; that is, once an ARK-to-object association has been made public, that association shall be considered unique into the indefinite future.
  • To help them age and travel well, the Name part of CDL-assigned ARKs shall contain no widely recognizable semantic information (to the extent possible).
  • CDL-assigned ARKs shall be generated with a terminal check character that guarantees them against single character errors and transposition errors.

Institutions that generate ARKS may want to follow similar principles or develop their own assignment policies.

Similarly, but in the role of an NMA and not an NAA, institutions will want to develop service commitment statements for their objects.

In developing such statements, it is useful to recognize first, that managing a digital object may require altering it as appropriate to ensure its stability, and second, that the declared level of commitment may change as the requirements and policies for persistence become better understood over time, and as the institution implements procedures and guidelines for maintaining the objects that it manages.

References

  1. Towards Electronic Persistence Using ARK Identifiers: ARK motivation and overview. July 2003. [PDF]
  2. The ARK Persistent Identifier Scheme: the complete ARK specification. [HTML] [TXT]
  3. Noid (Nice Opaque Identifier) Minting and Binding Tool: overview and technical specification [PDF], and latest noid software release (download).