An ARK is a URL created to allow persistent, long-term access to information objects. ARKs can identify objects of any type: digital documents, databases, images, software, and websites, as well as physical objects (books, bones, statues, etc.) and even intangible objects (chemicals, diseases, vocabulary terms, performances).
ARKs and other persistent identifiers are necessary and useful because both the protocols used to access objects (such as http and ftp) and the sites that host the objects are subject to change. An ARK contains parts that are impervious to such changes and parts that are flexible enough to support technological changes/improvements. The idea is to create a stable "name" or reference that can be permanently associated with that specific object.
An ARK is represented by a sequence of characters that contains the label, "ark:", optionally preceded by the protocol name ("http://") and hostname that begins every URL. That first part of the URL, or the "Name Mapping Authority" (NMA), is mutable and replaceable, as neither the web server itself nor the current web protocols are expected to last longer than the identified objects. The immutable, globally unique identifier follows the "ark:" label. This includes a "Name Assigning Authority Number" (NAAN) identifying the naming organization, followed by the name that it assigns to the object.
Here is a diagrammed example:
http://foobar.zaf.org/ark:/13030/654xz321/s3/f8.05v.tiff
\___________________/ \__/ \___/ \______/ \____________/
(replaceable) | | | Qualifier
| ARK Label | | (NMA-supported)
| | |
Name Mapping Authority (NMA) | Name (NAA-assigned)
|
Name Assigning Authority Number (NAAN)
The ARK syntax can be summarized,
[http://NMA/]ark:/NAAN/Name[Qualifier]
http://bnf.fr/ark:/13030/tf5p30086k
might become
http://portico.org/ark:/13030/tf5p30086k
Note that the
ark:/NAAN/Name
remains the same.
The NAAN part, following the "ark:" label, uniquely identifies the organization that assigned the Name part. Often the initial access provider (the first NMA) coincides with the original namer (represented by the NAAN), however, access may be provided by one or more different entities instead of or in addition to the original naming authority. The NAAN used above, 13030, represents the California Digital Library.
A sampling of other NAANs registered for ARK assignment includes
12025 National Library of Medicine
13030 California Digital Library
13960 Internet Archive
27927 Portico/Ithaka Electronic-Archiving Initiative
12148 National Library of France
78319 Google
64269 Digital Curation Centre
67531 University of North Texas
62624 New York University
15230 Rutgers University
88435 Princeton University
61001 University of Chicago
78428 University of Washington
13038 World Intellectual Property Organization
20775 University of California San Diego
29114 University of California San Francisco
28722 University of California Berkeley
21198 University of California Los Angeles
Any institution may obtain a NAAN and begin assigning ARKs. Because long-term identifiers often look like random strings of letters and digits, institutions typically generate (or mint in ARK parlance) and track identifiers with software. To mint ARKs, you may use any software that can produce identifiers conforming to the ARK specification. CDL uses the open-source "noid" (nice opaque identifiers, rhymes with "employed") software, which creates minters and accepts commands that operate them. The noid software documentation explains how to use noid not only to mint identifiers but also to serve as an institution's "identifier resolver".
Once minted and publicized as being associated with a specific object, the ARK becomes a stable, unique, and compact reference that can be included in metadata records, databases, redirection tables, etc. It is often useful to generate and assign ARKs well before institutional commitment has been decided because it is easier than changing the original object identifier that may have been in long established use prior to that decision.
Please contact the CDL if you are interested in generating and using ARKs for your information objects.
An ARK provides extra services above and beyond that of an ordinary URL. Instead of connecting to one thing, an ARK should connect to three things:
it returns a brief machine- and eye-readable metadata record, such ashttp://ark.cdlib.org/ark:/13030/tf5p30086k?
erc:
who: (:unav) unavailable
what: Truckee River, below Truckee Station, looking towards Eastern
Summit. -- Photographer's number: 222 -- Photographer's series:
Central Pacific Railroad, California.
when: (:unav) unavailable
where: http://ark.cdlib.org/ark:/13030/tf5p30086k
It is a side-benefit of ARKs that an object's metadata doesn't need an
identifier different from that for the object, which cuts in half the
number of identifiers that need to be generated and managed.
The CDL assigns identifiers within the ARK domain under the NAAN 13030 and according to the following principles:
Institutions that generate ARKS may want to follow similar principles or develop their own assignment policies.
Similarly, but in the role of an NMA and not an NAA, institutions will want to develop service commitment statements for their objects.
In developing such statements, it is useful to recognize first, that managing a digital object may require altering it as appropriate to ensure its stability, and second, that the declared level of commitment may change as the requirements and policies for persistence become better understood over time, and as the institution implements procedures and guidelines for maintaining the objects that it manages.