Inside CDL

Copyright Data Elements for the CDL

Copyright and Digital Materials

One of the chief characteristics of digital works is the ease with which they can be re-used either in whole or in part. Reuse rights are determined based on the copyright status of the item as well as the nature of the intended use. It is incumbent on the users of materials to assess the copyright status of the item and to understand whether use is permitted by law or if it may be necessary to acquire permission. To make this copyright assessment, users need to have certain information about the work. This information may be inherent in the context of the archive or collection in which the item resides, but the digitization of works often removes that context, and connections to the original connection may weaken over time. It is necessary, therefore, that information that allows the user to make the best assessment of copyright status be bundled with the digital item. However, the recording of copyright-related information is not part of the tradition of library cataloging.

It is a matter of service for the digital library to provide the user with all available information relating to the copyright status of the work. We say “available” information because the library or archive itself may not have detailed information relating to the rights. However, clarity about what is not known about the item can be informative as well. If the most authoritative source of this information, the archive, is lacking key rights-related information, the user can determine if further research is required to fulfill the “due diligence” that the law requires.

The Data Elements of Copyright

The copyright status of an item is relative to the time of its creation, and therefore will change as time passes. This means that what is needed for digital materials is not an assessment of current copyright status but a recording of the data elements that will be needed to make that assessment at any future date.

In the traditional publishing world, the only recorded data element relating to copyright is the copyright statement itself, which states the date that copyright protection begins and the name of the copyright holder. That copyright statement is not required on works published after 1976, and all works since that time that are in a fixed format are covered by copyright law. Unpublished and ephemeral works from all eras are usually unmarked as to copyright status. For these works, the information about the copyright status is evidenced in other data elements, such as the date of creation or publication, the name of the author, or the country of publication. These and other data elements are referenced in copyright law as criteria relating to copyright status, but there is no distinct list of data elements in the law itself.

There are two sources of data elements that pertain to the copyright of works and that are related to the copyright law: the first source is the Copyright Office itself, especially the forms used to register works with that office [1]; the second source is the algorithm devised by some practitioners for determination if an item is in the public domain [2]. This latter includes key decision points for determining copyright status, such as whether the item is published or unpublished. Neither of these can be considered a complete reflection of the copyright law, but we should not expect to be able to match the law in all of its complexity, especially in terms of the many exceptions that it contains. In addition, we want to record where information about the item is unknown, because this will help users determine their own next steps relating to further research into the status of an item.

Combining all of these sources of data elements, we have a tree structure of data elements that looks like:

  • Unpublished Work
    • Date Created
      • Exact (known)
      • Approximate (unknown, but can be placed within a range or time frame)
      • Unknown
    • Author(s)
      • Personal name
        • Death date
          • Exact (known)
          • Approximate (unknown, but can be placed within a range or time frame)
          • Unknown
      • Corporate name
      • Anonymous work
      • Unknown
    • Copyright owner
      • Author
      • Other (name)
      • Unknown, but possible contact provided
      • Unknown
  • Published Work
    • Publisher name
    • Date published
      • Exact (known)
      • Approximate (unknown, but can be placed within a range or time frame)
      • Unknown
    • Date Renewed
      • Exact (known)
      • Approximate (unknown, but can be placed within a range or time frame
      • Unknown
    • Country or countries of publication
    • Copyright notice (from the work)
      • Notice as it appears on work
      • No notice on work
      • Unknown if there is a notice
    • Author(s)
      • Personal name
        • Death date
          • Exact (known)
          • Approximate (unknown, but can be placed within a range or time frame)
          • Unknown
      • Corporate name
      • Anonymous work
      • Unknown

 

There are two sources of information about a digital item: the item itself (or its original, in the case of material digitized by the library), and the result of research. Research could include bibliographic resources, the files of the copyright office, contact with the author or the institution, etc. It will be useful for a user to know the extent of the research into the rights, especially if many metadata elements are listed as "unknown." Therefore we need two more elements:

  • Information taken from the piece only (and therefore information listed as unknown was not on the piece itself).
  • Research undertaken (with a brief description of what steps were taken and sources consulted).

The data elements for copyright are a major part of the service component for digital works, but they also need to be accompanied by contact information for users who need to carry their research into rights further. This contact information may be the archive that holds the original, and therefore may be a logical place to conduct additional research; it may be the institution where the initial rights assessment was done, and therefore the source of information about how much research has already gone into the determination of rights; or it could be the actual rights holder, who could be solicited for permission for reuse. This therefore adds another data element (or set of data elements, depending on how it is defined in a metadata format) to our list:

  • Contact information

This list of data elements does not indicate how these would be reflected in metadata, which needs to be the result of additional work. That work has to include an analysis of the interaction with descriptive cataloging, such as author, date of publication, publisher. Although it seems that some of the elements are redundant, there are places where the elements that are necessary for copyright are different from those of description, such as when the copyright holder is not the same as the author, so some of these data elements may need to be recorded both as description and as copyright information.

There also needs to be an analysis of how these data elements can be coordinated with the digital library work flow so that the information can be captured and recorded at the appropriate step in that flow, with particular attention to making this as efficient as possible.

Conclusion

In most digital library projects it is not reasonable to expect to perform research on all or even most of the items being processed. Copyright information will often be incomplete. But it is essential to begin to record all known information at the time each item enters the digital library, because recovering that information later in time, if even possible, will be very expensive. This is an expense that we must not pass along to our users, in part because the lack of information could result in materials not being used for fear of copyright violation. And that is in direct opposition to the goals of the digital library.


[1] The forms are linked off of the Copyright Office page.

[2] The data elements here were derived mainly from Peter Hirtle's Copyright Term and the Public Domain in the United States and Mary Minow's Library Digitization Projects and Copyright. Other good sources are: Laura Gassaway When Works Pass into the Public Domain, and the Colorado Digital Heritage Project's copyright resources.

Contact the CDL