Beginning in 2009, the Publishing Group has collaborated with the digital scholarly archive JSTOR, using their Data for Research Tool (DfR) to harvest scholarly content and train our automatic system to assign disciplinary terms appropriately to our publications within an academic disciplines taxonomy.
The Publishing Group had the challenge of classifying almost 30,000 legacy items during eScholarship’s fall 2009 redesign. Using the KEA tool, the group was able to automatically classify all of this content with one or more terms from a controlled vocabulary of academic disciplines.
In an effort to address the limitations of a manually created and therefore small training and testing corpus of 800 articles, the Publishing Group entered a collaboration with JSTOR, using its DfR tool to harvest scholarly content that has been associated with key terms and JSTOR’s own discipline categories, which closely map to those of eScholarship. Using DfR, the Publishing Group downloaded up to 10 articles per category for training and testing purposes, which allowed for more accuracy and greater confidence in the automatic assignment of terms.
Links for more information: