Keyword in Context Pictures (KWIC Pics)
KWIC Pics is a display technology that provides visual clues to aid in information searches. After a user of the eScholarship site performs a search, a results page is displayed. By hovering the cursor over matching keywords on the results page, the user can view an excerpted image of the actual PDF article page containing the keywords – providing an early glimpse of the document.
From a technical perspective, hovering the cursor over a highlighted keyword activates a series of events that are largely invisible to the user:
- The server then renders just that part of the PDF file and returns it as an image. First it must interpret the PDF file, find the right page, and turn it into image data (or “pixels”). PDF files are highly variable and eccentric, so eScholarship relies on a robust and well-tested PDF library called Poppler, which is also used within many PDF image display tools for Linux.
- The eScholarship server then modifies blocks of pixels to add yellow background and red foreground highlighting to the matching keywords, using text coordinates stored in the XTF full-text index.
- Next the pixel data is compressed for quick transfer. If a small number of colors or only shades of gray are present, PNG compression is chosen; for full color images, JPEG compression is used instead. This choice logic ensures accurate color reproduction while reducing bandwidth (and thus time) as much as possible.
Rendering PDFs as Images
Links for more information:
- Poppler PDF library
- XTF: eXtensible Text Framework
- PNG: Portable Network Graphics image format
- JPEG: Joint Photographic Experts image format