Linked data: the saviour of libraries in the internet age?

Another day, another depressing article about the future of libraries in the UK. I felt myself becoming predictably frustrated by the usual ‘libraries are glorified waiting rooms for the unemployed’ and ‘everything’s on the internet anyway’ comments.

I also found myself trying to come up with ways to do something about it. Don’t get me wrong, I like a good whinge as much as the next man, but whinging only sustains me for so long. Where possible I like to find practical solutions to life’s problems. The issue of mass library closures in the UK might seem too much for one librarian to solve—especially a student librarian on the other side of the world with absolutely no influence in UK politics. But I won’t let that put me off.

Consider the following: Google is our first port of call in any modern information search, right? When we want to know something, we google it. That’s fine. Who determines what appears in search results? Google’s super-secret Algorithm, harnessing an army of spiders to index most corners of the Web. How do web admins try and get their sites to appear higher in search results? Either the dark art of search engine optimisation (SEO), which is essentially a game of cat-and-mouse with the Algorithm, or the fine art of boutique metadata, which is embedded in a Web page’s <meta> tags and used to lure spiders.

Despite falling patronage and the ubiquity of online information retrieval, libraries are absolutely rubbish at SEO. When people google book or magazine titles (to give but one example), libraries’ OPACs aren’t appearing in search results. People looking for recreational reading material are libraries’ target audience, and yet we’re essentially invisible to them.

Even if I accept the premise that ‘everything’s on the internet’ (hint: no), how do people think content ends up on the internet in the first place? People put things online. Librarians could put things online if their systems supported them. Librarians could quite easily feed the internet and reclaim their long-lost status as information providers in a literal sense.

The ancient ILS used by my workplace is an aggravating example of this lack of support. If our ILS were a person it would be a thirteen-year-old high schooler, skulking around the YA section and hoping nobody notices it’s not doing much work. Our OPAC, for reasons I really don’t understand, has a robots.txt warding off Google and other web crawlers. The Web doesn’t notice it and patrons don’t either. It doesn’t help that MARC is an inherently web-unfriendly metadata standard; Google doesn’t know or care what a 650 field is, and it’s not about to start learning.

(Screenshot below obscures the name of my workplace in the interests of self-preservation)

cuut16fviaa6gqm

Down with this sort of thing.

Perhaps in recognition of this problem, vendor products such as SirsiDynix’s Bluecloud Visibility promise to convert MARC records to linked data in Bibframe and make a library’s OPAC more appealing to web crawlers. I have no idea if this actually works or not (though I’m dying to find out). For time-poor librarians and cash-strapped consortia, an off-the-shelf solution would have numerous benefits.

But even the included Google screenshot in the article, featuring a suitably enhanced OPAC, has its problems. Firstly, the big eye-catching infobox to the right makes no mention of the library, but includes links to Scribd and Kobo, who have paid for such prominence. Secondly, while the OPAC appears at the top of the search results, the blurb in grey text includes boring bibliographical information instead of an eye-catching abstract, or even something like ‘Borrow “Great Expectations” at your local library today!’. Surely I’m not the only one who notices things like this…?

I’m keen to do a lot more research in this area to determine whether the promise of linked data will make library collections discoverable for today’s users and bring people back to libraries. I know I can’t fix the ILS. I can’t re-catalogue every item we have. I can’t even make a script do this for me. For now, research is the most practical thing I can do to help solve this problem. Perhaps one day I’ll be able to do more.

Further reading

Fujikawa, G. (2015). The ILS and Linked Data: a White Paper. Emeryville, CA: Innovative Interfaces. Retrieved from https://www.iii.com/sites/default/files/Linked-Data-White-Paper-August-2015.pdf

Papadakis, I. et al. (2015). Linked Data URIs and Libraries: The Story So Far. D-Lib 21(5-6), May-June 2015. Retrieved from http://dlib.org/dlib/may15/papadakis/05papadakis.html

Schilling, V. (2012). Transforming Library Metadata into Linked Library Data: Introduction and Review of Linked Data for the Library Community, 2003–2011. ALCTS Research Topics in Cataloguing and Classification. Retrieved from http://www.ala.org/alcts/resources/org/cat/research/linked-data

Acquisitions Battle: Library v. Archives

I get a real kick out of spending other people’s money. Acquisitions take up a lot of my time at work these days; I’m responsible for, among many other things, acquiring works by local authors and material about the history and culture of our city. Naturally, our budget is minuscule. Every dollar has to be spent wisely, and if we can acquire something for free, we go for it. 

The other day I was surprised to receive a couple of short films produced by one of the local community service organisations, featuring a few locals talking about their lives. I had no real reason to be surprised; after all, I’d asked them to send me a copy. What did surprise me was that the two films were on USB flash drives, one for each film. Somehow I’d been expecting a DVD. Another community service org had graciously sent me a DVD of one of their recent film projects, and I suppose I hadn’t considered the fact that not all organisations distributed their AV material the same way.

Our immediate problem was deciding whether or not the USB flash drives constituted library or archive material. While we make a point of collecting both, the line is somewhat blurry; some material accepted as part of a manuscript deposit occasionally duplicates library closed stack holdings and vice versa. Generally speaking, if an item has been formally published it goes into the library collection and is catalogued in the usual way. If it hasn’t been published it is treated as archive material and is subject to appraisal, copyright clearance etc. and has a finding aid created for it. 

Is a USB flash drive considered ‘published’ material? One drive had the (newer) film’s title printed on the side, clearly indicating an intent to distribute. The other drive was a generic one and held the older film. Neither had an ISBN or other barcode, and could only be obtained by directly contacting the community organisation that produced them. The films themselves had been uploaded to YouTube by the community organisation, also suggesting that the participants had consented to their recordings being widely disseminated.

Because the DVD we received had been (almost automatically) treated as library material and given to our long-suffering cataloguer, I began to wonder whether the USB drives should be treated the same way. After all, if we had received DVDs instead of flash media, I wouldn’t have thought twice about adding them to the library stack. 

However, I ultimately decided, in consultation with my superior, to add the USB flash drives to our archival collections. The lack of ISBN or any kind of commercial packaging was a factor, but the decider was the realisation that write-protecting flash drives is close to impossible. Even if we were to add the drives to our library stack and only permit users to use them in the building, we would have no way of knowing whether someone was tampering with the drive while they used it. A professionally-produced DVD is a read-only medium, which I think we would feel better about having in the library collection.

The major downside to classifying the drives as archive material is that it means a lot more work for us. Naturally, I hadn’t thought to request a deed of gift or copyright clearances from the community organisation, so we’ll have to chase that up. If they in turn didn’t ask the participants to sign anything (which is unlikely but possible), that will also create some difficulties. And of course, at some point I’ll have to copy the contents of the drives to our rudimentary digital preservation setup. I’ve wound up being responsible for that too, but that’s a story for another time. 

I’m a document hipster. I only write in sustainable plaintext

No-one really needs three different word processing programs.

Yet that’s the situation I’m currently in. My six-year-old MacBook Pro is on its last legs and I’m desperately trying to eke out as much free storage space and processing power as possible, meaning a bit of spring cleaning is in order. Unfortunately for me, I’ve amassed text-based documents in (among others) .docx, .pages and .odt formats. Office for Mac has only recently added support for Open Document formats and I’m reluctant to get rid of LibreOffice, the originating program.

From a digital preservation perspective, my Documents folder is a mess. Converting all of these into more sustainable formats will, I’ve decided, be Future Alissa’s problem. But that doesn’t mean I need to keep living an unhealthy document lifestyle.

Instead, I’ve decided to try out one of the more intriguing lessons on The Programming Historian: ‘Sustainable Authorship in Plain Text using Pandoc and Markdown’. Any document that I would normally write in Pages will instead be written in a plaintext editor using Markdown and typeset in .docx or .pdf using Pandoc. I’ve been using Markdown for a while to write these blog posts, but Pandoc is a new experience.

Briefly, Markdown is a text markup language that is intended to be human-readable in a way HTML isn’t. Pandoc is a command-line program to convert one markup format into another, such as HTML to .docx (which at heart is an XML format). The primary benefit is that the manuscript (which is a plaintext .md file), will never need specialised word processing software to read and will remain intelligible to human eyes. Additional information that would otherwise be incorporated into a .docx or .pages file, such as bibliographic data and footnote stylesheets, is saved separately. These are also plaintext and easily human-readable.

There are plenty of reasons to kick the word processor habit (neatly summarised in this blog post by W. Caleb McDaniel). Personally, I spend way too much time mucking around with formatting before I even begin to type. A plain-text typing environment has no such distractions, allowing me to concentrate on content. If I need to bold or italicise something, for example, I can do that in Markdown without interrupting my sentence flow.

You’d be forgiven for asking, ‘Why bother with all this, when there are easier options?’ Certainly it’s a challenge for those unfamiliar with the command line. There’s also a lot this method won’t include–complex tables, mail merge, interactive elements, et cetera. And yes, there are plenty of other distraction-free apps out there. In the long run, however, I’m looking forward to three things:
1) a more fruitful and painless typing experience
2) not wasting hours of my life converting documents from one format to another (yes, this has been known to take me hours) and
3) improving my command-line and markup skills.

What I did, briefly

After installing Pandoc, and following the Programming Historian’s instructions (though I chose to forego LaTeX and hence .pdf conversion for want of disk space), I created a nice little test .md file, incorporating images, links and footnotes, in a nice desktop plaintext editor called Atom.

Atom code

I then ran a Pandoc command in Terminal to convert the .md file to a .docx file. Disappointingly, the program did not return anything to suggest it had been successful. A quick $ ls, however, revealed the new file.

Terminal

I also converted the .md manuscript into .odt and .html, just to see what might happen and if there were any differences.

How it ended up

As it turned out, the .docx and .odt conversions were missing the footnotes and .html was missing the header (which is not standard Markdown, but rather a Pandoc extension), meaning that none of the target formats included 100% of the Markdown content. Considering I had done absolutely no styling, the .docx was surprisingly eye-catching.

MS Word output

I don’t know why parts were missing from each target file, but I plan to investigate why before using Pandoc more extensively for research work. Despite not quite getting all the output I was promised, I wasn’t dissuaded from using Markdown and Pandoc for my long-form writing. The tutorial goes into some depth on footnotes and bibliographies, which I didn’t have time to test and which might well solve my problem.

Ironically, a copy of Matthew Kirschenbaum’s Tracked Changes, a history of word processors and their effect on the art of writing, arrived at the post office while I was compiling this article. In a way, adopting Markdown and Pandoc is an effort to get back to those, uh, halcyon days of formatting-free word processing. Hopefully when I re-examine my Documents folder in a few years’ time, it will be full of plaintext files!