Sick of hearing about linked data? You’re not alone

‘This looks a little bit complicated’ … you don’t say… #lodlam #lasum2016 @lissertations 8 Dec 2016

I’m not attending ALIA Information Online this year, largely because the program was broadly similar to NDFNZ (which I attended last year) and I couldn’t justify the time off work. Instead I’m trying to tune into #online17 on Twitter, in between dealing with mountains of work and various personal crises.

As usual, there’s a lot of talk about linked data. Pithy pronouncements on linked data. Snazzy slides on linked data. Trumpeting tweets about linked data.

You know what?

I’m sick of hearing about linked data. I’m sick of talking about linked data. I’m fed up to the back teeth with linked data plans, proposals, theories, suggestions, exhortations, the lot. I’ve had it. I’ve had enough.

What will it take to make linked data actually happen?

Well, for one thing, ‘linked data’ could mean all sorts of things. Bibframe, that much-vaunted replacement for everyone’s favourite 1960s data structure MARC, is surely years away. RDF and its query language SPARQL are here right now, but the learning curve is steep and its interoperability with legacy library data and systems is difficult. Whatever OCLC is working on has the potential to monopolise and commercialise the entire project. If people use ‘linked data’ to mean ‘indexed by Google’, well, there’s already a term for that. It’s called SEO, or ‘search engine optimisation’, and marketing types are quite good at it. (I have written on this topic before, for those interested.)

Furthermore, linked data is impossible to implement on an individual level. Making linked data happen in a given library service, including—

  • modifying one’s ILS to play nicely with linked data
  • training your cataloguing and metadata staff (should you have any) on writing linked data
  • ensuring your vendors are willing to provide linked data
  • teaching your floor staff about interpreting linked data
  • convincing your bureaucracy to pay for linked data and
  • educating the public on what the hell linked data is

—requires the involvement of dozens of people and is far above my pay grade. Most of those people can be relied upon to care very little, or not at all, about metadata of any kind. Without rigorous description and metadata standards, not to mention work on vocabularies and authority control, our linked data won’t be worth a square inch of screen real estate. The renewed focus on library customer service relies on staff knowing what materials and services their library offers. This is impossible without good metadata, which in turn is impossible without good staff. I can’t do it alone, and I shouldn’t have to.

Here, the library data ecosystem is so tightly wrapped around the MARC structure that I don’t know if any one entity will ever break free. Libraries demand MARC records because their software requires it. Their software requires MARC records because vendors wrote it that way. Vendors wrote the software that way because libraries demand it. It’s a vicious cycle, and one that vendors currently have little incentive to break.

I was overjoyed to hear recently of the Oslo Public Library’s decision a few years ago to ditch MARC completely and catalogue in RDF using the Koha open-source ILS. They decided there was no virtue in waiting for a standard that may never come, and decided to Make Linked Data Happen on their own. The level of resultant original cataloguing is quite high, but tools like MARC2RDF might ameliorate that to an extent. Somehow, I can’t see my workplace making a similar decision. It’d be awesome if we did, though.

I don’t yet know what will make linked data happen for the rest of us. I feel like we’ve spent years convincing traditionally-minded librarians of the virtues of linked data with precious little to show for it. We’re having the same conversations over and over. Making the same pronouncements. The same slides. The same tweets. All for something that our users will hopefully never notice. Because if we do our jobs right and somehow pull off the biggest advancement in library description since the invention of MARC, our users will have no reason to notice—discovery of library resources will be intuitive at last.

Now that would be something worth talking about.

Linked data: the saviour of libraries in the internet age?

Another day, another depressing article about the future of libraries in the UK. I felt myself becoming predictably frustrated by the usual ‘libraries are glorified waiting rooms for the unemployed’ and ‘everything’s on the internet anyway’ comments.

I also found myself trying to come up with ways to do something about it. Don’t get me wrong, I like a good whinge as much as the next man, but whinging only sustains me for so long. Where possible I like to find practical solutions to life’s problems. The issue of mass library closures in the UK might seem too much for one librarian to solve—especially a student librarian on the other side of the world with absolutely no influence in UK politics. But I won’t let that put me off.

Consider the following: Google is our first port of call in any modern information search, right? When we want to know something, we google it. That’s fine. Who determines what appears in search results? Google’s super-secret Algorithm, harnessing an army of spiders to index most corners of the Web. How do web admins try and get their sites to appear higher in search results? Either the dark art of search engine optimisation (SEO), which is essentially a game of cat-and-mouse with the Algorithm, or the fine art of boutique metadata, which is embedded in a Web page’s <meta> tags and used to lure spiders.

Despite falling patronage and the ubiquity of online information retrieval, libraries are absolutely rubbish at SEO. When people google book or magazine titles (to give but one example), libraries’ OPACs aren’t appearing in search results. People looking for recreational reading material are libraries’ target audience, and yet we’re essentially invisible to them.

Even if I accept the premise that ‘everything’s on the internet’ (hint: no), how do people think content ends up on the internet in the first place? People put things online. Librarians could put things online if their systems supported them. Librarians could quite easily feed the internet and reclaim their long-lost status as information providers in a literal sense.

The ancient ILS used by my workplace is an aggravating example of this lack of support. If our ILS were a person it would be a thirteen-year-old high schooler, skulking around the YA section and hoping nobody notices it’s not doing much work. Our OPAC, for reasons I really don’t understand, has a robots.txt warding off Google and other web crawlers. The Web doesn’t notice it and patrons don’t either. It doesn’t help that MARC is an inherently web-unfriendly metadata standard; Google doesn’t know or care what a 650 field is, and it’s not about to start learning.

(Screenshot below obscures the name of my workplace in the interests of self-preservation)

cuut16fviaa6gqm

Down with this sort of thing.

Perhaps in recognition of this problem, vendor products such as SirsiDynix’s Bluecloud Visibility promise to convert MARC records to linked data in Bibframe and make a library’s OPAC more appealing to web crawlers. I have no idea if this actually works or not (though I’m dying to find out). For time-poor librarians and cash-strapped consortia, an off-the-shelf solution would have numerous benefits.

But even the included Google screenshot in the article, featuring a suitably enhanced OPAC, has its problems. Firstly, the big eye-catching infobox to the right makes no mention of the library, but includes links to Scribd and Kobo, who have paid for such prominence. Secondly, while the OPAC appears at the top of the search results, the blurb in grey text includes boring bibliographical information instead of an eye-catching abstract, or even something like ‘Borrow “Great Expectations” at your local library today!’. Surely I’m not the only one who notices things like this…?

I’m keen to do a lot more research in this area to determine whether the promise of linked data will make library collections discoverable for today’s users and bring people back to libraries. I know I can’t fix the ILS. I can’t re-catalogue every item we have. I can’t even make a script do this for me. For now, research is the most practical thing I can do to help solve this problem. Perhaps one day I’ll be able to do more.

Further reading

Fujikawa, G. (2015). The ILS and Linked Data: a White Paper. Emeryville, CA: Innovative Interfaces. Retrieved from https://www.iii.com/sites/default/files/Linked-Data-White-Paper-August-2015.pdf

Papadakis, I. et al. (2015). Linked Data URIs and Libraries: The Story So Far. D-Lib 21(5-6), May-June 2015. Retrieved from http://dlib.org/dlib/may15/papadakis/05papadakis.html

Schilling, V. (2012). Transforming Library Metadata into Linked Library Data: Introduction and Review of Linked Data for the Library Community, 2003–2011. ALCTS Research Topics in Cataloguing and Classification. Retrieved from http://www.ala.org/alcts/resources/org/cat/research/linked-data

Acquisitions Battle: Library v. Archives

I get a real kick out of spending other people’s money. Acquisitions take up a lot of my time at work these days; I’m responsible for, among many other things, acquiring works by local authors and material about the history and culture of our city. Naturally, our budget is minuscule. Every dollar has to be spent wisely, and if we can acquire something for free, we go for it. 

The other day I was surprised to receive a couple of short films produced by one of the local community service organisations, featuring a few locals talking about their lives. I had no real reason to be surprised; after all, I’d asked them to send me a copy. What did surprise me was that the two films were on USB flash drives, one for each film. Somehow I’d been expecting a DVD. Another community service org had graciously sent me a DVD of one of their recent film projects, and I suppose I hadn’t considered the fact that not all organisations distributed their AV material the same way.

Our immediate problem was deciding whether or not the USB flash drives constituted library or archive material. While we make a point of collecting both, the line is somewhat blurry; some material accepted as part of a manuscript deposit occasionally duplicates library closed stack holdings and vice versa. Generally speaking, if an item has been formally published it goes into the library collection and is catalogued in the usual way. If it hasn’t been published it is treated as archive material and is subject to appraisal, copyright clearance etc. and has a finding aid created for it. 

Is a USB flash drive considered ‘published’ material? One drive had the (newer) film’s title printed on the side, clearly indicating an intent to distribute. The other drive was a generic one and held the older film. Neither had an ISBN or other barcode, and could only be obtained by directly contacting the community organisation that produced them. The films themselves had been uploaded to YouTube by the community organisation, also suggesting that the participants had consented to their recordings being widely disseminated.

Because the DVD we received had been (almost automatically) treated as library material and given to our long-suffering cataloguer, I began to wonder whether the USB drives should be treated the same way. After all, if we had received DVDs instead of flash media, I wouldn’t have thought twice about adding them to the library stack. 

However, I ultimately decided, in consultation with my superior, to add the USB flash drives to our archival collections. The lack of ISBN or any kind of commercial packaging was a factor, but the decider was the realisation that write-protecting flash drives is close to impossible. Even if we were to add the drives to our library stack and only permit users to use them in the building, we would have no way of knowing whether someone was tampering with the drive while they used it. A professionally-produced DVD is a read-only medium, which I think we would feel better about having in the library collection.

The major downside to classifying the drives as archive material is that it means a lot more work for us. Naturally, I hadn’t thought to request a deed of gift or copyright clearances from the community organisation, so we’ll have to chase that up. If they in turn didn’t ask the participants to sign anything (which is unlikely but possible), that will also create some difficulties. And of course, at some point I’ll have to copy the contents of the drives to our rudimentary digital preservation setup. I’ve wound up being responsible for that too, but that’s a story for another time. 

Tuesday: how it could revolutionise the Dewey Decimal System

I keep meaning to write this post when it’s not Tuesday. I also keep meaning to revolutionise library classification, but it’s slipped down my to-do list a few notches. Between looking for a new job, organising an overseas trip, writing a conference proposal and studying my last three MIS subjects, I’ve had a fair bit on. Happily, however, I’ve managed to find a spare hour for this most important discussion. Never mind the fact library cataloguers and researchers have spent entire careers on this topic, I’m an Enthusiastic New Professional™ and I can accomplish anything! [citation needed]

The inspiration for this post came from Hugh Rundle’s hilarious @lib_papers Twitter bot. It spits out nonsensical fake conference paper titles which, if you squint hard enough, could almost be real. Fortunately, however, I have the self-awareness to never style myself as an ‘entreprevational full-stack cybrarian’.

Now, to business. Plenty of authors before me have written on how terrible DDC is. It’s an antiquated, anglocentric, angst-inducing mess of a classification system. It assigns whole numbers to arcane topics and relegates vast areas of inquiry to lengthy strings (e.g. the etymology of classical Greek is awarded 482, but climate change, arguably one of the gravest issues of our time, is assigned 363.7387). It demands books on similar subjects be located far away from each other for reasons known only to a nineteenth-century white American man with a misogynist streak and a penchant for spelling reform.

DDC is so awful that growing numbers of libraries (mostly public) are choosing to do away with Dewey altogether. By ‘genrefying’ their collections, librarians and technical services staff are reclaiming their shelf order and reasserting their right to shelve a book where they see fit, not where ~Dewey~ sees fit. I’ve read many a report on the outcomes of genrefication, particularly in fiction collections and in schools, and so far I’ve been very impressed.

My first exposure to genrefication came with a visit to the (then temporary) City of Perth Library as part of a CSU study trip. (I don’t live in Perth, in case you were wondering how I had never visited the city library there.) Like any good mid-degree LIS student angling for a career in technical services, I was suitably horrified by the library’s decision to sort their print collections by genre. On reflection, however, I think the idea outraged me only because it was completely foreign. I was so thoroughly immersed in the Dewey-centric narrative promulgated by library schools everywhere that I had never considered the idea that classification could be done differently.

Certain stripes of librarians take classification really seriously. Perhaps too seriously. And I say this as someone who genuinely enjoys cataloguing. As long as a patron has a reasonable chance of finding a given book on a shelf, armed either with OPAC search results or an ability to read directional signs, and that such a book is located adjacent to other books on similar topics and/or in a reasonably intuitive place, who gives a shit what call number it’s got?

This is not to say that I support eradicating call numbers entirely. I don’t. I believe that we as librarians owe it to the public to come up with a system that doesn’t completely suck.

There is absolutely no need for library users to have to learn such a convoluted and inconsistent system. In Dewey’s day, libraries were typically closed-stack affairs anyway — the only people who had any need to learn the classification system were the library staff, for whom the idea of ‘browsability’ was not an issue. In an age where bookshops are organised by genre and video rental shops (R.I.P.) were similarly classified, why is it anathema for libraries, especially public and school libraries, to arrange their wares in a similar manner?

Dewey is easier for librarians, not for patrons. Dewey means technical services staff don’t have to classify every item from scratch if they don’t want to or can’t. Ostensibly, Dewey also means that any book on a given topic will have roughly the same call number anywhere Dewey is used. Yet I’ve come across numerous examples in the course of my work, in a library which uses Dewey for its modest physical collection, where the same item was given wildly different call numbers depending on the cataloguer. I found one edition of The Best Australian Science Writing, a monograph in annual series, in 500 and another in 800. Learning the implementation of Dewey in one library does not guarantee it will be the same elsewhere.

Alarmingly, I’ve reached almost 800 words and have yet to present any kind of workable alternative to Dewey. I know there’s one out there, though. In the coming weeks and months I intend to devote some of my spare brainpower to the idea, once I’ve finished all the other things I noted above. But the @lib_papers bot has, amusingly, almost come full circle. I look forward to one day genuinely presenting a paper on how Tuesday will help revolutionise DDC. Further thoughts on that will, alas, have to wait for another Tuesday.