So what’s next?: five things I learned at #GOGLAM

Yesterday I had the great privilege of attending the GO GLAM miniconf, held under the auspices of the Linux Australia conference. Hosted by the fabulous Bonnie Wildie and the indefatigable Hugh Rundle, GO GLAM brought the power and promise of open-source software to the GLAM sector. This miniconf has been run a couple of times before but this was my first visit. It was pretty damn good. I’m glad I started scribbling down notes, otherwise it would all be this massive blend of exhausted awesome.

The day began with an opening keynote by some guy called Cory Doctorow, but he wasn’t very interesting so I didn’t pay much attention. He did talk a lot about self-determination, and he did use the phrase ‘seizing the means of computation’ that I definitely want on a t-shirt, but there was a big ethics-of-care-sized gap at the centre of his keynote. I found myself wishing someone would use the words ‘self-determination’ and ‘social responsibility’ in the same talk.

Good tech platforms can exist, if we care enough to build them. As it happened, GO GLAM’s first speakers, a group of five mostly francophone and mostly Indigenous artists and coders from what is now eastern Canada, wound up doing almost exactly this. Natakanu, meaning ‘visit each other’ in the Innu language, is an ‘Indigenous-led, open source, peer to peer software project’, enabling First Nations communities to share art, data, files and stories without state surveillance, invasive tech platforms or an internet connection. I can’t express how brilliant this project is. I’m still so deeply awed and impressed by what this team have built.

gif of natakanu client
Demo gif of the Natakanu client. Image courtesy Mauve Signweaver

Two things leapt out at me during this electrifying talk—that Natakanu is thoughtful, and that it is valuable. It consciously reflects First Nations knowledge cultures, echoing traditions of oral history, and exemplifying an ‘approach of de-colonized cyberspace’. Files are shared with ‘circles’, where everyone in a circle is assumed to be a trusted party, but each member of that circle can choose (or not) to share something further. Building a collective memory is made easier with Natakanu, but the responsibility of doing so continues to rest with those who use it.

Natakanu embodies—and makes space for—First Nations sovereignties, values and ethics of care. It’s technology by people, for people. It’s a precious thing, because our communities are precious, too. The Natakanu platform reflects what these communities care about. Western tech platforms care about other things, like shouting at the tops of your lungs to ten billion other people in an agora and algorithmically distorting individuals’ sense of reality. We implicitly accept these values by continuing to use these platforms. Our tech doesn’t care about us. We could build better tech, if we knew how, and we chose to. (There’s a reason I’ve been consciously trying to spend less time on Twitter and more time on Mastodon.) But more on computational literacy a little later.

A few people mentioned in the Q&A afterwards how they’d love to bring Natakanu to Indigenous Australian communities. I don’t doubt their intentions are good (and Hugh touched on this in the recap at the end of the day), but in my (white-ass) view the better thing is to empower communities here to build their own things that work for them. A key aspect of Reconciliation in this country is developing a sense of cultural humility, to recognise when your whitefella expertise might be valuable and to offer it, when to quietly get out of the way, and which decisions are actually yours to make. Or, as speaker Mauve Signweaver put it, ‘instead of telling them “tell us what you need and we’ll make it for you”, saying “Tell us what you need and we’ll help you make it”‘.

I can’t wait to rewatch this talk and catch up on some parts I know I missed. It was absolutely the highlight of the entire miniconf. I couldn’t believe they were first-time speakers! Can they do the keynote next year?

Metadata and systems might not last forever, but we can still try. I think it’s safe to say many attendees were very taken with Arkisto, the ‘open-source, standards-based framework for digital preservation’ presented by Mike Lynch. It’s a philosophical yet pragmatic solution to describing, packaging and contextualising research data. Arkisto’s framework appears particularly useful for rescuing and re-housing data from abandoned or obsolete platforms (such as an Omeka instance where the grant money has run out and the site is at risk of deletion).

Arkisto describes objects with RO-Crate (Research Object Crate, a derivative of Schema.org) and stores them in the Oxford Common File Layout, a filesystem that brings content and metadata together. It’s actively not a software platform and it’s not a replacement for traditional digipres activities like checksums. It’s a bit like applying the philosophy of static site generators to research data management; it’s a minimalist, long-term, sustainably-minded approach that manages data in line with the FAIR principles. It also recognises that researchers have short-term incentives not to adequately describe or contextualise their research data (no matter how much librarians exhort them to) and tries to make it easier for them.

The new PARADISEC catalogue includes Arkisto and an associated web interface, Oni, as part of its tech stack. I was very taken with the catalogue’s principle of ‘graceful degradation’—even if the search function ceases to operate, browsing and viewing items will still work. As a former web archivist I was heartened to see them holding this more limited functionality in mind, an astute recognition that all heritage, be it virtual, environmental or built, will eventually decay. So much of my web archiving work involved desperately patching dynamic websites into something that bore a passing resemblance to what they had once been. We might not always be able to save the infrastructure, but one hopes we can more often save the art, the data, the files, the stories. (Which reminds me, I’ve had Curated Decay on my to-read shelf for far too long.)

I shouldn’t have needed reminding of this, but sometimes I forget that metadata doesn’t begin and end with the library sector. It was a thrill to hear someone in a related field speaking my language! I wanna hang out with these people more often now.

Generosity resides in all of us. My first impressions of Hugh Rundle’s talk were somewhat unfavourable—he only spent a couple of minutes talking about the bones of his project, a Library Map of every public library in Australia, and instead dedicated the bulk of his time to complaining about the poor quality of open datasets. Despite having had several sneak previews I was rather hoping to see more of the map itself, including its ‘white fragility mode’ and the prevalence of fine-free libraries across the country. Instead I felt a bit deflated by the persistent snark. Hugh was the only speaker to explicitly reference the miniconf’s fuller title of ‘Generous and Open GLAM’. But this felt like an ungenerous talk. Why did it bother me?

Perhaps it’s because Hugh is a close friend of mine, and I expected him to be as kind and generous about the failings of Data Vic as he is about my own. I’m not sure I held other speakers to that high a standard, but I don’t think anyone else was quite as mean about their data sources. I also hadn’t eaten a proper breakfast, so maybe I was just hangry, and I ought to give Hugh the benefit of the doubt. After all, he had a lot on his plate. I intend to rewatch this talk when the recordings come out, to see if I feel the same way about it on a full stomach. I hope I feel differently. The Library Map really is a great piece of software, and I don’t think Hugh quite did it justice.

screenshot of Hugh Rundle's Library Map
Homepage of Hugh Rundle’s Library Map

Omg pull requests make sense now. Liz Stokes is absolutely delightful, and her talk ‘Once more, with feeling!’ was no exception. Her trademark cheerfulness, gentleness and generosity shone in this talk, where she explored what makes a comfortable learning environment for tech newbies, and demonstrated just such an environment by teaching us how GitHub pull requests worked. How did she know that I desperately needed to know this?! Pull requests had just never made sense to me—until that afternoon. You ‘fork’ a repository by copying it to your space, then make changes you think the original repo would benefit from, then leave a little note explaining what you did and ‘request’ that the original owner ‘pull’ your changes back into the repo. A pull request! Amazing! A spotlight shone upon my brain and angels trumpeted from the heavens. This made my whole day. Hark, the gift of knowledge!

Liz also touched on the value of learning how to ‘think computationally’, a skill I have come to deeply appreciate as I progress in my technical library career. I’ve attended multiple VALA Tech Camps (including as a presenter), I’ve done all sorts of workshops and webinars, I’ve tried learning to code umpteen times (and just the other day bought Julia Evans’ SQL zine Become a SELECT Star! because I think I’ll shortly need it for work), but nowhere did I ever formally learn the basics of computational thinking. Computers don’t think like humans do, and in order to tell computers what we want, we have to learn to speak their language. But so much learn-to-code instruction attempts to teach the language without the grammar.

I don’t have a computer science background—I have an undergraduate degree in classics, and am suddenly reminded of the innovative Traditional Grammar course that I took at ANU many years ago. Most students come to Classical Studies with little knowledge of grammar in any language; instead of throwing them headfirst into the intricacies of the ancient languages, they learn about the grammars of English, Latin and Ancient Greek first and together. This gives students a solid grounding of the mechanics of language, setting them up for success in future years. Programming languages need a course like Traditional Grammar. Just as classicists learn to think like Romans, prospective coders need to be explicitly taught how to think like computers. A kind of basic computational literacy course.

(Of all the things I thought I’d get out of the day, I didn’t expect a newfound admiration of Professor Elizabeth Minchin to be one of them.)

Online confs are awesome! Being somewhat late to the online conference party, GO GLAM was my first experience of an exclusively online conference. I’ve watched a handful of livestreams before, but it just isn’t the same. A bit like reading a photocopied book. I don’t think I had any particular expectations of LCA, but I figured I’ve sat in on enough zoom webinars, it’d be a bit like that, right? Wrong. The LCA audio-visual and conference tech stack was an absolute thing of beauty. Everything looked a million bucks, everything was simple and easy to use. It was a far more active watching experience than simply tucking into a livestream—the chat box on the right-hand side, plus the breakout Q&A areas, helped me feel as if I were truly part of the action. I didn’t usually have a lot to say past ‘That was awesome!’ but it was far less intimidating than raising my hand at an in-person Q&A or cold-tweeting a speaker after the fact.

As someone who is deeply introverted, probably neurodivergent and extremely online, virtual conferences like GO GLAM are so much more accessible than their real-life counterparts. I didn’t have to travel, get up early, put on my People Face™, spend hours in a bright and noisy conference hall, eat mediocre food, make painful small talk, take awkward pictures of slides and furiously live-tweet at the same time, massively exhaust myself and make a mad dash for the exit. Instead I could have a nap, grab another pot of tea, turn the lights down, share links in the chat, clap with emojis, watch people make great connections, take neat and tidy screenshots of slides, squeeze in a spot of Hammock Time and still be feeling excited by it all at the end of the day.

I’m sure people will want to return to some form of physical conferencing in the fullness of time, but I fervently hope that online conferencing becomes the new norm. This infrastructure exists, it costs a lot less than you think (certainly less than venue hire and catering), and it makes conferences accessible to people for whom the old normal just wasn’t working. Please don’t leave us behind when the world comes back.

Using web archives for document supply: a case study

Today I was asked to help with a curly document supply request. A distance student was looking for a particular article, which my colleagues had been unable to locate. Usually we think of document supply as resource sharing, but today was really more about resource finding. It’s also similar to reference queries about how to find journal articles, which we get all the time.

It wound up being so difficult—and interesting!—that I thought others might like to know how it was done. This is also partly so that if my colleagues decide they want me to present a training session on this, I’ve already got the notes written up… teehee.

The request

The details I received looked like this:

Journal Title Risk & Regulation
Publisher CARR LSE
Volume / Issue Issue 3
Part Date Spring 2002
Call Number

Title Japan: Land of the Rising Audit?
Article Author Michael Power
Pages 10 ff

My colleagues initially thought this was a book chapter request, but the book they’d found didn’t quite match these details, at which point they roped me into the search.

Catalogue search

Step 1: Search our local catalogue. This is standard for all document supply requests—you’d be surprised how often people ask for things we already have. I consider it a learning and teaching opportunity (and sometimes also a reminder that print books and serials still exist). In this instance, we didn’t have anything with this title in our catalogue.

Step 2: Search Libraries Australia, the national union catalogue. If another Australian library held this serial, we would request it on the patron’s behalf through the Libraries Australia Document Delivery (LADD) system, of which most Australian libraries are a member. I didn’t have an ISSN, so I had to go on title alone.

Good news: there is a record in LA for this serial, so I could confirm it exists. Bad news: no library in Australia holds it. (Records without holdings are common in LA, as many libraries use it as an acquisitions tool.)

Step 3: Search SUNCAT, the British serials union catalogue. I realised later that I didn’t really need to do this step, because the only extra info that LA didn’t have was a list of UK institutions that held copies. (Which I obviously couldn’t get at.) However, it wasn’t until this point that I noticed the note stating ‘Also available via the internet.’ Which got me thinking—is this an OA online journal? It would explain the lack of local holdings if it was just on the internet…

Web archive search

Step 4: Google the journal title. Yes, Google, like a real librarian.

There is a distinct possiblity that I own this particular shirt

Turns out the journal Risk & regulation is indeed published free and online by the London School of Economics, AND they have back issues online! … going back to 2003. The one I need is from 2002, because of course it is.

Step 5: Search the UK Web Archive. Knowing the journal was a) a UK title and b) online at some point, I then turned to web archives to find a copy. I searched on the article title, it being more distinctive than the journal title, and also because a more specific search would get me results faster. This brought me to an archived LSE news page from 2002.

The LSE news page provided a link to the journal page—but the UK Web Archive hadn’t preserved it! Argh!

Step 6: Search the Wayback Machine. All was not lost, however. Because I was now armed with a dead URL that had once linked to the journal page I needed, I could go straight to the Wayback Machine, part of the Internet Archive, and simply plug in the URL to find archived copies of that page. The Wayback Machine recently launched a keyword search functionality, but it’s still a work in progress. My experience suggests this site functions best when you know exactly where to look.

I had to fiddle around with the URL slightly, but I eventually got to the journal landing page. Remembering that I needed issue 3 from Spring 2002, I clicked on the link to the relevant PDF—also archived!—and quickly located the article.

Step 7: Email article to student and give them the the good news. They thanked me and asked how I found it, so I gave them a shorter version of the above in the hope they might find it useful in future. I made sure to reassure them that this kind of thing is quite difficult and there’s often not necessarily a single place to search (they had wondered what search terms they ought to have used) and if they were stuck in future, just ask a librarian—it’s what we’re for. 🙂

Conclusions

Web archives aren’t usually the search target of choice for reference and document supply staff, but they are an absolute goldmine of public information, particularly for older online serials that may have vanished from the live web. Many researchers (and librarians, for that matter) don’t know much about web archives, if anything, so cases like this are a great way to introduce people to these incredible resources.

This was also a bit of a proud moment for me, I won’t lie. It’s so good to have moments like this every now and again—it helps me demonstrate there’s still a place for professional document hunters.

A #digipres reading list for the total beginner

This is part of an an occasional series, “Digital Preservation For the Rest of Us”.

Sorry, Kassi, I know I said I’d post this days ago!

If you’re a digital preservation beginner, you might be looking for a great resource to help you catch up on where the sector is at. This brief post will include a few choice books and other resources for digipres beginners. They’re in no particular order, and are totally my own opinions.

For the complete beginner, it’s hard to go past the Digital Preservation Handbook, hosted by the Digital Preservation Coalition. It provides lots of accessible, non-technical introductions to the topic, as well as lots of videos, task lists and links to other resources. Have a read of the ‘Digital Preservation Briefing’‘ if you need a gentle introduction.

For a holistic view of digital preservation, I can’t go past The Theory and Craft of Digital Preservation by Trevor Owens. The preprint is on LISSA right now, with the monograph due out in early 2018. It does a magnificent job of explaining not just the nuts and bolts of digipres, but the underlying philosophy and theory that informs our practice. I’ve been recommending this since the day the preprint went up, and I fully expect this will be a widely-used textbook for students in the field.

If you’re near a print library or repository of some kind, you probably want a few things from this pile:

In particular, I recommend Practical Digital Preservation: a how-to guide for organizations of any size by Adrian Brown (full of firm, practical advice), Is Digital Different? edited by Michael Moss, Barbara Endicott-Popovsky and Marc J. Dupuis (hint: yes) and, if you’re new to archives and preservation in general, Archives: principles and practices by Laura Millar (I have the 1st edition, but I hear the 2nd is even better).

Due out in March next year is the third edition of Preserving Digital Materials by Ross Harvey and Jaye Weatherburn. Both Australian authors (woo!), the book promises to be a one-stop shop for digital preservation practitioners. I’ll definitely be getting a copy of this when it comes out.

Re-collection: Art, New Media and Social Memory by Richard Rinehart and Jon Ippolito examines the topic from a curatorial perspective, which may be more accessible to those with museum or gallery backgrounds. I admit I haven’t read this myself, so I’m recommending it sight unseen, but the authors definitely know their stuff.

Finally, for a light-hearted look at the access side of digital preservation, have a look at ‘Accessing born-digital content: a look at the challenges of born-digital content in our collection’ by the NLA’s Gareth Kay. It’s a nice illustration of why digital preservation matters—works will be lost forever if they’re not preserved!

I hope this list is a useful one! Let me know if I missed any good resources 🙂

Digital archiving for journalists and writers

This post is part of an occasional series, “Digital Preservation For the Rest of Us”.

Don’t let it happen to you. (Picture courtesy Pixabay.com, CC-0)

Background

Ever heard the saying ‘the internet is forever’? Well, I’ve got good news and bad news. The internet does retain a staggeringly huge amount of information, but it doesn’t always last.

In the last couple of days we’ve heard about the abrupt shutdown of news organisations DNAinfo and Gothamist, with the sites being summarily yanked off the internet. Within hours, people realised that if those sites were gone for good, journalists and other contributors would have no way of verifying their work history, and years of valuable local journalism could be lost.

It followed the ABC’s recent decision to remove a few years’ worth of At the Movies videos as part of a transition of older websites for programs that have ceased broadcasting. Researchers were horrified by the idea that the ABC could simply ‘erase history’ by removing content from the public internet. Many commented on the avalanche of link rot the ABC had created.

While the At the Movies website was archived by the NLA’s Pandora service, the videos themselves were not archived (presumably for space and technical reasons). The ABC have also publicly stated they intend to move older video content from past shows to a better online archive. Compare that with Gothamist, which has found itself at the mercy of the Internet Archive and cached Google search results. A fair amount of content had been saved to the Internet Archive, but there are likely still gaps. It also highlighted how many people weren’t keeping personal archives of their work.

Key lessons

The internet is not your archive. I can’t emphasise this enough. The public internet is not—and was never designed to be—a permanent archive. Websites can be put up or taken down at a moment’s notice. Just because something is online right now, doesn’t mean it will still be online tomorrow, or next week, or next year. We can’t expect corporations and private organisations to archive their published work in perpetuity and have it be the only copy. That’s what libraries and archives are for. (Libraries around the world undertake national web archiving programs, incuding the NLA and the Library of Congress, but they can’t collect everything, and most can only collect material published or produced in their country.)

You cannot rely on others to archive your work. You will need to do this yourself. The best way to capture content in perpetuity, whether it’s physical or virtual, is with a mix of public and private archiving. That is, with archival tools and collecting policies controlled by public entities, by private entities, and by you personally. If one fails, the other two should persist. If all three fail, you’ve probably got bigger things to worry about.

How to archive your online articles

Here’s a selection of free tools to help you capture and archive your digital content.

  • Save to Evernote. Evernote is a free cloud-based notes app for every platform you’d care to name. It’s good for notetaking, but the killer feature is its Web Clipper extension, the ability to scrape web pages and save them straight to a note. I use this religiously to keep all my internet detritus in one place, but you can use this to save copies of your online work.
  • Add to the Internet Archive. The Internet Archive, perhaps the most well-known digital archive, incorporates the Wayback Machine, a privately-run web archiving service hoovering up the web since 1996. You can add individual pages to the Archive in several ways, including by copying and pasting a URL into this page, or by using a clipping extension (available for Chrome, Safari and Firefox, with apps available for iOS and Android). The extension will also detect dead pages or 404s and offer to take you to an archived version of that page, which is an incredibly useful tool.
The Internet Archive web clipper. (Screenshot via Chrome clipper)
  • Create a personal web archive with Webrecorder. Webrecorder is an amazing web archiving tool built by Rhizome. You can navigate to the pages you wish to save, creating a personalied set of archived pages. You can then download this set to your computer, view it with the accompanying Webrecorder desktop app, and—this is the best bit—the pages behave exactly as they did when you saved them! Video, animations, dynamic pages—they all work (this isn’t always the case with the Wayback Machine). Great for multimedia artists and people who wish to browse their archived work in its natural habitat.
  • Use Save My News. Save My News, a nifty little service brought to you by Ben Welsh, combines the cloud storage of the Internet Archive with the handy custom lists of Evernote or Webrecorder. Simply login with Twitter, copy and paste a URL, and bam! Instantly saved in the Wayback Machine, neatly arranged in a list for your reference. So simple, even your dog could do it.
The Save My News interface. (Screenshot via http://www.savemy.news/)
  • Print articles to PDF. In a browser, simply choose to print your page (Ctrl-P / Command-P). Select the printer “Save as PDF” and choose where to save the file, creating a neat PDF copy of your work. Be aware that some articles may not look quite the same if you choose to print, and interactive features won’t translate well to a static format.
  • Print to actual paper, if you’re into that kind of thing. If you’re not entirely convinced by all thse new-fangled digital storage options, there’s always paper. Obviously your work will lose all those interactive features like scrolling and clicking, and the stylesheets might not come out right, but your paper copies may well outlast your hard drive.

Please feel free to share this post with anyone you think could use a personal archive of their own. Happy saving!

Disrespect des fonds! ✊ (or, Five things I learned from the NSLA digipres forum)

This week I went to the NSLA forum on day-to-day digital collecting and preservation, which began auspiciously enough:

The forum was an illuminating experience. I got a lot out of the event, including useful tips and programs I can incorporate into my workflow, and took so many notes I ran out of notebook! The below are my personal thoughts and observations of the event, which do not represent my employer (shout at me, not at them).

Reality isn’t keeping up with my user expectations and professional aspirations. When I first landed a library job (not the job I have now), I harboured grand dreams of preserving digital artefacts on a workplace’s asset management system, creating intricate descriptions of said digital artefacts, and excitedly sharing this knowledge with library users. I wound up being a shelver, but that’s not the point. The point is that I’m still dreaming. I keep thinking libraries are far more advanced, digitally speaking, than where we actually are. Librarians, as a profession, struggle to accept the idea that society has moved on without us. Digital preservation is seemingly no exception.

It was refreshing to hear at this forum that people were once scared of digital. Scared for their jobs. Scared of new, ~uncontrolled~ sources of information. Scared by the idea of reimagining and reinventing their place within libraries and their library’s place within society. Plenty of people still think like this, but you’ll never hear them admit it.

Please don’t get me wrong—there’s a lot of innovation in this sector, incredible work by passionate people with limited resources. I was very impressed by several presentations showcasing new, systemic ways of appraising, preserving and delivering digital content. I just… kinda thought we had them already. Are my expectations too high, or are our standards too low?

Linear archival theory is doing the digital world, and our attempts to capture it, a great disservice. Archival theory is built on the foundational ideas of ‘original order’, ‘provenance’ and ‘respect des fonds’ (i.e. an appreciation of a record’s context and intended purpose). Now, I’m not an archivist, nor do I play one on television. But it isn’t hard to see where, in a digital world, these core archival concepts might start to fall down a bit.

Archivists (and librarians, for the most part) are used to thinking in linear terms. Boxed collections are measured in linear metres of shelf space, our finding aids are (by and large) designed to be read from top to bottom, and a manuscript item can only be in one folder at once. Linear thinking. Paper-based thinking. Ordered thinking.

Our digital universe doesn’t work like this. Disks can be read in any order. Hypertext lets us explore information in many dimensions. We have become random-access thinkers and, by extension, random-access hoarders. Archival concepts must accommodate these ways of thinking—not ‘disordered’, just ordered in other ways. We were invited to ‘disrespect des fonds’, and I think it’s a smashing idea. It’s time to think differently. To accommodate non-linear ideas of what constitutes ‘original order’ and what digital and intellectual context may shape the fonds of the future. Spatial thinking. Byte-based thinking. Still ordered thinking.

Jefferson Bailey wrote a wonderfully in-depth essay on disrespecting the fonds in 2013, and I was reminded of it several times during this forum. It’s well worth a read.

Systems can’t do digital preservation. Only you can. My workplace don’t have the luxury of a digital preservation system (yet) and our current digipres practice is extremely haphazard and conducted on a needs basis by… me. Eek. There’s no denying a system that takes care of basic fixity and AIP arrangement would make my life a lot easier. But that system still wouldn’t do my job for me. Systems can’t select or appraise. They can’t negotiate rights agreements with donors or keep themselves well fed with storage space. They don’t have an appreciation of strategic priorities or nuances of analytical metadata (subject headings and the like). That’s what I’m for. It’s important not to lose sight of the role of humans in what is (for those with the means) an increasingly automated process.

It’s also crucial for small- and medium-sized memory organisations, who will never have the resources enjoyed by NSLA members, to know that they don’t need a fancy system to preserve their digital heritage. So much digital preservation discussion is conducted in arcane, highly technical language, intelligible only to a small subset of information professionals. In order for digipres to gain any traction, it needs to be accessible by less skilled librarians, and even by non-professional library workers. I want the volunteers at the Woop Woop Historical Society, whose tech knowledge may extend only to sending emails and posting pics of the grandchildren on Facebook, to have an understanding of the basics of digipres and to be able to implement them. Distilling our communal knowledge down to this level promises to be almost as difficult as the process of preservation itself. But it’s vital work, and it can’t wait.

I have a lot of skills, knowledge and enthusiasm to bring to digital preservation. I didn’t present at the forum on account of a) a bad case of imposter syndrome and b) my workplace not having a whole lot to report in this area. I am also still a MIS student (yes! still!), am in a role where digipres is not explicitly part of my job description, and was almost certainly the youngest person in the room. All of those things worked together to convince me that I didn’t have anything worth saying.

However, I realised during the talks and discussions that far from being “just” a student, or “just” a local history librarian, or “just” a young’un, I actually have a lot to bring to the table:

  • I understand the broad lifecycle of digital preservation, from file creation to donation to fixity to ingest to preservation to access, and spend a lot of time contemplating the philosophy of what we do
  • I can catalogue, which I wasn’t expecting to be all that relevant to digipres, but it sounds like digitally-literate cataloguers are a rare breed, and
  • I can also learn quickly and methodically, such as last week when I successfully (and independently!) imaged and preserved a CD with BitCurator, for use by some student researchers. I learned how to do this via someone else’s notes from last year’s NSLA Digital Skills event, which I didn’t attend on account of being a shelver elsewhere.

Moreover, I’d like to think I know how much I don’t know; that is, there’s so much more for us as digipres practitioners to discover as well as learn from each other, and we can’t stop to even think that we know it all. It helped me gain a little self-esteem and reassure me that Australian digipres isn’t already full of people who have all the answers.

We can’t wait for everyone to get comfortable. Optical media won’t stop rotting while we learn how to deal with it. Film stocks won’t stop drowning in their own vinegar while we figure out what to do. Obscure file formats won’t give up their secrets of their own volition while we’re trying to nut them out. These problems are only going to get worse, irrespective of how quickly we as practitioners get our heads around them. Many of us are still grappling with digital preservation. Grappling. We’re still at the beginner stage.

There’s a very fine line between making people feel bad about the speed and scale of their own digipres programs, or about their personal knowledge, and encouraging them to keep looking to the horizon and recognise how far we all have to go. I say all this not to shame people, as I too am a beginner, but to express a broader worry about our ability as library employees to recognise and respond to digital change. By the sounds of it, some of our institutions are better at this than others.

In any case, I’d better get to work. I still need that floppy drive I’ve been dreaming about.

Further reading

Jefferson Bailey, Disrespect des Fonds: Rethinking Arrangement and Description in Born-Digital Archives (2013 article in Archive Journal)

Trevor Owens, Theory and Craft of Digital Preservation (preprint: monograph coming 2018)

There will be no GLAM 3017, because we will all be dead

I try not to think about where humanity might be in a thousand years. Based on our current trajectory, the most likely answer is ‘extinct’. Our current rate of consumption and pollution is not sustainable for anywhere near that length of time. When resources run out, there will inevitably be fierce wars over what little is left. Civilisation will end one of two ways: with a bang, or a whimper.

When we are all gone, we will leave behind an unfathomable amount of stuff. Priceless treasures representing the pinnacle of humanity, through personal possessions and records of ordinary people, to mountains of rubbish and items of no assigned value. All of this stuff will begin to degrade. Bespoke climate-controlled environments will no longer protect precious materials; our natural environment will likely not be conducive to long-term preservation, either. It is inevitable great works will be lost.

I’ve had Abby Smith Rumsey’s When We Are No More on my to-read pile for several months. I won’t get it read anytime soon, sadly, but her book touches on similar themes. Rumsey appears more optimistic than me; her book explores how people a thousand years from now will remember the early 21st century. I can’t help but admire her belief that humanity will exist at all.

This is a pessimistic worldview, to be sure. After all, modern capitalism is predicated on people buying stuff, which is in turn predicated on the constant production of stuff. Increasingly this ‘stuff’ is made from non-renewable materials, and sooner or later those materials will run out. Capitalism presents no incentive to preserve our scarce resources, because if a resource remains in the ground then less (or no) money can be made from it. The only real hope of changing this state of affairs lies in revolution, and that won’t be popular.

If, by some miracle, homo sapiens survives to 3017, it will not be a pleasant world. With the exhaustion of mineral resources will come a need to recycle or perish. If our choice becomes book-burning or starvation (we’ve all seen that scene in The Day After Tomorrow, right?), I doubt many would pick the latter. Technology will not save us. Our electronic memory will be irretrievable, our physical memory decayed if not destroyed. Perhaps our surviving collective descendants will despair at our modern habits of storing vast amounts of information on fragile pieces of metal and plastic, which require significant infrastructure to be accessed and read. A book (which, to be fair, we are also producing plenty of) requires nothing but a pair of functioning eyeballs.

I’d really like to believe that our species will survive, but nothing so far has convinced me. Knowledge and memory—and the externalisation thereof—are uniquely human traits. Without people to inhabit library buildings, without people to read books, without people to create and disseminate knowledge… our planet will be truly devoid.

Then again, we live in a time of information abundance, and look where it’s gotten us. Perhaps we’re reaping what we sow.

I’m a document hipster. I only write in sustainable plaintext

No-one really needs three different word processing programs.

Yet that’s the situation I’m currently in. My six-year-old MacBook Pro is on its last legs and I’m desperately trying to eke out as much free storage space and processing power as possible, meaning a bit of spring cleaning is in order. Unfortunately for me, I’ve amassed text-based documents in (among others) .docx, .pages and .odt formats. Office for Mac has only recently added support for Open Document formats and I’m reluctant to get rid of LibreOffice, the originating program.

From a digital preservation perspective, my Documents folder is a mess. Converting all of these into more sustainable formats will, I’ve decided, be Future Alissa’s problem. But that doesn’t mean I need to keep living an unhealthy document lifestyle.

Instead, I’ve decided to try out one of the more intriguing lessons on The Programming Historian: ‘Sustainable Authorship in Plain Text using Pandoc and Markdown’. Any document that I would normally write in Pages will instead be written in a plaintext editor using Markdown and typeset in .docx or .pdf using Pandoc. I’ve been using Markdown for a while to write these blog posts, but Pandoc is a new experience.

Briefly, Markdown is a text markup language that is intended to be human-readable in a way HTML isn’t. Pandoc is a command-line program to convert one markup format into another, such as HTML to .docx (which at heart is an XML format). The primary benefit is that the manuscript (which is a plaintext .md file), will never need specialised word processing software to read and will remain intelligible to human eyes. Additional information that would otherwise be incorporated into a .docx or .pages file, such as bibliographic data and footnote stylesheets, is saved separately. These are also plaintext and easily human-readable.

There are plenty of reasons to kick the word processor habit (neatly summarised in this blog post by W. Caleb McDaniel). Personally, I spend way too much time mucking around with formatting before I even begin to type. A plain-text typing environment has no such distractions, allowing me to concentrate on content. If I need to bold or italicise something, for example, I can do that in Markdown without interrupting my sentence flow.

You’d be forgiven for asking, ‘Why bother with all this, when there are easier options?’ Certainly it’s a challenge for those unfamiliar with the command line. There’s also a lot this method won’t include–complex tables, mail merge, interactive elements, et cetera. And yes, there are plenty of other distraction-free apps out there. In the long run, however, I’m looking forward to three things:
1) a more fruitful and painless typing experience
2) not wasting hours of my life converting documents from one format to another (yes, this has been known to take me hours) and
3) improving my command-line and markup skills.

What I did, briefly

After installing Pandoc, and following the Programming Historian’s instructions (though I chose to forego LaTeX and hence .pdf conversion for want of disk space), I created a nice little test .md file, incorporating images, links and footnotes, in a nice desktop plaintext editor called Atom.

Atom code

I then ran a Pandoc command in Terminal to convert the .md file to a .docx file. Disappointingly, the program did not return anything to suggest it had been successful. A quick $ ls, however, revealed the new file.

Terminal

I also converted the .md manuscript into .odt and .html, just to see what might happen and if there were any differences.

How it ended up

As it turned out, the .docx and .odt conversions were missing the footnotes and .html was missing the header (which is not standard Markdown, but rather a Pandoc extension), meaning that none of the target formats included 100% of the Markdown content. Considering I had done absolutely no styling, the .docx was surprisingly eye-catching.

MS Word output

I don’t know why parts were missing from each target file, but I plan to investigate why before using Pandoc more extensively for research work. Despite not quite getting all the output I was promised, I wasn’t dissuaded from using Markdown and Pandoc for my long-form writing. The tutorial goes into some depth on footnotes and bibliographies, which I didn’t have time to test and which might well solve my problem.

Ironically, a copy of Matthew Kirschenbaum’s Tracked Changes, a history of word processors and their effect on the art of writing, arrived at the post office while I was compiling this article. In a way, adopting Markdown and Pandoc is an effort to get back to those, uh, halcyon days of formatting-free word processing. Hopefully when I re-examine my Documents folder in a few years’ time, it will be full of plaintext files!

Dear five-year-old me: you’ll never leave school

When I was five, my teacher went around my kindergarten class asking each of us what we wanted to be when we grew up. Most of the girls, as I recall, wanted to be hairdressers. Instead I proudly proclaimed that I wanted to be the first woman on the moon. Never mind the fact my eyesight is terrible and I get motion sickness on everything that moves. I was obsessed with space and I wanted to be an astronaut.

I’m pretty sure I got laughed out of class. My mum believed in me, though.

Twenty years later, I’m comfortable with my decision not to pursue a career in astronomy. Instead, I’m a few short months away from a professional qualification in librarianship. Yet I’m increasingly pessimistic about what that qualification will do for my career prospects. Sure, an MIS will adequately prepare me for a career in cataloguing or other technical services (in the library sense of the term). But recently I’ve found my interests heading more in the direction of systems librarianship, online information provision and digital preservation. And I’m no longer convinced an MIS alone will get me a job in those fields.

Undoubtedly some of this pessimism springs from the fact I’m currently between jobs. I’m in no position to be picky about what I accept, and I’m very aware that as a new professional I’m expected to spend some time in bottom-rung jobs, grinding, until someone retires and everyone levels up. Plenty of people have their degrees and work in non-LIS fields. At least I still have a few months before I graduate.

Recently I’ve spent a fair bit of time reading Bill LeFurgy’s insightful 2011 blog post ‘What skills does a digital librarian or archivist need?‘ and browsing the websites of various digital preservation thinktanks. Combined with some valuable insight from followers on Twitter (for which many thanks!), I’ve begun mulling over what sorts of attributes I ought to have in order to make it in the digital GLAM sphere.

  • Appreciation of library and archival principles — I’m looking at my copy of Laura Millar’s ‘Archives: principles and practices‘ right now and I know I’d never be a good archivist without it. With a solid grounding in theory and framework I know that digital archiving still adheres to many of the ground rules for paper or physical archiving. This kind of thing is library school bread and butter.

  • Quickly learn new skills — this is a given in a profession fighting for its very existence. Every year more workflows move online, more material is added to (and removed from) the web, more file formats and media types are created. As new ways of research, outreach and preservation are invented, staff need to not just ‘keep up’ but actively be on top of new developments in the field. Perhaps even doing the developing themselves!

  • Be able to code in Python/PHP/Ruby/HTML/SQL/etc etc — this is where LIS programs on their own tend to fall down. Countless job adverts note their preference for a candidate who can code, but LIS students from non-STEM backgrounds (of which I am one) are likely to graduate with an awareness of current technology but no concrete coding skills. Web development is an elective at CSU, which I opted not to take on account of I can already write HTML and CSS reasonably well, but students are left to develop more technical skills on their own. I’m thrilled to have recently discovered The Programming Historian, which blends programming skills with cultural heritage corpora to make digital humanities accessible to all. People don’t go to library school to learn to code, but the world is increasingly expecting library students to acquire these skills.

  • Bridge the digital divide — by which I mean digital archivists need to be able not just to immerse themselves in this strange new digital world, but relate it back to archive users and researchers who may not be technologically literate. Self-service information provision will not be the answer for all users; some people will still need the assistance of a professional to find what they need. Sustaining the human face of digital memory institutions is essential if we still want to have jobs in ten years.

While writing this post I came across A Snapshot of a 21st-Century Librarian, a fascinating account of a research librarian’s work in an academic library. Pointedly, she mentioned taking graduate classes even as a tenure-track librarian to keep up with the changes in her field. I can easily see myself taking a similar path — whatever the MIS hasn’t taught me, I’ll need to learn elsewhere. I do, however, feel like I have a lot of catching-up to do. Five-year-old me would have been aghast at the idea of never leaving school, but then again, five-year-old me had no conception of what a digital archivist is, much less the idea that I could one day become one. Being an astronaut would have looked like a pretty safe bet.