A #digipres reading list for the total beginner

This is part of an an occasional series, “Digital Preservation For the Rest of Us”.

Sorry, Kassi, I know I said I’d post this days ago!

If you’re a digital preservation beginner, you might be looking for a great resource to help you catch up on where the sector is at. This brief post will include a few choice books and other resources for digipres beginners. They’re in no particular order, and are totally my own opinions.

For the complete beginner, it’s hard to go past the Digital Preservation Handbook, hosted by the Digital Preservation Coalition. It provides lots of accessible, non-technical introductions to the topic, as well as lots of videos, task lists and links to other resources. Have a read of the ‘Digital Preservation Briefing’‘ if you need a gentle introduction.

For a holistic view of digital preservation, I can’t go past The Theory and Craft of Digital Preservation by Trevor Owens. The preprint is on LISSA right now, with the monograph due out in early 2018. It does a magnificent job of explaining not just the nuts and bolts of digipres, but the underlying philosophy and theory that informs our practice. I’ve been recommending this since the day the preprint went up, and I fully expect this will be a widely-used textbook for students in the field.

If you’re near a print library or repository of some kind, you probably want a few things from this pile:

In particular, I recommend Practical Digital Preservation: a how-to guide for organizations of any size by Adrian Brown (full of firm, practical advice), Is Digital Different? edited by Michael Moss, Barbara Endicott-Popovsky and Marc J. Dupuis (hint: yes) and, if you’re new to archives and preservation in general, Archives: principles and practices by Laura Millar (I have the 1st edition, but I hear the 2nd is even better).

Due out in March next year is the third edition of Preserving Digital Materials by Ross Harvey and Jaye Weatherburn. Both Australian authors (woo!), the book promises to be a one-stop shop for digital preservation practitioners. I’ll definitely be getting a copy of this when it comes out.

Re-collection: Art, New Media and Social Memory by Richard Rinehart and Jon Ippolito examines the topic from a curatorial perspective, which may be more accessible to those with museum or gallery backgrounds. I admit I haven’t read this myself, so I’m recommending it sight unseen, but the authors definitely know their stuff.

Finally, for a light-hearted look at the access side of digital preservation, have a look at ‘Accessing born-digital content: a look at the challenges of born-digital content in our collection’ by the NLA’s Gareth Kay. It’s a nice illustration of why digital preservation matters—works will be lost forever if they’re not preserved!

I hope this list is a useful one! Let me know if I missed any good resources 🙂

Digital archiving for journalists and writers

This post is part of an occasional series, “Digital Preservation For the Rest of Us”.

Don’t let it happen to you. (Picture courtesy Pixabay.com, CC-0)

Background

Ever heard the saying ‘the internet is forever’? Well, I’ve got good news and bad news. The internet does retain a staggeringly huge amount of information, but it doesn’t always last.

In the last couple of days we’ve heard about the abrupt shutdown of news organisations DNAinfo and Gothamist, with the sites being summarily yanked off the internet. Within hours, people realised that if those sites were gone for good, journalists and other contributors would have no way of verifying their work history, and years of valuable local journalism could be lost.

It followed the ABC’s recent decision to remove a few years’ worth of At the Movies videos as part of a transition of older websites for programs that have ceased broadcasting. Researchers were horrified by the idea that the ABC could simply ‘erase history’ by removing content from the public internet. Many commented on the avalanche of link rot the ABC had created.

While the At the Movies website was archived by the NLA’s Pandora service, the videos themselves were not archived (presumably for space and technical reasons). The ABC have also publicly stated they intend to move older video content from past shows to a better online archive. Compare that with Gothamist, which has found itself at the mercy of the Internet Archive and cached Google search results. A fair amount of content had been saved to the Internet Archive, but there are likely still gaps. It also highlighted how many people weren’t keeping personal archives of their work.

Key lessons

The internet is not your archive. I can’t emphasise this enough. The public internet is not—and was never designed to be—a permanent archive. Websites can be put up or taken down at a moment’s notice. Just because something is online right now, doesn’t mean it will still be online tomorrow, or next week, or next year. We can’t expect corporations and private organisations to archive their published work in perpetuity and have it be the only copy. That’s what libraries and archives are for. (Libraries around the world undertake national web archiving programs, incuding the NLA and the Library of Congress, but they can’t collect everything, and most can only collect material published or produced in their country.)

You cannot rely on others to archive your work. You will need to do this yourself. The best way to capture content in perpetuity, whether it’s physical or virtual, is with a mix of public and private archiving. That is, with archival tools and collecting policies controlled by public entities, by private entities, and by you personally. If one fails, the other two should persist. If all three fail, you’ve probably got bigger things to worry about.

How to archive your online articles

Here’s a selection of free tools to help you capture and archive your digital content.

  • Save to Evernote. Evernote is a free cloud-based notes app for every platform you’d care to name. It’s good for notetaking, but the killer feature is its Web Clipper extension, the ability to scrape web pages and save them straight to a note. I use this religiously to keep all my internet detritus in one place, but you can use this to save copies of your online work.
  • Add to the Internet Archive. The Internet Archive, perhaps the most well-known digital archive, incorporates the Wayback Machine, a privately-run web archiving service hoovering up the web since 1996. You can add individual pages to the Archive in several ways, including by copying and pasting a URL into this page, or by using a clipping extension (available for Chrome, Safari and Firefox, with apps available for iOS and Android). The extension will also detect dead pages or 404s and offer to take you to an archived version of that page, which is an incredibly useful tool.
The Internet Archive web clipper. (Screenshot via Chrome clipper)
  • Create a personal web archive with Webrecorder. Webrecorder is an amazing web archiving tool built by Rhizome. You can navigate to the pages you wish to save, creating a personalied set of archived pages. You can then download this set to your computer, view it with the accompanying Webrecorder desktop app, and—this is the best bit—the pages behave exactly as they did when you saved them! Video, animations, dynamic pages—they all work (this isn’t always the case with the Wayback Machine). Great for multimedia artists and people who wish to browse their archived work in its natural habitat.
  • Use Save My News. Save My News, a nifty little service brought to you by Ben Welsh, combines the cloud storage of the Internet Archive with the handy custom lists of Evernote or Webrecorder. Simply login with Twitter, copy and paste a URL, and bam! Instantly saved in the Wayback Machine, neatly arranged in a list for your reference. So simple, even your dog could do it.
The Save My News interface. (Screenshot via http://www.savemy.news/)
  • Print articles to PDF. In a browser, simply choose to print your page (Ctrl-P / Command-P). Select the printer “Save as PDF” and choose where to save the file, creating a neat PDF copy of your work. Be aware that some articles may not look quite the same if you choose to print, and interactive features won’t translate well to a static format.
  • Print to actual paper, if you’re into that kind of thing. If you’re not entirely convinced by all thse new-fangled digital storage options, there’s always paper. Obviously your work will lose all those interactive features like scrolling and clicking, and the stylesheets might not come out right, but your paper copies may well outlast your hard drive.

Please feel free to share this post with anyone you think could use a personal archive of their own. Happy saving!

Disrespect des fonds! ✊🏻 (or, Five things I learned from the NSLA digipres forum)

This week I went to the NSLA forum on day-to-day digital collecting and preservation, which began auspiciously enough:

The forum was an illuminating experience. I got a lot out of the event, including useful tips and programs I can incorporate into my workflow, and took so many notes I ran out of notebook! The below are my personal thoughts and observations of the event, which do not represent my employer (shout at me, not at them).

Reality isn’t keeping up with my user expectations and professional aspirations. When I first landed a library job (not the job I have now), I harboured grand dreams of preserving digital artefacts on a workplace’s asset management system, creating intricate descriptions of said digital artefacts, and excitedly sharing this knowledge with library users. I wound up being a shelver, but that’s not the point. The point is that I’m still dreaming. I keep thinking libraries are far more advanced, digitally speaking, than where we actually are. Librarians, as a profession, struggle to accept the idea that society has moved on without us. Digital preservation is seemingly no exception.

It was refreshing to hear at this forum that people were once scared of digital. Scared for their jobs. Scared of new, ~uncontrolled~ sources of information. Scared by the idea of reimagining and reinventing their place within libraries and their library’s place within society. Plenty of people still think like this, but you’ll never hear them admit it.

Please don’t get me wrong—there’s a lot of innovation in this sector, incredible work by passionate people with limited resources. I was very impressed by several presentations showcasing new, systemic ways of appraising, preserving and delivering digital content. I just… kinda thought we had them already. Are my expectations too high, or are our standards too low?

Linear archival theory is doing the digital world, and our attempts to capture it, a great disservice. Archival theory is built on the foundational ideas of ‘original order’, ‘provenance’ and ‘respect des fonds’ (i.e. an appreciation of a record’s context and intended purpose). Now, I’m not an archivist, nor do I play one on television. But it isn’t hard to see where, in a digital world, these core archival concepts might start to fall down a bit.

Archivists (and librarians, for the most part) are used to thinking in linear terms. Boxed collections are measured in linear metres of shelf space, our finding aids are (by and large) designed to be read from top to bottom, and a manuscript item can only be in one folder at once. Linear thinking. Paper-based thinking. Ordered thinking.

Our digital universe doesn’t work like this. Disks can be read in any order. Hypertext lets us explore information in many dimensions. We have become random-access thinkers and, by extension, random-access hoarders. Archival concepts must accommodate these ways of thinking—not ‘disordered’, just ordered in other ways. We were invited to ‘disrespect des fonds’, and I think it’s a smashing idea. It’s time to think differently. To accommodate non-linear ideas of what constitutes ‘original order’ and what digital and intellectual context may shape the fonds of the future. Spatial thinking. Byte-based thinking. Still ordered thinking.

Jefferson Bailey wrote a wonderfully in-depth essay on disrespecting the fonds in 2013, and I was reminded of it several times during this forum. It’s well worth a read.

Systems can’t do digital preservation. Only you can. My workplace don’t have the luxury of a digital preservation system (yet) and our current digipres practice is extremely haphazard and conducted on a needs basis by… me. Eek. There’s no denying a system that takes care of basic fixity and AIP arrangement would make my life a lot easier. But that system still wouldn’t do my job for me. Systems can’t select or appraise. They can’t negotiate rights agreements with donors or keep themselves well fed with storage space. They don’t have an appreciation of strategic priorities or nuances of analytical metadata (subject headings and the like). That’s what I’m for. It’s important not to lose sight of the role of humans in what is (for those with the means) an increasingly automated process.

It’s also crucial for small- and medium-sized memory organisations, who will never have the resources enjoyed by NSLA members, to know that they don’t need a fancy system to preserve their digital heritage. So much digital preservation discussion is conducted in arcane, highly technical language, intelligible only to a small subset of information professionals. In order for digipres to gain any traction, it needs to be accessible by less skilled librarians, and even by non-professional library workers. I want the volunteers at the Woop Woop Historical Society, whose tech knowledge may extend only to sending emails and posting pics of the grandchildren on Facebook, to have an understanding of the basics of digipres and to be able to implement them. Distilling our communal knowledge down to this level promises to be almost as difficult as the process of preservation itself. But it’s vital work, and it can’t wait.

I have a lot of skills, knowledge and enthusiasm to bring to digital preservation. I didn’t present at the forum on account of a) a bad case of imposter syndrome and b) my workplace not having a whole lot to report in this area. I am also still a MIS student (yes! still!), am in a role where digipres is not explicitly part of my job description, and was almost certainly the youngest person in the room. All of those things worked together to convince me that I didn’t have anything worth saying.

However, I realised during the talks and discussions that far from being “just” a student, or “just” a local history librarian, or “just” a young’un, I actually have a lot to bring to the table:

  • I understand the broad lifecycle of digital preservation, from file creation to donation to fixity to ingest to preservation to access, and spend a lot of time contemplating the philosophy of what we do
  • I can catalogue, which I wasn’t expecting to be all that relevant to digipres, but it sounds like digitally-literate cataloguers are a rare breed, and
  • I can also learn quickly and methodically, such as last week when I successfully (and independently!) imaged and preserved a CD with BitCurator, for use by some student researchers. I learned how to do this via someone else’s notes from last year’s NSLA Digital Skills event, which I didn’t attend on account of being a shelver elsewhere.

Moreover, I’d like to think I know how much I don’t know; that is, there’s so much more for us as digipres practitioners to discover as well as learn from each other, and we can’t stop to even think that we know it all. It helped me gain a little self-esteem and reassure me that Australian digipres isn’t already full of people who have all the answers.

We can’t wait for everyone to get comfortable. Optical media won’t stop rotting while we learn how to deal with it. Film stocks won’t stop drowning in their own vinegar while we figure out what to do. Obscure file formats won’t give up their secrets of their own volition while we’re trying to nut them out. These problems are only going to get worse, irrespective of how quickly we as practitioners get our heads around them. Many of us are still grappling with digital preservation. Grappling. We’re still at the beginner stage.

There’s a very fine line between making people feel bad about the speed and scale of their own digipres programs, or about their personal knowledge, and encouraging them to keep looking to the horizon and recognise how far we all have to go. I say all this not to shame people, as I too am a beginner, but to express a broader worry about our ability as library employees to recognise and respond to digital change. By the sounds of it, some of our institutions are better at this than others.

In any case, I’d better get to work. I still need that floppy drive I’ve been dreaming about.

Further reading

Jefferson Bailey, Disrespect des Fonds: Rethinking Arrangement and Description in Born-Digital Archives (2013 article in Archive Journal)

Trevor Owens, Theory and Craft of Digital Preservation (preprint: monograph coming 2018)

There will be no GLAM 3017, because we will all be dead

I try not to think about where humanity might be in a thousand years. Based on our current trajectory, the most likely answer is ‘extinct’. Our current rate of consumption and pollution is not sustainable for anywhere near that length of time. When resources run out, there will inevitably be fierce wars over what little is left. Civilisation will end one of two ways: with a bang, or a whimper.

When we are all gone, we will leave behind an unfathomable amount of stuff. Priceless treasures representing the pinnacle of humanity, through personal possessions and records of ordinary people, to mountains of rubbish and items of no assigned value. All of this stuff will begin to degrade. Bespoke climate-controlled environments will no longer protect precious materials; our natural environment will likely not be conducive to long-term preservation, either. It is inevitable great works will be lost.

I’ve had Abby Smith Rumsey’s When We Are No More on my to-read pile for several months. I won’t get it read anytime soon, sadly, but her book touches on similar themes. Rumsey appears more optimistic than me; her book explores how people a thousand years from now will remember the early 21st century. I can’t help but admire her belief that humanity will exist at all.

This is a pessimistic worldview, to be sure. After all, modern capitalism is predicated on people buying stuff, which is in turn predicated on the constant production of stuff. Increasingly this ‘stuff’ is made from non-renewable materials, and sooner or later those materials will run out. Capitalism presents no incentive to preserve our scarce resources, because if a resource remains in the ground then less (or no) money can be made from it. The only real hope of changing this state of affairs lies in revolution, and that won’t be popular.

If, by some miracle, homo sapiens survives to 3017, it will not be a pleasant world. With the exhaustion of mineral resources will come a need to recycle or perish. If our choice becomes book-burning or starvation (we’ve all seen that scene in The Day After Tomorrow, right?), I doubt many would pick the latter. Technology will not save us. Our electronic memory will be irretrievable, our physical memory decayed if not destroyed. Perhaps our surviving collective descendants will despair at our modern habits of storing vast amounts of information on fragile pieces of metal and plastic, which require significant infrastructure to be accessed and read. A book (which, to be fair, we are also producing plenty of) requires nothing but a pair of functioning eyeballs.

I’d really like to believe that our species will survive, but nothing so far has convinced me. Knowledge and memory—and the externalisation thereof—are uniquely human traits. Without people to inhabit library buildings, without people to read books, without people to create and disseminate knowledge… our planet will be truly devoid.

Then again, we live in a time of information abundance, and look where it’s gotten us. Perhaps we’re reaping what we sow.

I’m a document hipster. I only write in sustainable plaintext

No-one really needs three different word processing programs.

Yet that’s the situation I’m currently in. My six-year-old MacBook Pro is on its last legs and I’m desperately trying to eke out as much free storage space and processing power as possible, meaning a bit of spring cleaning is in order. Unfortunately for me, I’ve amassed text-based documents in (among others) .docx, .pages and .odt formats. Office for Mac has only recently added support for Open Document formats and I’m reluctant to get rid of LibreOffice, the originating program.

From a digital preservation perspective, my Documents folder is a mess. Converting all of these into more sustainable formats will, I’ve decided, be Future Alissa’s problem. But that doesn’t mean I need to keep living an unhealthy document lifestyle.

Instead, I’ve decided to try out one of the more intriguing lessons on The Programming Historian: ‘Sustainable Authorship in Plain Text using Pandoc and Markdown’. Any document that I would normally write in Pages will instead be written in a plaintext editor using Markdown and typeset in .docx or .pdf using Pandoc. I’ve been using Markdown for a while to write these blog posts, but Pandoc is a new experience.

Briefly, Markdown is a text markup language that is intended to be human-readable in a way HTML isn’t. Pandoc is a command-line program to convert one markup format into another, such as HTML to .docx (which at heart is an XML format). The primary benefit is that the manuscript (which is a plaintext .md file), will never need specialised word processing software to read and will remain intelligible to human eyes. Additional information that would otherwise be incorporated into a .docx or .pages file, such as bibliographic data and footnote stylesheets, is saved separately. These are also plaintext and easily human-readable.

There are plenty of reasons to kick the word processor habit (neatly summarised in this blog post by W. Caleb McDaniel). Personally, I spend way too much time mucking around with formatting before I even begin to type. A plain-text typing environment has no such distractions, allowing me to concentrate on content. If I need to bold or italicise something, for example, I can do that in Markdown without interrupting my sentence flow.

You’d be forgiven for asking, ‘Why bother with all this, when there are easier options?’ Certainly it’s a challenge for those unfamiliar with the command line. There’s also a lot this method won’t include–complex tables, mail merge, interactive elements, et cetera. And yes, there are plenty of other distraction-free apps out there. In the long run, however, I’m looking forward to three things:
1) a more fruitful and painless typing experience
2) not wasting hours of my life converting documents from one format to another (yes, this has been known to take me hours) and
3) improving my command-line and markup skills.

What I did, briefly

After installing Pandoc, and following the Programming Historian’s instructions (though I chose to forego LaTeX and hence .pdf conversion for want of disk space), I created a nice little test .md file, incorporating images, links and footnotes, in a nice desktop plaintext editor called Atom.

Atom code

I then ran a Pandoc command in Terminal to convert the .md file to a .docx file. Disappointingly, the program did not return anything to suggest it had been successful. A quick $ ls, however, revealed the new file.

Terminal

I also converted the .md manuscript into .odt and .html, just to see what might happen and if there were any differences.

How it ended up

As it turned out, the .docx and .odt conversions were missing the footnotes and .html was missing the header (which is not standard Markdown, but rather a Pandoc extension), meaning that none of the target formats included 100% of the Markdown content. Considering I had done absolutely no styling, the .docx was surprisingly eye-catching.

MS Word output

I don’t know why parts were missing from each target file, but I plan to investigate why before using Pandoc more extensively for research work. Despite not quite getting all the output I was promised, I wasn’t dissuaded from using Markdown and Pandoc for my long-form writing. The tutorial goes into some depth on footnotes and bibliographies, which I didn’t have time to test and which might well solve my problem.

Ironically, a copy of Matthew Kirschenbaum’s Tracked Changes, a history of word processors and their effect on the art of writing, arrived at the post office while I was compiling this article. In a way, adopting Markdown and Pandoc is an effort to get back to those, uh, halcyon days of formatting-free word processing. Hopefully when I re-examine my Documents folder in a few years’ time, it will be full of plaintext files!

Dear five-year-old me: you’ll never leave school

When I was five, my teacher went around my kindergarten class asking each of us what we wanted to be when we grew up. Most of the girls, as I recall, wanted to be hairdressers. Instead I proudly proclaimed that I wanted to be the first woman on the moon. Never mind the fact my eyesight is terrible and I get motion sickness on everything that moves. I was obsessed with space and I wanted to be an astronaut.

I’m pretty sure I got laughed out of class. My mum believed in me, though.

Twenty years later, I’m comfortable with my decision not to pursue a career in astronomy. Instead, I’m a few short months away from a professional qualification in librarianship. Yet I’m increasingly pessimistic about what that qualification will do for my career prospects. Sure, an MIS will adequately prepare me for a career in cataloguing or other technical services (in the library sense of the term). But recently I’ve found my interests heading more in the direction of systems librarianship, online information provision and digital preservation. And I’m no longer convinced an MIS alone will get me a job in those fields.

Undoubtedly some of this pessimism springs from the fact I’m currently between jobs. I’m in no position to be picky about what I accept, and I’m very aware that as a new professional I’m expected to spend some time in bottom-rung jobs, grinding, until someone retires and everyone levels up. Plenty of people have their degrees and work in non-LIS fields. At least I still have a few months before I graduate.

Recently I’ve spent a fair bit of time reading Bill LeFurgy’s insightful 2011 blog post ‘What skills does a digital librarian or archivist need?‘ and browsing the websites of various digital preservation thinktanks. Combined with some valuable insight from followers on Twitter (for which many thanks!), I’ve begun mulling over what sorts of attributes I ought to have in order to make it in the digital GLAM sphere.

  • Appreciation of library and archival principles — I’m looking at my copy of Laura Millar’s ‘Archives: principles and practices‘ right now and I know I’d never be a good archivist without it. With a solid grounding in theory and framework I know that digital archiving still adheres to many of the ground rules for paper or physical archiving. This kind of thing is library school bread and butter.

  • Quickly learn new skills — this is a given in a profession fighting for its very existence. Every year more workflows move online, more material is added to (and removed from) the web, more file formats and media types are created. As new ways of research, outreach and preservation are invented, staff need to not just ‘keep up’ but actively be on top of new developments in the field. Perhaps even doing the developing themselves!

  • Be able to code in Python/PHP/Ruby/HTML/SQL/etc etc — this is where LIS programs on their own tend to fall down. Countless job adverts note their preference for a candidate who can code, but LIS students from non-STEM backgrounds (of which I am one) are likely to graduate with an awareness of current technology but no concrete coding skills. Web development is an elective at CSU, which I opted not to take on account of I can already write HTML and CSS reasonably well, but students are left to develop more technical skills on their own. I’m thrilled to have recently discovered The Programming Historian, which blends programming skills with cultural heritage corpora to make digital humanities accessible to all. People don’t go to library school to learn to code, but the world is increasingly expecting library students to acquire these skills.

  • Bridge the digital divide — by which I mean digital archivists need to be able not just to immerse themselves in this strange new digital world, but relate it back to archive users and researchers who may not be technologically literate. Self-service information provision will not be the answer for all users; some people will still need the assistance of a professional to find what they need. Sustaining the human face of digital memory institutions is essential if we still want to have jobs in ten years.

While writing this post I came across A Snapshot of a 21st-Century Librarian, a fascinating account of a research librarian’s work in an academic library. Pointedly, she mentioned taking graduate classes even as a tenure-track librarian to keep up with the changes in her field. I can easily see myself taking a similar path — whatever the MIS hasn’t taught me, I’ll need to learn elsewhere. I do, however, feel like I have a lot of catching-up to do. Five-year-old me would have been aghast at the idea of never leaving school, but then again, five-year-old me had no conception of what a digital archivist is, much less the idea that I could one day become one. Being an astronaut would have looked like a pretty safe bet.