If you’re reading this then you, like me, probably have too many books. As a professor in Hawaii I really suffer from this problem — in a privileged position in the university free, used, and discarded books just keep flowing in, and space is at a premium for non-millionaires living in Honolulu. Over the years I’ve tried various solutions to this problem: more bookshelves, judicious culling of non-essential books, and so forth. And now I’m trying a new solution: destructive book scanning.
The idea is simple: you mail off your books to a company (I’ve tried only 2 vendors and like bookscan.us the best). They slice off the spine of the book, scan the individual pages, and send you a PDF. Then they pulp the book.
I’ve also tried doing this manually — you can cut the cover off of a book with scissors, and then remove the spine with a paper cutter. Many office photocopiers (including my own) can now scan two-sided documents. Alternately, you can farm out some of this work to grad students, TAs, office assistants, or a print shop (which will de-spine your book for a buck). However, bookscan charges US$1 for the first 300 pages of a book and US$1 for every 200 pages after that — so most books will cost less than three dollars to scan. With the cost of destructive scanning so cheap it’s often a lot easier to just mail it off to a company.
At first, the act of physically destroying a book can be pretty stomach churning — especially for academics. But let’s face it: most of the books in our libraries are paperbacks that are widely available. Caught as we are in the bubble between an era of overproduction and a digital age where physical books will be unusual and valued artifacts, we are in the Golden Age of Digitization. Honestly: it’s not like there isn’t a copy of The Nuer in your library. And you probably haven’t read it lately. Why not pay a couple of bucks to dematerialize it and store it on your hard drive for the next time you need to thumb through it.
In fact, the act of choosing which books to murder is an excellent object lesson in establishing priorities. Mostly, it makes you realize how fruitless your attempts to store the long tail of academic knowledge in your office really are: I have dozens and dozens of books that I might need ‘some day’ for vaguely-imagined projects or interests. Sometimes asking yourself if yo want to pay two dollars to mail it and have it digitized makes you realize that you wouldn’t pay ten cents for the book — and in that case, you probably don’t need to have it in your library.
Digitizing books also makes you realize how important it is to curate your own personal library. We book hoarders have a strong desire to acquire everything in sight. But there’s something sickening about loading up your computer with masses of random-ass books that you don’t really care about and will never read. Digitization is cheap, but it is just costly enough that it keeps you filtering the bottom end of your preferences.
There are still lots of books I don’t digitize. These are the ones that are in my library because it has an archival function: the rare scholarly books on PNG, old 90s fanzines, and things like that. Beaming my library up has made me realize how important it is to preserve these books, and forced me to decide just how many preservable books I have in my library.
Moving on to the pragmatics of digitization, I get my digitized books dumped into a Dropbox folder the second they are done. They are in PDF format, in pretty good quality — even the illustrations turn out reasonably well. Of course, I am not an illustrations person, so if I was an art historian or something I’d probably want something much more specialized.
You can get scanners to OCR your books for you, but in my experience the extra cost is not worth it. OCR is one of those things where you get what you pay for and Acrobat will do the job as well as what they use, so you might as well do it yourself. Unfortunately, it turns out that that job is not very good. Although actually it depends very much on the quality of the original. Now when I see a discarded book and take a look at it I think “hmmm how will this OCR”. Overall, though, the OCR produces text that is not easily and fluidly readable on its own. You can search it, more or less, but in general, I OCR and convert the page images to JBIG2 format. The result is two PDFs: One archival original scan, the second is smaller reading copy that I can annotate on my iPad.
One downside of this procedure is that it doesn’t produce a document that can be used in any common ebook format. The OCR just isn’t good enough — and that’s not something that is going to change for average consumers like myself. These digitized books are not ready to read on the average smart phone. Often I end up with a choice between a free paper copy of a book it will cost three dollars to digitize, and a five dollar kindle copy of that same book. Sometimes its worth buying the ebook just because you figure being able to read the book on your phone.
This procedure takes time — media mail is the only sane way to send lots of books through the post, but its slow. Also, the company takes a while to turn around orders, especially large ones. So it is best to think of digitization as a slower process that is constantly going on in the background. You mail off a bunch of books you might someday want to read, and a month or two later they suddenly appear on your computer. I try not to look at what they are, so when I scroll through them later on I can be pleasantly surprised that someone has picked such a lovely set of books for me to read whenever I like.
To store the books I use a special program on the Macintosh called the ‘finder’. It runs constantly in the background on all machines running OS X, and allows you to create ‘folders’ where I store the PDFs. Each folder has an author’s name and all that author’s books go in there. Fancy, huh? I am sure that there are incredibly complicated ways to file these PDFs but I want to spend my free time reading, not filing, so I’ve chosen a simple and efficient method to organize them.
So far I’ve scanned about 100 books. These days I just have an open cardboard box in my office, and every time I get a ‘someday’ book I toss it in the box until the box is full, then I tape it up and mail it off. It ends up being sixty to seventy dollars a box to mail and scan them, which usually happens once or twice a month — It’s noticeable on the budget, but it’s less than I spend on coffee every week. Overall I’m very happy to have a growing collection of PDFs on my computer that I get to read whenever and wherever I want. Having them digitized has gotten me to read more than I have before. There are improvements to be made in scanning and OCR, but I’m sure those will come. At the moment I’m content to look for more free discarded books lurking unloved in the corridors of my department…