Destructive Scanning for Fun and Profit

If you’re reading this then you, like me, probably have too many books. As a professor in Hawaii I really suffer from this problem — in a privileged position in the university free, used, and discarded books just keep flowing in, and space is at a premium for non-millionaires living in Honolulu. Over the years I’ve tried various solutions to this problem: more bookshelves, judicious culling of non-essential books, and so forth. And now I’m trying a new solution: destructive book scanning.

The idea is simple: you mail off your books to a company (I’ve tried only 2 vendors and like bookscan.us the best). They slice off the spine of the book, scan the individual pages, and send you a PDF. Then they pulp the book.

I’ve also tried doing this manually — you can cut the cover off of a book with scissors, and then remove the spine with a paper cutter. Many office photocopiers (including my own) can now scan two-sided documents. Alternately, you can farm out some of this work to grad students, TAs, office assistants, or a print shop (which will de-spine your book for a buck). However, bookscan charges US$1 for the first 300 pages of a book and US$1 for every 200 pages after that — so most books will cost less than three dollars to scan. With the cost of destructive scanning so cheap it’s often a lot easier to just mail it off to a company.

At first, the act of physically destroying a book can be pretty stomach churning — especially for academics. But let’s face it: most of the books in our libraries are paperbacks that are widely available. Caught as we are in the bubble between an era of overproduction and a digital age where physical books will be unusual and valued artifacts, we are in the Golden Age of Digitization. Honestly: it’s not like there isn’t a copy of The Nuer in your library. And you probably haven’t read it lately. Why not pay a couple of bucks to dematerialize it and store it on your hard drive for the next time you need to thumb through it.

In fact, the act of choosing which books to murder is an excellent object lesson in establishing priorities. Mostly, it makes you realize how fruitless your attempts to store the long tail of academic knowledge in your office really are: I have dozens and dozens of books that I might need ‘some day’ for vaguely-imagined projects or interests. Sometimes asking yourself if yo want to pay two dollars to mail it and have it digitized makes you realize that you wouldn’t pay ten cents for the book — and in that case, you probably don’t need to have it in your library.

Digitizing books also makes you realize how important it is to curate your own personal library. We book hoarders have a strong desire to acquire everything in sight. But there’s something sickening about loading up your computer with masses of random-ass books that you don’t really care about and will never read. Digitization is cheap, but it is just costly enough that it keeps you filtering the bottom end of your preferences.

There are still lots of books I don’t digitize. These are the ones that are in my library because it has an archival function: the rare scholarly books on PNG, old 90s fanzines, and things like that. Beaming my library up has made me realize how important it is to preserve these books, and forced me to decide just how many preservable books I have in my library.

Moving on to the pragmatics of digitization, I get my digitized books dumped into a Dropbox folder the second they are done. They are in PDF format, in pretty good quality — even the illustrations turn out reasonably well. Of course, I am not an illustrations person, so if I was an art historian or something I’d probably want something much more specialized.

You can get scanners to OCR your books for you, but in my experience the extra cost is not worth it. OCR is one of those things where you get what you pay for and Acrobat will do the job as well as what they use, so you might as well do it yourself. Unfortunately, it turns out that that job is not very good. Although actually it depends very much on the quality of the original. Now when I see a discarded book and take a look at it I think “hmmm how will this OCR”. Overall, though, the OCR produces text that is not easily and fluidly readable on its own. You can search it, more or less, but in general, I OCR and convert the page images to JBIG2 format. The result is two PDFs: One archival original scan, the second is smaller reading copy that I can annotate on my iPad.

One downside of this procedure is that it doesn’t produce a document that can be used in any common ebook format. The OCR just isn’t good enough — and that’s not something that is going to change for average consumers like myself. These digitized books are not ready to read on the average smart phone. Often I end up with a choice between a free paper copy of a book it will cost three dollars to digitize, and a five dollar kindle copy of that same book. Sometimes its worth buying the ebook just because you figure being able to read the book on your phone.

This procedure takes time — media mail is the only sane way to send lots of books through the post, but its slow. Also, the company takes a while to turn around orders, especially large ones. So it is best to think of digitization as a slower process that is constantly going on in the background. You mail off a bunch of books you might someday want to read, and a month or two later they suddenly appear on your computer. I try not to look at what they are, so when I scroll through them later on I can be pleasantly surprised that someone has picked such a lovely set of books for me to read whenever I like.

To store the books I use a special program on the Macintosh called the ‘finder’. It runs constantly in the background on all machines running OS X, and allows you to create ‘folders’ where I store the PDFs. Each folder has an author’s name and all that author’s books go in there. Fancy, huh? I am sure that there are incredibly complicated ways to file these PDFs but I want to spend my free time reading, not filing, so I’ve chosen a simple and efficient method to organize them.

So far I’ve scanned about 100 books. These days I just have an open cardboard box in my office, and every time I get a ‘someday’ book I toss it in the box until the box is full, then I tape it up and mail it off.  It ends up being sixty to seventy dollars a box to mail and scan them, which usually happens once or twice a month — It’s noticeable on the budget, but it’s less than I spend on coffee every week. Overall I’m very happy to have a growing collection of PDFs on my computer that I get to read whenever and wherever I want. Having them digitized has gotten me to read more than I have before. There are improvements to be made in scanning and OCR, but I’m sure those will come. At the moment I’m content to look for more free discarded books lurking unloved in the corridors of my department…

Alex Golub is an associate professor of anthropology at the University of Hawai‘i at Mānoa. His book Leviathans at The Gold Mine has been published by Duke University Press. You can contact him at rex@savageminds.org

12 thoughts on “Destructive Scanning for Fun and Profit

  1. Ah, but someday the format won’t be readable, etc. etc., though perhaps digital archivists will figure out how to provide emulation apps for everyone to use. Here’s my solution to the space problem, one I used to dream about as a military brat condemned to move every three years: one or several bookmobiles! You just back the relevant one up to a convenient aperture of your house, and voila! Or: Look at Stuart Brand’s book How Buildings Learn, and learn how he built the Whole Earth Catalog inside a shipping container (now also the fave stash of the Internet Archive, as well).

  2. There are some smallish communities of people trying to put together DIY plans for non-destructive scanners, similar to the ones Google uses in their Google Books digitization. More labor-intensive, but it appears the price and complexity of building such a thing are getting into the range where a normal person could do it. Maybe non-destructive scanning will be available as a paid service in the future as well.

  3. Folks interested in destructive scanning might want to look at the horrifyingly wry fictionalization of the process in Vernor Vinge’s 2006 science fiction novel, Rainbows End, which includes the UCSD Library as a major character.

  4. To reply to Pat’s points — the format only has to stick around as long as I do, and PDF is a robust and well-supported format at least for the scale I’m working at. It would be cool to have a van labeled “A-H” on the side… if parking were cheap in Honolulu…

    There are DIY cradle scanners out there… as well as ones you can just purchase (if you have that sort of dough). They are great, but the problem is that at the end of it you haven’t destroyed the book and it is still taking up space — for me, ditching the book is the point!

    As far as I know this is all perfectly legal since all I am doing is format-switching the books I own. Unfortunately, I can’t throw them up on a course website, which would be sharing that would make everyone’s life better and is hence illegal.

    Speaking of Vinge, I wonder whether our initial fears of format obsolesence weren’t misplaced and the situation won’t be more like that in A Deepness In The Sky, where everything obsolete is there under strata and strata of more recent stuff, it’s just that only the infinitely old people know where to find it…

  5. I initiate destructive (and other) scanning as a librarian. Some of my other duties include reviewing faculty book collections. I’d simply emphasize that some books (not the paperback 15th ed. of whatever) are in fact valuable as artifacts. Not all will announce it with gilt edges or fancy bindings (though few readers consider the importance of condition in valuing books). I’d just urge a bit of mindfulness as you crate up your libraries for eventual pulping…there is obvious value in transforming these objects into virtual, weightless, pure intellectual content. Less obvious may be that some of these objects provide an invaluable window into their creation, use and cultural impact.

  6. I am very concerned about the legality of book scanning…

    If someone were to share their dropbox folder with me, then I would certainly not reciprocate by sharing my dropbox folder with them. And I would not buy that person a drink at the AAA.

  7. Dan — I hear you on this one. I actually hang on to good copies of, say, Doug Oliver’s “Sutides in the Anthropology of Bougainville” which is an old number of Papers of the Peabody Museum, even though these sorts of things are being digitized by museums. Anything printed _in_ Papua New Guinea stays physical. But my remaindered cloth copy of “What Hath God Wrought” is now happily made up of zeroes and ones. I’m trusting I’m enough of a bibliophile to separate the wheat from the chaff on this one.

    One thing that strikes me in the decade and a half or so I’ve had money to buy books is the way the old paperbacks of social science classics have disappeared — all those compact editions of William Robertson Smith and Durkheim that were printed for baby boomers to buy in their Intro to Soc classes. I remember a time when I thought they would be ubiquitous forever. Someone should do a small show in a library exhibition space of ‘the golden age of social science education through its paperbacks’.

  8. We use Scan Tailor to clean pdf files that have been badly scanned. It’s a free program that deskews and splits pages, and cleans darkened edges of the digital pages.

    You need to export the pdf file that you want to fix into tiff image files using Adobe. You can then work with the tiff files in Scan Tailor and ultimately re-combine the cleaned tiff files into a single cleaned pdf. It usually does a fantastic job. (We have found it will not work well with pdfs so poorly scanned that the pages are warped or with pdfs where dark edges cover parts of text.)

  9. Thank you for sharing this post. Just last week I was doing some serious deep cleaning of my office and ended up with a medium-sized box of books that I didn’t know what to do with. I kind of wanted them, and I kind of wanted them to disappear. Your post has given me the answer. I can keep them *and* make them disappear for a very reasonable price. While I am still a bit ambivalent about destroying the books, like all good American consumers at least I can outsource the dirty work. I am looking forward to giving it a try.

  10. I moved to Indonesia for fieldwork last December with several papers outstanding, but couldn’t bring all the books I needed. I have a high-speed document scanner (Fujitsu ScanSnap), but it only processes loose sheets. I disbound my entire anthropology library with a surgical scalpel, scanned, PDFed, and OCRed them. THe books remain with my parents, in storage, held together with binder clips. It was amazing, however, to discover how strong the taboo against altering books is. Many friends and colleagues were upset that I was “destroying books,” even though they were *my books*, and by scanning them, I’ve ensured that I and others will be able to access these books (many now rare or expenseive) long into the future.

Comments are closed.