[This is the final post in a three-part series on archiving and sharing fieldwork data.]
Lisa Cliggett: How can we archive all this data?
Two years ago, I worked with Lisa Cliggett on an NSF-sponsored project to curate 60 years of anthropology projects in the Gwembe Tonga region of Zambia, a complex pilot project that involved anthropologists, campus IT, librarians, and a gullible library school student then-willing to work for free (me!). We experimented with ways to curate Lisa’s field records in a digital library using Greenstone and Drupal. Our goal was a small teaching archive that undergraduates could use to better understand the processes involved in fieldwork–something that could be built into a larger archive over time.
This comes from Cliggett’s long-standing interest in preserving qualitative research. As she’s covered, there is a profound risk of data loss if we don’t find ways to share anthropological data and archive our fieldnotes as anthropologists. As she explains, thinking carefully through our archiving practices is important:
“As early as 1999, a colleague and I experimented with digitizing. . . a portion of Elizabeth Colson’s field notes in order to explore possibilities for creating a fully digital qualitative database. . . We saved files in an OCR format, storing them on “the standard” of the time – a 3.5 inch floppy disk. . . Now, 13 years later, we have a shoebox of 3.5 inch disks with files saved in 1990s proprietary software. Surely we could find technicians to free those files from their fossilized form, but it would require determination, time, and funding” (Cliggett 2013, p. 6).
So there’s a tension running throughout these last few posts: Dr. Bernson’s paper documentation could easily be lost, and Kristin Ghodsee’s sensitive research materials shouldn’t yet be openly shared—yet Lisa Cliggett’s earliest attempts to preserve historic field records also didn’t result in secure and accessible digital files.
Tips on preserving and sharing ethnographic source materials
This final set of tips, then, relates to how—and what—we can to do best document and share our field materials with other researchers, including the limits we might place on sensitive information and how we could later make that accessible to other scholars or to the descendants of original participants. Some suggestions:
Choose durable formats. Save your digital records in “open” file formats that are not owned by any particular corporation. This ensure that your files can be accessed by future scholars. For instance, storing in rich text (.rtf) instead of Word files (.doc) makes documents easier to analyze in Atlas.ti or NVIVO, as well as accessible to future researchers even if Microsoft goes out of business.
Use coding software with care. Most commercial qualitative coding software, such as MaxQDA, NVIVO or Atlas.ti, does not let you export your coding system into an open format that can be archived or imported into other programs. This is a huge concern, because if we can’t share our coding with future researchers, our perceptions and context for our notes may not be available. Before licensing any of this software, I recommend that you talk with vendors and ask that they allow the ability to fully export your codes in an open format like XML, one that can be imported to another program or stored in a long-term archive.
Code in open formats. Given that commercial coding software does not yet support data sharing, your easiest open may be to code within a text, using #hashtags or other in-text notations that could be read in any software or printout.
Get informed consent for archiving. If doing formal interviews, you can include language on an IRB consent form that lets participants indicate if they are willing to have anonymized versions of their interview stored in a secure data archive like Michigan’s ICPSR. Click here for a sample informed consent sheet that has participants choose whether to have their interview anonymized and shared with future researchers. Such consent is best given for clear records like one-off interviews or surveys.
Remove direct identifiers. If you are archiving a subset of your research to be accessed by other scholars or students, remove “direct identifiers” (name, location, family ties) from the text. Michigan’s ICPSR data archive is the best developed social science digital archive, and it requires that you strip identifying data from interviews before depositing them. Microsoft Word’s “find and replace” may be your friend here; have a student or colleague look over the materials as well.
Store identifying data in a restricted archive. If you have historical or contextual reasons for wanting to keep ‘direct identifiers’ within a set of field documents, you may be able to archive ‘restricted data‘ with ICPSR. This would require that later researchers get IRB approval before accessing and using your field data.
Embargo sensitive data. Are the above two points making you nervous? Me too, and that’s why I’m working in this area. Qualitative data archives are still very experimental; we can’t always share current videos, images, or texts. Our records, being deeply implicated in community and people’s lives, have enough details to easily identify others, even with changed names or places. Many ethnographic source notes should be embargoed, limiting access for 50 or 100 years. This balances the usefulness of our records to future scholars against the risks of current exposure.
Document your field documents. Because funders like the NSF are often the ones asking us to manage qualitative records, their grants should cover the costs of ‘documenting’ any project data that you plan to share. Student assistants can be tasked to add ‘metadata’ (tags, codes, context) to each document. Use of standard labels (a “controlled vocabulary”) for place, language, or authors can help make your project easier to find in a larger database or archive.
Create finding aids. Let others know what’s out there. In libraries and archives, a finding aid is a sort of abstract for a set of records, listing their topics, regions, persons, or content. For instance, I’ve collected notes and interviews on topics like:
- Multi-level marketing in Central Asia
- Kazakh and Kyrgyz names and naming practices
- Democratic elections in Mongolia
- Missionaries in Central Asia
Finding ways to share when we have more information on both published and unpublished topics could let other ethnographers know what prior projects might have aspects that could be available or reused.
Consider data reuse contracts. Much as non-disclosure contracts can make it clear that field assistants shouldn’t write up results without you, a reuse contract can clarify the terms under which you share your notes with other researchers. This could include your right to check results for identifying information, or the need for other researchers to abide by certain ethical standards before building on your work.
Support the AAA data registry. The AAA is already working with archivists and librarians to build an Anthropological Data Registry, which currently hosts information about 52 anthropological datasets and archival collections. This is based on an older CoPAR list of where physical fieldnotes are archived. If you know of any other physical or online archives of prior anthropological research materials, share that in the data registry!
Talk to a research librarian.If this is overwhelming or threatening, don’t despair! These are complicated issues that librarians and anthropologists are working together on. Send a quick note to your librarian or archivist now, while you’re thinking about it. Ask to talk about archiving or data sharing options at your institution. Librarians are attuned to these kinds of concerns, and can help you or find someone else who can.
All in all, I hope this is inspiring you to look at some of your field documents and see how you could archive or share them. And once again, if you’ve experimented in any of these areas, do share your experiences or interests in the comments.
Celia, thank you for this. Lots of great advice here. May I ask a specific, personal question? Notes from my dissertation fieldwork in 1969 were manually typed on very thin paper. The notes are faded and the paper is fragile. Do you know of services that could scan the pages and recover the text?
Hi John! It’s a great question. My first thought is that if your notes are unbound, you could gently scan them on a flatbed scanner and then use OCR text recognition software (plus image editing software, depending on how faded the pages are) to recover the text. I’d also ask if you’re eventually hoping to archive your physical papers anywhere, or if you foresee using just the digital scans going forward. I don’t know of specific services offhand, but will ask my colleagues for advice & get back to you. Feel free to send me an email if you have more questions!
A special word of thanks to Celia—and to Savage Minds. Celia and I have had a very fruitful exchange via email, adding details like the possibility of scans that produce 400-600 dpi TIFFs that can be converted to grayscale before heightening contrast in Photoshop. Wouldn’t have happened without Savage Minds.