Got Data? A Guide to Data Preservation in the Information Age
Berman
data preservation archiving legacy
@article{berman:cacm-2008,
title={Got Data? A Guide to Data Preservation in the Information Age},
author={Francine Berman},
journal={Communications of the {ACM}},
volume={51},
number={12},
month={December},
year={2008}
"If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy."
- But no one ever notices infrastructure until it fails or doesn't exist, despite it being complex and expensive
- Atkins report; US NSF 2003 Final Report of the Blue Ribbon Advisory Panel on Cyberinfrastructure
Keep data manageable, accessible, available, secure in useful, usable, cost-effective, unremarkable infrastructure
More digital data is being created than there is storage to host it
- We can't keep all data, must appraise and archive only priorities
More policies and regulations are requiring access, stewardship, and/or preservation of digital data
Storage costs for digital data are decreasing
- But electrical and human power requirements and bills are not going down
- Data centers are consuming larger portions of budgets
There is increasing commercialization of digital data storage and services
Greater levels of trust, monitoring, replication, and accounting is required for archiving material in the public interest
- Census, official records, critical scientific, and other irreplaceable data
- Minimize likelihood of loss or damage and ensure longevity
- Probably best done by trusted entities without a profit motive?
Value means many things to many people/users; can be organized into hierarchy, a Data Pyramid
- Bottom is individual/local users whose data is limited in interest---personal photos, documents
- Probably can be supported by commercial services
- Middle is communities and groups ranging in size, focus
- Top is government, large agencies, etc, with widespread relevance
- Probably best supported by libraries, museums, archives, government and academic agencies & institutions
Ten guidelines for data stewardship
- Make a plan
- Be aware of data costs, including them in IT budget
- Associate metadata with data
- Make multiple copies of valuable data
- Plan for migrating data to new media/formats/etc well ahead of time, before current solution is obsolete
- Plan for transitions, e.g., turning over to other groups
- Determine level of trust/reliability required
- Tailor plans for preservation and expected use
- Pay attention to security
- Know regulations
Of note:
- Branscomb pyramid of computing users