Papers /

Berman-CACM 2008

Reading

Outdoors

Games

Hobbies

LEGO

Food

Code

Events

Nook

sidebar

Berman-CACM 2008

Got Data? A Guide to Data Preservation in the Information Age

Berman

data preservation archiving legacy

@article{berman:cacm-2008,
  title={Got Data?  A Guide to Data Preservation in the Information Age},
  author={Francine Berman},
  journal={Communications of the {ACM}},
  volume={51},
  number={12},
  month={December},
  year={2008}

"If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy."

  • But no one ever notices infrastructure until it fails or doesn't exist, despite it being complex and expensive
  • Atkins report; US NSF 2003 Final Report of the Blue Ribbon Advisory Panel on Cyberinfrastructure

Keep data manageable, accessible, available, secure in useful, usable, cost-effective, unremarkable infrastructure

More digital data is being created than there is storage to host it

  • We can't keep all data, must appraise and archive only priorities

More policies and regulations are requiring access, stewardship, and/or preservation of digital data

Storage costs for digital data are decreasing

  • But electrical and human power requirements and bills are not going down
  • Data centers are consuming larger portions of budgets

There is increasing commercialization of digital data storage and services

Greater levels of trust, monitoring, replication, and accounting is required for archiving material in the public interest

  • Census, official records, critical scientific, and other irreplaceable data
  • Minimize likelihood of loss or damage and ensure longevity
  • Probably best done by trusted entities without a profit motive?

Value means many things to many people/users; can be organized into hierarchy, a Data Pyramid

  • Bottom is individual/local users whose data is limited in interest---personal photos, documents
    • Probably can be supported by commercial services
  • Middle is communities and groups ranging in size, focus
  • Top is government, large agencies, etc, with widespread relevance
    • Probably best supported by libraries, museums, archives, government and academic agencies & institutions

Ten guidelines for data stewardship

  • Make a plan
  • Be aware of data costs, including them in IT budget
  • Associate metadata with data
  • Make multiple copies of valuable data
  • Plan for migrating data to new media/formats/etc well ahead of time, before current solution is obsolete
  • Plan for transitions, e.g., turning over to other groups
  • Determine level of trust/reliability required
  • Tailor plans for preservation and expected use
  • Pay attention to security
  • Know regulations

Of note:

  • Branscomb pyramid of computing users
Recent Changes (All) | Edit SideBar Page last modified on June 02, 2009, at 11:32 AM Edit Page | Page History