Technical Standards: Backups
Ratified by Data Custodians: 06-15-2006

This document is in support of the Contingency Planning policy which requires data stewards to develop process(es) for creating and storing backup copies of critical data under its management and for data custodians, data administrators and data users to use those processes. These process(es) should be based on the sensitivity, volatility, and value of the data as well as the difficulty of reproducing it if/when needed.

The subject data for this standard, including determining which data and information is deemed ‘critical’ (i.e. confidential data, registered confidential data, and other data considered to be of institutional value) will be determined by the data stewards.

Data Backup Standard:

  1. Critical data (as determined by the data steward) should be backed up.
  2. Backup data should be stored at a location that is physically different from its original creation and usage location.
  3. The ability to retrieve and restore backup data should produce successful results. Verification, through restoration of backed-up data, should be performed on a regular basis.
  4. Procedures for backing up critical data and the testing of the procedures should be documented. Such procedures should include as a minimum for each type of data:
    • A definition of the specific data to be backed up
    • The type(s) of backup to be used (e.g. full backup, incremental backup, etc.)
    • The frequency and time of data backup
    • The number of generations of backed up data that are to be maintained (both on site and off site)
    • Responsibility for data backup
    • The storage site(s) for the backups
    • The storage media to be used
    • Any requirements concerning the data backup archives. (Retention policies should be based on State and Federal policies).
    • Transport modes
    • Recovery of backed up data

Factors used in developing the backup strategy:

  • What data (including software, operating system, system data, application data, log data) is to be backed up?
  • What are the availability requirements for the system? Consider the maximum permissible downtime that the University can manage without the availability of each of the data and without the need for resorting to backup copies.
  • What would be the effort required for data reconstruction without data backup? What would be the sources from which the data could be reconstructed if backups were not available?
  • What is the volume of data that is required to be backed up? This will help in the selection of the appropriate storage medium.
  • What is the volume of data that is modified over a certain time period? You may use any unit of time (e.g. hour, week, month, year). Specify whether the contents of existing files change or whether new files are generated.
  • When are data modifications taking place? Is the data being modified daily, weekly, or at other intervals? If data is only being modified at the end of the month, then data backup is only useful immediately after the data is being modified.
  • Are there any retention or deletion requirements, based on State or Federal requirements, affecting the data?
  • What are the confidentiality requirements of the individual data blocks needing backup? Note that the confidentiality requirement of a file also applies to any backup copy. A backup containing data with different confidentiality requirements must adhere to the highest level of confidentiality.
  • What are the integrity (assurance that data is not modified while in storage) requirements for the backups?
  • Who is expected to carry out the data backups and what is the competency of those individuals to carry out the backups?
Stipulating Data Backup Procedures:
  • Defining what is to be backed up:
    • All data and software essential to the continued operation of University functions and all data that the University is required to maintain must be backed up.
    • In backing up information, all supporting material (e.g. programs, control files, and operating system software) required to process the information must also be backed up, although not necessarily with the same frequency as the data.
    • The data steward will determine what information must be backed up, in what form, and how often, in consultation with the data custodian and the technical staff that are responsible for the specific data.
    • Email should be backed up on separate media from the rest of the system backup.
    • In general, derived data (i.e. data calculated from a raw data source) should be backed up only if restoring it is more efficient that recreating it from the original source.
    • The key files that require backing up on office workstations are those created by the user (e.g. word-processing documents, spreadsheets, etc.). In general, application software for which the University has a site license or which is part of the computer’s image such as Microsoft Office, can generally be replaced. However, a backup copy should be made of any specialized or University-created software.

  • Type of data backup: There are various procedures and technologies available for backing up data.
    • Full data backup: With this procedure, all data requiring backup are stored on an additional data medium without consideration as to whether the files have been changed since the last backup. Therefore, this method requires a high storage capacity. Its advantage is the simple and quick restoration of data due to the fact that all relevant and necessary files can be extracted from the latest full data backup. If full data backups are carried out infrequently, extensive changes to a file can result in major updating requirements.
    • Incremental data backup: In contrast to full data backup, this procedure simply stores the files which have been changed since the last (incremental or full) backup. This saves storage capacity and shortens the time required for the data backup. However, the restoration time for data is generally high, as the relevant files must be extracted from backups made at different stages. Incremental data backups are always based on full data backups and should be interspersed periodically by full data backups. During restoration, the latest full backup is restored as the basis which is then updated from subsequent, incremental backups to effectively restore to the most current state of the backed-up data.
    • Differential data backup: This procedure stores only the files that have been changed since the last full data backup. A differential backup requires more capacity on the backup medium than an incremental backup but the files can be restored quicker and easier. For restoration of data, the latest full data backup, followed by the most recent differential backup, will suffice to restore the data.
    • Image backup: This procedure backs up the physical sectors of the hard disk rather than the individual files on it. This is a full backup which allows very quick restoration of hard disks of the same type. This is very effective for disaster recovery, but not useful for restoration of individual files.
    • Tape Cloning: The process of making additional copies of backup media to address potential failures or loss and to allow storage in multiple locations.
  • Choosing the correct type of backup: The backup system chosen should be based on operational efficiency, timeliness and effectiveness.
    • If the quantity of the modified data is similar to the quantity of the original data volume, then there is no cost advantage to doing incremental backups, and full backups should be done. However, if the quantity of the modified data is much smaller than the quantity of the original data, then there is a considerable cost savings to incremental backups.
    • Only full backups should be considered if an application requires backup of the entire database at certain intervals (e.g. end-of-month processing; end of grade processing; etc.)
  • Frequency and time of data backup:
    • The frequency with which backups should be taken is dependent on the volatility of the data in the system. If information is updated once a month then monthly backups are sufficient. Systems which are updated daily should be backed up daily.
    • The interval between data backups should be selected so that the restoration time for the data changed within this period is shorter than the maximum permissible downtime. This is also a function of the quantity of data.
    • If data are changed to a large extent (e.g. program sequence for salary payments or different software version or system changes) or the entire database needs to be made available at certain points in time, it is advisable to carry out a full data backup immediately afterwards. Regular, as well as event-dependent intervals, need to be stipulated.
  • Number of generations:
    • Multiple generations of operating system, application and data backups should be maintained in both on-site and off-site storage facilities.
    • The higher the data availability or integrity requirements, the greater the number of generations required to minimize the time needed to recover from a loss of integrity. If file loss or integrity infringements cannot be detected until very late, additional quarterly or annual backups are recommended.
    • If the database is extensive but can be reconstructed without backups, it can be considered as an additional ‘pseudo generation’.
    • The higher the volume of data, the higher the costs of maintaining a generation due to the increased storage requirement. High volumes of data may therefore restrict the number of generations that is reasonable due to cost.
    • The higher the modification volume, the shorter the intervals between the generations should be in order to achieve close updating of files and minimum restoration effort.
  • Data Backup Procedures: Two parameters should be specified for the backup procedure: the degree of automation (manual vs. automatic) and the storage location (centralized vs. decentralized).
    • In manual data backup, there is manual triggering of the backup procedures. The advantage is that the operator can individually select the interval of data backup in accordance with the work schedule. The disadvantage is that the effectiveness of the data backup is dependent on the discipline and motivation of the operator. Automatic data backups are triggered by a program at certain intervals. The advantage is that the backup schedule is not dependent on the discipline and reliability of an operator. The disadvantage is that there is a cost associated with automation and the schedule needs to be monitored and revised to include any non-standard updates and/or changes to the work schedule.
    • In central data backups, the storage location and the performance of the data backup are carried out on a central IT system by a small set of trained administrators. This procedure allows for more economical usage of data media. The disadvantage is that there is added exposure to confidential data and confidential and non-confidential information may be combined requiring more stringent security controls for handling the backups. Decentralized data backups are performed by IT users or administrators without being transferred to a central IT system. The advantage is that IT users can control the information flow and data media, especially in the case of confidential data. The disadvantage is that the consistency of data backup depends on the reliability and skill level of the IT user. Sloppy procedures can result in data exposure or loss.
    • For non-networked PCs, backups of application data are usually performed manually by IT users as a full backup.
    • For LANs (Local Area Networks) with connected PCs, data backup can be carried out by having the PC user back up his/her application data on a central network server (either manually or automatically), after which the LAN administrator backs up these data centrally. (See below).
  • Storage medium: There are several considerations in determining the appropriate media for backups.
    • The amount of time it takes to identify the data media necessary for backup and making them available to the system is a consideration in choosing the appropriate media. Cassettes in a robot-system can be made available for restoration within a matter of minutes; stored tapes may take longer to identify and then transport.
    • The actual time required for restoring the data depends on the average time needed to access the data on the storage medium, the rate of data transfer, and the number of files involved. Hard disks allow access to certain files in a few milliseconds, whereas magnetic tapes must first be wound to the correct position.
    • Data media with a low storage capacity prevent large volumes of data from being backed up effectively, as their repeated interchange is time-consuming and susceptible to errors. With an increasing data volume, use is generally made of economical, tape-data media like magnetic tapes or data cartridges.
    • The cost of data backup (cost of read/write devices, data media and time required for operations) should be commensurate with the importance of the backup and less than the total cost of restoration without backup. The life and reliability of the data media should also be taken into consideration.
    • The higher the availability requirements, the faster the required access to data media for backup purposes, and the shorter the required time for re-importing the relevant data from the backup data media. In addition, when high availability is an issue, a compatible and fully operational reading device (e.g. tape drive, CD, DVD) must be obtainable on short notice to ensure that the data media are still usable for restoration even if a reading device fails.
    • In the case where retention schedules call for deletion/erasure of data at specific times, the selected storage medium must allow this deletion. Data media, for which deletion is impossible or difficult (e.g. tape drive, CD, DVD), should be avoided here.
    • Where confidentiality and integrity are issues and encrypted data backup is not possible, then consideration should be given to data media whose design and transport characteristics would allow their storage in locked vaults.
  • Responsibility for data backup
    • Each data backup process should have at least one primary and one secondary person in charge of the process who is committed to adherence to the specific data backup process established.
    • Within their department, deans, directors and department heads may develop policies about what data can be stored on an individual employee’s machine (workstations, laptops, PDAs, other portable devices) versus what data must be stored on the department’s server, provided these policies are not in conflict with the policies of the data stewards.
    • Departmental LAN administrators are responsible for backing up their LAN servers and workstations and are required to implement a tested and auditable process that will allow for recovery from power or hardware failure, data and/or network problems, and physical disaster.
    • Employees are responsible for following the procedures of the data stewards and their dean, director or department head with respect to backing up the data on their individual workstation and/or portable device and adhering to confidentiality policies for that data.
  • Storage site: In general data backups should be stored in two locations: on-site to have readily available current data in machine-readable form in the production area in the event that operating data is lost, damaged or corrupted, and off-site to additionally provide protection against loss to the primary site and on-site data. There are no "best practices" for defining the distance of the off-site storage area from the on-site storage area. In general, the distance requirements will vary based on the region, the specific threats associated with a particular location and any regulatory compliance requirements (e.g. for HIPAA data, the requirements specify that off-site means 25-50 miles away). Off-site backups should be sufficiently distant from the on-site storage area in order to prevent a single destructive event from destroying all copies of the data.
    • The higher the availability requirements, the quicker the need to obtain the data backup media. Therefore, in addition to storing backups in a secure location sufficiently distant from the production processing system, consideration should be given to storing additional backup copies in the immediate vicinity of the IT system.
    • The higher the data confidentiality and integrity requirements, the more important it is to prevent data media from being manipulated. The storage site selected along with the appropriate infrastructure and organizational measures must ensure that the security measures for the backup resources are at least as stringent as the protection required of the primary resources.
    • With increasing data volumes, the security of the storage site increases in importance.
    • Software backups should not be stored in the same location as the original media.
  • Requirements concerning the data backup archive: Due to the concentration of data on backup data media, the degree of confidentiality and integrity of the backed up data must be at least as high as that of the original data. Consequently appropriate IT security measures (e.g. access control, etc.) are required for data media stored in a central archive.
    • The higher the availability requirements, the quicker the required access to relevant data media. If manual inventory-keeping does not fulfill the availability requirements, then automatic access systems should be used.
    • The data quantity decisively determines the number of data media to be stored. Large data volumes require correspondingly large storage capacities of the data archive.
    • If records retention and disposition schedules need to be maintained, the data backup archive must be organized appropriately and equipped with the required erasure devices. Please refer to the Records Management Program at the University of Connecticut (http://www.lib.uconn.edu/online/research/ speclib/ASC/recordsmgmt/policies.htm) and the Record Retention Schedules for State Agencies (http://www.cslib.org/retstate.htm) to determine the length of time required to retain specific record series. Note that in some cases, permission must be obtained before disposing of certain record series.
    • The higher the data confidentiality and integrity requirements are, the more important it is to prevent data media from being manipulated. In general, the access control necessary for this can only be achieved by proper infrastructure and organization-related measures.
    • Any disks, tapes or other media used to store confidential information must be disposed of in a manner that ensures the data is not recoverable. See Device and Media Control policy and Procedures for Removing (Wiping) Data from a Computer Prior to Re-Deployment, Surplus or Disposal.
  • Transport Modes: Data are transferred during any backup process. The following must be observed in such situations, irrespective of whether data are being transferred through a network or line, or whether data media are being dispatched to an archive.
    • The higher the availability requirements, the more quickly data need to be obtained for restoration. This is to be considered during the selection of the transmission medium or transport mode.
    • If data required for restoration are to be transferred through a network, network capacity must accommodate the data availability requirement. In addition, end-to-end security of the transmission path must be ensured for confidential data.
    • The higher the data confidentiality and integrity requirements are, the more important it is to prevent data from being intercepted, copied or manipulated by unauthorized persons during transport. Encryption or cryptographic measures against manipulation must be considered for such data transmissions. Secure containers and routes must be selected for physical transport, and the degree and usefulness of encryption procedures should also be evaluated here.
  • Retention Considerations: As part of the data backup strategy, one must consider the retention requirements for the specific data backups. Contact the University of Connecticut Archivist at the Thomas J. Dodd Research Center (Unit 1205) for information and assistance concerning specific data.
    • An example of a possible schedule is as follows:
      • A full system backup will be performed weekly. Weekly backups will be saved for a full month.
      • The last full weekly backup of the month will be saved as a monthly backup. The other weekly backup media will be erased or destroyed (see Device and Media Control policy and procedures) within two months from the original backup date.
      • Monthly backups will be saved for one year, at which time the media will be erased, destroyed or reused.
      • Differential or Incremental backups will be performed daily. Daily backups will be retained for two weeks. Daily backup media will be thoroughly erased or destroyed within one month from the original backup date.
  • Recovery of Backup Data:
    • Backup documentation including identification of critical data, programs, documentation and support items necessary to perform essential tasks during a recovery process should be maintained, reviewed and updated periodically to account for new technology, business changes, and migration of applications to alternative platforms.
    • Documentation of the restoration process should include procedures for the recovery from single-system or application failures or loss as well as a total center or department disaster scenario. If encryption of the data has been used, the backup procedures must include provisions for key management to ensure that backup data will not be lost through the accidental loss or unavailability of an encryption key.
    • Recovery procedures should be tested and the tests documented on a periodic basis, but no less than annually. Testing ensures that the data can be recovered and that staff are familiar with the procedures.