Technical Standards: Backups
Ratified by Data Custodians: 06-15-2006
This document is in support of the Contingency Planning policy
which requires data stewards to develop process(es) for creating
and storing backup copies of critical data under its management
and for data custodians, data administrators and data users to
use those processes. These process(es) should be based on the
sensitivity, volatility, and value of the data as well as the difficulty
of reproducing it if/when needed.
The subject data for this standard, including determining which
data and information is deemed critical (i.e. confidential data,
registered confidential data, and other data considered to be of
institutional value) will be determined by the data stewards.
Data Backup Standard:
- Critical data (as determined by the data steward) should
be backed up.
- Backup data should be stored at a location that is physically
different from its original creation and usage location.
- The ability to retrieve and restore backup data should
produce successful results. Verification, through restoration of
backed-up data, should be performed on a regular basis.
- Procedures for backing up critical data and the testing of
the procedures should be documented. Such procedures should
include as a minimum for each type of data:
- A definition of the specific data to be backed up
- The type(s) of backup to be used (e.g. full backup, incremental
backup, etc.)
- The frequency and time of data backup
- The number of generations of backed up data that are to be
maintained (both on site and off site)
- Responsibility for data backup
- The storage site(s) for the backups
- The storage media to be used
- Any requirements concerning the data backup archives.
(Retention policies should be based on State and Federal policies).
- Transport modes
- Recovery of backed up data
Factors used in developing the backup strategy:
- What data (including software, operating system, system
data, application data, log data) is to be backed up?
- What are the availability requirements for the system?
Consider the maximum permissible downtime that the University
can manage without the availability of each of the data and
without the need for resorting to backup copies.
- What would be the effort required for data reconstruction
without data backup? What would be the sources from which the
data could be reconstructed if backups were not available?
- What is the volume of data that is required to be backed up?
This will help in the selection of the appropriate storage medium.
- What is the volume of data that is modified over a certain
time period? You may use any unit of time (e.g. hour, week,
month, year). Specify whether the contents of existing files
change or whether new files are generated.
- When are data modifications taking place? Is the data being
modified daily, weekly, or at other intervals? If data is only being
modified at the end of the month, then data backup is only useful
immediately after the data is being modified.
- Are there any retention or deletion requirements, based on
State or Federal requirements, affecting the data?
- What are the confidentiality requirements of the individual
data blocks needing backup? Note that the confidentiality
requirement of a file also applies to any backup copy. A backup
containing data with different confidentiality requirements must
adhere to the highest level of confidentiality.
- What are the integrity (assurance that data is not modified
while in storage) requirements for the backups?
- Who is expected to carry out the data backups and what is
the competency of those individuals to carry out the backups?
Stipulating Data Backup Procedures:
- Defining what is to be backed up:
- All data and software essential to the continued operation
of University functions and all data that the University is
required to maintain must be backed up.
- In backing up information, all supporting material (e.g.
programs, control files, and operating system software)
required to process the information must also be backed
up, although not necessarily with the same frequency as the data.
- The data steward will determine what information must be
backed up, in what form, and how often, in consultation with the
data custodian and the technical staff that are responsible for the
specific data.
- Email should be backed up on separate media from the rest
of the system backup.
- In general, derived data (i.e. data calculated from a raw data
source) should be backed up only if restoring it is more efficient
that recreating it from the original source.
- The key files that require backing up on office workstations
are those created by the user (e.g. word-processing documents,
spreadsheets, etc.). In general, application software for which
the University has a site license or which is part of the computers
image such as Microsoft Office, can generally be replaced.
However, a backup copy should be made of any specialized or
University-created software.
- Type of data backup: There are various procedures
and technologies available for backing up data.
- Full data backup: With this procedure, all data
requiring backup are stored on an additional data medium
without consideration as to whether the files have been changed
since the last backup. Therefore, this method requires a high
storage capacity. Its advantage is the simple and quick
restoration of data due to the fact that all relevant and necessary
files can be extracted from the latest full data backup. If full data
backups are carried out infrequently, extensive changes to a file
can result in major updating requirements.
- Incremental data backup: In contrast to full data
backup, this procedure simply stores the files which have been
changed since the last (incremental or full) backup. This saves
storage capacity and shortens the time required for the data
backup. However, the restoration time for data is generally high,
as the relevant files must be extracted from backups made at
different stages. Incremental data backups are always based on
full data backups and should be interspersed periodically by full
data backups. During restoration, the latest full backup is restored
as the basis which is then updated from subsequent, incremental
backups to effectively restore to the most current state of the
backed-up data.
- Differential data backup: This procedure stores only
the files that have been changed since the last full data backup. A
differential backup requires more capacity on the backup medium
than an incremental backup but the files can be restored quicker
and easier. For restoration of data, the latest full data backup,
followed by the most recent differential backup, will suffice to
restore the data.
- Image backup: This procedure backs up the physical
sectors of the hard disk rather than the individual files on it. This is
a full backup which allows very quick restoration of hard disks of
the same type. This is very effective for disaster recovery, but
not useful for restoration of individual files.
- Tape Cloning: The process of making additional
copies of backup media to address potential failures or loss and
to allow storage in multiple locations.
- Choosing the correct type of backup: The backup
system chosen should be based on operational efficiency,
timeliness and effectiveness.
If the quantity of the modified data is similar to the
quantity of the original data volume, then there is no cost
advantage to doing incremental backups, and full backups
should be done. However, if the quantity of the modified data is
much smaller than the quantity of the original data, then there
is a considerable cost savings to incremental backups.
- Only full backups should be considered if an application requires
backup of the entire database at certain intervals (e.g.
end-of-month processing; end of grade processing; etc.)
- Frequency and time of data backup:
- The frequency with which backups should be taken is
dependent on the volatility of the data in the system. If
information is updated once a month then monthly backups are
sufficient. Systems which are updated daily should be backed up
daily.
- The interval between data backups should be selected so
that the restoration time for the data changed within this period
is shorter than the maximum permissible downtime. This is also
a function of the quantity of data.
- If data are changed to a large extent (e.g. program
sequence for salary payments or different software version or
system changes) or the entire database needs to be made
available at certain points in time, it is advisable to carry out a
full data backup immediately afterwards. Regular, as well as
event-dependent intervals, need to be stipulated.
- Number of generations:
- Multiple generations of operating system, application and
data backups should be maintained in both on-site and off-site
storage facilities.
- The higher the data availability or integrity requirements,
the greater the number of generations required to minimize
the time needed to recover from a loss of integrity. If file loss
or integrity infringements cannot be detected until very late,
additional quarterly or annual backups are recommended.
- If the database is extensive but can be reconstructed without
backups, it can be considered as an additional pseudo generation.
- The higher the volume of data, the higher the costs of
maintaining a generation due to the increased storage requirement.
High volumes of data may therefore restrict the number of
generations that is reasonable due to cost.
- The higher the modification volume, the shorter the intervals
between the generations should be in order to achieve close
updating of files and minimum restoration effort.
- Data Backup Procedures: Two parameters should
be specified for the backup procedure: the degree of automation
(manual vs. automatic) and the storage location (centralized
vs. decentralized).
- In manual data backup, there is manual triggering
of the backup procedures. The advantage is that the operator
can individually select the interval of data backup in accordance
with the work schedule. The disadvantage is that the effectiveness
of the data backup is dependent on the discipline and motivation
of the operator. Automatic data backups are triggered
by a program at certain intervals. The advantage is that the
backup schedule is not dependent on the discipline and reliability
of an operator. The disadvantage is that there is a cost
associated with automation and the schedule needs to be
monitored and revised to include any non-standard updates
and/or changes to the work schedule.
- In central data backups, the storage location and
the performance of the data backup are carried out on a central
IT system by a small set of trained administrators. This
procedure allows for more economical usage of data media.
The disadvantage is that there is added exposure to confidential
data and confidential and non-confidential information may be
combined requiring more stringent security controls for handling
the backups. Decentralized data backups are performed
by IT users or administrators without being transferred to a central
IT system. The advantage is that IT users can control the information
flow and data media, especially in the case of confidential data. The
disadvantage is that the consistency of data backup depends on the
reliability and skill level of the IT user. Sloppy procedures can result
in data exposure or loss.
- For non-networked PCs, backups of application data are usually
performed manually by IT users as a full backup.
- For LANs (Local Area Networks) with connected PCs, data backup
can be carried out by having the PC user back up his/her application
data on a central network server (either manually or automatically),
after which the LAN administrator backs up these data centrally.
(See below).
- Storage medium: There are several considerations
in determining the appropriate media for backups.
- The amount of time it takes to identify the data media
necessary for backup and making them available to the system
is a consideration in choosing the appropriate media. Cassettes
in a robot-system can be made available for restoration within
a matter of minutes; stored tapes may take longer to identify
and then transport.
- The actual time required for restoring the data depends on
the average time needed to access the data on the storage
medium, the rate of data transfer, and the number of files
involved. Hard disks allow access to certain files in a few
milliseconds, whereas magnetic tapes must first be wound
to the correct position.
- Data media with a low storage capacity prevent large
volumes of data from being backed up effectively, as their
repeated interchange is time-consuming and susceptible to
errors. With an increasing data volume, use is generally made
of economical, tape-data media like magnetic tapes or data
cartridges.
- The cost of data backup (cost of read/write devices, data
media and time required for operations) should be
commensurate with the importance of the backup and less than
the total cost of restoration without backup. The life and reliability
of the data media should also be taken into consideration.
- The higher the availability requirements, the faster the required
access to data media for backup purposes, and the shorter the
required time for re-importing the relevant data from the backup
data media. In addition, when high availability is an issue, a
compatible and fully operational reading device (e.g. tape drive,
CD, DVD) must be obtainable on short notice to ensure that the
data media are still usable for restoration even if a reading
device fails.
- In the case where retention schedules call for deletion/erasure
of data at specific times, the selected storage medium must allow
this deletion. Data media, for which deletion is impossible or
difficult (e.g. tape drive, CD, DVD), should be avoided here.
- Where confidentiality and integrity are issues and encrypted
data backup is not possible, then consideration should be given
to data media whose design and transport characteristics would
allow their storage in locked vaults.
- Responsibility for data backup
- Each data backup process should have at least one primary
and one secondary person in charge of the process who is
committed to adherence to the specific data backup process
established.
- Within their department, deans, directors and department
heads may develop policies about what data can be stored on
an individual employees machine (workstations, laptops, PDAs,
other portable devices) versus what data must be stored on the
departments server, provided these policies are not in conflict
with the policies of the data stewards.
- Departmental LAN administrators are responsible for
backing up their LAN servers and workstations and are required
to implement a tested and auditable process that will allow for
recovery from power or hardware failure, data and/or network
problems, and physical disaster.
- Employees are responsible for following the procedures of
the data stewards and their dean, director or department head
with respect to backing up the data on their individual workstation
and/or portable device and adhering to confidentiality policies for
that data.
- Storage site: In general data backups should be
stored in two locations: on-site to have readily available current
data in machine-readable form in the production area in the
event that operating data is lost, damaged or corrupted, and
off-site to additionally provide protection against loss to the
primary site and on-site data. There are no
"best practices" for defining the distance of the
off-site storage area from the on-site storage area. In general,
the distance requirements will vary based on the region, the
specific threats associated with a particular location and any
regulatory compliance requirements (e.g. for HIPAA data, the
requirements specify that off-site means 25-50 miles away).
Off-site backups should be sufficiently distant from the on-site
storage area in order to prevent a single destructive event
from destroying all copies of the data.
- The higher the availability requirements, the quicker the
need to obtain the data backup media. Therefore, in addition
to storing backups in a secure location sufficiently distant
from the production processing system, consideration should
be given to storing additional backup copies in the immediate
vicinity of the IT system.
- The higher the data confidentiality and integrity
requirements, the more important it is to prevent data media
from being manipulated. The storage site selected along with
the appropriate infrastructure and organizational measures
must ensure that the security measures for the backup resources
are at least as stringent as the protection required of the primary
resources.
- With increasing data volumes, the security of the storage
site increases in importance.
- Software backups should not be stored in the same location
as the original media.
- Requirements concerning the data backup archive:
Due to the concentration of data on backup data media, the
degree of confidentiality and integrity of the backed up data
must be at least as high as that of the original data.
Consequently appropriate IT security measures (e.g. access
control, etc.) are required for data media stored in a central
archive.
- The higher the availability requirements, the quicker the
required access to relevant data media. If manual inventory-keeping
does not fulfill the availability requirements, then automatic
access systems should be used.
- The data quantity decisively determines the number of
data media to be stored. Large data volumes require
correspondingly large storage capacities of the data archive.
- If records retention and disposition schedules need to be
maintained, the data backup archive must be organized
appropriately and equipped with the required erasure devices.
Please refer to the Records Management Program at the
University of Connecticut
(http://www.lib.uconn.edu/online/research/
speclib/ASC/recordsmgmt/policies.htm) and the Record
Retention Schedules for State Agencies
(http://www.cslib.org/retstate.htm) to determine the
length of time required to retain specific record series. Note
that in some cases, permission must be obtained before
disposing of certain record series.
- The higher the data confidentiality and integrity requirements
are, the more important it is to prevent data media from being
manipulated. In general, the access control necessary for this can
only be achieved by proper infrastructure and organization-related
measures.
- Any disks, tapes or other media used to store confidential
information must be disposed of in a manner that ensures the
data is not recoverable. See Device and Media Control policy
and Procedures for Removing (Wiping) Data from a Computer
Prior to Re-Deployment, Surplus or Disposal.
- Transport Modes: Data are transferred during any
backup process. The following must be observed in such situations,
irrespective of whether data are being transferred through a
network or line, or whether data media are being dispatched
to an archive.
- The higher the availability requirements, the more quickly
data need to be obtained for restoration. This is to be considered
during the selection of the transmission medium or transport mode.
- If data required for restoration are to be transferred through
a network, network capacity must accommodate the data availability
requirement. In addition, end-to-end security of the transmission
path must be ensured for confidential data.
- The higher the data confidentiality and integrity requirements
are, the more important it is to prevent data from being
intercepted, copied or manipulated by unauthorized persons during
transport. Encryption or cryptographic measures against manipulation
must be considered for such data transmissions. Secure containers
and routes must be selected for physical transport, and the degree
and usefulness of encryption procedures should also be evaluated
here.
- Retention Considerations: As part of the data
backup strategy, one must consider the retention requirements
for the specific data backups. Contact the University of
Connecticut Archivist at the Thomas J. Dodd Research
Center (Unit 1205) for information and assistance
concerning specific data.
- An example of a possible schedule is as follows:
- A full system backup will be performed weekly. Weekly
backups will be saved for a full month.
- The last full weekly backup of the month will be saved as
a monthly backup. The other weekly backup media will be erased
or destroyed (see Device and Media Control policy and procedures)
within two months from the original backup date.
- Monthly backups will be saved for one year, at which time the
media will be erased, destroyed or reused.
- Differential or Incremental backups will be performed daily.
Daily backups will be retained for two weeks. Daily backup media
will be thoroughly erased or destroyed within one month from the
original backup date.
- Recovery of Backup Data:
- Backup documentation including identification of critical data,
programs, documentation and support items necessary to perform
essential tasks during a recovery process should be maintained,
reviewed and updated periodically to account for new technology,
business changes, and migration of applications to alternative
platforms.
- Documentation of the restoration process should include
procedures for the recovery from single-system or application
failures or loss as well as a total center or department disaster
scenario. If encryption of the data has been used, the backup
procedures must include provisions for key management to
ensure that backup data will not be lost through the accidental
loss or unavailability of an encryption key.
- Recovery procedures should be tested and the tests
documented on a periodic basis, but no less than annually.
Testing ensures that the data can be recovered and that
staff are familiar with the procedures.
|