Handbook for Digital Projects:
A Management Tool for Preservation and Access
Technical Primer
Steven Puglia
National Archives and Records AdministrationIntroduction
This chapter exposes readers to the technical terminology and concepts of the digitization process. Specifically, it provides basic technical information related to digitizing library collections, archival holdings, and other materials from cultural institutions. As an overview, the chapter does not go into the technical detail needed to actually perform digitization. Instead, it is intended for those who manage activities or work on other aspects of digitizing projects. For some readers, this chapter may be a bit basic, for others a bit complex, but it tries to strike a reasonable balance that is helpful.
The Digital Image
Digitization converts an image into a series of picture elements or pixels, little squares that are either black or white (binary), a specific shade of gray (grayscale) or color. Each pixel is represented by a single or series of binary digits, either 1s or 0s. The pixels are arranged in a two-dimensional matrix called a bitmap. This is referred to as a raster image. If you zoom in on a raster-based digital image, you will see the image is composed of a series of rows and columns of square pixels. A raster image is relatively analogous to traditional photographs, which are composed of image-forming grains or clumps of either silver or dyes. Where possible, this chapter compares aspects of digital technology to traditional photographic technology as a point of reference.
![]()
Vector image files are a different type of computer image distinct from raster images. Many computer programs (drawing, illustration, 3D modeling/rendering, computer-aided design/computer-aided manufacturing, and architectural design) use vectors -- arrows of direction, points, and lines -- that define shapes, as compared to the individual picture elements used to represent a raster image. This chapter will not discuss them further.
The Digitization Process
Digitization is the process of converting an analog signal into a digital signal, known as an A/D (analog to digital) conversion. For raster images, an analog voltage signal (from any of several types of imaging sensors), proportional to the amount of light reflected or transmitted by an item being digitized, is divided into discrete numeric values. The number of values is the bit depth for each pixel.
Common Digital Imaging Sensors (image detectors)
One important part of digitizing is the type of imaging sensor used. These image detectors can be compared to the film used in photography. The common digital imaging sensors are:
- CCD: Charged coupled devices, or CCDs, used in both flatbed scanners and digital cameras
- PMT: Photo-multiplier tubes, or PMTs, used in drum scanners
- CMOS: Complementary metal oxide semiconductors, or CMOS chips, used in low-end flatbed scanners and low-end digital cameras.
The most common sensor used in scanners is the charged-coupled device or CCD, used in all types of scanners and digital cameras. The photo multiplier tube, or PMT, is used only in drum scanners for the graphic arts or prepress market, i.e., printing and publishing. More recently, complementary metal oxide semiconductor or CMOS sensors have been introduced as a low-cost alternative to CCDs. CMOS chips are manufactured in the same way as standard computer chips and are therefore less expensive to manufacture. Eventually, CMOS chips could replace the CCD as the predominant sensor in the marketplace, but currently, due to certain technical deficiencies, they do not produce the same image quality of CCDs and cannot match the resolution of CCDs. At this time, CMOS chips are used only in low-end digital cameras and scanners.
CCDs are produced in a variety of designs and shapes. A single row of CCD sensors (or photo diodes) arranged in a straight line are referred to as line arrays. Line arrays are used commonly in flatbed scanners. Area arrays comprise a two-dimensional set of rows and columns of light sensors. Area arrays are used commonly in digital cameras.
In both scanners and digital cameras, a lens or set of lenses are used to focus an image onto the sensor -- a CCD, PMT, or CMOS chip. Sometimes people refer to digital imaging as lens-less. This is not true. Without a lens to focus the light, digital images would be blurry, just as in photography. All scanners have lenses. In most cases, the design of the scanner hides the lens(es) within the body of the scanner.
Scanners also have built-in light sources to illuminate the items being scanned. The light is either reflected (as with documents or photographic prints) or transmitted (as with microfilm, photographic negatives, or color transparencies) by the item being scanned, and the image is focussed by the lens(es) onto the imaging sensor. In the case of CCDs, the light falls onto the little light sensors or photo diodes on the CCD. These sites or diodes generate an electrical current, or voltage. The amount of voltage generated is proportional to the amount of light hitting the individual sensor. The brighter the light, the higher the voltage that is given off by the site on the CCD.
Analog to Digital Conversion
The analog electrical signal generated by the sensor is processed via the analog to digital conversion. The electrical signal generated at each light sensor or photo diode is divided into discrete numerical values that are proportional to the amount of light reflected from or transmitted by the item being scanned. The total number of discrete numerical values possible is determined by the sampling bit depth, while the specific numeric value for an individual pixel is based on the specific amount of light reflected or transmitted from that point on the original image.
Also, once a digital image has been created and stored in any media, there is a corresponding digital to analog conversion that allows the computer to present the image in a human readable form on either a display or printer. Displaying an image on a computer monitor or printing the image on a printer are both examples of an analog representation of a digital image.
Basic Image Measures
There are three important measures of every static digital image:
- Resolution. The number of dots, or pixels (picture element), used to represent an image. This is always given as a measure of linear or area density (e.g., 300 dots/inch).
- Pixel Bit Depth. This measure defines the number of shades that can actually be represented by the amount of information saved for each pixel. These can range from 1 bit/pixel for binary (fax type) images to 24 bits per pixel in high quality color images.
- Color. There are many ways to represent, compress, and distribute color images. Suffice it to say that the smaller the image file size, the less accurately it renders the original image.
1. Resolution
Resolution, or spatial frequency, is the number of times an image is sampled during the scanning process. Resolution -- the number of pixels in an image -- can be described in a number of ways:
DPI-- dots per inch
PPI-- pixels per inch
LPI-- lines per inch, used for halftonesThe scanning resolution and the resolution of digital image files are most appropriately referred to as pixels per inch or PPI. Dots per inch or DPI is considered a printing term and is most appropriate when referring to the resolution at which a computer printer produces a print. However, DPI is a more generic term and is more commonly used than PPI. LPI is a term that refers to a half-tone screen value. A half-tone screen converts an image into a series of dots that can be reproduced on a computer printer or a printing press; continuous tone digital images are converted to half-tone images when they are printed on most types of computer printers, including ink-jet and laser printers. Some printers print continuous tone image and do not convert the image.
1.a. Pixel Array
The pixel array is the number of pixels across both dimensions of an image in terms of rows and columns across the dimensions of the image. As an example, an 8"x10" photograph is scanned at 300 ppi. This produces a file that has a pixel array of 2400 x 3000 pixels.
Generally, lines per inch (LPI) is a term used for halftones (for reproduction on a printing press) and is not used for continuous-tone images. However, "lines" or rows of pixels is a term used within the photographic industry as a common shorthand for the number of pixels across the long dimension of digital images of photographs. Since photographs come in many different formats and sizes (ranging from small negatives to large prints), it is hard to refer to pixels per inch (PPI) of resolution when producing digital images of the same size because the PPI will vary depending on the size of the photographic original.
An 8"x10" print scanned at 300 ppi produces a file that is 2400 x 3000 pixels.
A 4"x5" negative scanned at 600 ppi produces a file that is 2400 x 3000 pixels.
A 35mm negative scanned at 2100 ppi produces a file that is 2000 x 3000 pixels.Each of these files is referred to as a 3000 line file, and all sizes are prior to applying any type of compression.
1.b. Resolution - True vs. Interpolated
... obtain a scanner with as high as optical resolution as is affordable Optical (true) resolution is the inherent resolution of the scanner based on the size of the imaging sensor and the magnification of the optical system. Interpolated resolution is synthetic or calculated resolution. Interpolation is a mathematical process that is used to increase or decrease the resolution of an image. This can be done during or after scanning. Higher optical (true) resolution in a scanner will provide better image quality than interpolated resolution. It is recommended that you obtain a scanner with as high an optical resolution as is affordable, not just in terms of the price of the scanner, but also in terms of the cost to store each image file.
Test documents with the actual scanners to be used Interpolation can be as simple as changing the optical resolution to a lower value for display purposes, or as complex as detecting and rescreening halftone areas of a document. Some interpolation algorithms work better than others. Generally, more expensive image processing software has better algorithms. However, there are exceptions because more and more of the interpolation algorithms are being built into the scanner hardware. Most interpolation algorithms represent a trade-off between image quality, speed, and image file size. It is highly recommended that you test the actual documents to be scanned with the actual scanners and image enhancement algorithms to be used. It is often a good idea to buy a more expensive scanner or to add an image processing software package -- one that has better image processing algorithms. Even though most users may use only a small number of all the features available, high quality images will at one point or another need to be processed with image processing algorithms.
1.c. Digitizing Resolution
Digitizing resolution can be divided into two generic categories. Reproduction resolution is the resolution needed to provide a desired image quality for a specific type of output device. Preservation resolution is the level of resolution that reproduces all the information in the original image or document. Using photographs as an example, these levels could be:
- Reproduction--screen resolution, which is a minimum of 600 x 400 pixels, or print resolution, which is usually 300 dpi to 600 dpi.
- Preservation--match the original (examples for color negative or color transparency). This is a theoretical resolution limit based on the resolution and granularity of the original film and the resolution of the lens used to take the photograph. The actual desired digital resolution will vary depending on the photographic film (and the developer used), the original camera lens, the significant feature size that is desired to be reproduced, and the quality of the scanner used for digitizing.
- 3,000 to 4,000 lines for 35mm
- 10,000 to 16,000 lines for 4"x5"
- 20,000 to 32,000 lines for 8"x10"
The following are estimates for file sizes for preservation quality scans of photographs--negatives or transparencies--based on the lower resolution limits cited above.
![]()
1.d. General or Minimum Digitizing Requirements for Facilitating Reproduction and Access
Cornell recommends 600 ppi for 1-bit scanning or 400 ppi for 8-bit scanning of printed type to achieve preservation quality scanning. Other general recommendations for reproduction are:
- Textual records
- 200 to 600 ppi for 1-bit
- 200 to 400 ppi for 8-bit grayscale
- 200 to 300 ppi for 24-bit color
- Photographs
- 3000 to 5000 lines for 8-bit grayscale
3000 to 5000 lines for 24-bit color- Maps/Plans/Oversized
- 200 to 300 ppi for 8-bit grayscale
- 200 to 300 ppi for 24-bit color
As computers become faster and memory becomes cheaper, the recommendations for scanning resolution are likely to increase. Today, projects are selecting higher scanning resolution than older digitizing projects.
2. Pixel Bit Depth
Computers work on a binary system; each bit of data is either a 1 or a 0. Each pixel in a raster image is represented by a string of binary digits. The number of digits is known as the bit depth. A 1-bit pixel is represented by one binary digit, either a 1 or a 0. A 2-bit pixel is represented by two binary digits, either -- 0+0, 0 + 1, 1 + 0, or 1 + 1. The bit depth determines the number of possible combinations of 1s and 0s for that number of binary digits and therefore the number of gray shades or color shades that can be represented by each pixel, as illustrated by the following formula.
Number of shades = 2X X = the bit depth
1 bit = 2 shades (a single binary digit- a single 1 or a single 0 -- (black or white)
2 bits = 4 shades (two binary digits form four possible combinations -- black, dark gray, light gray, and white)
3 bits = 8 shades (three binary digits form 8 possible combinations)
4 bits = 16 shades (four binary digits form 16 possible combinations)
5 bits = 32 shades (five binary digits form 32 possible combinations)
6 bits = 64 shades (six binary digits form 64 possible combinations)
7 bits = 128 shades (seven binary digits form 128 possible combinations)
8 bits = 256 shades (eight binary digits form 256 possible combinations
10 bits = 1,024 shades (ten binary digits form 1,024 possible combinations)
12 bits = 4,096 shades (twelve binary digits form 4,096 possible combinations)
14 bits = 16,384 shades (fourteen binary digits form 16,384 possible combinations)
16 bits = 65,536 shades (sixteen binary digits form 65,536 possible combinations)
Bit Depth Illustrations
The following are current standard bit depths for image files.
- 1-bit black-and-white
- 8-bit grayscale 256 shades of gray
- 8-bit color 256 colors
- 24-bit RGB* approximately 17 million colors, three 8-bit channels
- _____
- *See next section
The bit depth influences the representation of images. Obviously, at 1-bit there are only black or white values and no gray shading. Texture and other subtle shading values are not reproduced. At 2-bits, four shades are reproduced -- black, white, and two intermediate shades of gray. At 4-bits, 16 shades are reproduced, and the background texture of a document will be rendered. At 6-bit grayscale, 64 shades of gray, the digital image approximates typical human perceptual response. Psyc Hometric studies have determined that most people can distinguish approximately 64 shades of gray. Years ago, when computer scientists were establishing conventions for digital imaging, computer memory was expensive and CPU speed was slow. It was an easy decision to limit grayscale image files to 8-bits to save storage space, since the 256 shades reproduced exceeds human perception. However, 8-bit grayscales' rendering of 256 shades is limited compared to photographic materials and can present problems when the contrast and brightness of digital images needs to be adjusted. The use of 8-bit grayscale image files, and corresponding 24-bit RGB color image files (three color channels of 8-bit information), was a reasonable compromise and, in many cases, still is.
3. Color
Color Systems
Several different systems are used to represent color images. The most common are RGB (additive color system), CMYK (subtractive color system), and the CIE-L*A*B* color space, a mathematical modeling of color.RGB The additive color system combines variations of red, green, and blue (RGB) to form white. This method is used in the design of televisions, computer monitors, and film recorders. Think of an RGB color image as three separate images superimposed one over the other. The superimposition is done mathematically. Basically, an RGB image consists of three 8-bit grayscale images or channels; one channel represents the red information, a second channel represents the green information, and the third channel represents the blue information. The computer mathematically combines the three channels for each pixel to determine the final color. A 24-bit RGB color digital image file consists of three channels each with 8-bits of data (3 channels x 8-bits = 24-bits).
![]()
CMYK The subtractive color system combines variations of cyan, magenta, and yellow to form black. This method is used in the graphic arts printing process and with computer printers. Often, the printer uses a fourth ink, black, to increase the range of densities that can be reproduced. Four-color printers use cyan, magenta, yellow, and black (CMYK). Almost all color photographic materials have been based on subtractive color, utilizing varying amounts of cyan, magenta, and yellow dye. Most computer printers use four colors, although there are now printers on the market that have six colors; a light magenta and a light cyan have been added to improve the image quality when printing photographic images. A 32-bit CMYK color digital image file consists of four channels, each with 8-bits of data (4 channels x 8-bits = 32-bits).
![]()
CIE-L*A*B* The CIE-L*A*B* color space is a mathematical model of color that divides the color into luminosity (L) that can be thought of as the grayscale information, red (+A) to green (-A) information, and blue (+B) to yellow (-B) information. The L*A*B* color space is referred to as a device-independent color space. It is not linked to a specific type of output device like a computer monitor (RGB) or a computer printer (CMYK). A 24-bit L*A*B* color digital image file consists of three channels, each with 8-bits (3 channels x 8-bits = 24-bits).
Recommendations
Most scanners utilize RGB scanning, although some do convert the images to CMYK or L*A*B* images. For most digital imaging projects, it is recommended to save color images as RGB files, not as CMYK files. The L*A*B* color space could be used, but fewer software applications are able to interpret and use the L*A*B* files at this time. Since it is always possible to convert RGB files for output, CMYK image files should only be used for printing. The overriding objective in preservation is to save the most information that is economically possible, using methods that can be reversed if required.
For most projects, save color images as RGB files. Increasingly, scanners and software are able to handle high-bit image files. This means rather than having 8-bits per color channel, the files may have 10-bits, 12-bits or 16-bits per color channel. An RGB color image that has 16-bits per channel is a 48-bit color image file (3 channels x 16-bits = 48-bits).
Comparison
![]()
As can be seen from the chart, as the bit depth is increased, the number of shades and the number of colors that can be reproduced increases dramatically. Photographic materials are able to render effectively several thousand shades. The equivalent bit depth for digital imaging is at least 12-bits per channel.
Color Gamut
A color gamut is the range of colors that a system, such as a computer monitor or printer, can reproduce. Color gamuts are illustrated graphically to compare different color spaces, color systems, or devices. Wide gamut RGB and L*A*B* color spaces can render a greater range of colors and generally require the use of higher bit depths to achieve a wide color gamut. The CMYK color system has a limited color gamut and can reproduce a correspondingly limited range of colors; this is another reason not to use CMYK files for master image files.
Color Palettes
Color palettes are discrete sets of defined colors used by computers to represent 8-bit or 256-color images. The Windows and Macintosh operating systems use different sets of colors for 8-bit color images. The rendition of the image changes depending on which type of computer the image is viewed. One approach for 8-bit color file formats -- such as GIF files intended to be distributed via the World Wide Web -- is to use a Web-safe pallet. A Web-safe palette uses 212 to 216 common colors between the Windows and Macintosh palettes, and the image should look the same on either type of computer. Another option is an adaptive pallet, where the 256 colors used for the palette are based on the specific colors in a specific image. In most cases using an adaptive palette will make an 8-bit color image look much more like the original 24-bit color image, compared to using a Windows, Macintosh, or Web-safe palette.
Color Imbalance
Color imbalances happen when neutral values are not rendered with equal levels of red, green, and blue (obviously, for an RGB image file). As an example, a white highlight in a digital image will shift to a color when the tones are clipped in a single or two of the color channels.
Accuracy of Color
Managing the accuracy of color rendition for digital images is complex, involving the adjustment and calibration of computer monitors, the adjustment of scanner controls, the correction or enhancement of images using image processing software, the adjustment and calibration of output devices, and the use of color management software. This software transforms images between different color spaces to correct for differences in the color gamuts of scanners, monitors, and output devices. Apple's Colorsync and Windows ICM are examples of color management software that have been incorporated into the operating systems of computers.
Measuring Digital Values
Not all scanner or image processing controls work as well as expected, so it is often necessary to measure digital values -- either RGB levels for color images or % black for grayscale images. Most image processing software applications have a control that allows a user to measure the digital values for a single pixel or a group of pixels in an image, such as the Eyedropper in Adobe Photoshop. It is important to set the options for the Eyedropper to the appropriate setting before measuring values. All digital images have noise (random pixels of the incorrect shade or color) that makes measuring individual pixels problematic. It is recommended to set the Eyedropper to the setting that averages a set of 5 x 5 pixels (a square of 25 pixels). This will average out the variation due to noise.
Digital Image Processing
Oversampling
As previously noted, digital images have bit depths of 1-bit per pixel for black and white images (common for document imaging), 8-bits per pixel grayscale for continuous tone images, and 24-bits per pixel for color images. Generally, scanners will sample at bit depths higher than these, and then the bit depth is reduced for the final image. This is known as oversampling. Scanners are designed to oversample to improve image quality by reducing noise (random pixels of the wrong shade), and extending the effective tonal scale of the scanner (initially measuring more shades than are used in the final image). This allows a larger density range to be represented without loss of detail -- a problem when scanning color slides or transparencies and other very dense originals. Document scanners will sample at 8-bits to produce a 1-bit image, and a grayscale scanner will sample at 10-bits or 12-bits to produce an 8-bit image.
Image Processing Filters
Image processing filters -- mathematical formulas that change the appearance of digital images -- can be applied to improve the appearance of images and to assist with resizing images. Commonly, sharpening filters are used to enhance the appearance of digital image files. The need for sharpening is inversely proportional to the resolution of the digital image: lower resolution or smaller digital images tend to need more sharpening, and higher resolution or larger digital images tend to need less sharpening. Many people advocate not sharpening master image files, due to concern that the enhancement cannot be undone in the future. The most photographic sharpening filter is unsharp mask. This term comes from the graphic arts industry practice of using a reverse toned mask that is slightly out of focus to increase the visual sharpness of images. It is possible to over-sharpen an image: Over-sharpening with an unsharp mask filter will create light halos around sharp edges within images.
Another filter commonly used when resizing images is the blur filter. Slightly blurring an image creates additional shading along sharply defined edges in an image, which can allow the interpolation software do a better job when the image is resized. Most images have to be sharpened after resizing, whether or not a blur filter is applied.
Just as with interpolation algorithms, some image-processing filter algorithms will do a better job in terms of image quality than other algorithms, while others might work faster. Again, generally the filters in more expensive image processing software will tend to do a better job with image quality compared to the filters in less expensive software.
Histogram
A common image-processing tool is the histogram, found in most image processing software packages. The histogram is a graphic representation of the distribution of gray shades in an image. The height of each vertical line is proportional to the number of pixels that are of that shade -- the taller the line the more pixels of that shade. Also, the histogram can give indications of certain types of image defects, such as loss of tones in the shadows (dark values or shades) or the highlights (light values or shades) of an image. The histogram illustrates and helps our understanding of the concept of thresholding.
![]()
Thresholding
Thresholding is a technique used in image processing to convert gray shades to either black or white. All shades lower than a selected value are rendered as white and all shades higher are rendered as black. Depending on the value selected for the threshold, the representation of the same image can be altered dramatically. Most 1-bit scanners actually sample at 8 bits, but then a threshold value is used to convert the 8-bit image to a 1-bit image.
When are there problems using 1-bit digitization and thresholding? In cases of thermofax, verifax, or carbon copy processes where the paper ages as it darkens and the type fades, it is very difficult to reproduce the image with a 1-bit scan regardless of the threshold level. At lower threshold values the characters appear incomplete. As the threshold value is increased, the characters will quickly fill in (e.g., the letter "o" becomes a very large dot) and only the context within the word or sentence provides an idea of the character. Further increasing the level of the threshold will cause pixels representing shading in the background to turn black, an effect that is known as speckle. There are software programs designed to work with 1-bit scanning designed to despeckle an image. The software tries to remove extraneous black pixels in the image. Unfortunately, this doesn't always work the way you want. Parameters for despeckling can be adjusted, based on the size of the speckle you want to remove, but as the size of the speckle to be removed is increased, it will start removing periods, dots of "i"s, and other necessary punctuation.
![]()
![]()
Dithering
When using low bit depth images, it is possible to simulate a greater number of shades with fewer shades. This process is known as dithering. The key is to redistribute pixels according to a mathematical formula to produce synthetic shades of gray based on the arrangement of these pixels and the way the eye perceives them. There are different formulas for dithering, and some work better than others. If an 8-bit grayscale image is converted to a 3-bit image without dithering, broad areas of similar shades will be rendered as a single shade. In photographic terms, this effect is known as posterization. In digital images, this effect is sometimes referred to as banding, particularly when it appears across broad shade gradients, such as skies in photographs. When a 24-bit color image is converted to an 8-bit color image, the 8-bit file can be dithered. Dithering and an adaptive grayscale palette can be used to provide a very accurate rendition of an image with bit depths as low as 4-bits or 16 shades.
Tonal Controls
Each image processing software application has different controls for adjusting the tones and color balance of digital images. In Adobe PhotoShop, one of the most common image processing software packages, the preferred controls for tonal adjustments and color correction are Levels and Curves. Other controls are available in PhotoShop, such as Contrast and Brightness and Color Balance, but they are global corrections that influence all tones of the image. Levels and Curves offers greater control with less risk of losing information while adjusting images.
![]()
Tonal Scale Comparisons
The bit depth of a digital image has a big influence on how accurately an original document, photograph, or book is rendered in terms of the tones of the original. The tone reproduction for a 1-bit digital image is somewhat similar to the tonal response of microfilm -- high contrast and most suitable for rendering clean, printed type. An 8-bit grayscale image is more similar to continuous tone black-and-white photographic films used in still photography -- lower contrast and able to render a greater range of tones.
![]()
One way to compare tonal responses is to look at the characteristic curves for photographic films and a similar graph for the digitization response. A characteristic curve is a graphic representation of the response of a film to both exposure and development. The horizontal axis is exposure; as you move to the right on the x-axis, exposure increases. The vertical axis is density. A typical characteristic curve for microfilm shows that as exposure increases, there is a proportional large increase in density. The rate of increase in density compared to exposure, which is the slope of the line, is the contrast. Microfilm is a high-contrast photographic film with a limited range of tones that can be distinguished. This is somewhat similar to 1-bit (bitonal) digital images. In bitonal images, all tonal values on the original that are lighter than a selected tone will be rendered white, and all tonal values darker than the selected tone will be rendered black. The point at which the tones shift from white to black is the threshold value. Anything lighter than the threshold value will be rendered white, everything darker than the threshold value will be rendered black. The 1-bit digitization response is similar to microfilm. However, despite being high contrast, microfilm does have a range of shades unlike 1-bit images, which only have black or white values (all gray shades are eliminated).
![]()
If you look at the characteristic curve of a continuous-tone black-and-white still photography film, the curve looks different because the contrast is lower and, for most of these films, the length of the characteristic curve is longer. Both of these properties mean that a still photography film can render more shades, equivalent to scanning at a higher bit depth, and correspondingly can distinguish more shades. An 8-bit gray scale image has a response that is relatively similar to continuous-tone photographic films. However, an 8-bit image has a maximum of only 256 shades or levels. Most photographic films can effectively distinguish thousands of shades.
Clipping
Clipping happens when image detail is rendered as white or black and the image detail is lost. Once the tones have been clipped, it is not possible to get the tones back. It is important to adjust scanner controls to minimize clipping during scanning. Then, it is important to avoid clipping when using the tonal and color adjustment controls of image processing software.
Digital Image File Structure
The digital data that represent a complete image are contained within a computer file. The string of binary digits is arranged into an organized structure that allows the computer and software to interpret the data and recreate the image. A digital image file has several major parts.
Simple image file structure (from "Structures and Metrics for Image Storage and Interchange," JEI, Journal of Electronic Imaging co-published by SPIE [the International Society of Optical Engineering] and IS&T [the Society for Imaging Science and Technology], April 1993).
- Header
- * A file identifier
- * Image specification
- Image Data
- * Look-up table
- * Image raster
- Footer
- * File terminator
The file header identifies a digital file for the computer and includes an image specification indicating the file format. The image data section of the file, which in some cases includes a look-up table, follows the header. The look-up table is a defined set of colors or shades of gray that tells the computer how to represent the image on a computer monitor. The image raster is the strings of 1s and 0s representing each of the individual pixels representing a bitmap image. The final part of the file, the footer, tells the computer the entire file has been opened or downloaded.
Data and File Compression
Data and file compression is the process of reducing, through various means, the amount of data to be stored or transmitted. There are two broad categories of compression: lossless allows file reconstruction that is identical to the original and lossy discards certain amount of original information during the compression process. Some of the compression algorithms include:
- LZW (Lempel-Ziv-Welch) -- lossless
- JPEG (Joint Photographic Experts Group) -- lossy
- MPEG (Moving Pictures Experts Group) -- lossy
- Wavelet -- lossy
- Fractal -- lossy
Reformatting Comparison
Original documents with clean, printed type or text (with high inherent contrast between the type/text and the background, and sharply defined characters) can be reproduced using 1-bit scanning in the digital environment. This is comparable to traditional microfilm. If documents have low inherent contrast between the type/text, if characters have diffuse edges (such as carbon copies or other types of copy processes), or if there are photographs, then you should digitize with 8-bits (256 shades of gray) at a minimum, or use a low- to medium-contrast, continuous-tone photographic film. Finally, for color graphics, color text, or color photographs, you need to capture the color information in addition to shading. At a minimum, digitization should be done as 24-bit RGB (16 million colors) scanning; you can also use color photographic color.
![]()
________
Contributing to this chapter was Don Willis, whose earlier role in preservation is cited in Chapter VII: Case Studies--Working with Microfilm.
Besser, Howard and Jennifer Trant. Introduction to Imaging: Issues in Constructing an Image Database. Getty Art History Information Program, Santa Monica, CA, 1995. [Online] http://www.getty.edu/gri/standard/introimages/
Frey, Franziska and James Reilly. Digital Imaging for Photographic Collections: Foundations for Technical Standards. Image Permanence Institute, Rochester Institute of Technology, Rochester, NY, 1999. [Online] http://www.rit.edu/~661www1/sub_pages/8page3g.htm
A Guide to Digital Photography: Theory and Basics. Agfa Educational Publishing, Randolph, MA. [Online] http://www.agfa Home.com/publications/
An Introduction to Digital Scanning. Agfa Educational Publishing, Randolph, MA. [Online] http://www.agfa Home.com/publications/
Kenney, Anne and Oya Rieger. Moving Theory Into Practice: Digital Imaging for Libraries and Archives. Research Libraries Group, Mountain View, CA, 2000. [Online] http://www.rlg.org/preserv/mtip-order.html
Kodak Digital Learning Center. [Online] http://www.kodak.com/US/en/digital/dlc/book3/
RLG DigiNews on-line newsletter. [Online] http://www.rlg.org/preserv/diginews/
The Secrets of Color Management. Agfa Educational Publishing, Randolph, MA. [Online] http://www.agfa Home.com/publications/
Table of Contents
Northeast Document Conservation Center