Handbook for Digital Projects:
A Management Tool for Preservation and Access



III

Considerations for Project Management

Stephen Chapman
Harvard University Library

Librarians and archivists are experts at project management. They routinely process groups of materials in selection, processing, cataloging, and preservation workflows. Digital projects, however, create new challenges. Perhaps the most difficult challenge is establishing clear boundaries, particularly stopping points. Managers of several noteworthy projects have written about their experiences in creating collections that require constant modification to keep pace with improvements in technology (Thomas, 1998).

This is not to say that digital conversion projects cannot be well planned in advance and successfully managed to conclusion. Many questions and challenges can be anticipated, and much of the workflow can be structured as batch activities with predictable outcomes. The purpose of this chapter is to give managers a clear understanding of the decisions that are typically under their control so they can form effective strategies to design, fund, and manage digitization projects.

Setting Goals


The best-managed conversion projects have clear goals. Brainstorming, the first phase of project management, is the time to talk about outcomes. "Starting at the end" is an effective way to ensure smooth beginnings. Too often there is a tendency to dive right into questions of technology -- e.g., which scanner should I buy? -- before articulating the purposes that digital reformatting must serve. Setting goals is a process of thinking about things from several angles before writing project plans. What are the possible outcomes for the collections? What are the potential benefits to users, to collection managers, and to the institution? What is a reasonable price -- in time and money -- to invest in new procedures, systems, and services? Is self-publishing a good idea, or are partnerships (with other institutions or even publishers) a better course to follow? Is this the right time to begin digitizing collections?

Good management is largely an act of communication. If the people who work on the project understand the desired outcomes, they will provide better services; they will be aware of their individual contribution and how it relates to what others are doing; they will know why they are digitizing collections (the vision thing); and, perhaps most importantly, they will be better at recognizing when things go wrong.

The starting-at-the-end approach refers to focusing on outcomes before analyzing source materials or evaluating conversion processes. As described in the following sections, outcomes generally fall into three categories: collections, digital reproductions, and institutional benefits. Before writing a project plan and budget, bring together all of the stakeholders who have an interest in these issues and establish priorities that everyone can accept.

The Collections

When one speaks of preservation and access as project goals, there is a certain transitive quality to the statement. Digital conversion projects are undertaken on behalf of original collections. (Original is used here to refer to any source material for scanning, regardless of its format.)

A popular rationale for investing in digital collections is that the surrogates will reduce, if not eliminate, the physical handling that threatens fragile or unique materials (Noerr, 1999). This sounds sensible, but beware of the responsibility of advancing this logic. Remember that digital collections do not make themselves, and consider that a collection is likely to be handled more during conversion than at any other time during its life in an institution. Digitizing for preservation, then, applies not only to outcomes, but also to the handling guidelines that will be mandated for the conversion process. Remember, too, that increased care and handling generally translate to increased cost.

Once materials have been selected for conversion, one should articulate the specific physical outcomes desired for the source materials. Whenever the originals are to be removed from circulation -- either by change in policy, transfer to offsite storage, or, more rarely, disposal -- imaging requirements will be high. As noted in Chapter VII Section 1, "Working with Printed Text and Manuscripts," high quality does not necessarily refer to high cost, but quality control, authentication of files and their sources, and other issues become more critical in cases where original materials cannot be easily retrieved or consulted.

In all projects, whether digitization is to serve access goals, preservation goals, or both, consider the following questions:

Other questions address access policies and cataloging.

Stating the goals for the original collections first will make it easier to narrow the wide range of choices of scanning technologies and methodologies. With rapidly deteriorating source materials -- such as newspapers, brittle books and journals, notebooks, and scrapbooks -- a hybrid approach to conversion might be desirable. These undertakings demand planning for two, or even three, workflows, creating digital surrogates for access, creating microfilm for preservation, and, if necessary, rehousing or otherwise treating the originals.

The Digital Reproductions

There is not a one-size-fits-all approach to scanning because there are many types of source materials, diverse audiences with a wide range of interests, and an ever-expanding choice of digital formats. The most diligent student of technology -- even as it relates to the field of digital libraries -- will not be able to keep up with new or emerging products. Even when people are knowledgeable about digital formats, it is wise to prepare for a discussion about various strategies. Do not assume there will be ready consensus on what is best.

Ultimately the project manager, not the technology manufacturer or distributor, must be the one to judge whether a given system will do the job that is needed. Librarians and archivists, rather than engineers, have the skills to describe in practical terms what the digital reproductions are supposed to do. To paraphrase Michael Ester, President of Luna Imaging, Inc., formulating the rationale for digitizing a collection relies upon curators' abilities to exercise their own good judgment. Technology can then be assessed according to project objectives rather than vice versa (Ester, 1997).

There are two schools of thought about developing specifications for digital reformatting. One advocates closely assessing the source materials, then relating the attributes of the digital reproductions to those of the originals. This practice is sometimes referred to as benchmarking (Kenney, 1999). The other recommends that attributes of digital reproductions be related to those of the hardware and software systems that will display or process them. As an example, consider working with printed originals, such as papyri. Scanning to create a high-quality print may not necessarily satisfy the requirement to magnify details on screen at 10:1.

Characterizing functional requirements from the user's point of view can make the job of defining technical specifications much easier. What resources are available to the audience(s) you intend to serve? Answering this question is particularly vexing when you want digital collections to persist. What assumptions does one make about the systems people will have ten to twenty years from now? When thinking about all of the ways that technology can be used to enhance access to collections, consider:

Benchmarking, by contrast, considers the interests of the collections' creators (original artists and publishers) and custodians to be as important as today's users. In this approach, the attributes of the source materials that need to be conveyed in the digital reproductions (either pictorially, in textual metadata, or both) are:

By adopting the user's and the owner's perspectives, the project manager will be in a better position to articulate project goals to staff and/or the vendors who offer systems and services. Successful working relationships can be established when representatives from cultural institutions can describe the functional requirements for the digital reproductions; representatives from industry can then respond with offerings of what technology can do -- they may even be motivated to create new systems.

Perhaps the most important goal at this point of a planning exercise is to answer the following question: Can you state functional requirements that can only be fulfilled by digital reproductions? If not, reformat your collections with an appropriate analog process (National Endowment for the Humanities, 1999).

One final note about defining requirements for the digital reproductions: A lifespan, even if only approximated, should be assigned to the electronic editions to help define technical requirements for conversion as well as the overall project budget. With analog formats, we can take for granted that the institution will bear the ongoing costs to store, catalog, and provide access to the reproductions. The overhead for storage facilities and supporting technologies such as circulation systems, photocopiers, and microform readers is considered to be affordable.

With digital formats, interventions will be comparatively frequent, and maintenance can be defined anywhere on the scale of simple copying (to new media and/or new formats) to budgeting for wholesale digital-to-digital conversion in order to maintain a standard level of service. It is one thing to preserve content, another to preserve a level of service. All this is to say that longevity is not a physical attribute of digital reproductions, but an assigned lifespan that is backed up by the recognition that today's decisions regarding digital quality and functionality will need to be supported by tomorrow's managers and portions of their operational budgets.

Benefits to the Institution

In recent years, many organizations have invested in digital projects with an eye toward realizing institutional benefits, as well as enhancing access to their collections. Oxford University, for example, categorizes digitization projects according to four objectives: Access, Infrastructure, Preservation, and Feasibility (Lee, 1999).

Research libraries in particular have been interested in feasibility and infrastructure projects for several years. These are important parts of a collective effort to test and disseminate tools, procedures, and methodologies. Managers in organizations of all sizes are often interested in monitoring processes of first-time digitization projects in order to conduct cost-benefit analyses. The experience gained by doing projects in-house helps organizations understand the overhead not only in creating digital collections, but also in maintaining and delivering them.

The following quotes from those with real-world experience in managing digital projects illustrate how different institutional goals can lead to different philosophies about creating electronic collections (or vice versa):

As we evaluate new reformatting technologies, we can 'keep it simple' by working on large quantities of material with few problems before working on smaller quantities of material with difficult problems (Waters, 1999).

If an electronic scholarly project can't fail and doesn't produce new ignorance, then it isn't worth a damn (Unsworth, 1999).

In the former case, the KISS principle applies, and the logic is that solving small problems helps institutions prepare for tackling larger ones. In the latter case, the bigger problems are more appealing, as the certain failure will itself represent a meaningful stride towards developing expertise.

Experience can produce tangible benefits as well. These include:

Project Planning: Creating a Plan of Work and Budget


Setting goals represents the thinking or brainstorming first phase of a project, and a good manager knows when to make the transition to planning, the second phase.

If a department or institution were to conduct only a single project -- and provide all necessary funding -- then it might be possible to skip planning and proceed directly to the work itself. The time invested in writing planning documents, however, will pay off during production. These documents are also fundamental stepping stones that lead from the first project to the second and third. If published, they also can serve as guideposts for other institutions planning digital conversion projects (Library of Congress, 1999). Examples of planning documents include:

From the internal perspective, these early management documents may be the most important products to emerge from a project. Some of the documents, such as RFPs or contracts, will have a direct impact on product quality. The plan of work, by contrast, will have a direct impact on the processes to initiate, undertake, and complete the project.

Several elements are essential to the plan of work, regardless of the nature of the source materials or demands of the core audience(s) to be served. Specific answers to the five questions below help to ensure that fewer problems will be encountered when the work begins.

(1) Who will do the work?

Practically speaking, this question comes first because many of the tasks will have to be carried out by people already in the organization. When it comes to staffing, perhaps it is more accurate to survey the organization and ask, "Who is available to do this type of work?" or "Who has the right skills to learn to participate in a digital conversion project?"

The second phase of charting out the staffing picture is to determine how many new FTE will be required. Always assume that somebody will need to be hired to get the job done. No matter how small or simple the project appears, a good rule of thumb is "the job is always more than one person can do." Medium-scale projects require several departments to work together. Large projects require coordination among multiple agencies, institutions, service bureaus, and publishing partners.

Large projects not only have multiple positions but also several people with appropriate expertise in each job category. Small projects, by contrast, will not require a dozen full-time employees, but someone will have to assume these roles if the work is to be executed with reasonable levels of responsibility. (Naturally, several of these jobs can be subcontracted.) Each of the following roles, or tasks, is too important to be excluded from a project that seeks to convert materials, maintain them for any reasonable length of time, and make them accessible via computer networks (or even CD-ROMs).

Project Staff -- Roles

For each of the project staff roles, decide where training is needed, who will provide it, when it should (or must) occur, and how much it will cost.

(2) What systems will need to be used or developed during the project?

In this context, systems refer to software, hardware, and the good old-fashioned brick-and-mortar facilities needed to store media. Although highly flexible, digital products are physical objects that must be located somewhere. It is important to specify before work begins where the digital objects will be stored, how long they must reside there in readable and accessible form, and who will be responsible for them.

Software and hardware requirements will vary, but the number of systems will be proportionate to the number of processes and tasks specified to be under local control. In other words, the capabilities of the local infrastructure define the limits of the work that can be done in-house.

Consider the medium to long-term consequences that will result from the hardware and software decisions you are inclined to make on behalf of the short-term needs of the project. Will you be willing to build throw-away systems? Will it be acceptable to abandon custom applications when a programmer leaves? Or will commercial solutions be required?

(3) What are the technical specifications for the image files and metadata?

Digital images and associated metadata (in a number of categories) comprise the raw stuff of image databases. If consistencies in searching and presentation are desired, then it is essential to mandate technical specifications for data elements, image formats, and access protocols. These specifications become even more important when interoperability with other collections is desired.

Chapter VII describes some of the practices and specifications employed to date. If exact specifications cannot be determined before actually scanning or cataloging materials, then the project plan should at least state the options under consideration.

(4) How much will the project cost?

Empirical evidence gathered from one's own collections is more convincing than anecdotal reports from other projects. One of the best ways to forecast project costs is to create a representative sample of the materials selected for conversion. In many cases, a half dozen items will be sufficient. If scanning will be outsourced, then the project budget should be finalized only after a sample has been put through an entire workflow -- scanning, processing, metadata creation (including full text), and quality control -- and the results have been inspected and approved by the appropriate stakeholders in the project. Many vendors are willing to provide this service as part of their response to the RFI or RFP in order to compete for a contract.

All activities conducted in-house should be accounted for as project costs or cost share. An advance walk-through of the proposed workflows can quickly reveal how well the manager has envisioned the process from brainstorming map to reality. Surprises can occur. It may take a considerable amount of time to retrieve materials from storage and pack them for shipment to a vendor; this time must be doubled, of course, to account for the return phase. The digital masters that you intended to be inspected during a 100% quality control check may take five minutes to open on the computer you have available for this task. Catalog records that seemed adequate upon initial cursory review require clean-up or additional information. Try to identify in advance where production bottlenecks can occur and make sure that the levels of budget and staffing in the plan of work allow room for such contingencies.

Finally, consider the impact of a timeline on all of the project costs, particularly staffing when salaries and benefits must be budgeted for fixed periods.

(5) Who will own and manage the digital products that will be produced?

This question applies to staffing, workflow, and the budget. As noted in a New York Times article in April 1999, questions regarding storage "cannot be resolved without considering the question of ownership." In the same article, Ann Okerson of Yale University observed, "I don't know how you can preserve something you don't own" (Hafner, 1999). Costs, at the end of this chapter, shows the financial impact of taking on the responsibilities to store and deliver digital collections. Because of the high costs of infrastructure to manage and distribute digital objects, perhaps it is not surprising to see that a number of partnerships between university libraries and publishers have emerged. (Early American Fiction published by Chadwyck-Healey and the University of Virginia Library is a representative example.) If an institution desires to own and distribute the digital reproductions it creates for any length of time, then it will be important to articulate these goals in a project plan, to purchase the systems and staffing necessary to manage them, and to ensure that either the institution or the project's funder(s) will fully support these components as well.

Project Implementation: Managing Workflow


The third and final phase of project management is implementation. Virtually all digital conversion projects require several workflows to be charted and managed. One exception might be an in-house keying project with light encoding. Projects will be completed sooner if the tasks are orchestrated in parallel or overlapping rather than linear workflows. Cost may be the bottom line in the project budget and other planning documents, but time is what must be accounted for in managing the actual work of converting collections.

The following activities typically are segregated into separate workflows. Separate individuals or departments might end up undertaking each activity.

Work would proceed more or less chronologically as listed above if materials were not segregated into batches during production. Selecting the appropriate size for each batch and following its progress carefully from start to end is the manager's principal responsibility. Gathering and reporting production statistics, problem logs, feedback from staff, and expenditures are all indicators of effective management.

When considering digital conversion from a hands-on perspective, it is easy to appreciate the efficiencies of working with batches of similar materials whenever possible. This is true for cataloging as well as scanning. If the materials themselves cannot be grouped in like categories, then work will often be structured in a series of steps, where technicians focus on specific tasks for meaningful, uninterrupted periods. With the appropriate configuration of a project facility, workers can be given the opportunity to break the repetitive cycle of one task (such as scanning) by moving to another (such as quality control and metadata creation). This practice facilitates high production and helps ensure consistent quality.

Guidelines and Best Practices for Management

One often hears about the need for best practices or guidelines for digital conversion. In the area of project management, the first measure of best practice is likely to be one of the ends justifying the means. If digital reproductions are well received and have been made in a timely and cost-effective fashion, the project will naturally be considered a success. Another measure, particularly from peer institutions and practitioners, might be in the quality of the documentation gathered throughout the project. Documenting the rationale, methodologies, systems, staffing models, costs, and most importantly, the lessons learned from a project helps the broader community (i.e., within the institution, funders, and other practitioners) benefit from the experience gained in a single project. The project manager has done his or her job well if the people who worked on it had a satisfying experience and if the future manager(s) of the digital collection can easily interpret why things were created in a particular way and what needs to be done to maintain, or even to improve, these first-generation digital objects.

Costs


Costs are difficult to generalize due to the wide spectrum of digital processes and products. Even when source materials and digital reproductions are comparable, investments can vary considerably in activities such as project planning and management, as well as in the infrastructure to store and deliver digital objects.

Conversion Costs

Conversion is a bounded project activity, regularly outsourced to specialists. Production scanning, OCR, text markup, and digital photography costs are relatively predictable. Trends over the last several years suggest that text and image conversion costs will remain stable or increase slightly -- not decrease -- although product quality may be improving in several areas. The base numbers provided below should be construed as realistic starting points for budgeting. Increases over base are approximations of the impact of the combined variables introduced by the nature of the source materials as well as the technical specifications for the digital objects.

Product Base Price Meaningful Cost Factors Increase Product(Over Base)
page images $ .25/page

- size (page dimensions)
- format (paper < microfilm)
- binding (removed < intact)- bit depth (b/w < grayscale < color)
- metadata (descriptive and structural)

2-6X (1-bit)
4-25X (8-24 bit)

full text $ .50/page* same as for page images above, plus
- required level of accuracy
- extent of markup ("lite" to full SGML)

6X + (keying)
2X + (markup)

images $3.00/image

size (dimensions of originals)
- handling requirements- tone/color reproduction requirements
- metadata

2-20X (pictorial)

* includes cost of page images

Full Project Costs

Underscoring the point that conversion to digital is only one of the steps leading to delivery of digital, the Internet Library of Early Journals Final Report states that the cost "per indexed page image accessible on the Internet" is approximately seven times higher than the unit cost of scanning and uncorrected OCR (see Note below). Since libraries and archives are actively integrating digital technologies into acquisition, cataloging, systems, and even preservation departments, perhaps it is legitimate to consider these activities as costs that live outside of a conversion project. Nevertheless, it is important to recognize that analog-to-digital publishing (including distribution) requires significant investments -- JSTOR and the Library of Congress National Digital Library Program provide two noteworthy, if large-scale, examples -- to develop and integrate new systems, services, and expertise. Commercial publishers are willing to provide these services, but the terms of such agreements must be reviewed carefully to balance interests of project budgeting with those of collection ownership and control.

Note: "Internet Library of Early Journals (January 1996 - August 1998), A project in the eLib programme, Final Report."March, 1999. [http://www.bodley.ox.ac.uk/ilej/papers/fr1999/] (October 24, 1999). See, paragraph 80: "The total of £458,000 [approx. $757,395 USD] represents an expenditure of £4.21 [approx. $6.96 USD] per indexed page image accessible on the Internet. This estimate of expenditure does not take into account the costs of the contribution of the IT and library infrastructures of the four Institutions." See also page E14.

Sources


Borghuis, Marthyn, et al. TULIIP Final Report. Elsevier Science, 1996. See Appendix X, "Checklist of aspects to be considered for the implementation of a 'digital library,' 337-44.

Chapman, Stephen and Anne R. Kenney. "Digital Conversion of Research Library Materials: A Case for Full Informational Capture." See Table 4. D-Lib Magazine (October 1996). [http://www.dlib.org/dlib/october96/cornell/10chapman.html] (November 4,1999).

Ester, Michael. Digital Image Collections: Issues and Practice, pp. 10-12.Washington, DC: Commission on Preservation and Access, 1996.

Hafner, Katie. "Books to Bytes: The Electronic Archive, Research Libraries Grapple With the Difficult Task of Preserving the Digital Present." The New York Times on the Web. April 8, 1999. [http://www.nytimes.com/library/tech/ 99/04/circuits/articles/08arch.html] (November 4, 1999).

Internet Library of Early Journals (January 1996-August 1998), A project in the eLib programme, Final Report. March 1999. [http://www.bodley.ox.ac.uk/ilej/papers/fr1999/] (October 24, 1999).

Kenney, Anne R. "Digital to Microfilm Conversion: A Demonstration Project, 1994-1996, Final Report to the National Endowment for the Humanities, PS-20781-94." Also see other publications about the methodology used by the Cornell University Department of Preservation and Conservation. [http://www.library.cornell.edu/preservation/pub.htm] (October 24, 1999).

Lee, Stuart D. Scoping the Future of the University of Oxford's Digital Library Collections, funded by the Andrew W. Mellon Foundation, Final Report. Appendix D, 1. Oxford University, August 1999.

The Library of Congress, for example, has published many useful "Background Papers and Technical Information," available on-line, during the course of creating American Memory collections. See the Technical Operations Documentation and White Papers, n.d. [http://memory.loc.gov/ammem/ftpfiles.html] (October 24, 1999).

The National Endowment for the Humanities Division of Preservation and Access, for example, accepts applications for digital conversion projects, but holds these applications to the standard that. ". . . digitization [must] significantly improve access to the collection and the ways in which [it] may be used for scholarship, education, or public programming." See "Considerations for Reviewers," August 1999.

Noerr, Peter. The Digital Library Tool Kit. Sun Microsystems, Inc. (April 1998):21. [http://webdoc.gwdg.de/ebook/aw/1999/sun/noerrfinal.pdf] (October 24,1999).

Puglia, Steven. "The Costs of Digital Imaging Projects." RLG DigiNews. 3:5 (October 15, 1999). [http://www.rlg.org/preserv/diginews/diginews3-5.html#feature] (November 4, 1999).

Thomas, Timothy. "Physical Review Online Archives (PROLA): An Image Archive for the Journal Physical Review," D-lib magazine (June 1998). [http://www.dlib.org/dlib/june98/06thomas.html] (October 24, 1999). Mr. Thomas reports that "an electronic archive is by no means static. [It] requires constant modification to keep up with the current high rate of technical change."

Unsworth, John. "The Importance of Failure," Journal of Electronic Publishing 3:2 (December 1997). [http://www.press.umich.edu/jep/03-02/unsworth.html] (October 24, 1999).

Waters, Donald J. Electronic Technologies and Preservation. Washington, DC: Commission on Preservation and Access, 1992. [http://www.clir.org/pubs/ reports/waters/waters2.html] (October 24, 1999).

 
Table of Contents

 


Northeast Document Conservation Center
100 Brickstone Square
Andover, MA 01810-1494
Telephone: (978) 470-1010
Fax: (978) 475-6021
               http://www.nedcc.org


Last Modified: May 13, 2003

Copyright 2000. Northeast Document Conservation Center