Lawrence W. Wright1*, Jay B. Lichter2, John Reinitz1, Mark A. Shifman1, Kenneth K. Kidd3, Perry L. Miller1
1Center for Medical Informatics and Department of Anesthesiology
2Department of Genetics
3Departments of Genetics, Psychiatry and Biology
*To whom reprint requests should be sent
Yale University School of Medicine
New Haven, CT 06520-8009
USA
Abstract
Building a map of restriction sites from double digest gel data can be a complex and frustrating task, especially when many DNA fragments are detected or when the gel results are ambiguous. "Double Digester" is an interactive, graphical computer program which helps researchers understand and resolve such data. It explicitly represents the experimental data, the associated uncertainties, the researcher's hypotheses, and possible map interpretations. Alternative solutions are frequently possible, and the differences between them may help determine which additional experiments might resolve ambiguities. Initial use has confirmed the benefits of this approach, and has suggested ways in which it can be refined and extended. Double Digester meets the need for a practical tool to help build restriction maps, and also illustrates how a computer-based tool can confront experimental uncertainty in an integrated fashion.
Introduction
Double digest restriction mapping is one of the simplest and most widely used techniques available to molecular biologists. Two restriction enzymes are used, separately and together, to digest a cloned DNA fragment. The resulting fragments are run on an electrophoretic gel along with one or more standards whose fragment lengths are known. The digested fragment sizes are estimated by comparing their positions to those of the standards. The map of enzyme cutting sites is inferred by determining how such fragments could have been produced.
Building restriction maps in this way is a fundamental step in characterizing most newly cloned fragments of DNA. Bench scientists typically spend from 20 minutes to many hours constructing a usable map from a photograph of a gel. Although the double digest problem has been extensively studied, we know of no computational tool for solving it that is widely used. To fill that gap, we propose a new approach to how such a tool should work.
If double digest data were precise and complete, reliably identifying all fragments and estimating their sizes within narrow limits, the correct map of enzyme cutting sites could readily be found for most clones and pairs of enzymes. Unfortunately, double digest data and the map-building process present many different types of uncertainty, and allowing realistic levels for all of them can make it difficult or infeasible to calculate all possible maps. Even when the results are calculated, the set of possible maps may prove too numerous and diverse to be useful. Yet without computer assistance, it is often tempting to accept the first map that is found to fit reasonably with the data, even though this may not be the true map.
Over the years, researchers have developed several different algorithms, with many heuristic refinements, to allow the construction of such maps both by hand and by computer (surveyed in Zehetner and Lehrach, 1986; Zehetner et al., 1987; Lander, 1989). In creating the computer program "Double Digester," we have sought to build on and extend this work to provide a practical tool for bench scientists, following three central criteria.
1. Uncertainty in the raw data and map-building process must be a central focus, with intuitive representations and realistic defaults. This includes uncertainty about the exact size of each fragment, small fragments which may have gone undetected, discrepancies in fragment counts and molecular weights which can result from two fragments appearing as a single band ("doublets") or from partial digestion, unresolved ambiguities in possible map interpretations, and identifying cases where the data are too ambiguous to resolve a map.
2. User understanding and judgement are integral to dealing with uncertain and ambiguous experimental data. No sophisticated algorithms can substitute for an experimenter's understanding of the precise character of the data and the extent to which simplifying assumptions are reasonable. An interactive, graphical interface is crucial to enable users to easily enter data and guide the analysis. Fragment sizes, undetected fragments and other parameters need to be readily available for users to inspect and modify. User judgement is also essential in deciding when to test hypotheses concerning, for example, bands which might be doublets or bands in different lanes which are likely to be the same piece of DNA.
3. Map interpretations must be correct and complete. Users may choose to make simplifying assumptions, ruling out possibilities they consider unlikely, but such assumptions and related algorithmic oversights (discussed in Ho, 1990 and Ho et al., 1990) should not be built into the underlying map-building process.
Our approach takes advantage of the combined strengths of the computer and the human researcher. The computer manages the complexities of processing uncertain and potentially ambiguous data, but the experimenter is left in control and can flexibly test various hypotheses. We believe that this type of integrated approach to uncertainty provides a powerful paradigm for allowing the computer to help in the interpretation of experimental gene mapping data.
System and methods
Double Digester is programmed in Common Lisp, using the Common Lisp Interface Manager (CLIM) version 1.1 to create a user interface portable across Macintosh, X Window System, Microsoft Windows and other platforms. Development is done on a Sun Sparcstation 2 running Lucid Common Lisp 4.0.2 under SunOS 4.1.2 with OpenWindows 3.0, and on a MacIIfx running Macintosh Common Lisp 2.0 under System 7.1. The distributed application is tested under SunOS 4.1.3 and on various Macintoshes running earlier versions of Systems 6 and 7.
Double Digester runs on color, grayscale and monochrome displays, and automatically rescales all fonts and graphics to suit virtually any size screen. Disk and memory requirements are more demanding: the Macintosh version needs 2.8 megabytes of disk space and at least 5 megabytes of memory to run well, while the Sun version uses 12 megabytes on disk and runs best with at least 24 megabytes of installed memory and 40 megabytes of swap space.
Implementation
The best way to introduce this integrated approach to experimental uncertainty is by taking the reader through an example of how Double Digester performs its analyses, seen from the experimenter's perspective. The underlying methodology is discussed in the next section.
Data entry and critiquing
Figure 1 shows data from a lambda phage clone, omitting the large bands for each arm. The initial fragment size estimates are displayed in Double Digester's "Digest Data" window as bands on a logarithmically scaled gel, with the gel standard in the leftmost lane. The appearance of the window reflects that of the gel over its log-linear range (Sambrook et al., 1989). This allows users to work directly from a photo of a gel, inserting bands with a mouse against a displayed standard, and also allows visual inspection for correctness. Alternatively, estimated sizes can be typed interactively or read from text files. The likely range of fragment size error is indicated by vertical bars at the ends of each fragment band, using split sub-lanes where ranges overlap. All size estimates and defaults can be changed by clicking on the "Error" button (to change the ranges globally), on the lane headers (to change ranges for a single lane) or on the individual fragments (to change the size or range of one fragment).
Beneath each lane appear totals for the estimated actual, minimum and maximum sizes of the fragments in that lane. Below these are shown the program's current settings for the default error level in estimating fragment sizes, whether to use a tighter (same gel) basis for comparing fragment sizes, the size below which small fragments might have gone undetected, and a limit on how liberally to allow for such fragments when constructing maps.
After fragment sizes are entered, the data are reviewed for possible problems in the number and sizes of fragments in each lane. In this example, Double Digester offers a two-part critique. First, it warns that the minimum sizes for fragments in the EcoRI lane total 30 bp more than the maximum sizes in the double digest lane, and suggests checking for doublets or partial-digestion products. Second, it notes that linear DNA would normally have one more fragment in the double digest lane, or one fewer in single digest lanes, and that, while this could be explained by either coincidences in cutting sites or unseen small fragments, it might be worth reviewing the reported data. Undetected small fragments in the double digest lane could explain both discrepancies, and no evidence was seen for doublets or partially digested DNA.
Defining constraints
The experimenter often has additional information about some fragments. For example, hybridization results might indicate that a particular double digest fragment is contained in a particular single digest fragment. The experimenter may also be willing to assert certain hypotheses, such as that two fragments are very likely the same or that one fragment is definitely bigger or smaller than another, whatever their exact sizes. Such constraints can be easily added or dropped, and help make Double Digester's analysis more focused and efficient.
In this example, no other data were available, but the HindIII 1.4 kb fragment appeared identical to that in the double digest lane, as did the 1.45 kb and 1.85 kb fragments in the EcoRI and double digest lanes. These three likely identities were entered as constraints. Other possible constraints include "not identical," "bigger than," "smaller than," "part of," "not part of," "contains" and "does not contain."
Constructing maps
"Calculate Maps" can be selected at any time to calculate all maps consistent with the current data and constraints. Maps are ranked and presented according to the average "goodness of fit" between single and double digest fragments: 0% means that all aligned fragments match perfectly in size, while 100% means that the maximum size allowed on one side of each match equals the minimum size allowed on the other.
Double Digester can display each summary map in three formats, as shown in Figure 1. The linear text format shows one column for each double digest fragment, in map order, with each fragment indicated by its length. Overlapping single digest fragments are shown above and below each double digest fragment, and lines between the columns indicate enzyme cutting sites. In addition, both linear and circular drawings are available (either can be suppressed).
If small unseen fragments have been hypothesized, they are indicated on the linear drawing. Three are shown in Figure 1, and the mouse cursor is pointing to one of them, so details about that map region are displayed next to the map and at the top of the screen.
Local uncertainties may remain, where the order of adjacent fragments cannot be determined, or where whole sections of the map are interchangeable due to map sites cut by both enzymes. Double Digester condenses such ambiguities into a single "summary map," but notes them on the screen as shown in Figure 1:
1. A text line at the top summarizes the order ambiguities. Two coincidences, where cutting sites for both enzymes are too close to distinguish, delineate groups of fragments which could equally well have their order reversed internally or as groups. The second ambiguity is a group of two EcoRI fragments whose order within the HindIII-7.75 fragment is reversible.
2. Also on the text-format map, the interchangeable EcoRI fragments are grouped by a horizontal bar joining the EcoRI cutting sites.
3. On the linear graphical map, the mouse cursor is pointing to the region containing the interchangeable EcoRI fragments, so the detailed information displayed above the linear map includes a line describing the local order ambiguity.
Deciding on additional experiments
For this phage clone data, only one possible summary map was found, containing both local order ambiguities and three hypothesized small fragments. These results reinforced the earlier observations of discrepancies between the lanes in fragment counts and total genetic material. A second gel was run, and revealed two new HindIII fragments and three new double digest fragments. Figure 2 shows the new data, with two new identity constraints for two pairs of newly found fragments which appeared identical.
Five maps are now possible. The alignments of the larger fragments shown in the earlier summary map are all preserved in all five new maps, and their order is now unambiguous. The remaining minor ambiguities were judged to be unimportant; had they mattered, the five maps would have been compared to decide on a strategy for resolving them.
Algorithm
A substantial literature exists discussing the computationally difficult nature of what is widely called "the double digest problem," and describing a variety of possible algorithms for providing solutions. Viewed theoretically, the problem is NP-complete, meaning that its computational complexity could overwhelm available computing resources given a sufficiently large number of fragments to analyze (Lander, 1989). There are no rigorous analyses available of the computational demands posed by real experimental data. Nevertheless, simplifying assumptions have often been used to allow practical computation.
Double Digester attempts to resolve this problem by combining fast, incremental and robust methods for building restriction maps with an interactive approach to dealing with the major types of uncertainty. Test runs with real data, ranging up to dozens of fragments with realistic estimates of error, consistently provide results within a few seconds, or at most minutes, either giving a ranked list of all possible maps, or indicating that such maps are almost certainly too numerous and diverse to provide any useful information.
Three basic approaches have been used to tackle the double digest problem. The first is to test the possible permutations of the single digest fragments, comparing the double digest fragments these permutations would produce against those actually observed. This was first proposed and implemented by Pearson (1982), and has been used occasionally since including some simulated annealing variants (Goldstein and Waterman, 1987; Grigorjev and Mironov, 1990). This approach does not require preset bounds on possible fragment sizes, but its ability to find correct maps falls off quickly with errors in fragment size, and it also suffers from mushrooming combinatorics as the number of fragments grows (Lander, 1989).
The second approach attempts to build out from one point on a map, using a generate-and-test strategy to extend and test partial maps using fragments from all three lanes simultaneously. This approach was first used by Stefik (1978), with several later variations (Durand and Bregegere, 1984; Polner et al., 1984; Allison and Yee, 1988; Dix and Kieronska, 1988). It requires fixed size limits to be able to rule out partial maps. If realistic error levels are propagated and all possible alternatives are pursued, this approach should give results similar to the approach described next, with no clear advantages.
The third approach also uses upper and lower bounds on the size of each fragment, but first calculates all ways in which each single digest fragment can be built from the double digest fragments, and then uses these possible match sets to build maps. This approach was first outlined by Fitch et al. (1983), and has been used by several researchers since with various modifications (Nolan et al., 1984; Bellon, 1988; Krawczak, 1988; Tuffery et al., 1988). It is much more robust to measurement errors than the first approach (Lander, 1989), allows efficient computation and provides an intuitive framework for analyzing the data.
Double Digester follows this third approach, merging into each stage routines to deal with data uncertainties and map ambiguities in a realistic way.
1. Minimum and maximum size limits for each observed fragment are set by default to reflect the characteristics of standard agarose gels, but are easily changed.
2. The fragment data for all lanes are compared, looking for indications of problems such as doublets and partial digestion, without imposing any assumptions.
3. Calculations of how each single digest fragment can be built from the double digest fragments include the possibility of undetected small fragments, and also reflect the closer comparability of size estimates for nearby fragments run on the same gel. All possibilities for each fragment are calculated, and those later eliminated are stored with an explanation for possible review. Cases where there are likely to be far too many possible maps to be useful are trapped at the earliest stage possible, and users are given the choice of continuing or stopping to constrain the possibilities.
4. Summary maps indicate the remaining ambiguities in the ordering of fragments, and are ranked according to overall "goodness of fit."
The details of how Double Digester handles each of these stages are set out below.
Estimating fragment size uncertainty
DNA fragments tend to migrate through agarose gels at rates inversely proportional to the logarithm of the number of base pairs (Sambrook et al., 1989), and Double Digester plots all size data on a simple log-linear scale. This relationship breaks down, however, for both very large and very small fragments: mobility is lower at the top of the gel, and more irregular at the bottom. In its default calculations of sizing error, Double Digester models this decreasing discrimination in the formula
log10 L = log10(3,000) - (0.03)d + (2.2 x 10- 9)d5 - (5.0 x 10- 10)d7
where L is size in base pairs and d is proportional to the distance from the 3 kb center of the log-linear range of the gel. This equation is essentially log-linear between about 1 kb and 8 kb. Double Digester uses the inverse of this equation to determine size values for deviations in d at any given point on the gel. Empirical estimates from experienced bench scientists were used to obtain deviations in d which roughly correspond to various standard-deviation likelihoods that the actual size will prove to be within a particular range. Full details are available on request.
The default level for fragment size error is set to approximate two standard deviations. While only a rough approximation, experience to date has shown it to be a useful one, much closer to reality than, for example, the common approach of using flat percentages for all fragment sizes. The default size limits for the 3.8 kb EcoRI fragment in our example are 3,572 and 4,044 bp (about 6% variation), while a 12 kb fragment would have limits of 10,294 bp (14% smaller) and 14,536 bp (21% larger).
Double Digester also allows setting error ranges in terms of percentages or base pairs, for use with other types of gels or in other special conditions.
Reviewing fragment data for problems
Double Digester examines every set of digest data for indications that doublets, partially digested fragments, undetected small fragments or other problems might exist. When possible problems are identified, Double Digester reports them with an explanation of likely causes.
For a double digest with N1 and N2 fragments in the single digest lanes and Nd fragments in the double digest lane, it should normally be true that Nd = N1 + N2 for circular molecules, and Nd = N1 + N2 - 1 for linear molecules. Two related factors tend to lower the value of Nd:
1. Each coincidence reduces Nd by one. A coincidence is effectively anywhere that sites for each enzyme are close enough for the resulting fragment to go undetected, something which is not uncommon in normal double digest experiments.
2. Undetected small fragments tend to reduce Nd, as any occurring in one of the single digest lanes should also occur, either cut or not by the other enzyme, in the double digest lane.
A larger-than-expected value for Nd, on the other hand, very likely indicates some error in interpreting observable bands. Any discrepancies in fragment counts are reported to the user.
Size totals for the double digest lane are also somewhat more likely to be depressed, due to the problem of unseen small fragments. This effect can easily be swamped, however, by problems estimating the size of larger fragments, and it is normal to find some divergence in lane totals even with very good data. Double Digester therefore reports only those cases where the minimum sizes estimated for one lane exceed the maximum sizes estimated for another.
Fragment matching
In calculating matches of single digest to double digest fragments, all possibilities are calculated where any overlap exists in allowed sizes. Thus, Double Digester's default error range allows the 7.75 kb HindIII fragment to be matched with any set of double digest fragments whose maximum sizes total at least 7,101 bp, and whose minimum sizes do not exceed 8,572 bp. This basic algorithm is modified to reflect two special considerations.
1. If the possibility of small unseen fragments is allowed, small double digest fragments may be hypothesized up to the specified size and number to reach the target match size. Thus, using initial defaults, the 1.45 kb double digest fragment is considered a possible match for the 1.85 kb EcoRI fragment by hypothesizing an unseen 0.4 kb companion.
2. If the fragments were run on the same gel, a narrower range of possible sizes is allowed for matching between similarly sized fragments because some of the sources of error would tend to distort both in the same direction. At most, this adjustment equals half the sizing error allowed for each double digest fragment, and is indicated by two-tiered range bars when "Used same gel" is set to "Yes," as in Figure 1.
The possible matches for each single digest fragment are then tested in several steps, weeding out all but those which can actually be used to build one or more complete maps. If an initial possibility is ruled out, it is not simply discarded. The match and reason for rejection are stored, and can be examined by interested users. There are four steps in analyzing the set of possible matches for each single digest fragment.
1. User constraints are applied first.
2. Several tests for compatibility are applied, such as that if only the 1.4 kb double digest fragment matches the 1.4 kb HindIII fragment, it cannot also match another HindIII fragment.
3. For each enzyme separately, all consistent sets of match possibilities are calculated, such that all double digest fragments appear once and only once in each set.
4. Finally, all possible complete maps are constructed (where a consistent solution is found involving both enzymes), using the single enzyme sets from step 3.
For steps 2-4 above, the number of possible combinations is compared with predefined thresholds. If a threshold is exceeded, the user is warned that the problem may strain available computing resources, and that there are anyway likely to be so many maps that the results would not be useful. Exhaustive computations on test data have consistently confirmed the latter part of this warning. The user can choose to let calculation continue, or stop to add constraints.
Map ranking and presentation
When possible matches between single and double digest fragments are initially calculated, the difference in estimated sizes as a percentage of the maximum allowed is stored, providing a measure of the "goodness of fit" between the fragments. For each map, these percentages are averaged to provide a score for the map as a whole, and maps are presented to the user in order, from best fit to worst. The user can also print all maps, prefaced by a summary of the input data, the error bounds assumed in the analysis, any constraints which were defined, and how the double digest fragments might match each single digest fragment.
Each map provides a unique way of matching up all detected fragments, but can contain two types of local order uncertainty:
1. Where two or more fragments of one enzyme are entirely contained by a single fragment of the other enzyme, the order of those fragments is undetermined.
2. Where two or more "coincidences" occur, the order of map segments thus delimited is undetermined. With multiple cut sites per enzyme, and reasonable settings for size error and for small unseen fragments, this problem is more common than has often been assumed.
Discussion
Double Digester has two purposes: to provide a practical tool to help build restriction maps, and to explore how computer software can deal flexibly and interactively with the many uncertainties inherent in genetic physical mapping data. Double digest restriction data can be extremely difficult to interpret, yet most researchers still do so without the aid of any of the algorithms or programs previously developed. This situation suggested a fundamental mismatch between the nature of the problem and the approaches used to address it. Computational complexity, and the related problems that programs often ran very slowly or gave incorrect results, did not seem adequate explanation, particularly when interpretation without a computer was clearly both extremely slow and frequently inaccurate.
Full and realistic representation of experimental uncertainty, and user control over the values used and hypotheses tested, appeared to be the missing ingredients. Where data clearly determine an unambiguous map, a solution can be quickly provided. Where this is not so, there are no magic formulas which can replace an experimenter's judgement in deciding which values are more or less reliable, and which hypotheses are worth testing. The computer should quickly and accurately calculate the implications of any given set of assumptions, but leave the user in control of which paths to pursue. A wide variety of methods proved necessary to achieve this:
1. Uncertainty in estimating fragment sizes is presented graphically, with realistic defaults which can be changed easily using several different metrics.
2. The possibility of undetected small fragments is made an explicit part of the problem.
3. Discrepancies in fragment counts and total material per lane are reported, with possible explanations; the user can then decide whether to make changes or proceed.
4. Information and hypotheses about relationships between fragments can also be easily entered, tested and possibly dropped.
5. Users are quickly warned when computation is likely to be excessive, and the eventual map interpretations too numerous and diverse to prove useful. They can decide at each stage whether to proceed, or to modify their data and hypotheses first.
6. Details of all match possibilities at each stage of the analysis are available for inspection, with an indication of either which maps they are found in or why they have been rejected.
7. Summary maps condense and point out local ambiguities, where the positions of neighboring fragments are undetermined or whole sections could be interchanged. This can greatly reduce the number of maps produced, while highlighting issues which remain unresolved.
8. All maps consistent with current settings are presented, ranked according to the average goodness of fit of the constituent fragment alignments.
9. Data, assumptions and maps are all displayed graphically, in ways intended to maximize user understanding and control. Additional details are normally available by placing the mouse cursor over displayed objects, and most data and assumptions can be changed simply by clicking on them and selecting new values.
There are three main areas where Double Digester could be improved and extended as a practical tool for everyday lab use.
1. Data models and handling could be extended and improved, including more types of gels, user-modifiable error formulas and integration with laboratory databases.
2. Map constraints and analysis could reflect more naturally and fully the often rich set of other data which is available. Some desirable features would be direct entry of vector data and the results of hybridization and other experiments, merging data and results from other digests, and merging double digest restriction maps with other partial or completed maps.
3. Reimplementation in a different programming language and graphics system could reduce Double Digester's memory and disk space demands while increasing the responsiveness of the interface. Common Lisp is excellent for exploring issues of data representation and user interface, but problematic for creating agile end-user applications.
Double Digester is nevertheless ready to be put to the test of wider use within the research community. We are making it freely available to interested researchers, to test its performance in diverse laboratory settings and to get feedback on how we might refine and extend the approach. All versions, documentation and downloading instructions are available via anonymous ftp from ftp.cs.yale.edu (128.36.0.36), in the "/pub/double_digester" directory.
In summary, no sophisticated mathematical technique or other "magic pill" can make the problem of uncertainty in double digest analysis disappear. Instead, the problem of experimental uncertainty should be approached at several levels, using a variety of techniques in an integrated fashion. The computer handles uncertainty explicitly in several ways in its internal analyses. The user explores the implications of the uncertainty in a robust interactive fashion using the graphical interface. In this way, the researcher and the computer work together to find a solution to the problem.
Acknowledgements
This work is supported in part by NIH Grants R01 HG00175 and R01 HG00365 from the National Center for Human Genome Research, by NIH Grant T15 LM07056 from the National Library of Medicine, and by Grant MH39239 from the National Institute of Mental Health.
References
Allison,L. and Yee,C.N. (1988) Restriction mapping is in separation theory. Comput. Applic. Biosci., 4, 97-101.
Bellon,B. (1988) Construction of restriction maps. Comput. Applic. Biosci., 4, 111-115.
Dix,T.I. and Kieronska,D.H. (1988) Errors between sites in restriction site mapping. Comput. Applic. Biosci., 4, 117-124.
Durand,R. and Bregegere,F. (1984) An efficient program to construct restriction maps from experimental data with realistic error levels. Nucleic Acids Res., 12, 703-716.
Fitch,W.M., Smith,T.F. and Ralph,W.W. (1983) Mapping the order of DNA restriction fragments. Gene, 22, 19-29.
Goldstein,L. and Waterman,M.S. (1987) Mapping DNA by stochastic relaxation. Adv. Appl. Math., 8, 194-207.
Grigorjev,A.V. and Mironov,A.A. (1990) Mapping DNA by stochastic relaxation: a new approach to fragment sizes. Comput. Applic. Biosci., 6, 107-111.
Ho,S.T.S. (1990) Restriction site mapping. M.S. thesis, Monash University.
Ho,S.T.S., Allison,L.. and Yee,C.N. (1990) Restriction site mapping for three or more enzymes. Comput. Applic. Biosci., 6, 195-204.
Krawczak,M. (1988) Algorithms for the restriction-site mapping of DNA molecules. Proc. Natl. Acad. Sci. USA, 85, 7298-7301.
Lander,E.S. (1989) Analysis with restriction enzymes. In Waterman, M. (ed.),Mathematical Methods for DNA Sequences. CRC Press, Boca Raton, pp.35-51.
Nolan,G.P., Maina,C.V. and Szalay,A.A. (1984) Plasmid mapping computer program. Nucleic Acids Res., 12, 717-729.
Pearson,W.R. (1982) Automatic construction of restriction site maps. Nucleic Acids Res., 10, 217-227.
Polner,G., Dorgai,L. and Orosz,L. (1984) PMAP, PMAPS: DNA physical map constructing programs. Nucleic Acids Res., 12, 227-236.
Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor.
Stefik,M. (1978) Inferring DNA structures from segmentation data. Artif. Intell., 11, 85-114.
Tuffery,P., Dessen,P., Mugnier,C. and Hazout,S. (1988) Restriction map construction using complete sentences compatibility algorithm. Comput. Applic. Biosci., 4, 103-110.
Zehetner,G., Frischauf,A. and Lehrach,H. (1987) Approaches to restriction map determination. In Bishop,M.J. and Rawlings,C.J. (eds.), Nucleic Acid and Protein Sequence Analysis: A Practical Approach. IRL Press, Oxford, pp.147-164.
Zehetner,G. and Lehrach,H. (1986) Approaches to restriction map determination. Nucleic Acids Res., 14, 335-349.
Figure legends
Fig. 1. Double Digester interface screen after analyzing the example phage clone data. The data are displayed graphically in the "Digest Data" window. The "Constraints" window shows three identity constraints that have been entered, postulating that pairs of fragments seen in different lanes are the same. The "Maps" window shows a single summary map, containing some local order ambiguity and size mismatches allowed by hypothesizing unseen small fragments; all three available map formats are shown.
The buttons across the top of each window usually produce dialog boxes or menus of choices. Many other displayed objects, such as gel lane headers and fragments, also act as buttons which each produce a menu of options related to that object. Buttons are shown in italics when they are inactive; thus, the "Prev" and "Next" map selection buttons in the top left part of the "Maps" window are both italicized, because only one summary map was found. The top window shows additional information about the part of the map pointed to by the mouse.
Fig. 2. The analysis of the same phage clone as in Fig. 1, after adding data on five small fragments detected on a second gel. A unique order has been determined for all of the originally detected fragments, but five variants are possible in placing the new small fragments. The gel standard lane and circular map drawing have been dropped.