MAXDIP is an application to estimate a reciprocal recombination (cross-over) rate parameter (rho) and a gene conversion rate parameter (f) from population variation data. We apply an operational definition of gene conversion that is consistent with a number of genetic mechanisms whereby daughter chromosomes contain small tracts from a different parental chromosome.
MAXDIP uses unphased diploid polymorphism data, tolerates missing data, and can utilize information on the ancestral state of alleles, if that information is available. Sites with three or more alleles will be ignored. At present, samples of up to 50 diploids can be used. MAXDIP uses a maximum composite likelihood method described in Hudson (2001) and extended to include gene conversion in Frisse et al. (2001). An infinite-sites constant-population-size neutral model is assumed. The gene conversion model is that of Wiuf and Hein (2000), and assumes that the tract length is geometrically distributed.
The program can be run in either one of two modes. In the first mode, the user provides a range of rho values and, optionally, a mean gene conversion tract length and a single value of the gene conversion rate parameter f. The program will then compute the estimated composite likelihood of the data for a discrete number of rho values in the range. In the second mode, the user provides an initial rho value and, optionally, a mean gene conversion tract length and a range of f values. The program then finds the value of rho that maximizes the composite likelihood of the data for each value of f, as well as the (rho, f)-pair which results in the maximum composite likelihood for the specified set of f values. Ranges for both rho and f are specified by entering minimum and maximum values of the parameter, and a step size. Multiple data sets, with different sample sizes and numbers of sites, can be used simultaneously by simply concatenating the files.
It should be emphasized that the method is a composite likelihood method, not a maximum likelihood method. For example, approximate confidence intervals obtained by treating these composite likelihoods as if they were true likelihoods will almost certainly grossly underestimate the size of such intervals. Also, it should be noted that when recombination rates are heterogeneous, either within or between loci, estimates of f using MAXDIP may be biased.
Source code for MAXDIP (a version utilizing command line arguments) is available here.
Click here for MAXDIP citations.
|Input File Formats|
|MAXDIP accepts nucleotide polymorphism data in the following formats:|
rho equals 4Nr, where N is the effective population size and r is the reciprocal recombination rate between adjacent base pairs.
|Initial value of rho:|
MAXDIP limits its search for a rho value that maximizes the composite likelihood to within two orders of magnitude of the initial rho value. If MAXDIP returns a rho value that is near 100 times the initial value, the program should be re-run with either a larger or smaller initial rho value.
|Minimum rho value:|
This is the lower end of rho values for which MAXDIP will compute the composite likelihood.
|Minimum rho value:|
This is the upper end of rho values for which MAXDIP will compute the composite likelihood.
|Step size of rho:|
This is the increment that MAXDIP will apply to the rho values when computing the likelihoods between the minimum and maximum rho values.
|Gene Conversion Parameters|
f is the gene conversion parameter. It is defined as the ratio of g/r, where g is the probability per generation that a gamete has a gene conversion tract, which starts at a specified site (g is the same as in Wiuf and Hein, 2000). Thus, f is the ratio of reciprocal to gene conversion rate per base pair.
|Minimum value of f:|
This specifies the lower end of the range of f values.
|Maximum value of f:|
This specifies the upper end of the range of f values.
|Step size of f:|
This specifies the size of the intervals between the f values in the range.
|Mean gene conversion tract length:|
This is the mean of a geometric distribution of tract length (equal to 1/q in the notation of Wiuf and Hein, 2000).
|Settings for Reading Input File|
|Ignore ancestral data (if present):|
If 'Yes' is selected, the program will assume all ancestral states are unknown. If the file does not contain the ancestor line (or if all the ancestral alleles are labelled missing) this button will have no effect.
|Parse individual IDs for populations:|
If 'Yes' is selected, the input file will be parsed into separate populations based on the leading alphabetic characters of the individuals' IDs. MAXDIP will then return rho estimates for each population in the data set. For example, if a file contains individual IDs CA23, AA100, and CAU54, MAXDIP will return separate rho estimates for populations CA, AA, and CAU.
|Estimate rho for total sample:|
If MAXDIP identifies more than one sample in the data (see Parse individual ID's for populations) and 'Yes' is selected, MAXDIP will provide a rho estimate for a sample that includes all individuals in the file, in addition to providing rho estimates for all subsamples.
|Parse alleles in case-sensitive mode:|
Unless 'Yes' is selected, MAXDIP will assume upper and lowercase letters represent the same allele.
|Missing data symbols:|
Those symbols checked will be read as representing missing data (i.e., the allele is unspecified). The user can also enter an arbitrary string of characters to represent missing data.
|Allele frequency threshold:|
This is the minor allele or derived allele (if ancestral data is present) threshold frequency that must be met in order for the polymorphic site to be included in the MAXDIP analysis.
|MCLE of rho:|
Value of rho which maximizes the composite likelihood of the data.
Hudson, Richard R. Two-locus sampling distributions and their application.
Wiuf, C., and Hein, J. The coalescent with gene conversion. Genetics 155:451-62. (2000)
|Send E-mail inquiries to: David Witonsky|
|Supported by the National Institute of General Medical Sciences (GM61393-S1)|