MAXDIP

Overview

MAXDIP is an application to estimate a reciprocal recombination (cross-over) rate parameter (rho) and a gene conversion rate parameter (f) from population variation data. We apply an operational definition of gene conversion that is consistent with a number of genetic mechanisms whereby daughter chromosomes contain small tracts from a different parental chromosome.

MAXDIP uses unphased diploid polymorphism data, tolerates missing data, and can utilize information on the ancestral state of alleles, if that information is available. Sites with three or more alleles will be ignored. At present, samples of up to 50 diploids can be used. MAXDIP uses a maximum composite likelihood method described in Hudson (2001) and extended to include gene conversion in Frisse et al. (2001). An infinite-sites constant-population-size neutral model is assumed. The gene conversion model is that of Wiuf and Hein (2000), and assumes that the tract length is geometrically distributed.

The program can be run in either one of two modes. In the first mode, the user provides a range of rho values and, optionally, a mean gene conversion tract length and a single value of the gene conversion rate parameter f. The program will then compute the estimated composite likelihood of the data for a discrete number of rho values in the range. In the second mode, the user provides an initial rho value and, optionally, a mean gene conversion tract length and a range of f values. The program then finds the value of rho that maximizes the composite likelihood of the data for each value of f, as well as the (rho, f)-pair which results in the maximum composite likelihood for the specified set of f values. Ranges for both rho and f are specified by entering minimum and maximum values of the parameter, and a step size. Multiple data sets, with different sample sizes and numbers of sites, can be used simultaneously by simply concatenating the files.

It should be emphasized that the method is a composite likelihood method, not a maximum likelihood method. For example, approximate confidence intervals obtained by treating these composite likelihoods as if they were true likelihoods will almost certainly grossly underestimate the size of such intervals. Also, it should be noted that when recombination rates are heterogeneous, either within or between loci, estimates of f using MAXDIP may be biased.

Source code for MAXDIP (a version utilizing command line arguments) is available here.

Click here for MAXDIP citations.
 
    Instructions:
  • Follow steps 1-4 below. In Step 2, choose either Mode A or Mode B.
  • Please make sure to upload a file and not a directory.
  • Click on parameter links for more information.
  • The results will be e-mailed to the address provided below.
 
1. Upload a polymorphism data file. (File size is limited to 1 MB.)
Choose file:
File format:
File contains ancestral data: ancestral ID 
 
2. Select either mode A or B and enter the appropriate parameter values.
Mode A: Enter a range of rho values and, optionally, a value for f and a mean gene conversion tract length. MAXDIP will output the estimated composite likelihood for each value of rho in the range. 
Minimum value of rho:
Maximum value of rho:
Step size of rho:
-----------------------------Optional-----------------------------  
Value of f:
Mean gene conversion tract length:
---------------------------------------------------------------------  
Mode B: Enter an initial rho value and, optionally, a range of f values and a mean gene conversion tract length. MAXDIP will output the rho value that maximizes the composite likelihood for each value of f in the range. 
Initial value of rho:
-----------------------------Optional-----------------------------  
Minimum value of f:
Maximum value of f:
Step size of f:
Mean gene conversion tract length:
---------------------------------------------------------------------  
 
3. Select optional settings for reading input file.
Ignore ancestral data if present?  
Parse individual IDs for populations?
Estimate rho for total sample?
Parse alleles in case-sensitive mode?
Missing data symbols: ? N n - other: 
Allele frequency threshold: %
  
4. Enter and confirm an e-mail address to which results should be sent.
E-mail address:
Confirm e-mail address:
   
     
     
Input File Formats
 
MAXDIP accepts nucleotide polymorphism data in the following formats: Click the above links to get descriptions and examples of each format.
top
 
Parameter Definitions
 
rho:
rho equals 4Nr, where N is the effective population size and r is the reciprocal recombination rate between adjacent base pairs.
top
 
Initial value of rho:
MAXDIP limits its search for a rho value that maximizes the composite likelihood to within two orders of magnitude of the initial rho value. If MAXDIP returns a rho value that is near 100 times the initial value, the program should be re-run with either a larger or smaller initial rho value.
top
 
Minimum rho value:
This is the lower end of rho values for which MAXDIP will compute the composite likelihood.
top
 
Minimum rho value:
This is the upper end of rho values for which MAXDIP will compute the composite likelihood.
top
 
Step size of rho:
This is the increment that MAXDIP will apply to the rho values when computing the likelihoods between the minimum and maximum rho values.
top
 
Gene Conversion Parameters
 
f:
f is the gene conversion parameter. It is defined as the ratio of g/r, where g is the probability per generation that a gamete has a gene conversion tract, which starts at a specified site (g is the same as in Wiuf and Hein, 2000). Thus, f is the ratio of reciprocal to gene conversion rate per base pair.
top
 
Minimum value of f:
This specifies the lower end of the range of f values.
top
 
Maximum value of f:
This specifies the upper end of the range of f values.
top
 
Step size of f:
This specifies the size of the intervals between the f values in the range.
top
 
Mean gene conversion tract length:
This is the mean of a geometric distribution of tract length (equal to 1/q in the notation of Wiuf and Hein, 2000).
top
 
Settings for Reading Input File
 
Ignore ancestral data (if present):
If 'Yes' is selected, the program will assume all ancestral states are unknown. If the file does not contain the ancestor line (or if all the ancestral alleles are labelled missing) this button will have no effect.
top
 
Parse individual IDs for populations:
If 'Yes' is selected, the input file will be parsed into separate populations based on the leading alphabetic characters of the individuals' IDs. MAXDIP will then return rho estimates for each population in the data set. For example, if a file contains individual IDs CA23, AA100, and CAU54, MAXDIP will return separate rho estimates for populations CA, AA, and CAU.
top
 
Estimate rho for total sample:
If MAXDIP identifies more than one sample in the data (see
Parse individual ID's for populations) and 'Yes' is selected, MAXDIP will provide a rho estimate for a sample that includes all individuals in the file, in addition to providing rho estimates for all subsamples.
top
 
Parse alleles in case-sensitive mode:
Unless 'Yes' is selected, MAXDIP will assume upper and lowercase letters represent the same allele.
top
 
Missing data symbols:
Those symbols checked will be read as representing missing data (i.e., the allele is unspecified). The user can also enter an arbitrary string of characters to represent missing data.
top
 
Allele frequency threshold:
This is the minor allele or derived allele (if ancestral data is present) threshold frequency that must be met in order for the polymorphic site to be included in the MAXDIP analysis.
top
 
Output Definitions
 
MCLE of rho:
Value of rho which maximizes the composite likelihood of the data.
top
 
 
MAXDIP Citations

Frisse L, Hudson R. R, Bartoszewicz A., Wall J. D., Donfack J., and Di Rienzo A.
Gene conversion and different population histories may explain
the contrast between polymorphism and linkage disequilibrium levels.
Am. J. Hum. Genet. 69:831-843. (2001) [pdf]

Hudson, Richard R. Two-locus sampling distributions and their application.
Genetics 159:1805-1817. (2001) [pdf]

Wiuf, C., and Hein, J. The coalescent with gene conversion. Genetics 155:451-62. (2000)

top
 
 
Contact Information
Send E-mail inquiries to: David Witonsky
 
Supported by the National Institute of General Medical Sciences (GM61393-S1)