RECSLIDER

Overview

RECSLIDER is a program that calculates the population recombination (cross-over) rate parameter rho (4Nr) per bp from population variation data over a "sliding window". RECSLIDER calculates rho using the program MAXDIP, but it does so for overlapping subsets, or windows, of the data, so that variations in the recombination rate across a surveyed region can be measured. The windows of RECSLIDER are defined in terms of a fixed number of polymorphic sites, instead of a fixed number of base pairs, but the estimate of rho for each window is normalized by the size of the window, thus giving a rho per base pair result.

RECSLIDER can be used in either one of two modes. In the first mode (Steps 1-2 below), the user specifies only a minimum window size and RECSLIDER performs the sliding window analysis just once. The resulting output gives rho per bp estimates as a function of median window position. The results are returned to the user in a tab-delimited text file, which can later be imported into a program like Excel to generate sliding window plots.

Alternatively, the user can specify both a minimum window size and a maximum window size (Step 2). RECSLIDER will then perform the sliding window analysis iteratively over all window sizes in this range. This second method is useful for identifying possible "hotspots" or "coldspots" of recombination, because RECSLIDER will also search the results for the largest window in which the recombination rate is either greater than or less than a user-specified rho cutoff (Step 3 below). This type of analysis can also be run with the option to test the statistical significance of the hotspot/coldspot results (Step 4 below). By performing coalescent simulations, based on user-specified parameters, RECSLIDER will estimate the probability that a window of the discovered size or greater has a rho value equal to at least (at most) the hotspot (coldspot) cutoff. It is strongly recommended that users first read Wall et al. (2003) before performing a hotspot (coldspot) search. Rho cutoffs should be specified before the analyses, or else the estimated significance levels may be anti-conservative.

RECSLIDER uses unphased diploid polymorphism data (e.g., from autosomal loci) and tolerates missing data. Sites with three or more alleles will be ignored. At present, samples of up to 50 diploids with up to 2000 polymorphic sites can be used. Click here for data formatting instructions. The current implementation is based on a recombination model with cross-overs only, but a future version will incorporate gene conversion.

 
    Instructions:
  • Follow Steps 1-5 below.
  • Please make sure to choose a file and not a directory.
  • For a more complete description of input parameters, click on the parameter links.
  • Clicking on 'Data set file' will open new window with a sample input data file.
  • Results will be e-mailed to the address provided.
 
1. Upload a polymorphism data file. (Maximum file size 1 MB.)
Data set file:
File format:
 
2. Enter the parameters for performing the sliding window analysis. Leave 'maximum window size' blank if you wish to analyze only a single window size.
Initial rho estimate:
Minimum sliding window size:
Maximum sliding window size:
 
3.Do you also wish to search within the range of window sizes for a hotspot/coldspot?
Yes No
If yes, enter a coldspot and/or hotspot rho cutoff.
Coldspot rho cutoff:
Hotspot rho cutoff:
 
4. If you selected 'Yes' for Step 2, do you wish to estimate the significance levels of your coldspot/hotspot? This could significantly increase the processing time, depending on the number of replicates you choose.
Yes No
If yes, enter the parameters for the simulations.
Surveyed sequence length:
Theta:
Rho:
Number of replicates:
 
5. Enter an address to which results should be e-mailed and click 'Submit'.
E-mail address:
Confirm e-mail address:
   
     
     
Parameter definitions
 
Initial rho estimate:
RECSLIDER searches for a local maximum of the composite likelihood, starting with the specified initial value of rho. Recslider will not search for a maximum beyond 100 times the initial value of rho. If RECSLIDER returns an estimate near this maximum, a warning message will appear in the output file, indicating that the program should be re-run with a larger initial value of rho. This should be entered as a per base pair value.
top
 
Minimum window size:
This is the initial value for the number of segregating sites in each sliding window. This value cannot be left blank. .
top
 
Maximum window size:
This is the final value for the number of segregating sites in each sliding window. Starting from the minimum window size, RECSLIDER will perform the sliding window analysis for each window size, incrementing the window size by 1 until the maximum window size is reached. Leave this value blank if you wish to perform the sliding window analysis for a single window size.
top
 
Coldspot rho cutoff:
The program will search for the largest subset in the range of window sizes for which the rho value is less than or equal to the coldspot rho cutoff. This should be entered as a per base pair value. (See Wall et al. 2003 for details on how to set cutoff values.)
top
 
Hotspot rho cutoff:
The program will search for the largest subset in the range of window sizes for which the rho value is greater than or equal to the hotspot rho cutoff. This should be entered as a per base pair value. (See Wall et al. 2003 for details on how to set cutoff values.)
top
 
Surveyed sequence length:
Used in the simulations, this specifies the length in base pairs of the entire surveyed region.
top
 
Theta:
Population mutation parameter (4N, where N is the effective population size and is the mutation rate per generation) per base pair.
top
 
Rho:
Population recombination parameter (4Nr, where N is the effective population size and r is the crossing over rate per generation) per base pair.
top
 
Number of replicates (up to 1000):
Number of independent coalescent simulations run to estimate significance levels (maximum of 1000).
top
 
Output definitions
 
The sliding window results will be returned as a tab-delimited text file attached to an e-mail. The first column of the file shows the sliding window size. The second column shows the median nucleotide position of each window, based on the numbering system used in the input data file. The third column shows the estimated value of rho per bp for each window. The fourth column is a message from recslider, indicating either that the rho value represents a hotspot/coldspot or that a warning has been generated by maxdip.
top
 
 
Recslider citations:

Richard R. Hudson
Two-locus Sampling Distributions and Their Application
Genetics 159: 1805-1817. (2001) [pdf]

Frisse L, Hudson R. R, Bartoszewicz A., Wall J. D., Donfack J., and Di Rienzo A.
Gene Conversion and Different Population Histories May Explain
the Contrast Between Polymorphism and Linkage Disequilibrium Levels.
Am. J. Hum. Genet. 69: 831-843. (2001) [pdf]

Wall J. D., Frisse L. A., Hudson R. R., and Di Rienzo A.
Comparative Linkage-Disequilibrium Analysis of the β-Globin Hotspot in Primates
Am. J. Hum. Genet. 73: 1330-1340. (2003) [pdf]

top
 
 
Contact information:
Send E-mail inquiries to: David Witonsky
 
Supported by the National Institute of General Medical Sciences (GM61393-S1)