RECSLIDER |
||||
| Overview RECSLIDER is a program that calculates the population recombination (cross-over) rate parameter rho (4Nr) per bp from population variation data over a "sliding window". RECSLIDER calculates rho using the program MAXDIP, but it does so for overlapping subsets, or windows, of the data, so that variations in the recombination rate across a surveyed region can be measured. The windows of RECSLIDER are defined in terms of a fixed number of polymorphic sites, instead of a fixed number of base pairs, but the estimate of rho for each window is normalized by the size of the window, thus giving a rho per base pair result. RECSLIDER can be used in either one of two ways. In the first way (Step 1 below), the user specifies only a minimum window size and RECSLIDER performs the sliding window analysis just once. The resulting output gives rho per bp estimates as a function of median window position. The results are returned to the user in a tab-delimited text file, which can later be imported into a program like Excel to generate sliding window plots. Alternatively, the user can specify both a minimum window size and a maximum window size. RECSLIDER will then perform the sliding window analysis iteratively over all window sizes in this range. This second method is useful for identifying possible "hotspots" or "coldspots" of recombination, because RECSLIDER will also search the results for the largest window in which the recombination rate is either greater than or less than a user-specified rho cutoff (Step 2 below). This type of analysis can also be run with the option to test the statistical significance of the hotspot/coldspot results (Step 3 below). By performing coalescent simulations, based on user-specified parameters, RECSLIDER will estimate the probability that a window of the discovered size or greater has a rho value equal to at least (at most) the hotspot (coldspot) cutoff. It is strongly recommended that users first read Wall et al. (2003) before performing a hotspot (coldspot) search. Rho cutoffs should be specified before the analyses, or else the estimated significance levels may be anti-conservative.
RECSLIDER uses unphased diploid polymorphism data (e.g., from autosomal loci) and tolerates missing data. Sites with three or more alleles will be ignored. At present, samples of up to 50 diploids with up to 2000 polymorphic sites can be used. Click here for data formatting instructions. The current implementation is based on a recombination model with cross-overs only, but a future version will incorporate gene conversion.
| ||||
|
||||
| Parameter definitions | ||||
| Initial rho estimate: RECSLIDER searches for a local maximum of the composite likelihood, starting with the specified initial value of rho. Recslider will not search for a maximum beyond 100 times the initial value of rho. If RECSLIDER returns an estimate near this maximum, a warning message will appear in the output file, indicating that the program should be re-run with a larger initial value of rho. This should be entered as a per base pair value. | ||||
| top | ||||
| Minimum window size:
This is the initial value for the number of segregating sites in each sliding window. This value cannot be left blank. . | ||||
| top | ||||
| Maximum window size:
This is the final value for the number of segregating sites in each sliding window. Starting from the minimum window size, RECSLIDER will perform the sliding window analysis for each window size, incrementing the window size by 1 until the maximum window size is reached. Leave this value blank if you wish to perform the sliding window analysis for a single window size. | ||||
| top | ||||
| Coldspot rho cutoff: The program will search for the largest subset in the range of window sizes for which the rho value is less than or equal to the coldspot rho cutoff. This should be entered as a per base pair value. (See Wall et al. 2003 for details on how to set cutoff values.) | ||||
| top | ||||
| Hotspot rho cutoff: The program will search for the largest subset in the range of window sizes for which the rho value is greater than or equal to the hotspot rho cutoff. This should be entered as a per base pair value. (See Wall et al. 2003 for details on how to set cutoff values.) | ||||
| top | ||||
| Surveyed sequence length: Used in the simulations, this specifies the length in base pairs of the entire surveyed region. | ||||
| top | ||||
| Theta: Population mutation parameter (4Nµ, where N is the effective population size and µ is the mutation rate per generation) per base pair. | ||||
| top | ||||
| Rho: Population recombination parameter (4Nr, where N is the effective population size and r is the crossing over rate per generation) per base pair. | ||||
| top | ||||
| Number of replicates (up to 1000): Number of independent coalescent simulations run to estimate significance levels (maximum of 1000). | ||||
| top | ||||
| Output definitions | ||||
| The sliding window results will be returned as a tab-delimited text file attached to an e-mail. The first column of the file shows the sliding window size. The second column shows the median nucleotide position of each window, based on the numbering system used in the input data file. The third column shows the estimated value of rho per bp for each window. The fourth column is a message from recslider, indicating either that the rho value represents a hotspot/coldspot or that a warning has been generated by maxdip. | ||||
| top | ||||
| Recslider citations:
Richard R. Hudson
| ||||
| top | ||||
| Contact information: | ||||
| Send E-mail inquiries to: David Witonsky | ||||
| Supported by the National Institute of General Medical Sciences (GM61393-S1) | ||||