SLIDER

Overview

SLIDER is an application for computing a variety of summary statistics of population genetic data over a "sliding window". In a sliding window analysis, these statistics are calculated for a small frame, or window, of the data. The window incrementally advances across the surveyed region and, at each new position, the reported statistics are calculated for the polymorphism data contained within the window. In this way, variation of each statistic across the surveyed region can be measured. This type of analysis allows one to investigate how patterns of variation change across a surveyed genomic segment; it is typically applied to the detection of local signatures of natural selection or the identification of local changes in mutation or recombination rates (e.g., hot spots of mutation or recombination).

The statistics that SLIDER calculates fall into three classes: statistics for a single population sample (e.g., nucleotide diversity per base pair); statistics for a single population sample, with an outgroup to provide information on the ancestral state of the alleles (e.g., Fay and Wu H statistic); and statistics for multiple population samples (FST). The full list of statistics is given below.

A surveyed region in SLIDER can be one continuous segment of DNA or several noncontiguous, nonoverlapping segments (representing, for example, different coding or noncoding regions of a gene). Each segment is defined by a start position and a stop position. Any positional numbering system can be used, as long as the numbers are positive integers and every segment's stop position is greater than its start position. A window in SLIDER advances along a segment until the window's end position moves beyond the segment's last position. If another segment lies beyond the current segment, the gap between the two segments is spanned by the window, if a certain criterion is met. When a gap is spanned, the segments on either side of the gap are treated as if they were contiguous. If the criterion is not met, the sliding window will "jump" the gap and continue from the first position of the next segment.

SLIDER can be configured so that its window size is defined either in terms of a fixed number of base pairs, a fixed number of segregating sites, the length of each surveyed segment (dynamic sizing), or the total length of the surveyed segments. A more complete description of SLIDER's four window types is given below.

SLIDER can accept polymorphism data in several different file formats. The numbering of site positions within the file must be consistent with the numbering used to define the segments. An individual's ID or a site's allele can be represented by any string of characters that does not include white space (spaces, tabs, etc.). Missing data can also be represented by a user-defined string of characters not containing white space or by any of a default set of symbols (see SLIDER's optional settings). SLIDER has very specific ways of handling missing data. For a detailed description of how SLIDER handles missing data for a given statistic, click on the statistic's link.

 
    Instructions:
  • Follow Steps 1-5 below.
  • Please make sure to choose a file and not a directory. Maximum file size is 1 megabyte.
  • For a more complete description of input parameters, click on the parameter links.
  • Results will be e-mailed to the address provided below.
 
1. Upload a file containing polymorphism data.
Choose file:
File format:
File contains ancestral data: ancestral ID  
 
2. Enter the segments of the surveyed region in the box below.
 
Segments are non-overlapping ranges of nucleotide positions. Each segment should be entered as a start position followed by a stop position, separated by white space or a hyphen. Enter only one segment per line. Segments can be listed in any order. At least one segment must be entered.
   
3. Choose the class of statistics you wish SLIDER to report. Then select or deselect the statistics in that class that you want calclated.
I
Single
Population
S:
π:
Var(π):
Watterson's θ:
Tajima's D:
Fu and Li's D*:
Haplotype number:
Haplotype diversity:
Mean Heterozygosity:
II
Single Population
with Outgroup
Fay and Wu's H:
Fu and Li's D:
Frequency spectrum:
Average Divergence:
 
 
 
 
 
 
 
III
Multiple
Population
Fst:
 
 
 
 
 
 
 
 
 
4. (Optional) Change SLIDER's default settings.
window defined by:
window size: (bp or segregating sites)
window increment: bp
window gap percentage: %
minimum allele frequency: %
parse individual ID's for population:
parse alleles in case-sensitive mode:
missing data symbols: ? N n - other: 
job description:
 
5. Enter and confirm an e-mail address, and click the 'Submit' button. Results will be returned in e-mail attachments (see below).
 
E-mail address:
Confirm e-mail address:
 
     
     
File formats
SLIDER accepts nucleotide polymorphism data in the following formats: Click the above links to get descriptions and examples of each format.
top
Parameter definitions
 
Window:
A SLIDER window is a small frame of DNA nucleotide positions that is defined by a begin position and an end position. There are four different ways in which a "window" can be defined in SLIDER: by "number of base pairs", by "number of polymorphic sites", by "dynamic sizing", and by "total surveyed length".
  • Number of base pairs: Each window contains the same number of base pair positions lying within the segments of the surveyed region. The begin position of the first window is the start position of the first segment in the surveyed region. The window advances by adding a fixed window increment to the begin position of the previous window. This sum becomes the begin position of the next window, unless the position falls between two segments of the surveyed region, in which case the the begin position is set to the start position of the next segment, or the position falls beyond the stop position of the last segment, in which case the window advancement stops. The end position of each window is found by adding one less than the fixed window size to the window's begin position. If this position falls between two segments of the surveyed region, then the window will either span the gap between the two segments or jump to the next segment, depending on whether or not the gap percentage criterion is met.
  • Number of segregating sites: Each window contains the same number of segregating sites. Because the number of segregating sites is fixed, the window size (in base pairs) is variable. The begin position of the first window is the start position of the first segment in the surveyed region. The end position of the first window is the position halfway between the nth and the (n + 1)st segregating sites, where n is the number of segregating sites in the window. The windows advance by 1 segregating site, so that the begin position of the next window is the position halfway between the positions of the first and second segegrating sites of the previous window, and the end position advances in a similar manner. If the begin position falls between two segments of the surveyed region, then the start position of the next segment becomes the new window begin position. If the end position falls between two segments, then ...
  • Dynamic sizing: Each window is a complete segment of the surveyed region. The begin and end positions of a window correspond to the start and stop positions of a segment.
  • Entire region: A single window contains all the segments of the surveyed region. The window's begin and end positions are the start position of the first segment and the stop position of the last segment, respectively.
top
 
Sliding window size:
When the sliding window is defined by the number of base pairs, the window size should be given in terms of the number of base pairs to be included in each window (the default value is 1000). The selected slider statistics will be calculated for each window of this length that falls within When the sliding window is defined by number of segregating sites, the sliding window size should be given in terms of the number of segregating sites that are to be included in each window (the default value is 5).
top
 
Sliding window increment:
This value is used to increment the begin position of the sliding window. When the sliding window is defined by the number of base pairs, the window increment should be given in terms of the number of base pairs to advance the window (the default value is 100 bp). When the sliding window is defined by the number of segregating sites, the window increment is set to be 1 segregating site.
top
 
Gap percentage:
When the window is defined in terms of the number of base pairs, the window will span a gap between one or more surveyed segments only if the gap length is less than the gap percentage multiplied by the window size. When a gap is spanned by a window, the segments on either side of the gap are treated as if they were contiguous. For example, consider a surveyed region that is covered by two segments with site positions 1-1000 and 1200-2400, and thus has an intervening gap length of 199 base pairs. With the default settings (window size = 1000 bp, window increment = 100 bp, gap percentage = 50%), SLIDER's first window would begin at position 1 and end at position 1000. Because the window increment is 100 bp, SLIDER's window would advance to begin at position 101; however, it could not end at position 1101, since this position falls in the gap between the two segments. Instead, since 199 < 50% × 1000 and the gap percentage criterion is met, the second window would span the gap to end at position 1299 (the window includes the 1000 base pair positions 101-1000 and 1200-1299). On the other hand, if the gap percentage were set to say 10%, the gap percentage criterion would not be met, and the second window would instead jump to the start position of the second segment, and thus begin at position 1200 and end at position 2199. A window will span multiple gaps, if the gap percentage criterion is met for the sum of the gap lengths.
top
 
Treat missing data conservatively:
If 'True' is selected, haplotypes containing missing alleles are assumed to be completely unknown and are not included in the haplotype counts. If 'False' is selected, a haplotype with one or more missing alleles will be counted as the most commonly occurring known haplotype to which it could match.
top
 
Missing allele symbols:
Unless these are unchecked, the program assumes the symbols '?', 'N', 'n', and '-' (dash) represent missing alleles. You also have the option of entering your own string of characters to represent missing data.
top
 
Parse alleles in case-sensitive mode:
If same letter is used in upper and lowercase to denote different alleles (e.g., 'A' and 'a'), select 'Yes'. Otherwise, the program will be case insensitive.
top
 
Minimum allele frequency:
This is the minor allele (or derived allele, if ancestral data is present) threshold frequency that must be met in order for the polymorphic site to be included in the analysis.
top
 
Parse individual ID's for populations:
If 'Yes' is selected, SLIDER will use the leading alphabetic characters of the individual ID's to determine the different sample populations within the data. For example, the individual names CA23, AA100, and CAU54 will create populations CA, AA, and CAU, respectively. SLIDER results will then be returned for each subpopulation, as well as for the total population.
top
 
Job description:
The user can use this space to place information describing the SLIDER job. The job description will be written in the body of the email containing the SLIDER results. Any text is acceptable.
top
 
Single Population Statistics
S (number of polymorphisms):
S is the number of polymorphic sites that are contained within the window. Under SLIDER's default settings, S is the number of biallelic sites.
top
 
π (nucleotide diversity):
Given that there are typically a different number of unknown alleles for each site within a window, SLIDER separately calculates the nucleotide diversity (πj) for each site using only those chromosomes for which the allele is known. For a single polymorphic site with K different alleles and missing data, the nucleotide diversity is

where xji is the number of chromosomes having allele ai at site j. If there are n chromosomes in the sample and m of these have missing data, then ∑xi = n - m. If the site is biallelic and has no missing data, the above formula for πj reduces to
where pj and qj are the allelic frequencies x1/n and x2/n. The nucleotide diversity (Π) for a window containing S polymorphic sites is then
If L is the size of the window in base pairs, the nucleotide diversity per base pair is π = Π/L.
top
 
Variance of π:
Under an infinite sites model and the assumption of random mating, E(π) = θ and
where θ = 4Neu, Ne is the effective size of the population, and u is the mutation rate per base pair per generation.
top
 
Watterson's θW:
With n chromosomes in the sample and S segregating sites in the window, θW is given by

where

top
 
Tajima's D:
Using the above definitions of Π and θW, Tajima's D is given by
top
 
Fu and Li's D*:
top
 
Haplotype number:
top
 
Haplotype diversity:
top
 
Average divergence:
top
 
Single Population Statistics w/Outgroup
Fu and Li's D:
top
 
Fay and Wu's H:
top
 
Multiple Population Statistics
Fst:
top
 
Output
 
After the submit button has been clicked, SLIDER will respond with either a confirmation page or an error page. If the input file is successfully processed and a valid e-mail address is used, the sliding window results will be returned in one or more (depending on the number of populations analyzed) e-mail attachments. A SLIDER run typically takes no more than a minute. SLIDER results files are in text format and tab-delimited, so that they can be easily imported into most spreadsheet programs. Each line of a SLIDER results file will show the window's begin position, the window's end position, and the window's size in base pairs, followed by the selected statistics for the window.
top
 
 
 
 
Contact information:
Send E-mail inquiries to: David Witonsky
 
Supported by the National Institute of General Medical Sciences (GM61393-S1)