SLIDER |
||||
| Overview SLIDER is an application for computing a variety of summary statistics of population genetic data over a "sliding window". In a sliding window analysis, these statistics are calculated for a small frame, or window, of the data. The window incrementally advances across the surveyed region and, at each new position, the reported statistics are calculated for the polymorphism data contained within the window. In this way, variation of each statistic across the surveyed region can be measured. This type of analysis allows one to investigate how patterns of variation change across a surveyed genomic segment; it is typically applied to the detection of local signatures of natural selection or the identification of local changes in mutation or recombination rates (e.g., hot spots of mutation or recombination). The statistics that SLIDER calculates fall into three classes: statistics for a single population sample (e.g., nucleotide diversity per base pair); statistics for a single population sample, with an outgroup to provide information on the ancestral state of the alleles (e.g., Fay and Wu H statistic); and statistics for multiple population samples (FST). The full list of statistics is given below. A surveyed region in SLIDER can be one continuous segment of DNA or several noncontiguous, nonoverlapping segments (representing, for example, different coding or noncoding regions of a gene). Each segment is defined by a start position and a stop position. Any positional numbering system can be used, as long as the numbers are positive integers and every segment's stop position is greater than its start position. A window in SLIDER advances along a segment until the window's end position moves beyond the segment's last position. If another segment lies beyond the current segment, the gap between the two segments is spanned by the window, if a certain criterion is met. When a gap is spanned, the segments on either side of the gap are treated as if they were contiguous. If the criterion is not met, the sliding window will "jump" the gap and continue from the first position of the next segment. SLIDER can be configured so that its window size is defined either in terms of a fixed number of base pairs, a fixed number of segregating sites, the length of each surveyed segment (dynamic sizing), or the total length of the surveyed segments. A more complete description of SLIDER's four window types is given below. SLIDER can accept polymorphism data in several different file formats. The numbering of site positions within the file must be consistent with the numbering used to define the segments. An individual's ID or a site's allele can be represented by any string of characters that does not include white space (spaces, tabs, etc.). Missing data can also be represented by a user-defined string of characters not containing white space or by any of a default set of symbols (see SLIDER's optional settings). SLIDER has very specific ways of handling missing data. For a detailed description of how SLIDER handles missing data for a given statistic, click on the statistic's link.
| ||||
|
||||
| File formats | ||||
| SLIDER accepts nucleotide polymorphism data in the following formats: Click the above links to get descriptions and examples of each format. | ||||
| top | ||||
| Parameter definitions | ||||
| Window: A SLIDER window is a small frame of DNA nucleotide positions that is defined by a begin position and an end position. There are four different ways in which a "window" can be defined in SLIDER: by "number of base pairs", by "number of polymorphic sites", by "dynamic sizing", and by "total surveyed length".
| ||||
| top | ||||
| Sliding window size: When the sliding window is defined by the number of base pairs, the window size should be given in terms of the number of base pairs to be included in each window (the default value is 1000). The selected slider statistics will be calculated for each window of this length that falls within When the sliding window is defined by number of segregating sites, the sliding window size should be given in terms of the number of segregating sites that are to be included in each window (the default value is 5). | ||||
| top | ||||
| Sliding window increment:
This value is used to increment the begin position of the sliding window. When the sliding window is defined by the number of base pairs, the window increment should be given in terms of the number of base pairs to advance the window (the default value is 100 bp). When the sliding window is defined by the number of segregating sites, the window increment is set to be 1 segregating site. | ||||
| top | ||||
| Gap percentage:
When the window is defined in terms of the number of base pairs, the window will span a gap between one or more surveyed segments only if the gap length is less than the gap percentage multiplied by the window size. When a gap is spanned by a window, the segments on either side of the gap are treated as if they were contiguous. For example, consider a surveyed region that is covered by two segments with site positions 1-1000 and 1200-2400, and thus has an intervening gap length of 199 base pairs. With the default settings (window size = 1000 bp, window increment = 100 bp, gap percentage = 50%), SLIDER's first window would begin at position 1 and end at position 1000. Because the window increment is 100 bp, SLIDER's window would advance to begin at position 101; however, it could not end at position 1101, since this position falls in the gap between the two segments. Instead, since 199 < 50% × 1000 and the gap percentage criterion is met, the second window would span the gap to end at position 1299 (the window includes the 1000 base pair positions 101-1000 and 1200-1299). On the other hand, if the gap percentage were set to say 10%, the gap percentage criterion would not be met, and the second window would instead jump to the start position of the second segment, and thus begin at position 1200 and end at position 2199. A window will span multiple gaps, if the gap percentage criterion is met for the sum of the gap lengths. | ||||
| top | ||||
| Treat missing data conservatively:
If 'True' is selected, haplotypes containing missing alleles are assumed to be completely unknown and are not included in the haplotype counts. If 'False' is selected, a haplotype with one or more missing alleles will be counted as the most commonly occurring known haplotype to which it could match. | ||||
| top | ||||
| Missing allele symbols: Unless these are unchecked, the program assumes the symbols '?', 'N', 'n', and '-' (dash) represent missing alleles. You also have the option of entering your own string of characters to represent missing data. | ||||
| top | ||||
| Parse alleles in case-sensitive mode: If same letter is used in upper and lowercase to denote different alleles (e.g., 'A' and 'a'), select 'Yes'. Otherwise, the program will be case insensitive. | ||||
| top | ||||
| Minimum allele frequency: This is the minor allele (or derived allele, if ancestral data is present) threshold frequency that must be met in order for the polymorphic site to be included in the analysis. | ||||
| top | ||||
| Parse individual ID's for populations: If 'Yes' is selected, SLIDER will use the leading alphabetic characters of the individual ID's to determine the different sample populations within the data. For example, the individual names CA23, AA100, and CAU54 will create populations CA, AA, and CAU, respectively. SLIDER results will then be returned for each subpopulation, as well as for the total population. | ||||
| top | ||||
| Job description: The user can use this space to place information describing the SLIDER job. The job description will be written in the body of the email containing the SLIDER results. Any text is acceptable. | ||||
| top | ||||
| Single Population Statistics | ||||
| S (number of polymorphisms): S is the number of polymorphic sites that are contained within the window. Under SLIDER's default settings, S is the number of biallelic sites. | ||||
| top | ||||
| π (nucleotide diversity): Given that there are typically a different number of unknown alleles for each site within a window, SLIDER separately calculates the nucleotide diversity (πj) for each site using only those chromosomes for which the allele is known. For a single polymorphic site with K different alleles and missing data, the nucleotide diversity is ![]() where xji is the number of chromosomes having allele ai at site j. If there are n chromosomes in the sample and m of these have missing data, then ∑xi = n - m. If the site is biallelic and has no missing data, the above formula for πj reduces to
| ||||
| top | ||||
| Variance of π: Under an infinite sites model and the assumption of random mating, E(π) = θ and
| ||||
| top | ||||
| Watterson's θW: With n chromosomes in the sample and S segregating sites in the window, θW is given by
where
| ||||
| top | ||||
| Tajima's D: Using the above definitions of Π and θW, Tajima's D is given by
| ||||
| top | ||||
| Fu and Li's D*: | ||||
| top | ||||
| Haplotype number: | ||||
| top | ||||
| Haplotype diversity: | ||||
| top | ||||
| Average divergence: | ||||
| top | ||||
| Single Population Statistics w/Outgroup | ||||
| Fu and Li's D: | ||||
| top | ||||
| Fay and Wu's H: | ||||
| top | ||||
| Multiple Population Statistics | ||||
| Fst: | ||||
| top | ||||
| Output | ||||
| After the submit button has been clicked, SLIDER will respond with either a confirmation page or an error page. If the input file is successfully processed and a valid e-mail address is used, the sliding window results will be returned in one or more (depending on the number of populations analyzed) e-mail attachments. A SLIDER run typically takes no more than a minute. SLIDER results files are in text format and tab-delimited, so that they can be easily imported into most spreadsheet programs. Each line of a SLIDER results file will show the window's begin position, the window's end position, and the window's size in base pairs, followed by the selected statistics for the window. | ||||
| top | ||||
| Contact information: | ||||
| Send E-mail inquiries to: David Witonsky | ||||
| Supported by the National Institute of General Medical Sciences (GM61393-S1) | ![]() | |||