Pseudophased File Format |
A polymorphism data file of this type with N individuals and M sites has the
following format:
|
|
|
N_dips | | M_sites | |
| | | | | | |
| | pos_1 | | pos_2 |
| pos_3 | | pos_4 | ... | pos_M |
dip_id_1a | | allele1a_1 | |
allele1a_2 | | allele1a_3 | | allele1a_4 | ... | allele1a_M |
dip_id_1b | | allele1b_1 | |
allele1b_2 | | allele1b_3 | | allele1b_4 | ... | allele1b_M |
dip_id_2a | | allele2a_1 | |
allele2a_2 | | allele2a_3 | | allele2a_4 | ... | allele2a_M |
dip_id_2b | | allele2b_1 | |
allele2b_2 | | allele2b_3 | | allele2b_4 | ... | allele2b_M |
dip_id_3a | | allele3a_1 | |
allele3a_2 | | allele3a_3 | | allele3a_4 | ... | allele3a_M |
dip_id_3b | | allele3b_1 | |
allele3b_2 | | allele3b_3 | | allele3b_4 | ... | allele3b_M |
. | | . | |
. | | . | | . | ... | . |
. | | . | |
. | | . | | . | ... | . |
. | | . | |
. | | . | | . | ... | . |
dip_id_Na | | alleleNa_1 | |
alleleNa_2 | | alleleNa_3 | | alleleNa_4 | ... | alleleNa_M |
dip_id_Nb | | alleleNb_1 | |
alleleNb_2 | | alleleNb_3 | | alleleNb_4 | ... | alleleNb_M |
anc_id | | alleleA_1 | |
alleleA_2 | | alleleA_3 | | alleleA_4 | ... | alleleA_M |
|
|
Click here for an example. |
Format Notes |
- The file must be formatted in simple text only.
This can easily be achieved by creating the file in
MS Excel and saving the file as 'Text (Tab delimited)'.
- The file must begin with a line providing the number of diploids surveyed (N_dips)
and the number of sites for which data
will be included (M_sites). These numbers are separated by a whitespace character (space or tab).
- The next line must provide the positions of each site. The positions must be integer values,
and the number of positions must equal the value of M_sites. This line must begin
with a whitespace character (e.g., space or tab), and the site positions must be separated by whitespace
characters.
- Each pair of lines that follows represents the diploid genotype for an individual. The data
are coded in the form of pseudohaplotypes, i.e., the placement of the alleles for each genotype with
respect to the two lines is arbitrary. Individual IDs can be any arbitrary string of characters
that does not include whitespace characters. The number of alleles listed must equal the number
of sites surveyed (M_sites). Alleles can be any arbitrary string of characters not containing
whitespace characters.
- An additional single line can be added to indicate the ancestral states of the alleles.
|