Pseudophased File Format

A polymorphism data file of this type with N individuals and M sites has the following format:

N_dips M_sites         
  pos_1 pos_2  pos_3 pos_4 ...pos_M
dip_id_1a allele1a_1  allele1a_2 allele1a_3 allele1a_4 ...allele1a_M
dip_id_1b allele1b_1  allele1b_2 allele1b_3 allele1b_4 ...allele1b_M
dip_id_2a allele2a_1  allele2a_2 allele2a_3 allele2a_4 ...allele2a_M
dip_id_2b allele2b_1  allele2b_2 allele2b_3 allele2b_4 ...allele2b_M
dip_id_3a allele3a_1  allele3a_2 allele3a_3 allele3a_4 ...allele3a_M
dip_id_3b allele3b_1  allele3b_2 allele3b_3 allele3b_4 ...allele3b_M
. .  . . . ....
. .  . . . ....
. .  . . . ....
dip_id_Na alleleNa_1  alleleNa_2 alleleNa_3 alleleNa_4 ...alleleNa_M
dip_id_Nb alleleNb_1  alleleNb_2 alleleNb_3 alleleNb_4 ...alleleNb_M
anc_id alleleA_1  alleleA_2 alleleA_3 alleleA_4 ...alleleA_M

Click here for an example.
Format Notes
  • The file must be formatted in simple text only. This can easily be achieved by creating the file in MS Excel and saving the file as 'Text (Tab delimited)'.
  • The file must begin with a line providing the number of diploids surveyed (N_dips) and the number of sites for which data will be included (M_sites). These numbers are separated by a whitespace character (space or tab).
  • The next line must provide the positions of each site. The positions must be integer values, and the number of positions must equal the value of M_sites. This line must begin with a whitespace character (e.g., space or tab), and the site positions must be separated by whitespace characters.
  • Each pair of lines that follows represents the diploid genotype for an individual. The data are coded in the form of pseudohaplotypes, i.e., the placement of the alleles for each genotype with respect to the two lines is arbitrary. Individual IDs can be any arbitrary string of characters that does not include whitespace characters. The number of alleles listed must equal the number of sites surveyed (M_sites). Alleles can be any arbitrary string of characters not containing whitespace characters.
  • An additional single line can be added to indicate the ancestral states of the alleles.