Unphased File Format
IUPAC Coded

A polymorphism data file of this type with N individuals and M sites has the following format:
 

N_dips M_sites         
  pos_1 pos_2  pos_3 pos_4 ...pos_M
dip_id_1 allele1_1  allele1_2 allele1_3 allele1_4 ...allele1_M
dip_id_2 allele2_1  allele2_2 allele2_3 allele2_4 ...allele2_M
dip_id_3 allele3_1  allele3_2 allele3_3 allele3_4 ...allele3_M
. .  . . . ....
. .  . . . ....
. .  . . . ....
dip_id_N alleleN_1  alleleN_2 alleleN_3 alleleN_4 ...alleleN_M
anc_id alleleA_1  alleleA_2 alleleA_3 alleleA_4 ...alleleA_M

Click here for an example.
Format Notes
  • This format is similar to Dick's file format, except that a single line represents each individual's genotype. An individual's genotype at a site is represented using the IUPAC-IUB codes for nucleotides:
    SymbolGenotype
    AA/A
    GG/G
    CC/C
    TT/T
    RA/G
    MA/C
    WA/T
    YC/T
    SC/G
    KG/T
  • A line for ancestral data can also be included. The ancestral states should be represented by the symbols 'A', 'G', 'C', and 'T'.