Prephappy

Prephappy

[Download]
[What it does]
[Examples]
[How to install it]
[Syntax]
[Options]

What it does

Prephappy is a tool to help prepare input files for the haplotype reconstruction program HAPPY. It is a Perl program invoked on the command line. Given a set of alleles files and corresponding ped files (eg, chr1.alleles, chr2.alleles, chr1.ped, chr2.ped), it will:

  • Check the number of markers in the ped and alleles files match.
  • Check all individuals have the same number of markers.
  • Check that the alleles reported for each marker in the ped are consistent with those allowed by the alleles file.
  • Check for zero cM distances between markers (understandably not tolerated by HAPPY).
  • Reorder markers by their cM position within each chromosome.
  • Convert the deprecated ND values for missing data to NA.
  • Renames markers to R-formula-friendly strings, replacing “()-*/^” characters with “.”.
  • Combine multiple files that specify different markers for the same chromosome.
  • Remove specified markers from the data.
  • Pad genotypes for subjects with missing chromosomes or missing markers with NA.
  • Overwrite existing cM distances in the alleles files with new cM distances specified in a separate table.
  • Incorporate a model of missing genotype data into the happy alleles file (assuming this is not already included).

After these checks and corrections, Prephappy will write a set of “cleaned-up” files to a user-specified directory (by default a new subdirectory PREPHAPPY_GENOTYPES/).

Small print: Prephappy does not aspire to be interesting, merely useful. That said, Prephappy comes with absolutely no guarantees and users run and rely on it at their own risk.

Examples

prephappy.pl --alleles chr1.alleles --ped chr1.ped
Check consistency and parity of chr1.alleles and chr1.ped, and write cleaned up versions to a new subdirectory PREPHAPPY_GENOTYPES/.
prephappy.pl --alleles chr1.alleles --ped chr1.ped --skipmarkers 'rs314321,rs489233'
As above, but skips markers rs314321 and rs489233 in the checking process and omitting them from the cleaned up files.
prephappy.pl --alleles old.chr1.alleles,old.chr2.alleles --ped old.chr1.ped,old.chr2.ped
Processes files for two chromosomes. There should be no spaces between the filenames.
prephappy.pl --alleles "old.*.alleles" --ped "old.*.ped"
Process all files matching the patterns old.*.alleles and old.*.ped. Prephappy will assume the order of the matched files is the same.
prephappy.pl --alleles="old.*.alleles",new.chr3.alleles --ped="old.*.ped",new.chr3.ped
As above but combines data from files new.chr3.alleles and new.chr3.ped.
prephappy.pl --alleles chr1.alleles --ped chr1.ped --map my.map,my_added.map --mapfile_cM_column my_cM_pos
Process files using cM positions defined in my.map and my_added.map (under the column heading my_cM_pos) in place of those in the alleles files.

How to install it

These instructions assume you are installing Prephappy on Linux or Darwin/MacOSX, already have installed a working installation of Perl v5.10.0 or higher, and assumes you are familiar with the UNIX command line or in a position to ask favors from someone who is. It may be possible to run Prephappy on Windows or other operating systems but the author has not had occasion to try this.

  1. Download the latest prephappy.tgz file from here and put it in a directory you’d like to run it from dir.
  2. Open a UNIX shell and go to dir.
  3. Unzip and unarchive by typing tar -zxf prephappy.tgz
  4. Set an environmental variable PREPHAPPY_LIBS to dir/prephappy/libs/. For example, in your .bash_profile add the line export PREPHAPPY_LIBS=dir/prephappy/libs/.
  5. Set an alias to the prephappy program so you can run it from anywhere. For example, in your .bash_profile add the line alias prephappy=dir/prephappy/prephappy.pl.

Syntax

prephappy --alleles=chr1.alleles,chr2.alleles --ped=chr1.ped,chr2.ped
        [ --add_noise_prob 0 ]
        [ --founder_probs string ]
        [ --map string ]
        [ --mapfile_cM_column string ]
        [ --outdir PREPHAPPY_GENOTYPES/ ]
        [ --mismatch2na 0 ]
        [ --padsubjects 0 ]
        [ --ped_delimiter '\s+' ]   
     	[ --skipmarkers string ]

Options

–add_noise_prob
A model of genotyping error. Specifies the probability $theta$ that at each
locus the allele call is a random draw from the set of available allele
types rather than from p(allele|founder). This models genotyping error such
that in a population of equiprobable founders with biallelic genotypes, the
rate of miscalled genotypes is approximately $theta$ (in fact, $theta$ –
1/4*$theta$^2). If founders are not equiprobable, then requires
–founder_probs
Optionally specify prior founder probabilities as comma-separated list.
Only relevant for some options (eg, –add_noise_prob). Default is
equiprobable founders.
–alleles
Alleles files specified as comma-separated list or/and file patterns (see examples)
–map
Specifies one or more map files to be included in the checking process. The files should be white space delimited (eg, tab delimited) and have columns under the headings “marker”, “chr”, “bp”, as well as any other columns.
–mapfile_cM_column
Specify column in mapfile corresponding to centiMorgan position and use those cM positions in place of those in the alleles file(s).
–mismatches2na=1
Warns of inconsistent alleles, replacing them with NAs
–padsubjects=1
Sets missing genotypes for missing subjects as NA
–ped_delimiter
Specifies what separates columns in the ped file, eg, ‘\s’, ‘\t’
–skipmarkers
Skips markers as specified by a comma-separated list (eg, ‘rs314321,rs489233’)
or the name of a file containing a whitespace-separated list