BayesDiallel is a Gibbs-sampler program designed to fit, image, and give confidence intervals for intricately modeled inheritance in diallel data.
Contents
- About BayesDiallel
- Some current limitations
- Download current version
- Requirements and installation
- Input file
- Input file
- To do list
- References
About BayesDiallel
Facts about BayesDiallel:
- BayesDiallel is a R package utility suite.
- BayesDiallel is used for modelling inheritance structure of F1 crosses from distinct strains.
- BayesDiallel assumes a-priori Gaussian independence distribution of strain effects, though Empirical Bayes theory justifies this shrinkage even for populations selected differently.
- BayesDiallel intends to capture additive, maternal, inbreeding, combination-specific effects,
as well as sex-interaction with all effects. - BayesDiallel supports additional fixed effects (treatment, diet, etc.) and random effects
(cage, season, technician…) as additional model variables. - The models and methods of BayesDiallel are described in more detail in Lenarcic et al (2012) and Crowley et al (2014).
For a brief tutorial see the vignette BayesDiallel_Users_Vignette. For constructive comments and suggestions, email the authors Alan Lenarcic or William Valdar.
Some current limitations
BayesDiallel is limited to Bayesian and H-likelihood methods for fitting structured likelihoods.
Gibbs sampling produces a set of random draws from the Bayesian posterior which can be used in inference for many hypotheses on a dataset, though processing of MCMC “coda”-package objects remains difficult. Many utilities are included in BayesDiallel, including plotting, saving, summary, posterior $H^2$ estimation etc. to aid first time Gibbs sampler users in using these MCMC chains.
Diallel data does not inherently communicate a genetic locus unless strains are chosen specifically to target fine loci. Diallel analysis is then first useful for testing first stage F1 “Macro-genetic” effects that are potentially epi-genetic/multi-locus, and establishing heritability of phenotype between distinguishable sub-populations. When studied carefully, and a locus is targeted, F1 choices that establish carefully the presenece of a locus, can be used to confirm that F1 combos of 2+ strains are inheriting due to additive, dominance, sex-specific, etc. effects at that locus.
BayesDiallel currently requires package “R.oo” object structures, which are being deprecated by
the R-foundation in favor of “R5”-reference classes, whose implementation (R-environments),
are very similar to “R.oo”, but the conversion to R.oo will require a heavy rewrite. These
“pass-by-reference” object structures are necessary for the many object-oriented utilities
available for BayesDiallel. However, for R users, used to S3/S4 classes, using fully object-oriented pass-by-reference objects will take some time using/understanding. As R-memory is never a completely stable system, object-bugs/quirks are to be expected. Although a “Copy()” cast should always produce an independent copy of an object, saving an object to two names has instable results in terms of which data members are actually shared between the two aliases.
Download current version
The BayesDiallel R package is available in several different zipped formats. Click on the required format to download. We also provide an accompanying package BayesSpike, which is required for calculating model inclusion probabilities and Bayes factors (as in Figures 2 and 5 of Lenarcic et al, 2012), but is unnecessary for estimating effects, heritability, etc.
- BayesDiallel (last update 2016-10-24):
[BayesDiallel_0.982.tar.gz] [BayesDiallel_0.982.zip] - BayesSpike (last update 2016-10-24):
[BayesSpike_0.500.tar.gz] [BayesSpike_0.500.zip]
Requirements and Installation
Installing R package BayesDiallel
The R package R/BayesDiallel currently requires R packages “R.oo”, “coda”, and “corpcor”,
as well as the R software. Under Mac OS X, it may also require the GNU Fortran compiler, which is available from the tools subdirectory of the R for Mac downloads site.
Pending companion package “R/BayesSpike” is necessary for Gibbs selection functions,
but is currently in development for publication. Contact Valdarlab for details.
Installation is supported through R CMD INSTALL
terminal interface outside of the RGui.
Please attempt to install “R/BayesSpike” using the directions bellow, using Unix/Mac Terminal or Windows Command Prompt through R CMD INSTALL
method before contacting Valdarlab with
issues regarding installation problems for unsupported methods.
Terminal installation directions follow
- Download the latest version of BayesDiallel from the download section.
- Unix (and all users with full R-install capabilities RTools/RFortran installed for Windows/Mac machines)
users Open a UNIX shell and go to the directory containing the tar.gz file - Unix users Type
R CMD INSTALL --clean BayesDiallel_XXX.tar.gz
for XXX current package version - Windows users should open a
Command Prompt
window and can download pre-compiled “zip” file and runR CMD INSTALL BayesDiallel_XXX.zip
for XXX current package version-
- Command Prompt Terminal can be launched through the Windows
Start
menu, default in theAccessories
folder as shown.
- Command Prompt Terminal can be launched through the Windows
-
- Mac OSX users should open a
Terminal
window can download pre-compiled “tgz” file and runR CMD INSTALL BayesDiallel_XXX.tgz
for XXX current package versionNote: Install “R.oo”, “corpcor”, and “coda” inside R usinginstall.packages("R.oo"); install.packages("coda"); install.packages("corpcor")
first.
After Installation
Once installed the BayesDiallel users vignette is helpful as an introduction at (here), also typing at R prompt
help(BayesDiallel)
provides entry into R-package-style documentation for the many functions available in BayesDiallel.The chief R-function will be
DiallelAnalyzer()
, help for which is available in R by typing
athelp(DiallelAnalyzer)
.- Once in R,
help(PiximusData)
describes example Jackson Labs Collaborative Cross founders data and its structure - Help on models file contained in
help(ModelsFile)
Input files
Input requires Phenotype files which should can either by data.frame matrices or .txt/.csv files displaying the parentage (mother.strain/father.strain) of each F1 test subject, as well as quantitative (only limited utilities for censored data) phenotype data, and sex data (for partially censored sex, some Bayesian imputation is possible), as well as all additional user-supplied Fixed/Random effects. A “ModelsFile” should also be supplied as this will give an example of each of the Diallel inhertiance sture models one wants BayesDiallel::DiallelAnalyzer to consider. The complete model: “alls” (that is, all inheritance methods, interacted with sex) is default. BayesDiallel will fit many models at once and store them together within the output object. A “PriorsFile” is required to tweak priors.
Phenotype file
The phenotype file should be a tab-delimited or csv file of columns that lists mother.strain/father.strain, sex, an phenotype information for every F1 test subject. Example included “Piximus.data” appears as:
mother.strain.name father.strain.name is.female mouse.name PctFat BoneMineralContent BoneMineralDensity LeanTissueMass MouseLength.NoseToAnus MouseWeight TotalTissueMass 129 129 TRUE CCF1F01_92297 13.91 0.31 0.07 8 21.89 21.89 9.71 129 129 TRUE CCF1F02_92298 19.88 0.26 0.07 7.1 24.33 24.33 9.63 129 129 TRUE CCF1F03_92299 19.56 0.27 0.07 7.95 22.25 22.25 10.54 129 129 TRUE CCF1F04_92300 23.81 0.25 0.06 7.65 22.57 22.57 10.5 129 AJ TRUE CAF1F01_87462 14.8 0.29 0.06 8.4 22.91 22.91 10.42 129 AJ TRUE CAF1F05_89443 18.2 0.31 0.08 9.05 25.08 25.08 11.54 B6 B6 FALSE BBF1M04_93779 15.28 0.27 0.06 11.2 29.09 29.09 13.59 B6 B6 FALSE BBF1M05_93780 19.9 0.27 0.06 11.7 31.81 31.81 14.88 B6 CAST TRUE BFF1F01_92287 14.55 0.28 0.07 7.8 20.83 20.83 9.53 .......................
Here “is.female” column represents sex of subject as a TRUE/false variable. The first two columns are the mother.strain and the father.strain respectively. Here the unique strains examined are colloqial codes for the Collaborative cross founders (“129S1”, “AJ”, “C57/JB6”, “NOD”, “NZO”, “PWK”, “CAST”, “WSB”) , respectively. Please use one unique string for each strain. Phenotype measurments are quantitative. Missing phenotypes will be dropped from analysis, though for left/right censored data some phenotype imputation is described in help(SetupMissingYIndices). Phenotype should be pre-transformed to desired scale.
Models File
A models file should be a multi-line text file where several string choices communicate to
DiallelAnalyzer()
which inbreeding/cross-specific effects to include in the model. All of the different models described in the Models File will be fit in order by theDiallelAnalyzer()
call resulting in aFullDiallelAnalyze
object containing each model’s information in aAllDiallelObs
R-list element. An example structure of a models file follows:fulls fullu fulls, df_6 BSabm BSasbsms B,v,w,w_s,b,m,ms,mu Mu, strain, gender, gender-mother, gender-inbred, symmetric-cross, inbred Mu, inbred, inbred-strain, strain, gender-inbred
fulls
denotes a full sexed model, andfullu
refers to a full model with no sex-specific differentiation., df_#
refers to using a long tailed Student’s t-distributed noise structure with degrees of freedom encoded by#
.Further information on coding submodels is located within the Vignette.
Priors File
A Priors File should be a multi-line matrix file communicating desired prior distribution values for $\tau^2$ group dispersion values of the many model groups. Typical diffuse prior information of strength .2 degrees of freedom has weak influence on the fitted model if the number of strains compared is 4 or more.
aj dominancej genderj symcrossjk asymcrossjkDkj motherj m 0.2 0.2 0.2 0.2 0.2 0.2 df 0.2 0.2 0.2 0.2 0.2 0.2 Prob 0.5 0.5 0.5 0.5 0.5 0.5
Here row “m” refers to the assumed prior mean for this $\tau^2$ parameter, and df gives the degrees of freedom of an inverse-chi-squared prior. “Prob” which is only used when using
BayesSpike
is an additional optional row that is used to give prior probabilities of activation, that is, a score between [0,1] denoting how likely one believes a set of parameters belongs in non-zero coefficients of the whole modelStatistical “significance”
BayesDiallel embodies a Bayesian statistical approach and is not designed to generate p-values, as would be found in frequentist-style significance testing. Nonetheless, as a summary measure it can be useful to state how much posterior estimates of interesting parameters diverge from zero. Specifically, how much weight in the posterior is concentrated away from zero. Operationally, the posterior credibility intervals may be considered analogous (but not identical) to confidence intervals. When 95% credibility intervals do not intersect, one can establish that two competing strain-effects are noticably different. The Gibbs paradigm suggests that confidence for any statistic of the posterior, calculable upon multivariate draws from the posterior distribution, can be estimated by investigating the distribution of that test statistic in the Gibbs samples. If 95% of the posterior samples are out of reach of the null hypothesis, one can reject that null hypothesis at a 5% credibility level.
Although credibility serves as an alternate measure to true “reject-the-null confidence”,
for flat/close-to-flat unbiased priors, the credibility intervals demonstrate efficient but robust error widths suitable for decision making, matching good credibility issues in settings where credibility can be established. In fact, Bayes posterior decision theory made with careful understanding of the loss-function, even in multiple-model situations where a null model and thus credibility is difficult to establish, is a straightforward and well-established method of choosing a decision that replicably minimizes expected loss over many experiments.To do list
Suggestions to Alan Lenarcic or William Valdar. Here is current and incomplete list (in no particular order):
- Hierarchical matching model for treated/control pre/post studies.
- Multivariate phenotypes
References
Lenarcic A, Svenson K, Churchill GA, Valdar W (2012) A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190:413-435.
Crowley JJ*, Kim Y*, Lenarcic AB*, Quackenbush CR, Barrick C, Adkins DE, Shaw GS, Miller DR, Pardo Manuel de Villena F, Sullivan PF, Valdar W (2014) Genetics of adverse reactions to haloperidol in a mouse diallel: A drug-placebo experiment and Bayesian causal analysis.
Genetics 196(1):321-47. - Once in R,