# BayesDiallel

BayesDiallel is a Gibbs-sampler program designed to fit, image, and give confidence intervals for intricately modeled inheritance in diallel data.

## Contents

1. BayesDiallel is a R package utility suite.
2. BayesDiallel is used for modelling inheritance structure of F1 crosses from distinct strains.
3. BayesDiallel assumes a-priori Gaussian independence distribution of strain effects, though Empirical Bayes theory justifies this shrinkage even for populations selected differently.
4. BayesDiallel intends to capture additive, maternal, inbreeding, combination-specific effects,
as well as sex-interaction with all effects.
5. BayesDiallel supports additional fixed effects (treatment, diet, etc.) and random effects
(cage, season, technician…) as additional model variables.
6. The models and methods of BayesDiallel are described in more detail in Lenarcic et al (2012) and Crowley et al (2014).

For a brief tutorial see the vignette BayesDiallel_Users_Vignette. For constructive comments and suggestions, email the authors Alan Lenarcic or William Valdar.

## Some current limitations

BayesDiallel is limited to Bayesian and H-likelihood methods for fitting structured likelihoods.
Gibbs sampling produces a set of random draws from the Bayesian posterior which can be used in inference for many hypotheses on a dataset, though processing of MCMC “coda”-package objects remains difficult. Many utilities are included in BayesDiallel, including plotting, saving, summary, posterior $H^2$ estimation etc. to aid first time Gibbs sampler users in using these MCMC chains.

Diallel data does not inherently communicate a genetic locus unless strains are chosen specifically to target fine loci. Diallel analysis is then first useful for testing first stage F1 “Macro-genetic” effects that are potentially epi-genetic/multi-locus, and establishing heritability of phenotype between distinguishable sub-populations. When studied carefully, and a locus is targeted, F1 choices that establish carefully the presenece of a locus, can be used to confirm that F1 combos of 2+ strains are inheriting due to additive, dominance, sex-specific, etc. effects at that locus.

BayesDiallel currently requires package “R.oo” object structures, which are being deprecated by
the R-foundation in favor of “R5”-reference classes, whose implementation (R-environments),
are very similar to “R.oo”, but the conversion to R.oo will require a heavy rewrite. These
“pass-by-reference” object structures are necessary for the many object-oriented utilities
available for BayesDiallel. However, for R users, used to S3/S4 classes, using fully object-oriented pass-by-reference objects will take some time using/understanding. As R-memory is never a completely stable system, object-bugs/quirks are to be expected. Although a “Copy()” cast should always produce an independent copy of an object, saving an object to two names has instable results in terms of which data members are actually shared between the two aliases.

The BayesDiallel R package is available in several different zipped formats. Click on the required format to download. We also provide an accompanying package BayesSpike, which is required for calculating model inclusion probabilities and Bayes factors (as in Figures 2 and 5 of Lenarcic et al, 2012), but is unnecessary for estimating effects, heritability, etc.

## Requirements and Installation

### Installing R package BayesDiallel

The R package R/BayesDiallel currently requires R packages “R.oo”, “coda”, and “corpcor”,
as well as the R software. Under Mac OS X, it may also require the GNU Fortran compiler, which is available from the tools subdirectory of the R for Mac downloads site.

Pending companion package “R/BayesSpike” is necessary for Gibbs selection functions,
but is currently in development for publication. Contact Valdarlab for details.

Installation is supported through  R CMD INSTALL  terminal interface outside of the RGui.
Please attempt to install “R/BayesSpike” using the directions bellow, using Unix/Mac Terminal or Windows Command Prompt through R CMD INSTALL  method before contacting Valdarlab with
issues regarding installation problems for unsupported methods.

Terminal installation directions follow

2. Unix (and all users with full R-install capabilities RTools/RFortran installed for Windows/Mac machines)
users Open a UNIX shell and go to the directory containing the tar.gz file
3. Unix users Type R CMD INSTALL --clean BayesDiallel_XXX.tar.gz
for XXX current package version
4. Windows users should open a  Command Prompt  window and can download pre-compiled “zip” file and run R CMD INSTALL BayesDiallel_XXX.zip for XXX current package version
1. Command Prompt Terminal can be launched through the Windows Start menu, default in the Accessories folder as shown.

5. Mac OSX users should open a Terminal window can download pre-compiled “tgz” file and run R CMD INSTALL BayesDiallel_XXX.tgz for XXX current package versionNote: Install “R.oo”, “corpcor”, and “coda” inside R using install.packages("R.oo"); install.packages("coda"); install.packages("corpcor") first.

### After Installation

Once installed the BayesDiallel users vignette is helpful as an introduction at (here), also typing at R prompt help(BayesDiallel) provides entry into R-package-style documentation for the many functions available in BayesDiallel.

The chief R-function will be DiallelAnalyzer(), help for which is available in R by typing
at help(DiallelAnalyzer).

1. Once in R, help(PiximusData) describes example Jackson Labs Collaborative Cross founders data and its structure
2. Help on models file contained in help(ModelsFile)

## Input files

Input requires Phenotype files which should can either by data.frame matrices or .txt/.csv files displaying the parentage (mother.strain/father.strain) of each F1 test subject, as well as quantitative (only limited utilities for censored data) phenotype data, and sex data (for partially censored sex, some Bayesian imputation is possible), as well as all additional user-supplied Fixed/Random effects. A “ModelsFile” should also be supplied as this will give an example of each of the Diallel inhertiance sture models one wants BayesDiallel::DiallelAnalyzer to consider. The complete model: “alls” (that is, all inheritance methods, interacted with sex) is default. BayesDiallel will fit many models at once and store them together within the output object. A “PriorsFile” is required to tweak priors.

### Phenotype file

The phenotype file should be a tab-delimited or csv file of columns that lists mother.strain/father.strain, sex, an phenotype information for every F1 test subject. Example included “Piximus.data” appears as:

 mother.strain.name father.strain.name is.female mouse.name PctFat BoneMineralContent BoneMineralDensity LeanTissueMass MouseLength.NoseToAnus MouseWeight TotalTissueMass 129 129 TRUE CCF1F01_92297 13.91 0.31 0.07 8 21.89 21.89 9.71 129 129 TRUE CCF1F02_92298 19.88 0.26 0.07 7.1 24.33 24.33 9.63 129 129 TRUE CCF1F03_92299 19.56 0.27 0.07 7.95 22.25 22.25 10.54 129 129 TRUE CCF1F04_92300 23.81 0.25 0.06 7.65 22.57 22.57 10.5 129 AJ TRUE CAF1F01_87462 14.8 0.29 0.06 8.4 22.91 22.91 10.42 129 AJ TRUE CAF1F05_89443 18.2 0.31 0.08 9.05 25.08 25.08 11.54 B6 B6 FALSE BBF1M04_93779 15.28 0.27 0.06 11.2 29.09 29.09 13.59 B6 B6 FALSE BBF1M05_93780 19.9 0.27 0.06 11.7 31.81 31.81 14.88 B6 CAST TRUE BFF1F01_92287 14.55 0.28 0.07 7.8 20.83 20.83 9.53 .......................

Here “is.female” column represents sex of subject as a TRUE/false variable. The first two columns are the mother.strain and the father.strain respectively. Here the unique strains examined are colloqial codes for the Collaborative cross founders (“129S1”, “AJ”, “C57/JB6”, “NOD”, “NZO”, “PWK”, “CAST”, “WSB”) , respectively. Please use one unique string for each strain. Phenotype measurments are quantitative. Missing phenotypes will be dropped from analysis, though for left/right censored data some phenotype imputation is described in help(SetupMissingYIndices). Phenotype should be pre-transformed to desired scale.

## Models File

A models file should be a multi-line text file where several string choices communicate to DiallelAnalyzer() which inbreeding/cross-specific effects to include in the model. All of the different models described in the Models File will be fit in order by the DiallelAnalyzer()
call resulting in a FullDiallelAnalyze object containing each model’s information in a AllDiallelObs R-list element. An example structure of a models file follows:

 fulls fullu fulls, df_6 BSabm BSasbsms B,v,w,w_s,b,m,ms,mu Mu, strain, gender, gender-mother, gender-inbred, symmetric-cross, inbred Mu, inbred, inbred-strain, strain, gender-inbred 

fulls denotes a full sexed model, and fullu refers to a full model with no sex-specific differentiation. , df_# refers to using a long tailed Student’s t-distributed noise structure with degrees of freedom encoded by #.

Further information on coding submodels is located within the Vignette.

## Priors File

A Priors File should be a multi-line matrix file communicating desired prior distribution values for $\tau^2$ group dispersion values of the many model groups. Typical diffuse prior information of strength .2 degrees of freedom has weak influence on the fitted model if the number of strains compared is 4 or more.

 aj dominancej genderj symcrossjk asymcrossjkDkj motherj m 0.2 0.2 0.2 0.2 0.2 0.2 df 0.2 0.2 0.2 0.2 0.2 0.2 Prob 0.5 0.5 0.5 0.5 0.5 0.5 

Here row “m” refers to the assumed prior mean for this $\tau^2$ parameter, and df gives the degrees of freedom of an inverse-chi-squared prior. “Prob” which is only used when using BayesSpike is an additional optional row that is used to give prior probabilities of activation, that is, a score between [0,1] denoting how likely one believes a set of parameters belongs in non-zero coefficients of the whole model

## Statistical “significance”

BayesDiallel embodies a Bayesian statistical approach and is not designed to generate p-values, as would be found in frequentist-style significance testing. Nonetheless, as a summary measure it can be useful to state how much posterior estimates of interesting parameters diverge from zero. Specifically, how much weight in the posterior is concentrated away from zero. Operationally, the posterior credibility intervals may be considered analogous (but not identical) to confidence intervals. When 95% credibility intervals do not intersect, one can establish that two competing strain-effects are noticably different. The Gibbs paradigm suggests that confidence for any statistic of the posterior, calculable upon multivariate draws from the posterior distribution, can be estimated by investigating the distribution of that test statistic in the Gibbs samples. If 95% of the posterior samples are out of reach of the null hypothesis, one can reject that null hypothesis at a 5% credibility level.

Although credibility serves as an alternate measure to true “reject-the-null confidence”,
for flat/close-to-flat unbiased priors, the credibility intervals demonstrate efficient but robust error widths suitable for decision making, matching good credibility issues in settings where credibility can be established. In fact, Bayes posterior decision theory made with careful understanding of the loss-function, even in multiple-model situations where a null model and thus credibility is difficult to establish, is a straightforward and well-established method of choosing a decision that replicably minimizes expected loss over many experiments.

## To do list

Suggestions to Alan Lenarcic or William Valdar. Here is current and incomplete list (in no particular order):

• Hierarchical matching model for treated/control pre/post studies.
• Multivariate phenotypes

## References

Lenarcic A, Svenson K, Churchill GA, Valdar W (2012) A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190:413-435.

Crowley JJ*, Kim Y*, Lenarcic AB*, Quackenbush CR, Barrick C, Adkins DE, Shaw GS, Miller DR, Pardo Manuel de Villena F, Sullivan PF, Valdar W (2014) Genetics of adverse reactions to haloperidol in a mouse diallel: A drug-placebo experiment and Bayesian causal analysis.
Genetics 196(1):321-47.