Introduction
A Bayesian network (BN) describes the joint probabilistic distribution of random variables through the conditional dependencies/independencies and presents as a graphical model (directed acyclic graph)(Scutari and Denis 2014). This provides an approach to decipher the probabilistic dependencies among variables of interest and has been applied in plant and animal breeding (Morota et al. 2012; Yu et al. 2019).
Recently, we published one paper about leveraging the Bayesian network (BN) to understand the interdependencies between rice phenotypes. I figured out that an example would be helpful for people who are interested in BN. In the following examples, I used bnlearn
package (Scutari 2010) to infer the BN of 7 rice phenotypes. The example rice dataset is from Rice Diversity Panel 1, which measures Na+, K+, and Na+: K+ content in the roots and shoots (Zhao et al. 2011). Since all of these 7 phenotypes follow the Gaussian distribution, I specifically fitted a Gaussian Bayesian Network (GBN) here. In addition, I presented two different approaches to infer GBN: data-driven approach and the combination of the expert prior knowledge with data. The example rice dataset is available here.
Load R packages.
Data import
## [1] 385 7
## [1] 366 7
## Reading sativas413.fam
## Reading sativas413.bim
## Reading sativas413.bed
## ped stats and snps stats have been set.
## 'p' has been set.
## 'mu' and 'sigma' have been set.
## [1] 413 36901
## Subset accesions with both genotype and phenotype records.
geno@ped$id <- paste0("NSFTV_", geno@ped$id)
geno <- geno[geno@ped$id %in% row.names(pheno), ]
dim(geno) # 363 36901
## [1] 363 36901
## [1] 363 7
##
## TRUE
## 363
Construct genomic relationship matrix (GRM)
# Quality control of genotype file with MAF & Callrate
geno <- select.snps(geno, c(maf > 0.05 & callrate > 0.9))
dim(geno) # 363 29499
## [1] 363 29499
## [1] 363 363
Fit multivariate mixed model to get estimated Breeding Values (EBV)
# Define a MTM_func using MTM Package
MTM_func <- function( Y, G, nTrait, nIter, burnIn, thin, prefix) {
library(MTM)
set.seed(007) # reproducible results
MTM ( Y = Y, K = list ( list ( K = G, COV = list ( type = 'UN', df0 = nTrait,
S0 = diag (nTrait) ) ) ), resCov = list ( type = 'UN', S0 = diag (nTrait),
df0 = nTrait), nIter = nIter, burnIn = burnIn, thin = thin, saveAt = prefix)
}
Y <- scale(pheno, center = TRUE, scale = TRUE)
G <- grm
nTrait <- ncol(Y)
MTM_fit <- MTM_func( Y = Y, G = G, nTrait = nTrait, nIter = 1200, burnIn = 200,
thin = 5, prefix = 'MTM_fit')
## Check gibbs samples
list.files(pattern = 'MTM_fit')
## Retrieve estimates
str(MTM_fit)
BV <- MTM_fit$K[[1]]$U # estimated breeding values
dim(BV) # 363 7
colnames(BV) <- colnames(pheno)
BV <- as.data.frame(BV)
Structure learning using EBV
#-------------------------
# Predefined function
#-------------------------
## Check strength and direction in BN
check_boot <- function(boot, Strength_Thres, Direction_Thres) {
boot[(boot$strength >= Strength_Thres) & (boot$direction > Direction_Thres),]
}
Structure learning: data-driven approach
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV)
graphviz.plot(tabu_simple, main = "Tabu_data-driven", shape = "ellipse",
layout = "dot")
# Model Averaging with 500 bootstrap samples to account for uncertainty
set.seed(007)
boot_tabu <- boot.strength (BV, algorithm = "tabu", R = 500)
# Check strength (>= 0.8) and direction (> 0.5)
check_boot(boot_tabu, 0.8, 0.5)
## from to strength direction
## 1 Na.K.Shoot Na.Shoot 1.000 0.9470000
## 2 Na.K.Shoot K.Shoot.Salt 1.000 0.5360000
## 12 Na.Shoot K.Root.Salt 0.896 0.8649554
## 14 K.Shoot.Salt Na.Shoot 0.976 0.9303279
## 15 K.Shoot.Salt K.Shoot.Control 1.000 0.9420000
## 28 Na.K.Root K.Shoot.Control 0.946 0.9704017
## 29 Na.K.Root Na.Root 1.000 0.9960000
## 30 Na.K.Root K.Root.Salt 1.000 0.9960000
## 40 K.Root.Salt K.Shoot.Control 0.988 0.7419028
## 42 K.Root.Salt Na.Root 1.000 0.7510000
Structure learning: expert prior knowledge + data
Generally, the inference of BN is expected to be purely depended on the data. Under some contexts, prior knowledge about the network structure is available, we could incorporate this prior knowledge into the process of structure learning. We can add and eliminate the edge using arguments of whitelist
and blacklist
, accordingly.
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Incorporate subjective prior
whtlist <- c(from = 'Na.K.Root', to = 'Na.K.Shoot')
blklist <- data.frame(from = c("K.Shoot.Salt", "K.Shoot.Control" ),
to = c("K.Shoot.Control", "K.Shoot.Salt"))
# Data + subjective prior
tabu_blk <- tabu(BV, whitelist = whtlist, blacklist = blklist)
graphviz.plot(tabu_blk, main = "Tabu_data-and-prior", shape = "ellipse",
layout = "dot")
Remove the dependence in EBV with Cholesky Decomposition
Theoretically, BN learning algorithms assume sample independence. The genomic relationship matrix introduced dependencies into the EBV in the multivariate analysis, which potentially serves as a confounder. A transformation of EBV could be done to get adjusted EBV by eliminating the dependencies. We can decompose \(\mathbf{G}\) into its Cholesky factors \(\mathbf{G} = \mathbf{L} \mathbf{L'}\). Here, \(\mathbf{L}\) is an \(n \times n\) lower triangular matrix. For a single trait we could remove the dependancy from the breeding values \(\mathbf{u}\) and yield the adjusted breeding values \(\mathbf{u^*}\) by \(\mathbf{u^*} = \mathbf{L^{-1}} \mathbf{u}\). When we have multiple traits (t), the dimension of \(\mathbf{u}\) becomes \((n \times t) \times 1\) and \(\mathbf{u^*} = \mathbf{M^{-1}} \mathbf{u}\), where \(\mathbf{M^{-1}} = \mathbf{I_{(n \times t) \times (n \times t)}} \otimes \mathbf{L^{-1}}\).
Structure learning using adjusted EBV
Structure learning: data-driven approach
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV_adj)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Model Averaging with 500 bootstrap samples to account for uncertainty
set.seed(007)
boot_tabu <- boot.strength (BV_adj, algorithm = "tabu", R = 500)
# Check strength (>= 0.8) and direction (> 0.5)
check_boot(boot_tabu, 0.8, 0.5)
## from to strength direction
## 1 Na.K.Shoot Na.Shoot 1.000 0.8300000
## 2 Na.K.Shoot K.Shoot.Salt 1.000 0.6400000
## 6 Na.K.Shoot K.Root.Salt 0.950 0.5105263
## 14 K.Shoot.Salt Na.Shoot 0.954 0.7442348
## 15 K.Shoot.Salt K.Shoot.Control 1.000 0.9790000
## 35 Na.Root Na.K.Root 1.000 0.6970000
## 40 K.Root.Salt K.Shoot.Control 0.956 0.9895397
## 41 K.Root.Salt Na.K.Root 1.000 0.7730000
## 42 K.Root.Salt Na.Root 1.000 0.6320000
Structure learning: expert prior knowledge + data
# Score-based algorithm: Tabu Search
tabu_simple <- tabu(BV_adj)
graphviz.plot(tabu_simple, main = "Tabu", shape = "ellipse", layout = "dot")
# Incorporate subjective prior
whtlist <- c(from = 'Na.K.Root', to = 'Na.K.Shoot')
blklist <- data.frame(from = c("K.Shoot.Salt", "K.Shoot.Control" ),
to = c("K.Shoot.Control", "K.Shoot.Salt"))
# Data + subjective prior
tabu_blk <- tabu(BV_adj, whitelist = whtlist, blacklist = blklist)
graphviz.plot(tabu_blk, main = "Tabu_data-and-prior", shape = "ellipse",
layout = "dot")
Available BN learning algorithms in bnlearn
package
In the above illustrative examples, I used the score-based algorithm of Tabu Search for structure learning. Alternatively, you can use other algorithms by replacing tabu in tabu()
and algorithm = "tabu"
with the following algorithms. The details can be found in the help page of bnlearn.
- Score-based structure learning algorithms:
- hc: Hill Climbing (HC)
- tabu: Tabu Search (Tabu)
- Constraint-based structure learning algorithms:
- pc.stable : the first practical application of the IC algorithm (the stable version)
- gs: Grow-Shrink (GS)
- iamb: Incremental Association Markov Blanket (IAMB)
- fast.iamb: Fast Incremental Association (Fast-IAMB)
- inter.iamb: Interleaved Incremental Association (Inter-IAMB)
- mmpc: Max-Min Parents & Children (MMPC)
- si.hiton.pc: Semi-Interleaved Hiton-PC (SI-HITON-PC)
- Hybrid structure learning algorithms:
- mmhc: Max-Min Hill Climbing (MMHC)
- rsmax2: General 2-Phase Restricted Maximization (RSMAX2)
References
Morota, G, BD Valente, GJM Rosa, KA Weigel, and D Gianola. 2012. “An Assessment of Linkage Disequilibrium in Holstein Cattle Using a Bayesian Network.” Journal of Animal Breeding and Genetics 129 (6). Wiley Online Library: 474–87.
Scutari, Marco. 2010. “Learning Bayesian Networks with the Bnlearn R Package.” Journal of Statistical Software, Articles 35 (3): 1–22. https://doi.org/10.18637/jss.v035.i03.
Scutari, Marco, and Jean-Baptiste Denis. 2014. Bayesian Networks: With Examples in R. Chapman; Hall/CRC.
Yu, Haipeng, Malachy T Campbell, Qi Zhang, Harkamal Walia, and Gota Morota. 2019. “Genomic Bayesian Confirmatory Factor Analysis and Bayesian Network to Characterize a Wide Spectrum of Rice Phenotypes.” G3: Genes, Genomes, Genetics. G3: Genes, Genomes, Genetics, g3–400154.
Zhao, Keyan, Chih-Wei Tung, Georgia C Eizenga, Mark H Wright, M Liakat Ali, Adam H Price, Gareth J Norton, et al. 2011. “Genome-Wide Association Mapping Reveals a Rich Genetic Architecture of Complex Traits in Oryza Sativa.” Nature Communications 2. Nature Publishing Group: 467.