Gaussian Bayesian Network

Haipeng Yu

Update: 2019-11-26

Introduction

A Bayesian network (BN) describes the joint probabilistic distribution of random variables through the conditional dependencies/independencies and presents as a graphical model (directed acyclic graph)(Scutari and Denis 2014). This provides an approach to decipher the probabilistic dependencies among variables of interest and has been applied in plant and animal breeding (Morota et al. 2012; Yu et al. 2019).

Recently, we published one paper about leveraging the Bayesian network (BN) to understand the interdependencies between rice phenotypes. I figured out that an example would be helpful for people who are interested in BN. In the following examples, I used bnlearn package (Scutari 2010) to infer the BN of 7 rice phenotypes. The example rice dataset is from Rice Diversity Panel 1, which measures Na+, K+, and Na+: K+ content in the roots and shoots (Zhao et al. 2011). Since all of these 7 phenotypes follow the Gaussian distribution, I specifically fitted a Gaussian Bayesian Network (GBN) here. In addition, I presented two different approaches to infer GBN: data-driven approach and the combination of the expert prior knowledge with data. The example rice dataset is available here.

Structure learning using EBV

Structure learning: data-driven approach

##            from              to strength direction
## 1    Na.K.Shoot        Na.Shoot    1.000 0.9470000
## 2    Na.K.Shoot    K.Shoot.Salt    1.000 0.5360000
## 12     Na.Shoot     K.Root.Salt    0.896 0.8649554
## 14 K.Shoot.Salt        Na.Shoot    0.976 0.9303279
## 15 K.Shoot.Salt K.Shoot.Control    1.000 0.9420000
## 28    Na.K.Root K.Shoot.Control    0.946 0.9704017
## 29    Na.K.Root         Na.Root    1.000 0.9960000
## 30    Na.K.Root     K.Root.Salt    1.000 0.9960000
## 40  K.Root.Salt K.Shoot.Control    0.988 0.7419028
## 42  K.Root.Salt         Na.Root    1.000 0.7510000

Structure learning: expert prior knowledge + data

Generally, the inference of BN is expected to be purely depended on the data. Under some contexts, prior knowledge about the network structure is available, we could incorporate this prior knowledge into the process of structure learning. We can add and eliminate the edge using arguments of whitelist and blacklist, accordingly.

Remove the dependence in EBV with Cholesky Decomposition

Theoretically, BN learning algorithms assume sample independence. The genomic relationship matrix introduced dependencies into the EBV in the multivariate analysis, which potentially serves as a confounder. A transformation of EBV could be done to get adjusted EBV by eliminating the dependencies. We can decompose \(\mathbf{G}\) into its Cholesky factors \(\mathbf{G} = \mathbf{L} \mathbf{L'}\). Here, \(\mathbf{L}\) is an \(n \times n\) lower triangular matrix. For a single trait we could remove the dependancy from the breeding values \(\mathbf{u}\) and yield the adjusted breeding values \(\mathbf{u^*}\) by \(\mathbf{u^*} = \mathbf{L^{-1}} \mathbf{u}\). When we have multiple traits (t), the dimension of \(\mathbf{u}\) becomes \((n \times t) \times 1\) and \(\mathbf{u^*} = \mathbf{M^{-1}} \mathbf{u}\), where \(\mathbf{M^{-1}} = \mathbf{I_{(n \times t) \times (n \times t)}} \otimes \mathbf{L^{-1}}\).

Structure learning using adjusted EBV

Structure learning: data-driven approach

##            from              to strength direction
## 1    Na.K.Shoot        Na.Shoot    1.000 0.8300000
## 2    Na.K.Shoot    K.Shoot.Salt    1.000 0.6400000
## 6    Na.K.Shoot     K.Root.Salt    0.950 0.5105263
## 14 K.Shoot.Salt        Na.Shoot    0.954 0.7442348
## 15 K.Shoot.Salt K.Shoot.Control    1.000 0.9790000
## 35      Na.Root       Na.K.Root    1.000 0.6970000
## 40  K.Root.Salt K.Shoot.Control    0.956 0.9895397
## 41  K.Root.Salt       Na.K.Root    1.000 0.7730000
## 42  K.Root.Salt         Na.Root    1.000 0.6320000

Available BN learning algorithms in bnlearn package

In the above illustrative examples, I used the score-based algorithm of Tabu Search for structure learning. Alternatively, you can use other algorithms by replacing tabu in tabu() and algorithm = "tabu" with the following algorithms. The details can be found in the help page of bnlearn.

  • Score-based structure learning algorithms:
    • hc: Hill Climbing (HC)
    • tabu: Tabu Search (Tabu)
  • Constraint-based structure learning algorithms:
    • pc.stable : the first practical application of the IC algorithm (the stable version)
    • gs: Grow-Shrink (GS)
    • iamb: Incremental Association Markov Blanket (IAMB)
    • fast.iamb: Fast Incremental Association (Fast-IAMB)
    • inter.iamb: Interleaved Incremental Association (Inter-IAMB)
    • mmpc: Max-Min Parents & Children (MMPC)
    • si.hiton.pc: Semi-Interleaved Hiton-PC (SI-HITON-PC)
  • Hybrid structure learning algorithms:
    • mmhc: Max-Min Hill Climbing (MMHC)
    • rsmax2: General 2-Phase Restricted Maximization (RSMAX2)

References

Morota, G, BD Valente, GJM Rosa, KA Weigel, and D Gianola. 2012. “An Assessment of Linkage Disequilibrium in Holstein Cattle Using a Bayesian Network.” Journal of Animal Breeding and Genetics 129 (6). Wiley Online Library: 474–87.

Scutari, Marco. 2010. “Learning Bayesian Networks with the Bnlearn R Package.” Journal of Statistical Software, Articles 35 (3): 1–22. https://doi.org/10.18637/jss.v035.i03.

Scutari, Marco, and Jean-Baptiste Denis. 2014. Bayesian Networks: With Examples in R. Chapman; Hall/CRC.

Yu, Haipeng, Malachy T Campbell, Qi Zhang, Harkamal Walia, and Gota Morota. 2019. “Genomic Bayesian Confirmatory Factor Analysis and Bayesian Network to Characterize a Wide Spectrum of Rice Phenotypes.” G3: Genes, Genomes, Genetics. G3: Genes, Genomes, Genetics, g3–400154.

Zhao, Keyan, Chih-Wei Tung, Georgia C Eizenga, Mark H Wright, M Liakat Ali, Adam H Price, Gareth J Norton, et al. 2011. “Genome-Wide Association Mapping Reveals a Rich Genetic Architecture of Complex Traits in Oryza Sativa.” Nature Communications 2. Nature Publishing Group: 467.