Processing math: 55%
+ - 0:00:00
Notes for current slide
Notes for next slide

Bayesian confirmatory factor analysis and Bayesian network for high-dimensional phenotypic data

GWAS Workshop @VT

Haipeng Yu
https://haipengu.github.io/
Gota Morota
http://morotalab.org/

2019/6/26

1 / 25

Motivation: High dimensional and diverse phenotypes

2 / 25

How to handle a large number of phenotypes?

More and more phenotypes are being generated across time and space

Challenges:

  • high dimensional phenotypes
  • diverse phenotypes
  • how to make sense of these data and interpret
  • multi-trait linear mixed model is computationally challenging

Objective:

  • leverage Bayesian confirmatory factor analysis and Bayesian network to characterize a wide spectrum of rice phenotypes
3 / 25

Bayesian confirmatory factor analysis

Assume observed phenotypes are derived from underlying latent variables

T=ΛF+s

  • T is the t×n matrix of observed phenotypes (413 accessions)

  • Λ is the t×q factor loading matrix

  • F is the q×n latent variables matrix

  • s is the t×n matrix of specific effects.

Variance-covariance model: var(T)=ΛΦΛ+Ψ,

  • Φ is the variance of latent variables

  • Ψ is the variance of specific effects

4 / 25

Prior distributions

var(T)=ΛΦΛ+Ψ,

  • Factor loading matrix: ΛN(0,0.01)

  • Variance of latent variables: ΦW1(I66,7)

  • Variance of specific effects: ΨΓ1(1,0.5)

5 / 25

Define 6 latent variables from 48 phenotypes

  1. Grain Morphology (Grm, 11)
    • Seed length (Sl), Seed width (Sw), Seed volume (Sv), etc
  2. Morphology (Mrp, 14)
    • Flag leaf length (Fll), Flag leaf width (Flw), etc
  3. Flowering Time (Flt, 7)
    • Flowering time in Arkansas (Fla), Flowering time in Aberdeen (Flb), etc
  4. Ionic components of salt stress (Iss, 6)
    • Na shoot (Nas), K shoot salt (Kss), etc
  5. Yield (Yid, 5)
    • Panicle number per plant (Pnu), Panicle length (Pal), etc
  6. Morphological salt response (Msr, 5)
    • Shoot BM ratio (Sbr), Root BM ratio (Rbr), etc
6 / 25

Study the genetics of each latent variable

7 / 25

Multivariate analysis

Bayesian genomic best linear unbiased prediction

- separate genetic effects from noise (44K SNPs)

\begin{align*} \mathbf{F} = \boldsymbol{\mu} + \mathbf{Xb} + \mathbf{Zu} + \boldsymbol{\epsilon} \end{align*}

  • \mathbf{F} : Vector of factor scores
  • \mathbf{X} : Incidence matrix for fixed effects
  • \mathbf{Z} : Incidence matrix for additive genetic effects
  • \mathbf{b}: Vector of fixed effects
  • \mathbf{u}: Vector of additive genetic effects
  • \mathbf{e}: Vector of residuals
8 / 25

Piror distributions

\begin{align*} \mathbf{F} = \boldsymbol{\mu} + \mathbf{Xb} + \mathbf{Zu} + \boldsymbol{\epsilon} \end{align*}


  • \mathbf{\mu}, \mathbf{b} were assigned a flat prior

  • \boldsymbol{\Sigma_{u}}, \boldsymbol{\Sigma_{e}} are variance-covariance matrix between latent variables

\begin{align*} \boldsymbol{\Sigma_{u}}, \boldsymbol{\Sigma_{e}} \sim \mathcal{W}^{-1}(\mathbf{I}_{66}, 6) \end{align*}

9 / 25

Bayesian Network

Probabilistic directed acyclic graphical model (DAG)

- interrelationships (Edges) among latent variables (Nodes)
- genetic selection for breeding requires causal assumptions
10 / 25

Constraint-based learning

11 / 25

Score-based learning

12 / 25

Examples of algorithms

  • Score-based algorithm

    • Hill climbing
    • Tabu search
  • Hybrid algorithm

    • Max-Min Hill Climbing algorithm
    • General 2-Phase Restricted Maximization algorithm
13 / 25

Standardized factor loadings

14 / 25

Standardized factor loadings

15 / 25

Genetic correlations among latent variable

16 / 25

Hill Climbing algorithm

17 / 25

Tabu algorithm

18 / 25

Max-Min Hill Climbing algorithm

19 / 25

General 2-Phase Restricted Maximization algorithm

20 / 25

Consensus Bayesian network

21 / 25

FA vs. PCA

  • What is the main difference between principal component analysis (PCA) and factor analysis (FA)?

  • Confirmatory factor analysis (CFA) vs. Explanatory factor analysis (EFA)

22 / 25

CFA vs. EFA

23 / 25

Summary

  • Bayesian cofirmatory factor analysis allows to work at the level of latent variables

  • Bayesian network can be applied to predict the potential influence of external interventions or selection associated with target traits

  • Provide greater insights than pairwise-association measures of multiple phenotypes

  • It is possible to dissect genetic signals from high-dimensional phenotypes if we focus on underlying patterns in big data

24 / 25

Useful links

25 / 25

Motivation: High dimensional and diverse phenotypes

2 / 25
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow