Type: | Package |
Title: | Directed Acyclic Graph HMM with TAN Structured Emissions |
Version: | 0.1.1 |
Maintainer: | Prajwal Bende <prajwal.bende@gmail.com> |
Description: | Hidden Markov models (HMMs) are a formal foundation for making probabilistic models of linear sequence. They provide a conceptual toolkit for building complex models just by drawing an intuitive picture. They are at the heart of a diverse range of programs, including genefinding, profile searches, multiple sequence alignment and regulatory site identification. HMMs are the Legos of computational sequence analysis. In graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph. Tree represents the nodes connected by edges. It is a non-linear data structure. A poly-tree is simply a directed acyclic graph whose underlying undirected graph is a tree. The model proposed in this package is the same as an HMM but where the states are linked via a polytree structure rather than a simple path. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2.0.0)] |
Encoding: | UTF-8 |
Imports: | gtools, future, matrixStats, PRROC, bnlearn, bnclassify |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2025-07-15 16:01:46 UTC; prajw |
Author: | Prajwal Bende [aut, cre], Russ Greiner [ths], Pouria Ramazi [ths] |
Repository: | CRAN |
Date/Publication: | 2025-07-18 15:20:29 UTC |
Infer the backward probabilities for all the nodes of the dagHMM
Description
backward
calculates the backward probabilities for all the nodes
Usage
backward(hmm, observation, bt_seq, kn_states = NULL)
Arguments
hmm |
hmm Object of class List given as output by |
observation |
Dataframe containing the discritized character values of only covariates at each node. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
bt_seq |
A vector denoting the order of nodes in which the DAG should be traversed in backward direction(from leaves to roots). Output of |
kn_states |
(Optional) A (L * 2) dataframe where L is the number of training nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
Details
The backward probability for state X and observation at node k is defined as the probability of observing the sequence of observations e_k+1, ... ,e_n under the condition that the state at node k is X.
That is:b[X,k] := Prob(E_k+1 = e_k+1, ... , E_n = e_n | X_k = X)
where E_1...E_n = e_1...e_n
is the sequence of observed emissions and X_k
is a random variable that represents the state at node k
Value
(2 * N) matrix denoting the backward probabilites at each node of the dag, where "N" is the total number of nodes in the dag
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
bt_sq = bwd_seq_gen(hmmA)
kn_st = data.frame(node=c(3),state=c("P"),stringsAsFactors = FALSE)
#state at node 3 is known to be "P"
BackwardProbs = backward(hmm=hmmA,observation=obsvA,bt_seq=bt_sq,kn_states=kn_st)
Inferring the parameters of a dag Hidden Markov Model via the Baum-Welch algorithm
Description
For an initial Hidden Markov Model (HMM) with some assumed initial parameters and a given set of observations at all the nodes of the dag, the Baum-Welch algorithm infers optimal parameters to the HMM. Since the Baum-Welch algorithm is a variant of the Expectation-Maximisation algorithm, the algorithm converges to a local solution which might not be the global optimum. Note that if you give the training and validation data, the function will message out AUC and AUPR values after every iteration. Also, validation data must contain more than one instance of either of the possible states
Usage
baumWelch(
hmm,
observation,
kn_states = NULL,
kn_verify = NULL,
maxIterations = 50,
delta = 1e-05,
pseudoCount = 1e-100
)
Arguments
hmm |
hmm Object of class List given as output by |
observation |
Dataframe containing the discritized character values of only covariates at each node. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
kn_states |
(Optional) A (L * 2) dataframe where L is the number of training nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
kn_verify |
(Optional) A (L * 2) dataframe where L is the number of validation nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
maxIterations |
(Optional) The maximum number of iterations in the Baum-Welch algorithm. Default is 50 |
delta |
(Optional) Additional termination condition, if the transition and emission matrices converge, before reaching the maximum number of iterations ( |
pseudoCount |
(Optional) Adding this amount of pseudo counts in the estimation-step of the Baum-Welch algorithm. Default is 1e-100 (Don't keep it zero to avoid numerical errors) |
Value
List of three elements, first being the infered HMM whose representation is equivalent to the representation in initHMM
, second being a list of statistics of algorithm and third being the final state probability distribution at all nodes.
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates.
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
kn_st = data.frame(node=c(2),state=c("P"),stringsAsFactors = FALSE)
#state at node 2 is known to be "P"
kn_vr = data.frame(node=c(3,4,5),state=c("P","N","P"),stringsAsFactors = FALSE)
#state at node 3,4,5 are "P","N","P" respectively
learntHMM= baumWelch(hmm=hmmA,observation=obsvA,kn_states=kn_st, kn_verify=kn_vr)
Implementation of the Baum Welch Algorithm as a special case of EM algorithm
Description
baumWelch
recursively calls this function to give a final estimate of parameters for dag HMM
Uses Parallel Processing to speed up calculations for large data. Should not be used directly.
Usage
baumWelchRecursion(hmm, observation, t_seq, kn_states = NULL, kn_verify = NULL)
Arguments
hmm |
hmm Object of class List given as output by |
observation |
Dataframe containing the discritized character values of only covariates at each node. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
t_seq |
list of forward and backward DAG traversal sequence (in that order) as given output by |
kn_states |
(Optional) A (L * 2) dataframe where L is the number of training nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
kn_verify |
(Optional) A (L * 2) dataframe where L is the number of validation nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
Value
List containing estimated Transition and Emission probability matrices
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
kn_st = data.frame(node=c(2),state=c("P"),stringsAsFactors = FALSE)
#state at node 2 is known to be "P"
kn_vr = data.frame(node=c(3,4,5),state=c("P","N","P"),stringsAsFactors = FALSE)
#state at node 3,4,5 are "P","N","P" respectively
t_seq=list(fwd_seq_gen(hmmA),bwd_seq_gen(hmmA))
newparam= baumWelchRecursion(hmm=hmmA,observation=obsvA,t_seq=t_seq,
kn_states=kn_st, kn_verify=kn_vr)
Calculate the order in which nodes in the dag should be traversed during the backward pass(leaves to roots)
Description
dag is a complex graphical model where we can have multiple parents and multiple children for a node. Hence the order in which the dag should be tranversed becomes significant. Backward algorithm is a dynamic programming problem where to calculate the values at a node, we need the values of the children nodes beforehand, which need to be traversed before this node. This algorithm outputs a possible(not unique) order of the traversal of nodes ensuring that the childrens are traversed first before the parents
Usage
bwd_seq_gen(hmm, nlevel = 100)
Arguments
hmm |
hmm Object of class List given as output by |
nlevel |
No. of levels in the dag, if known. Default is 100 |
Value
Vector of length "D", where "D" is the number of nodes in the dag
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
bt_sq = bwd_seq_gen(hmmA)
Calculating the probability of occurance of particular values of covariates at a node given the value of target.
Description
Calculating the probability of occurance of particular values of covariates at a node given the value of target.
Usage
calc_emis(state, obsv, probs, pars)
Arguments
state |
character value of state variable at a particular node. |
obsv |
character vector of values of covariates at that node. |
probs |
emission probability distribution of the covariates in TAN structure. |
pars |
integer vector denoting the parents of the nodes(other than root) in the TAN structure. |
Value
probability of occurance of particular values of covariates at a node given the value of target.
Infer the forward probabilities for all the nodes of the dagHMM
Description
forward
calculates the forward probabilities for all the nodes
Usage
forward(hmm, observation, ft_seq, kn_states = NULL)
Arguments
hmm |
hmm Object of class List given as output by |
observation |
Dataframe containing the discritized character values of only covariates at each node. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
ft_seq |
A vector denoting the order of nodes in which the DAG should be traversed in forward direction(from roots to leaves). Output of |
kn_states |
(Optional) A (L * 2) dataframe where L is the number of training nodes where state values are known. First column should be the node number and the second column being the corresponding known state values of the nodes |
Details
The forward probability for state X up to observation at node k is defined as the probability of observing the sequence of observations e_1,..,e_k given that the state at node k is X.
That is:f[X,k] := Prob( X_k = X | E_1 = e_1,.., E_k = e_k)
where E_1...E_n = e_1...e_n
is the sequence of observed emissions and X_k
is a random variable that represents the state at node k
Value
(2 * N) matrix denoting the forward probabilites at each node of the dag, where "N" is the total number of nodes in the dag
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
ft_sq = fwd_seq_gen(hmmA)
kn_st = data.frame(node=c(3),state=c("P"),stringsAsFactors = FALSE)
#state at node 3 is known to be "P"
ForwardProbs = forward(hmm=hmmA,observation=obsvA,ft_seq=ft_sq,kn_states=kn_st)
Calculate the order in which nodes in the dag should be traversed during the forward pass(roots to leaves)
Description
dag is a complex graphical model where we can have multiple parents and multiple children for a node. Hence the order in which the dag should be tranversed becomes significant. Forward algorithm is a dynamic programming problem where to calculate the values at a node, we need the values of the parent nodes beforehand, which need to be traversed before this node. This algorithm outputs a possible(not unique) order of the traversal of nodes ensuring that the parents are traversed first before the children.
Usage
fwd_seq_gen(hmm, nlevel = 100)
Arguments
hmm |
hmm Object of class List given as output by |
nlevel |
No. of levels in the dag, if known. Default is 100 |
Value
Vector of length "D", where "D" is the number of nodes in the dag
See Also
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
ft_sq = fwd_seq_gen(hmmA)
Generating the inital emission probability distribution of the covariates in TAN structure.
Description
Generating the inital emission probability distribution of the covariates in TAN structure.
Usage
gen_emis(net, observation, sym)
Arguments
net |
Object of type 'bn' provided as output by model2network showing the TAN structure between target variable and covariates. |
observation |
Dataframe containing the discritized character values of only covariates at each node. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
sym |
Character vector of possible values of target variable |
Value
Inital emission probability distribution of the covariates in TAN structure
Examples
library(bnlearn)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
target_value= c("P","N")
prob= gen_emis(net=bnet,observation=obsvA,sym=target_value)
Initializing dagHMM with given parameters
Description
Initializing dagHMM with given parameters
Usage
initHMM(
States,
dagmat,
net = NULL,
observation,
startProbs = NULL,
transProbs = NULL,
leak_param = 0
)
Arguments
States |
A (2 * 1) vector with first element being discrete state value for the cases(or positive) and second element being discrete state value for the controls(or negative) for given dagHMM |
dagmat |
Adjacent Symmetry Matrix that describes the topology of the dag |
net |
Object of type 'bn' provided as output by model2network showing the TAN structure between target variable and covariates. |
observation |
Dataframe containing the discritized character values of covariates at each node. If "net" is not given, dataframe should also contain the column for target variable (as the last column) so as to learn the structure. Column names of dataframe should be same as the covariate names. Missing values should be denoted by "NA". |
startProbs |
(Optional) (2 * 1) vector containing starting probabilities for the states. Default is equally probable states |
transProbs |
(Optional) (2 * 2) matrix containing transition probabilities for the states. |
leak_param |
(Optional) Leak parameter used in Noisy-OR algorithm used in |
Value
List describing the parameters of dagHMM(pi, alpha, beta, dagmat, net)
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates.
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
obsvB=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")),
A=c("P","N","P","P","N"))
hmmB = initHMM(States=states, dagmat= tmat, net=NULL, observation=obsvB)
Calculating the probability of transition from multiple nodes to given node in the dag
Description
Calculating the probability of transition from multiple nodes to given node in the dag
Usage
noisy_or(hmm, prev_state, cur_state)
Arguments
hmm |
Object of class List given as output by |
prev_state |
vector containing state variable values for the previous nodes |
cur_state |
character denoting the state variable value for current node |
Value
The Noisy_OR probability for the transition
Examples
library(bnlearn)
tmat = matrix(c(0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0),
5,5, byrow= TRUE ) #for "X" (5 nodes) shaped dag
states = c("P","N") #"P" represent cases(or positive) and "N" represent controls(or negative)
bnet = model2network("[A][C|A:B][D|A:C][B|A]") #A is the target variable while
#B, C and D are covariates.
obsvA=data.frame(list(B=c("L","H","H","L","L"),C=c("H","H","L","L","H"),D=c("L","L","L","H","H")))
hmmA = initHMM(States=states, dagmat= tmat, net=bnet, observation=obsvA)
Transprob = noisy_or(hmm=hmmA,prev_state=c("P","N"),cur_state="P") #for transition from P & N
#simultaneously to P