--- title: "Example workflow using FUSE" author: "Susanna Holmstrom" date: "2026-02-05" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Example workflow using FUSE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE) ``` ```{r} library(methFuse) ``` ### Reading in data We start by reading in the data, which consist of the tables K0 and K1. Both have CpG sites as rows and samples as columns, and - K0 contains the unmethylated counts - K1 contains the methylated counts This is a dummy data set consisting of manipulated counts of the 100 000 first CpG sites in chromosome 20. ```{r} k0_file <- system.file("examples/k0.tsv.gz", package = "methFuse") k1_file <- system.file("examples/k1.tsv.gz", package = "methFuse") K0 <- read.table(k0_file, header = TRUE) K1 <- read.table(k1_file, header = TRUE) head(K0) ``` ### Apply FUSE as a pipeline The whole FUSE pipeline can be applied using the function 'fuse.segment()'. It performs the following steps: 1. clustering 2. cutting clustering tree optimally 3. summarizing segments The function takes as input the count matrices K0 and K1 and the chromosome and position information for each CpG site, and outputs a summary table and a data frame with betas. ```{r} segment_result <- fuse.segment( as.matrix(K0), as.matrix(K1), chr = sub("\\..*$", "", rownames(K0)), pos = as.integer(sub("^.*\\.", "", rownames(K0))), method = "BIC" ) head(segment_result$summary) ``` ```{r} head(segment_result$betas_per_segment) ``` ```{r, fig.height=10, fig.width=8} plot(segment_result, segments_to_plot = 1:nrow(segment_result$summary)) plot(segment_result, segments_to_plot = 50:60) ``` ## Using BSseq or methrix The 'fuse.segment()' function accepts as input also a BSseq or a methrix object. ```{r bsseq-example, eval = requireNamespace("bsseq", quietly = TRUE)} library(bsseq) set.seed(1) M <- matrix(c(sample(0:10, 1000, TRUE), sample(50:60, 1000, TRUE), sample(0:10, 1000, TRUE), sample(20:30, 1000, TRUE)) , nrow = 1000, byrow = T) Cov <- M + matrix(sample(50:60, 4000, TRUE), nrow = 1000) bs <- BSseq( chr = rep("chr1", 1000), pos = seq_len(1000), M = M, Cov = Cov, sampleNames = colnames(M) ) res <- fuse.segment(bs) head(res$summary) ``` ```{r methrix-example, eval = requireNamespace("methrix", quietly = TRUE)} library(methrix) data(methrix_data, package = "methrix") res <- fuse.segment(methrix_data) head(res$summary) ``` ## Apply FUSE through separate steps If the intermediate outputs of the method are relevant, then the pipeline can also be applied by calling each of the functions 'fuse.cluster()', 'number.of.clusters()', 'fuse.cut.tree()', and 'fuse.summary()' separately. ### 1. Cluster In the first step, 'fuse.cluster()' is applied on the count matrices. This performs a hierarchical clustering of closest neighbors, and outputs a clustering tree of class 'hclust'. ```{r} tree <- fuse.cluster(as.matrix(K0), as.matrix(K1), chr = sub("\\..*$", "", rownames(K0)), pos = as.integer(sub("^.*\\.", "", rownames(K0)))) names(tree) head(tree$merge) head(tree$height) ``` The clustering tree contains the following elements: - 'merge': the labels of the merged points - 'height': The total log-likelihood for forming this merge - 'order': Same order as original, since clustering is done in 1D - 'labels': Labels of form 'chr.pos' - 'call': 'fuse.cluster(K0, K1)' - 'method': 'fuse' - 'dist.method': 'fuse' ### 2. Cutting the tree In order to cut the clustering tree, the optimal number of segments needs to be calculated. For this we have the function 'number.of.clusters()', which performs model selection by minimizing either the Bayesian Information Criterion (BIC, default), or the Akaike Information Criterion (AIC). ```{r} optimal_num_of_segments <- number.of.clusters(tree, ncol(K0), 'BIC') optimal_num_of_segments ``` The tree can then be cut using 'fuse.cut.tree()', (or 'cutree' from package 'stats'). ```{r} segments <- fuse.cut.tree(tree, optimal_num_of_segments) head(segments, 100) ``` ### 3. Summarizing segments The segments can be summarized in a table with the function 'fuse.summary()'. ```{r} result <- fuse.summary(as.matrix(K0), as.matrix(K1), chr = sub("\\..*$", "", rownames(K0)), pos = as.numeric(sub("^.*\\.", "", rownames(K0))), segments) head(result$summary) ``` ```{r} head(result$betas_per_segment) ``` ### Verification The outputs are identical. ```{r} identical(segment_result$summary, result$summary) ``` ```{r} identical(segment_result$betas_per_segment, result$betas_per_segment) ``` ## Plotting Let's visualize a small piece of the result: ```{r, fig.width=8, fig.height=3} plot(result, segments_to_plot = 1:50) ```