Tree versions of diffcyt functions
Source:R/diffcyt_buildTree.R, R/diffcyt_calcMediansByTreeMarker.R, R/diffcyt_calcTreeCounts.R, and 2 more
diffcyt_workflow.RdA collection of functions from the diffcyt package
have been generalized to work with data provided in a tree structure. The
tree represents increasingly coarse clustering of the cells, and the leaves
are the clusters from an initial, high-resolution clustering generated by
diffcyt. Note that diffcyt represents data using
SummarizedExperiment objects with cells in rows and features in
columns.
Usage
buildTree(d_se, dist_method = "euclidean", hclust_method = "average")
calcMediansByTreeMarker(d_se, tree)
calcTreeCounts(d_se, tree)
calcTreeMedians(d_se, tree, message = FALSE)Arguments
- d_se
A
SummarizedExperimentobject, with cells as rows and features as columns. This should be the output fromgenerateClusters. ThecolDatais assumed to contain a factor namedmarker_class.- dist_method
The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given. Please refer to
methodindistfor more information.- hclust_method
The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Please refer to
methodinhclustfor more information.- tree
A
phyloobject frombuildTree.- message
A logical scalar indicating whether progress messages should be printed.
Value
For
buildTree, aphyloobject representing the hierarchical clustering of the initial high-resolution clusters.For
calcTreeCounts, aTreeSummarizedExperimentobject, with clusters (nodes of the tree) in rows, samples in columns and abundances (counts) in anassay.For
calcMediansByTreeMarker, aTreeSummarizedExperimentobject with clusters (nodes of the tree) in rows and markers in columns. The marker expression values are in theassay.For
calcTreeMedians, aTreeSummarizedExperimentobject with median marker expression for each cluster (each node of the tree) and each sample for each cluster (node of the tree). Each node is represented as a separate assay, with the number of columns equal to the number of samples. Themetadataslot contains variablesid_type_markersandid_state_markers.
Details
The data object is assumed to contain a factor marker_class in the
column meta-data (see prepareData), which indicates
the protein marker class for each column of data ("type",
"state", or "none").
Variables id_type_markers and id_state_markers are saved in
the metadata slot of the output object. These can be used to
identify the 'cell type' and 'cell state' markers in the list of
assays in the output TreeSummarizedExperiment object,
which is useful in later steps of the 'diffcyt' pipeline.
buildTreeapplies hierarchical clustering to build a tree starting from the high-resolution clustering created bygenerateClusters. The function calculates the median abundance for each (ID type) marker and cluster, and uses this data to further aggregate the initial clusters using hierarchical clustering.calcTreeCountscalculates the number of cells per cluster-sample combination (referred to as cluster cell 'counts', 'abundances', or 'frequencies'. This is a tree version ofcalcCounts.calcMediansByTreeMarkercalculates the median value for each cluster-marker combination. A cluster is represented by a node on the tree. This is a tree version ofcalcMediansByClusterMarker.calcTreeMedianscalculates the median expression for each cluster-sample-marker combination. This is a tree version ofcalcMedians.
Examples
## For a complete workflow example demonstrating each step in the 'diffcyt'
## pipeline, please see the diffcyt vignette.
suppressPackageStartupMessages({
library(diffcyt)
library(TreeSummarizedExperiment)
})
## Helper function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", seq_len(ncol)))
d
}
## Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(), sample2 = d_random(),
sample3 = d_random(), sample4 = d_random()
)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", seq_len(4))),
group_id = factor(c("group1", "group1", "group2", "group2"))
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", seq_len(20))),
marker_name = paste0("marker", sprintf("%02d", seq_len(20))),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none"))
)
# Prepare data
d_se <- diffcyt::prepareData(d_input, experiment_info, marker_info)
# Transform data
d_se <- diffcyt::transformData(d_se)
# Generate clusters
d_se <- diffcyt::generateClusters(d_se)
#> FlowSOM clustering completed in 0.3 seconds
# Build a tree
tr <- buildTree(d_se)
## Calculate abundances for nodes in each sample
d_counts_tree <- calcTreeCounts(d_se = d_se, tree = tr)
## Calculate medians (by cluster and marker)
d_medians_by_cluster_marker <-
calcMediansByTreeMarker(d_se = d_se, tree = tr)
#> Warning: Multiple nodes are found to have the same label.
#> Warning: Multiple nodes are found to have the same label.
#> Warning: Multiple nodes are found to have the same label.
## Calculate medians (by cluster, sample and marker)
d_medians_tree <- calcTreeMedians(d_se = d_se, tree = tr)
#> Warning: Multiple nodes are found to have the same label.
#> Warning: Multiple nodes are found to have the same label.
#> Warning: Multiple nodes are found to have the same label.