Test for differential state of entities using functions from the
edgeR package. This adapts
edgerWrp to accept input as a
SummarizedExperiment (SE) object
instead of matrix. Each assay should correspond to data
for one node of the tree. Samples are in columns and
features are in rows. The sample information is
in colData. The tree that stores the hierarchical relation between
the assays is provided via the argument tree.
Usage
runDS(
SE,
tree,
option = c("glm", "glmQL"),
design = NULL,
contrast = NULL,
filter_min_count = 1,
filter_min_total_count = 15,
filter_large_n = 10,
filter_min_prop = 1,
min_cells = 10,
normalize = TRUE,
normalize_method = "TMM",
group_column = "group_id",
design_terms = "group_id",
message = TRUE,
...
)Arguments
- SE
A
SummarizedExperimentobject, typically generated byaggDS.- tree
A
phyloobject. Each assay ofSEstores data for one node of the tree.- option
Either
"glm"or"glmQL". If"glm",glmFitandglmLRTare used; otherwise,glmQLFitandglmQLFTestare used. Details about the difference between two options are in the help page ofglmQLFit.- design
A numeric design matrix. If
NULL, all columns of the sample annotation will be used to create the design matrix.- contrast
A numeric vector specifying one contrast of the linear model coefficients to be tested equal to zero. Its length must equal to the number of columns of design. If
NULL, the last coefficient will be tested equal to zero.- filter_min_count
A numeric value, passed to min.count of
filterByExpr.- filter_min_total_count
A numeric value, passed to min.total.count of
filterByExpr.- filter_large_n
A numeric value, passed to large.n of
filterByExpr.- filter_min_prop
A numeric value, passed to min.prop of
filterByExpr.- min_cells
A numeric scalar specifying the minimal number of cells in a node required to include a node in the analysis. The information about the number of cells per node and sample should be available in
metadata(SE)$n_cells. A node is retained if at least half of the samples have at leastmin_cellscells belonging to the node.- normalize
A logical scalar indicating whether to estimate normalization factors (using
calcNormFactors).- normalize_method
Normalization method to be used. See
calcNormFactorsfor more details.- group_column
The name of the column in the sample annotation providing group labels for samples. This annotation is used for filtering.
- design_terms
The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided.
- message
A logical scalar, indicating whether progress messages should be printed.
- ...
More arguments to pass to
glmFit(option = "glm"orglmQLFit(option = "glmQL").
Value
A list with entries edgeR_results, tree, and nodes_drop.
- edgeR_results
A list. Each of the elements contains the output of
glmQLFTestorglmLRTfor one node, depending on the specifiedoption.- tree
The hierarchical structure of entities that was stored in the input
SE.- nodes_drop
A vector storing the alias node labels of entities that are filtered before analysis due to low counts.
Examples
suppressPackageStartupMessages({
library(TreeSummarizedExperiment)
})
## Load example data
ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds",
package = "treeclimbR"))
ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id",
group_id = "group", cluster_id = "cluster_id", FUN = sum)
#> Warning: Multiple nodes are found to have the same label.
## Information about the number of cells is provided in the metadata
S4Vectors::metadata(ds_se)$n_cells
#> 4 1 2 3
#> alias_1 10 8 14 12
#> alias_2 17 12 10 14
#> alias_3 9 9 18 11
#> alias_4 14 13 14 7
#> alias_5 11 9 13 14
#> alias_6 15 14 13 12
#> alias_7 16 18 16 13
#> alias_8 17 13 15 7
#> alias_9 7 12 13 13
#> alias_10 12 11 12 12
#> alias_11 128 119 138 115
#> alias_12 116 108 126 103
#> alias_13 36 29 42 37
#> alias_14 26 21 28 25
#> alias_15 80 79 84 66
#> alias_16 73 67 71 53
#> alias_17 40 36 40 33
#> alias_18 25 22 27 21
#> alias_19 33 31 31 20
ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL",
group_column = "group", contrast = c(0, 1),
filter_min_count = 0, filter_min_total_count = 1,
design = model.matrix(~ group, data = colData(ds_se)),
filter_min_prop = 0, min_cells = 5, message = FALSE)
## Top differential features (across nodes)
nodeResult(ds_res, type = "DS")
#> logFC logCPM F PValue FDR node feature
#> 3 1.647224 16.31850 2702.109 2.402920e-38 9.131097e-36 11 3
#> 1...2 1.623036 16.28457 2411.255 2.264792e-37 4.303105e-35 11 1
#> 6...3 -1.566550 16.27571 2177.354 1.684458e-36 2.133646e-34 11 6
#> 5...4 -1.528560 16.26806 2042.835 5.892588e-36 5.597959e-34 11 5
#> 6...5 -1.633382 16.19005 1671.077 3.012255e-34 2.289314e-32 13 6
#> 5...6 -1.577632 16.18364 1610.875 6.168542e-34 3.906743e-32 13 5
#> 2 1.535873 16.20796 1350.855 1.904533e-32 1.033890e-30 13 2
#> 4 -1.613073 16.25077 1258.532 7.530189e-32 3.576840e-30 13 4
#> 1...9 1.535774 16.23117 1224.192 1.287592e-31 5.436501e-30 13 1
#> 1...10 1.581489 16.32083 1147.625 4.496677e-31 1.708737e-29 17 1