Test for differential state of entities using functions from the
edgeR
package. This adapts
edgerWrp
to accept input as a
SummarizedExperiment
(SE) object
instead of matrix
. Each assay
should correspond to data
for one node of the tree. Samples are in columns and
features are in rows. The sample information is
in colData
. The tree that stores the hierarchical relation between
the assays
is provided via the argument tree
.
Usage
runDS(
SE,
tree,
option = c("glm", "glmQL"),
design = NULL,
contrast = NULL,
filter_min_count = 1,
filter_min_total_count = 15,
filter_large_n = 10,
filter_min_prop = 1,
min_cells = 10,
normalize = TRUE,
normalize_method = "TMM",
group_column = "group_id",
design_terms = "group_id",
message = TRUE,
...
)
Arguments
- SE
A
SummarizedExperiment
object, typically generated byaggDS
.- tree
A
phylo
object. Each assay ofSE
stores data for one node of the tree.- option
Either
"glm"
or"glmQL"
. If"glm"
,glmFit
andglmLRT
are used; otherwise,glmQLFit
andglmQLFTest
are used. Details about the difference between two options are in the help page ofglmQLFit
.- design
A numeric design matrix. If
NULL
, all columns of the sample annotation will be used to create the design matrix.- contrast
A numeric vector specifying one contrast of the linear model coefficients to be tested equal to zero. Its length must equal to the number of columns of design. If
NULL
, the last coefficient will be tested equal to zero.- filter_min_count
A numeric value, passed to min.count of
filterByExpr
.- filter_min_total_count
A numeric value, passed to min.total.count of
filterByExpr
.- filter_large_n
A numeric value, passed to large.n of
filterByExpr
.- filter_min_prop
A numeric value, passed to min.prop of
filterByExpr
.- min_cells
A numeric scalar specifying the minimal number of cells in a node required to include a node in the analysis. The information about the number of cells per node and sample should be available in
metadata(SE)$n_cells
. A node is retained if at least half of the samples have at leastmin_cells
cells belonging to the node.- normalize
A logical scalar indicating whether to estimate normalization factors (using
calcNormFactors
).- normalize_method
Normalization method to be used. See
calcNormFactors
for more details.- group_column
The name of the column in the sample annotation providing group labels for samples. This annotation is used for filtering.
- design_terms
The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided.
- message
A logical scalar, indicating whether progress messages should be printed.
- ...
More arguments to pass to
glmFit
(option = "glm"
orglmQLFit
(option = "glmQL"
).
Value
A list with entries edgeR_results, tree, and nodes_drop.
- edgeR_results
A list. Each of the elements contains the output of
glmQLFTest
orglmLRT
for one node, depending on the specifiedoption
.- tree
The hierarchical structure of entities that was stored in the input
SE
.- nodes_drop
A vector storing the alias node labels of entities that are filtered before analysis due to low counts.
Examples
suppressPackageStartupMessages({
library(TreeSummarizedExperiment)
})
## Load example data
ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds",
package = "treeclimbR"))
ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id",
group_id = "group", cluster_id = "cluster_id", FUN = sum)
#> Warning: Multiple nodes are found to have the same label.
## Information about the number of cells is provided in the metadata
S4Vectors::metadata(ds_se)$n_cells
#> 4 1 2 3
#> alias_1 10 8 14 12
#> alias_2 17 12 10 14
#> alias_3 9 9 18 11
#> alias_4 14 13 14 7
#> alias_5 11 9 13 14
#> alias_6 15 14 13 12
#> alias_7 16 18 16 13
#> alias_8 17 13 15 7
#> alias_9 7 12 13 13
#> alias_10 12 11 12 12
#> alias_11 128 119 138 115
#> alias_12 116 108 126 103
#> alias_13 36 29 42 37
#> alias_14 26 21 28 25
#> alias_15 80 79 84 66
#> alias_16 73 67 71 53
#> alias_17 40 36 40 33
#> alias_18 25 22 27 21
#> alias_19 33 31 31 20
ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL",
group_column = "group", contrast = c(0, 1),
filter_min_count = 0, filter_min_total_count = 1,
design = model.matrix(~ group, data = colData(ds_se)),
filter_min_prop = 0, min_cells = 5, message = FALSE)
#> 0 nodes are ignored, as they don't contain at least 5 cells in at least half of the samples.
## Top differential features (across nodes)
nodeResult(ds_res, type = "DS")
#> logFC logCPM F PValue FDR node feature
#> 3 1.647224 16.31850 2702.109 2.402920e-38 9.131097e-36 11 3
#> 1...2 1.623036 16.28457 2411.255 2.264792e-37 4.303105e-35 11 1
#> 6...3 -1.566550 16.27571 2177.354 1.684458e-36 2.133646e-34 11 6
#> 5...4 -1.528560 16.26806 2042.835 5.892588e-36 5.597959e-34 11 5
#> 6...5 -1.633382 16.19005 1671.077 3.012255e-34 2.289314e-32 13 6
#> 5...6 -1.577632 16.18364 1610.875 6.168542e-34 3.906743e-32 13 5
#> 2 1.535873 16.20796 1350.855 1.904533e-32 1.033890e-30 13 2
#> 4 -1.613073 16.25077 1258.532 7.530189e-32 3.576840e-30 13 4
#> 1...9 1.535774 16.23117 1224.192 1.287592e-31 5.436501e-30 13 1
#> 1...10 1.581489 16.32083 1147.625 4.496677e-31 1.708737e-29 17 1