Test for differential state of entities using functions from the
edgeR
package. This adapts edgerWrp
to accept
input as a
SummarizedExperiment
(SE) object
instead of matrix
. Each assay
should correspond to data
for one node of the tree. Samples are in columns and
features are in rows. The sample information is
in colData
. The tree that stores the hierarchical relation between
the assays
is provided via the argument tree
.
runDS(
SE,
tree,
option = c("glm", "glmQL"),
design = NULL,
contrast = NULL,
filter_min_count = 1,
filter_min_total_count = 15,
filter_large_n = 10,
filter_min_prop = 1,
min_cells = 10,
normalize = TRUE,
normalize_method = "TMM",
group_column = "group_id",
design_terms = "group_id",
message = TRUE,
...
)
A SummarizedExperiment
object, typically generated by
aggDS
.
A phylo
object. Each assay of SE
stores data for one node of the tree.
Either "glm"
or "glmQL"
. If "glm"
,
glmFit
and glmLRT
are used;
otherwise, glmQLFit
and
glmQLFTest
are used. Details about the difference
between two options are in the help page of
glmQLFit
.
A numeric design matrix. If NULL
, all columns of
the sample annotation will be used to create the design matrix.
A numeric vector specifying one contrast of
the linear model coefficients to be tested equal to zero. Its length
must equal to the number of columns of design. If NULL
, the last
coefficient will be tested equal to zero.
A numeric value, passed to min.count of
filterByExpr
.
A numeric value, passed to
min.total.count of filterByExpr
.
A numeric value, passed to large.n of
filterByExpr
.
A numeric value, passed to min.prop of
filterByExpr
.
A numeric scalar specifying the minimal number of cells
in a node required to include a node in the analysis. The information
about the number of cells per node and sample should be available in
metadata(SE)$n_cells
. A node is retained if at least half of the
samples have at least min_cells
cells belonging to the node.
A logical scalar indicating whether to estimate
normalization factors (using calcNormFactors
).
Normalization method to be used. See
calcNormFactors
for more details.
The name of the column in the sample annotation providing group labels for samples. This annotation is used for filtering.
The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided.
A logical scalar, indicating whether progress messages should be printed.
More arguments to pass to glmFit
(option = "glm"
or glmQLFit
(option = "glmQL"
).
A list with entries edgeR_results, tree, and nodes_drop.
A list. Each of the elements contains the output of
glmQLFTest
or
glmLRT
for one node, depending on the specified
option
.
The hierarchical structure of entities that was stored in the
input SE
.
A vector storing the alias node labels of entities that are filtered before analysis due to low counts.
suppressPackageStartupMessages({
library(TreeSummarizedExperiment)
})
## Load example data
ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds",
package = "treeclimbR"))
ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id",
group_id = "group", cluster_id = "cluster_id", FUN = sum)
#> Warning: Multiple nodes are found to have the same label.
## Information about the number of cells is provided in the metadata
S4Vectors::metadata(ds_se)$n_cells
#> 4 1 2 3
#> alias_1 10 8 14 12
#> alias_2 17 12 10 14
#> alias_3 9 9 18 11
#> alias_4 14 13 14 7
#> alias_5 11 9 13 14
#> alias_6 15 14 13 12
#> alias_7 16 18 16 13
#> alias_8 17 13 15 7
#> alias_9 7 12 13 13
#> alias_10 12 11 12 12
#> alias_11 128 119 138 115
#> alias_12 116 108 126 103
#> alias_13 36 29 42 37
#> alias_14 26 21 28 25
#> alias_15 80 79 84 66
#> alias_16 73 67 71 53
#> alias_17 40 36 40 33
#> alias_18 25 22 27 21
#> alias_19 33 31 31 20
ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL",
group_column = "group", contrast = c(0, 1),
filter_min_count = 0, filter_min_total_count = 1,
design = model.matrix(~ group, data = colData(ds_se)),
filter_min_prop = 0, min_cells = 5, message = FALSE)
#> 0 nodes are ignored, as they don't contain at least 5 cells in at least half of the samples.
## Top differential features (across nodes)
nodeResult(ds_res, type = "DS")
#> logFC logCPM F PValue FDR node feature
#> 3 1.646829 16.31850 2562.620 6.829409e-38 2.595176e-35 11 3
#> 1...2 1.623029 16.28457 2288.515 6.329879e-37 1.202677e-34 11 1
#> 6...3 -1.566619 16.27571 2061.142 4.946102e-36 6.265063e-34 11 6
#> 5...4 -1.529440 16.26806 1939.114 1.637568e-35 1.555689e-33 11 5
#> 6...5 -1.633327 16.19005 1681.250 2.675315e-34 2.033239e-32 13 6
#> 5...6 -1.577228 16.18364 1620.074 5.519353e-34 3.495590e-32 13 5
#> 2 1.537199 16.20796 1360.382 1.661333e-32 9.018666e-31 13 2
#> 4 -1.612205 16.25077 1273.402 5.995628e-32 2.847923e-30 13 4
#> 1...9 1.534617 16.23117 1231.883 1.140403e-31 4.815033e-30 13 1
#> 1...10 1.582595 16.32083 1059.818 2.092353e-30 7.950943e-29 17 1