Test for differential state of entities using functions from the edgeR package. This adapts edgerWrp to accept input as a SummarizedExperiment (SE) object instead of matrix. Each assay should correspond to data for one node of the tree. Samples are in columns and features are in rows. The sample information is in colData. The tree that stores the hierarchical relation between the assays is provided via the argument tree.

runDS(
  SE,
  tree,
  option = c("glm", "glmQL"),
  design = NULL,
  contrast = NULL,
  filter_min_count = 1,
  filter_min_total_count = 15,
  filter_large_n = 10,
  filter_min_prop = 1,
  min_cells = 10,
  normalize = TRUE,
  normalize_method = "TMM",
  group_column = "group_id",
  design_terms = "group_id",
  message = TRUE,
  ...
)

Arguments

SE

A SummarizedExperiment object, typically generated by aggDS.

tree

A phylo object. Each assay of SE stores data for one node of the tree.

option

Either "glm" or "glmQL". If "glm", glmFit and glmLRT are used; otherwise, glmQLFit and glmQLFTest are used. Details about the difference between two options are in the help page of glmQLFit.

design

A numeric design matrix. If NULL, all columns of the sample annotation will be used to create the design matrix.

contrast

A numeric vector specifying one contrast of the linear model coefficients to be tested equal to zero. Its length must equal to the number of columns of design. If NULL, the last coefficient will be tested equal to zero.

filter_min_count

A numeric value, passed to min.count of filterByExpr.

filter_min_total_count

A numeric value, passed to min.total.count of filterByExpr.

filter_large_n

A numeric value, passed to large.n of filterByExpr.

filter_min_prop

A numeric value, passed to min.prop of filterByExpr.

min_cells

A numeric scalar specifying the minimal number of cells in a node required to include a node in the analysis. The information about the number of cells per node and sample should be available in metadata(SE)$n_cells. A node is retained if at least half of the samples have at least min_cells cells belonging to the node.

normalize

A logical scalar indicating whether to estimate normalization factors (using calcNormFactors).

normalize_method

Normalization method to be used. See calcNormFactors for more details.

group_column

The name of the column in the sample annotation providing group labels for samples. This annotation is used for filtering.

design_terms

The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided.

message

A logical scalar, indicating whether progress messages should be printed.

...

More arguments to pass to glmFit (option = "glm" or glmQLFit (option = "glmQL").

Value

A list with entries edgeR_results, tree, and nodes_drop.

edgeR_results

A list. Each of the elements contains the output of glmQLFTest or glmLRT for one node, depending on the specified option.

tree

The hierarchical structure of entities that was stored in the input SE.

nodes_drop

A vector storing the alias node labels of entities that are filtered before analysis due to low counts.

Author

Ruizhu Huang

Examples

suppressPackageStartupMessages({
    library(TreeSummarizedExperiment)
})
## Load example data
ds_tse <- readRDS(system.file("extdata", "ds_sim_20_500_8de.rds",
                              package = "treeclimbR"))
ds_se <- aggDS(TSE = ds_tse, assay = "counts", sample_id = "sample_id",
               group_id = "group", cluster_id = "cluster_id", FUN = sum)
#> Warning: Multiple nodes are found to have the same label.
## Information about the number of cells is provided in the metadata
S4Vectors::metadata(ds_se)$n_cells
#>            4   1   2   3
#> alias_1   10   8  14  12
#> alias_2   17  12  10  14
#> alias_3    9   9  18  11
#> alias_4   14  13  14   7
#> alias_5   11   9  13  14
#> alias_6   15  14  13  12
#> alias_7   16  18  16  13
#> alias_8   17  13  15   7
#> alias_9    7  12  13  13
#> alias_10  12  11  12  12
#> alias_11 128 119 138 115
#> alias_12 116 108 126 103
#> alias_13  36  29  42  37
#> alias_14  26  21  28  25
#> alias_15  80  79  84  66
#> alias_16  73  67  71  53
#> alias_17  40  36  40  33
#> alias_18  25  22  27  21
#> alias_19  33  31  31  20

ds_res <- runDS(SE = ds_se, tree = colTree(ds_tse), option = "glmQL",
                group_column = "group", contrast = c(0, 1),
                filter_min_count = 0, filter_min_total_count = 1,
                design = model.matrix(~ group, data = colData(ds_se)),
                filter_min_prop = 0, min_cells = 5, message = FALSE)
#> 0 nodes are ignored, as they don't contain at least 5 cells in at least half of the samples.
## Top differential features (across nodes)
nodeResult(ds_res, type = "DS")
#>            logFC   logCPM        F       PValue          FDR node feature
#> 3       1.646829 16.31850 2562.620 6.829409e-38 2.595176e-35   11       3
#> 1...2   1.623029 16.28457 2288.515 6.329879e-37 1.202677e-34   11       1
#> 6...3  -1.566619 16.27571 2061.142 4.946102e-36 6.265063e-34   11       6
#> 5...4  -1.529440 16.26806 1939.114 1.637568e-35 1.555689e-33   11       5
#> 6...5  -1.633327 16.19005 1681.250 2.675315e-34 2.033239e-32   13       6
#> 5...6  -1.577228 16.18364 1620.074 5.519353e-34 3.495590e-32   13       5
#> 2       1.537199 16.20796 1360.382 1.661333e-32 9.018666e-31   13       2
#> 4      -1.612205 16.25077 1273.402 5.995628e-32 2.847923e-30   13       4
#> 1...9   1.534617 16.23117 1231.883 1.140403e-31 4.815033e-30   13       1
#> 1...10  1.582595 16.32083 1059.818 2.092353e-30 7.950943e-29   17       1