Generate a .Rmd file containing code to perform differential expression analysis with length normalized counts + SVA + limma — lengthNorm.sva.limma.createRmd • compcodeR

A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying a length normalizing transformation, followed by a surrogate variable analysis (SVA), and then a differential expression analysis with limma. The code is written to a .Rmd file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp function.

lengthNorm.sva.limma.createRmd(
  data.path,
  result.path,
  codefile,
  norm.method,
  extra.design.covariates = NULL,
  length.normalization = "RPKM",
  data.transformation = "log2",
  trend = FALSE,
  n.sv = "auto"
)

Arguments

data.path: The path to a .rds file containing the phyloCompData object that will be used for the differential expression analysis.
result.path: The path to the file where the result object will be saved.
codefile: The path to the file where the code will be written.
norm.method: The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the calcNormFactors of the edgeR package. Possible values are "TMM", "RLE", "upperquartile" and "none"
extra.design.covariates: A vector containing the names of extra control variables to be passed to the design matrix of limma. All the covariates need to be a column of the sample.annotations data frame from the phyloCompData object, with a matching column name. The covariates can be a numeric vector, or a factor. Note that "condition" factor column is always included, and should not be added here. See Details.
length.normalization: one of "none" (no length correction), "TPM", or "RPKM" (default). See details.
data.transformation: one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data.
trend: should an intensity-trend be allowed for the prior variance? Default to FALSE.
n.sv: The number of surrogate variables to estimate (see sva). Default to "auto": will be estimated with num.sv.

Value

The function generates a .Rmd file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr package.

Details

For more information about the methods and the interpretation of the parameters, see the sva and limma packages and the corresponding publications.

See the details section of lengthNorm.limma.createRmd for details on the normalization and the extra design covariates.

References

Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.

Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.

Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.

Leek JT, Johnson WE, Parker HS, Jaffe AE, and Storey JD. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics DOI:10.1093/bioinformatics/bts034

Author

Charlotte Soneson, Paul Bastide, Mélina Gallopin

Examples

try(
if (require(limma) && require(sva)) {
tmpdir <- normalizePath(tempdir(), winslash = "/")
## Simulate data
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, 
                                    samples.per.cond = 5, n.diffexp = 100, 
                                    id.species = factor(1:10),
                                    lengths.relmeans = rpois(1000, 1000),
                                    lengths.dispersions = rgamma(1000, 1, 1),
                                    output.file = file.path(tmpdir, "mydata.rds"))
## Add covariates
## Model fitted is count.matrix ~ condition + test_factor + test_reg
sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5))
sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1)
saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds"))
## Diff Exp
runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "lengthNorm.sva.limma", 
           Rmdfunction = "lengthNorm.sva.limma.createRmd", 
           output.directory = tmpdir, norm.method = "TMM",
           extra.design.covariates = c("test_factor", "test_reg"))
})
#> Loading required package: sva
#> Loading required package: mgcv
#> Loading required package: nlme
#> 
#> Attaching package: ‘nlme’
#> The following object is masked from ‘package:IRanges’:
#> 
#>     collapse
#> This is mgcv 1.9-3. For overview type 'help("mgcv-package")'.
#> Loading required package: genefilter
#> 
#> Attaching package: ‘genefilter’
#> The following objects are masked from ‘package:ROC’:
#> 
#>     AUC, pAUC
#> The following objects are masked from ‘package:MatrixGenerics’:
#> 
#>     rowSds, rowVars
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     rowSds, rowVars
#> Error in sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2,  : 
#>   could not find function "sample.annotations"