R/runComparison.R
runComparison.Rd
The main function for performing comparisons among differential expression methods and generating a report in HTML format. It is assumed that all differential expression results have been generated in advance (using e.g. the function runDiffExp
) and that the result compData
object for each data set and each differential expression method is saved separately in files with the extension .rds
. Note that the function can also be called via the runComparisonGUI
function, which lets the user set parameters and select input files using a graphical user interface.
runComparison(
file.table,
parameters,
output.directory,
check.table = TRUE,
out.width = NULL,
save.result.table = FALSE,
knit.results = TRUE
)
A data frame with at least a column input.files
, potentially also columns named datasets
, nbr.samples
, repl
and de.methods
.
A list containing parameters for the comparison study. The following entries are supported, and used by different comparison methods:
incl.nbr.samples
An array with sample sizes (number of samples per condition) to consider in the comparison. If set to NULL
, all sample sizes will be included.
incl.dataset
A dataset name (corresponding to the dataset
slot of the results or data objects), indicating the dataset that will be used for the comparison. Only one dataset can be chosen.
incl.replicates
An array with replicate numbers to consider in the comparison. If set to NULL
, all replicates will be included.
incl.de.methods
An array with differential expression methods to be compared. If set to NULL
, all differential expression methods will be included.
fdr.threshold
The adjusted p-value threshold for FDR calculations. Default 0.05.
tpr.threshold
The adjusted p-value threshold for TPR calculations. Default 0.05.
mcc.threshold
The adjusted p-value threshold for MCC calculations. Default 0.05.
typeI.threshold
The nominal p-value threshold for type I error calculations. Default 0.05.
fdc.maxvar
The maximal number of variables to include in false discovery curve plots. Default 1500.
overlap.threshold
The adjusted p-value for overlap analysis. Default 0.05.
fracsign.threshold
The adjusted p-value for calculation of the fraction/number of genes called significant. Default 0.05.
nbrtpfp.threshold
The adjusted p-value for calculation of the number of TP, FP, TN, FN genes. Default 0.05.
ma.threshold
The adjusted p-value threshold for coloring genes in MA plots. Default 0.05.
signal.measure
Either 'mean'
or 'snr'
, determining how to define the signal strength for a gene which is expressed in only one condition.
upper.limits,lower.limits
Lists that can be used to manually set the upper and lower plot limits for boxplots of fdr, tpr, auc, mcc, fracsign, nbrtpfp and typeIerror.
comparisons
Array containing the comparison methods to be applied. The entries must be chosen among the following abbreviations:
"auc"
- Compute the area under the ROC curve
"mcc"
- Compute Matthew's correlation coefficient
"tpr"
- Compute the true positive rate at a given adjusted p-value threshold (tpr.threshold
)
"fdr"
- Compute the false discovery rate at a given adjusted p-value threshold (fdr.threshold
)
"fdrvsexpr"
- Compute the false discovery rate as a function of the expression level.
"typeIerror"
- Compute the type I error rate at a given nominal p-value threshold (typeI.threshold
)
"fracsign"
- Compute the fraction of genes called significant at a given adjusted p-value threshold (fracsign.threshold
).
"nbrsign"
- Compute the number of genes called significant at a given adjusted p-value threshold (fracsign.threshold
).
"nbrtpfp"
- Compute the number of true positives, false positives, true negatives and false negatives at a given adjusted p-value threshold (nbrtpfp.threshold
).
"maplot"
- Construct MA plots, depicting the average expression level and the log fold change for the genes and indicating the genes called differential expressed at a given adjusted p-value threshold (ma.threshold
).
"fdcurvesall"
- Construct false discovery curves for each of the included replicates.
"fdcurvesone"
- Construct false discovery curves for a single replicate only
"rocall"
- Construct ROC curves for each of the included replicates
"rocone"
- Construct ROC curves for a single replicate only
"overlap"
- Compute the overlap between collections of genes called differentially expressed by the different methods at a given adjusted p-value threshold (overlap.threshold
)
"sorensen"
- Compute the Sorensen index, quantifying the overlap between collections of genes called differentially expressed by the different methods, at a given adjusted p-value threshold (overlap.threshold
)
"correlation"
- Compute the Spearman correlation between gene scores assigned by different methods
"scorevsoutlier"
- Visualize the distribution of the gene scores as a function of the number of outlier counts introduced for the genes
"scorevsexpr"
- Visualize the gene scores as a function of the average expression level of the genes
"scorevssignal"
- Visualize the gene score as a function of the 'signal strength' (see the signal.measure
parameter above) for genes that are expressed in only one condition
The directory where the results should be written. The subdirectory structure will be created automatically. If the directory already exists, it will be overwritten.
Logical, should the input table be checked for consistency. Default TRUE
.
The width of the figures in the final report. Will be passed on to knitr
when the HTML is generated.
Logical, should the intermediate result table be saved for future use ? Default to FALSE
.
Logical, should the Rmd be generated and knitted ? Default to TRUE
. If FALSE
, no comparison report is generated, and only the intermediate result table is saved (if save.result.table=TRUE
).
If knit.results=TRUE
, the function will create a comparison report, named compcodeR_report<timestamp>.html, in the output.directory
. It will also create subfolders named compcodeR_code
and compcodeR_figure
, where the code used to perform the differential expression analysis and the figures contained in the report, respectively, will be stored. Note that if these directories already exists, they will be overwritten.
If save.result.table=TRUE
, the function will also create a comparison report, named compcodeR_result_table_<timestamp>.rds in the output.directory
, containing the result table.
The input to runComparison
is a data frame with at least a column named input.files
, containing paths to .rds
files containing result objects (of the class compData
), such as those generated by runDiffExp
. Other columns that can be included in the data frame are datasets
, nbr.samples
, repl
and de.methods
. They have to match the information contained in the corresponding result objects. If these columns are not present, they will be added to the data frame automatically.
tmpdir <- normalizePath(tempdir(), winslash = "/")
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000,
samples.per.cond = 5, n.diffexp = 100,
output.file = file.path(tmpdir, "mydata.rds"))
runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "voom.limma",
Rmdfunction = "voom.limma.createRmd", output.directory = tmpdir,
norm.method = "TMM")
#>
#>
#> processing file: /private/var/folders/xz/lz5thm6s3vb1vdkk77vmkgbr0000gn/T/RtmpDwlx4a/tempcode21d046b1670c.Rmd
#> 1/2
#> 2/2 [unnamed-chunk-1]
#> output file: /private/var/folders/xz/lz5thm6s3vb1vdkk77vmkgbr0000gn/T/RtmpDwlx4a/tempcode21d046b1670c.md
#> [1] TRUE
runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "edgeR.exact",
Rmdfunction = "edgeR.exact.createRmd", output.directory = tmpdir,
norm.method = "TMM",
trend.method = "movingave", disp.type = "tagwise")
#>
#>
#> processing file: /private/var/folders/xz/lz5thm6s3vb1vdkk77vmkgbr0000gn/T/RtmpDwlx4a/tempcode21d02baf3853.Rmd
#> 1/2
#> 2/2 [unnamed-chunk-1]
#> output file: /private/var/folders/xz/lz5thm6s3vb1vdkk77vmkgbr0000gn/T/RtmpDwlx4a/tempcode21d02baf3853.md
#> [1] TRUE
file.table <- data.frame(input.files = file.path(tmpdir,
c("mydata_voom.limma.rds", "mydata_edgeR.exact.rds")),
stringsAsFactors = FALSE)
parameters <- list(incl.nbr.samples = 5, incl.replicates = 1, incl.dataset = "mydata",
incl.de.methods = NULL,
fdr.threshold = 0.05, tpr.threshold = 0.05, typeI.threshold = 0.05,
ma.threshold = 0.05, fdc.maxvar = 1500, overlap.threshold = 0.05,
fracsign.threshold = 0.05, mcc.threshold = 0.05,
nbrtpfp.threshold = 0.05,
comparisons = c("auc", "fdr", "tpr", "ma", "correlation"))
if (interactive()) {
runComparison(file.table = file.table, parameters = parameters, output.directory = tmpdir)
}