Scale transcript coverages by abundance estimate

This function takes a list of predicted transcript coverage profiles (from predictTxCoverage) and corresponding transcript abundance estimates (from, e.g., Salmon, kallisto, StringTie, RSEM, or any other transcript abundance estimation software), and scales the coverage profiles by the abundance estimates in order to determine the predicted number of reads covering each position in the transcript. In addition, it extracts the predicted number of reads spanning each junction in each transcript, and sums up these for each annotated junction, across all transcripts containing the junction.

scaleTxCoverages(txCoverageProfiles, txQuants, tx2gene, strandSpecific,
  methodName, verbose = FALSE)

Arguments

txCoverageProfiles: A list of predicted transcript coverage profiles (output from predictTxCoverage).
txQuants: A data.frame with estimated transcript abundances. Must contain columns named transcript, TPM and count, all other columns will be ignored.
tx2gene: A data.frame with transcript-to-gene mapping. Must have at least two columns, named tx and gene.
strandSpecific: Logical, whether or not the data is strand-specific. Determines whether or not the strand of a junction should be taken into account when the coverages are summed up across transcripts.
methodName: Name to use for the abundance estimation method in the output table.
verbose: Logical, whether to print progress messages.

Value

A list with two elements:

junctionPredCovs:: A tibble with the predicted coverage for each junction, scaled by the estimated transcript abundances and summarized across all transcripts.
txQuants:: A tibble with estimated transcript abundances, including information about whether or not the coverage profile could be predicted ('covOK') or if a uniform coverage was imposed ('covError' or 'covNA').

References

Soneson C, Love MI, Patro R, Hussain S, Malhotra D, Robinson MD: A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs. bioRxiv doi:10.1101/378539 (2018)

Author

Charlotte Soneson

Examples

if (FALSE) { # \dontrun{
gtf <- system.file("extdata/Homo_sapiens.GRCh38.90.chr22.gtf.gz",
                   package = "jcc")
bam <- system.file("extdata/reads.chr22.bam", package = "jcc")
biasMod <- fitAlpineBiasModel(gtf = gtf, bam = bam,
                              organism = "Homo_sapiens",
                              genome = Hsapiens, genomeVersion = "GRCh38",
                              version = 90, minLength = 230,
                              maxLength = 7000, minCount = 10,
                              maxCount = 10000, subsample = TRUE,
                              nbrSubsample = 30, seed = 1, minSize = NULL,
                              maxSize = 220, verbose = TRUE)
tx2gene <- readRDS(system.file("extdata/tx2gene.sub.rds", package = "jcc"))
predCovProfiles <- predictTxCoverage(biasModel = biasMod$biasModel,
                                     exonsByTx = biasMod$exonsByTx,
                                     bam = bam, tx2gene = tx2gene,
                                     genome = Hsapiens,
                                     genes = c("ENSG00000070371",
                                               "ENSG00000093010"),
                                     nCores = 1, verbose = TRUE)
txQuants <- readRDS(system.file("extdata/quant.sub.rds", package = "jcc"))
txsc <- scaleTxCoverages(txCoverageProfiles = predCovProfiles,
                         txQuants = txQuants, tx2gene = tx2gene,
                         strandSpecific = TRUE, methodName = "Salmon",
                         verbose = TRUE)
} # }