Fit fragment bias model with alpine — fitAlpineBiasModel • jcc

This function provides a wrapper around some of the functions from the alpine package. Given a gtf file and a bam file with reads aligned to the genome, it will find single-isoform genes (with lengths and expression levels within given ranges) and use the observed read coverages to fit a fragment bias model.

fitAlpineBiasModel(gtf, bam, organism, genome, genomeVersion, version,
  minLength = 600, maxLength = 7000, minCount = 500, maxCount = 10000,
  subsample = TRUE, nbrSubsample = 200, seed = 1, minSize = NULL,
  maxSize = NULL, verbose = FALSE)

Arguments

gtf: Path to gtf file with genomic features. Preferably in Ensembl format.
bam: Path to bam file with read alignments to the genome.
organism: The organism (e.g., 'Homo_sapiens'). This argument will be passed to ensembldb::ensDbFromGtf.
genome: A BSgenome object.
genomeVersion: Genome version (e.g., 'GRCh38'). This argument will be passed to ensembldb::ensDbFromGtf.
version: The version of the reference annotation (e.g., 90). This argument will be passed to ensembldb::ensDbFromGtf.
minLength, maxLength: Minimum and maximum length of single-isoform genes used to fit fragment bias model.
minCount, maxCount: Minimum and maximum read coverage of single-isoform genes used to fit fragment bias model.
subsample: Whether to subsample the set of single-isoform genes satisfying the minLength, maxLength, minCount and maxCount criteria before fitting the fragment bias model.
nbrSubsample: If subsample is TRUE, the number of genes to subsample.
seed: If subsample is TRUE, the random seed to use to ensure reproducibility.
minSize, maxSize: Smallest and largest fragment size to consider. One or both of these can be NULL, in which case it is estimated as the 2.5 or 97.5 percentile, respectively, of estimated fragment sizes in the provided data.
verbose: Logical, whether to print progress messages.

Value

A list with three elements:

biasModel:: The fitted fragment bias model.
exonsByTx:: A GRangesList object with exons grouped by transcript.
transcripts:: A GRanges object with all the reference transcripts.

References

Soneson C, Love MI, Patro R, Hussain S, Malhotra D, Robinson MD: A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs. bioRxiv doi:10.1101/378539 (2018).

Love MI, Hogenesch JB, Irizarry RA: Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology 34(12):1287-1291 (2016).

Author

Charlotte Soneson, Michael I Love

Examples

if (FALSE) { # \dontrun{
gtf <- system.file("extdata/Homo_sapiens.GRCh38.90.chr22.gtf.gz",
                   package = "jcc")
bam <- system.file("extdata/reads.chr22.bam", package = "jcc")
biasMod <- fitAlpineBiasModel(gtf = gtf, bam = bam,
                              organism = "Homo_sapiens",
                              genome = Hsapiens, genomeVersion = "GRCh38",
                              version = 90, minLength = 230,
                              maxLength = 7000, minCount = 10,
                              maxCount = 10000, subsample = TRUE,
                              nbrSubsample = 30, seed = 1, minSize = NULL,
                              maxSize = 220, verbose = TRUE)
} # }