This function provides a wrapper around some of the functions from the alpine package. Given a gtf file and a bam file with reads aligned to the genome, it will find single-isoform genes (with lengths and expression levels within given ranges) and use the observed read coverages to fit a fragment bias model.

fitAlpineBiasModel(gtf, bam, organism, genome, genomeVersion, version,
  minLength = 600, maxLength = 7000, minCount = 500, maxCount = 10000,
  subsample = TRUE, nbrSubsample = 200, seed = 1, minSize = NULL,
  maxSize = NULL, verbose = FALSE)

Arguments

gtf

Path to gtf file with genomic features. Preferably in Ensembl format.

bam

Path to bam file with read alignments to the genome.

organism

The organism (e.g., 'Homo_sapiens'). This argument will be passed to ensembldb::ensDbFromGtf.

genome

A BSgenome object.

genomeVersion

Genome version (e.g., 'GRCh38'). This argument will be passed to ensembldb::ensDbFromGtf.

version

The version of the reference annotation (e.g., 90). This argument will be passed to ensembldb::ensDbFromGtf.

minLength, maxLength

Minimum and maximum length of single-isoform genes used to fit fragment bias model.

minCount, maxCount

Minimum and maximum read coverage of single-isoform genes used to fit fragment bias model.

subsample

Whether to subsample the set of single-isoform genes satisfying the minLength, maxLength, minCount and maxCount criteria before fitting the fragment bias model.

nbrSubsample

If subsample is TRUE, the number of genes to subsample.

seed

If subsample is TRUE, the random seed to use to ensure reproducibility.

minSize, maxSize

Smallest and largest fragment size to consider. One or both of these can be NULL, in which case it is estimated as the 2.5 or 97.5 percentile, respectively, of estimated fragment sizes in the provided data.

verbose

Logical, whether to print progress messages.

Value

A list with three elements:

biasModel:

The fitted fragment bias model.

exonsByTx:

A GRangesList object with exons grouped by transcript.

transcripts:

A GRanges object with all the reference transcripts.

References

Soneson C, Love MI, Patro R, Hussain S, Malhotra D, Robinson MD: A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs. bioRxiv doi:10.1101/378539 (2018).

Love MI, Hogenesch JB, Irizarry RA: Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology 34(12):1287-1291 (2016).

Author

Charlotte Soneson, Michael I Love

Examples

if (FALSE) { # \dontrun{
gtf <- system.file("extdata/Homo_sapiens.GRCh38.90.chr22.gtf.gz",
                   package = "jcc")
bam <- system.file("extdata/reads.chr22.bam", package = "jcc")
biasMod <- fitAlpineBiasModel(gtf = gtf, bam = bam,
                              organism = "Homo_sapiens",
                              genome = Hsapiens, genomeVersion = "GRCh38",
                              version = 90, minLength = 230,
                              maxLength = 7000, minCount = 10,
                              maxCount = 10000, subsample = TRUE,
                              nbrSubsample = 30, seed = 1, minSize = NULL,
                              maxSize = 220, verbose = TRUE)
} # }