Abstract

alevinQC reads output files from alevin and generates summary reports.

Introduction

The purpose of the alevinQC package is to generate a summary QC report based on the output of an alevin (Srivastava et al. 2019) run. The QC report can be generated as a html or pdf file, or launched as a shiny application.

Installation

alevinQC can be installed using the BiocManager CRAN package.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("alevinQC")

After installation, load the package into the R session.

Note that in order to process output from Salmon v0.14 or later, you need Alevin v1.1 or later.

Assumed output directory structure

For more information about running alevin, we refer to the documentation. When invoked, alevin generates several output files in the specified output directory. alevinQC assumes that this structure is retained, and will return an error if it isn’t - thus, it is not recommended to move or rename the output files from alevin. alevinQC assumes that the following files (in the indicated structure) are available in the provided baseDir (note that currently, in order to generate the full set of files, alevin must be invoked with the --dumpFeatures flag).

For alevin versions before 0.14:

baseDir
  |--alevin
  |    |--featureDump.txt
  |    |--filtered_cb_frequency.txt
  |    |--MappedUmi.txt
  |    |--quants_mat_cols.txt
  |    |--quants_mat_rows.txt
  |    |--quants_mat.gz
  |    |--raw_cb_frequency.txt
  |    |--whitelist.txt
  |--aux_info
  |    |--meta_info.json
  |--cmd_info.json

For alevin version 0.14 and later:

baseDir
  |--alevin
  |    |--featureDump.txt
  |    |--raw_cb_frequency.txt
  |    |--whitelist.txt  (depending on how alevin was run)
  |--aux_info
  |    |--meta_info.json
  |    |--alevin_meta_info.json
  |--cmd_info.json

Check that all required alevin files are available

The report generation functions (see below) will check that all the required files are available in the provided base directory. However, you can also call the function checkAlevinInputFiles() to run the check manually. If one or more files are missing, the function will raise an error indicating the missing file(s).

baseDir <- system.file("extdata/alevin_example_v0.14", package = "alevinQC")
checkAlevinInputFiles(baseDir = baseDir)
#> [1] "v0.14"

Generate QC report

The alevinQCReport() function generates the QC report from the alevin output. Depending on the file extension of the outputFile argument, and the value of outputFormat, the function can generate either an html report or a pdf report.

outputDir <- tempdir()
alevinQCReport(baseDir = baseDir, sampleId = "testSample", 
               outputFile = "alevinReport.html", 
               outputFormat = "html_document",
               outputDir = outputDir, forceOverwrite = TRUE)

Create shiny app

In addition to static reports, alevinQC can also generate a shiny application, containing the same summary figures as the pdf and html reports.

app <- alevinQCShiny(baseDir = baseDir, sampleId = "testSample")

Once created, the app can be launched using the runApp() function from the shiny package.

shiny::runApp(app)

It is possible to export the data used internally by the interactive application (in effect, the output from the internal call to readAlevinQC() or readAlevinFryQC()). To enable such export, first generate the app object as in the example above, and then assign the call to shiny::runApp() to a variable to capture the output. For example:

if (interactive()) {
    out <- shiny::runApp(app)
}

To activate the export, make sure to click the button ‘Close app’ in the top right corner in order to close the application (don’t just close the window). This will take you back to your R session, where the variable out will be populated with the data used in the app.

Generate individual plots

The individual plots included in the QC reports can also be independently generated. To do so, we must first read the alevin output into an R object.

alevin <- readAlevinQC(baseDir = baseDir)

The resulting list contains four entries:

  • cbTable: a data.frame with various inferred characteristics of the individual cell barcodes.
  • summaryTables: a list of data.frames with summary information about the full data set, the initial set of whitelisted cells and the final set of whitelisted cells, respectively.
  • versionTable: a matrix with information about the invokation of alevin.
  • type: a character scalar indicating how alevinQC interpreted the alevin output directory.
head(alevin$cbTable)
#>                 CB originalFreq ranking collapsedFreq nbrMappedUMI
#> 1 GACTGCGAGGGCATGT       121577       1        123419       104128
#> 2 GGTGCGTAGGCTACGA       110467       2        111987        93608
#> 3 ATGAGGGAGTAGTGCG       106446       3        108173        88481
#> 4 ACTGTCCTCATGCTCC       104794       4        106085        81879
#> 5 CGAACATTCTGATACG       104616       5        106072        84395
#> 6 ACTGTCCCATATGGTC        99208       6        100776        81066
#>   totalUMICount mappingRate dedupRate  MeanByMax nbrGenesAboveZero
#> 1         73312    0.843695  0.295943 0.00735194              7512
#> 2         66002    0.835883  0.294911 0.00783094              7522
#> 3         62196    0.817958  0.297069 0.00832595              7081
#> 4         57082    0.771824  0.302849 0.00619664              6956
#> 5         58547    0.795639  0.306274 0.00743685              7347
#> 6         56534    0.804418  0.302618 0.00947029              6841
#>   nbrGenesAboveMean ArborescenceCount inFinalWhiteList inFirstWhiteList
#> 1              1237           1.42034             TRUE             TRUE
#> 2              1238           1.41826             TRUE             TRUE
#> 3              1151           1.42262             TRUE             TRUE
#> 4               957           1.43441             TRUE             TRUE
#> 5              1238           1.44149             TRUE             TRUE
#> 6              1068           1.43393             TRUE             TRUE
knitr::kable(alevin$summaryTables$fullDataset)
Total number of processed reads 7197662
Number of reads with Ns 35362
Number of reads with valid cell barcode (no Ns) 7162300
Number of mapped reads 4869156
Percent mapped (of all reads) 67.65%
Number of noisy CB reads 1003624
Number of noisy UMI reads 266
Total number of observed cell barcodes 188613
knitr::kable(alevin$summaryTables$initialWhitelist)
Number of barcodes (initial whitelist) 100
Number of barcodes with quantification (initial whitelist) 100
Fraction reads in barcodes (initial whitelist) 84.64%
Mean number of reads per cell (initial whitelist) 60620
Median number of reads per cell (initial whitelist) 58132
Mean number of detected genes per cell (initial whitelist) 5163
Median number of detected genes per cell (initial whitelist) 5268
Mean UMI count per cell (initial whitelist) 33274
Median UMI count per cell (initial whitelist) 31353
knitr::kable(alevin$summaryTables$finalWhitelist)
Number of barcodes (final whitelist) 95
Number of barcodes with quantification (final whitelist) 95
Fraction reads in barcodes (final whitelist) 82.39%
Mean number of reads per cell (final whitelist) 62118
Median number of reads per cell (final whitelist) 58725
Mean number of detected genes per cell (final whitelist) 5260
Median number of detected genes per cell (final whitelist) 5343
Mean UMI count per cell (final whitelist) 34091
Median UMI count per cell (final whitelist) 32028
knitr::kable(alevin$versionTable)
Start time Thu May 30 13:06:55 2019
Salmon version 0.14.0
Index /mnt/scratch5/avi/alevin/data/mohu/salmon_index
R1file /mnt/scratch5/avi/alevin/data/10x/v2/mohu/100/all_bcs.fq
R2file /mnt/scratch5/avi/alevin/data/10x/v2/mohu/100/all_reads.fq
tgMap /mnt/scratch5/avi/alevin/data/mohu/gtf/txp2gene.tsv
Library type ISR

The plots can now be generated using the dedicated plotting functions provided with alevinQC (see the help file for the respective function for more information).

plotAlevinKneeRaw(alevin$cbTable)

plotAlevinQuant(alevin$cbTable)

plotAlevinKneeNbrGenes(alevin$cbTable)

Session info

sessionInfo()
#> R Under development (unstable) (2024-10-31 r87283)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: UTC
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] alevinQC_1.23.0  BiocStyle_2.35.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] tximport_1.35.0      sass_0.4.9           utf8_1.2.4          
#>  [4] generics_0.1.3       tidyr_1.3.1          digest_0.6.37       
#>  [7] magrittr_2.0.3       evaluate_1.0.1       grid_4.5.0          
#> [10] RColorBrewer_1.1-3   bookdown_0.41        fastmap_1.2.0       
#> [13] plyr_1.8.9           jsonlite_1.8.9       promises_1.3.0      
#> [16] BiocManager_1.30.25  GGally_2.2.1         purrr_1.0.2         
#> [19] fansi_1.0.6          crosstalk_1.2.1      scales_1.3.0        
#> [22] shinydashboard_0.7.2 textshaping_0.4.0    jquerylib_0.1.4     
#> [25] cli_3.6.3            shiny_1.9.1          rlang_1.1.4         
#> [28] cowplot_1.1.3        munsell_0.5.1        withr_3.0.2         
#> [31] cachem_1.1.0         yaml_2.3.10          tools_4.5.0         
#> [34] dplyr_1.1.4          colorspace_2.1-1     ggplot2_3.5.1       
#> [37] httpuv_1.6.15        DT_0.33              ggstats_0.7.0       
#> [40] mime_0.12            vctrs_0.6.5          R6_2.5.1            
#> [43] lifecycle_1.0.4      fs_1.6.5             htmlwidgets_1.6.4   
#> [46] fontawesome_0.5.2    ragg_1.3.3           pkgconfig_2.0.3     
#> [49] desc_1.4.3           later_1.3.2          pkgdown_2.1.1.9000  
#> [52] pillar_1.9.0         bslib_0.8.0          gtable_0.3.6        
#> [55] glue_1.8.0           Rcpp_1.0.13          systemfonts_1.1.0   
#> [58] highr_0.11           xfun_0.49            tibble_3.2.1        
#> [61] tidyselect_1.2.1     knitr_1.48           farver_2.1.2        
#> [64] xtable_1.8-4         rjson_0.2.23         htmltools_0.5.8.1   
#> [67] labeling_0.4.3       rmarkdown_2.28       compiler_4.5.0

References

Srivastava, Avi, Laraib Malik, Tom Sean Smith, Ian Sudbery, and Rob Patro. 2019. “Alevin Efficiently Estimates Accurate Gene Abundances from dscRNA-seq Data.” Genome Biology 20: 65.