This vignette illustrates the basic principles of performing RNA velocity analysis in R, using the velociraptor package, which provides a wrapper around the scVelo python package.
We will use an example data set from Hermann et al. (2018), for which spliced and unspliced counts are available via the scRNAseq package. See here for a record of how the data was processed to get to the SingleCellExperiment
object from the raw FASTQ files. Of course, in practice you will typically need to assemble the SingleCellExperiment
object based on the output from a quantification tool. It is worth noting that there are many R packages that are useful for reading raw data from various quantification tools directly into a SummarizedExperiment
or SingleCellExperiment
object; e.g. fishpond, tximeta, BUSpaRse and DropletUtils.
We start by loading the data set and performing some initial “standard” analysis steps, such as normalizing the data, finding highly variable genes, and performing dimensionality reduction.
suppressPackageStartupMessages({
library(scRNAseq)
})
## Load data set
<- HermannSpermatogenesisData()
hermann #> snapshotDate(): 2021-10-19
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
## Add a 'counts' assay for easier downstream analysis
## (here equal to the 'spliced' assay)
assay(hermann, "counts") <- assay(hermann, "spliced")
hermann#> class: SingleCellExperiment
#> dim: 54448 2325
#> metadata(0):
#> assays(3): spliced unspliced counts
#> rownames(54448): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
#> ENSMUSG00000064369.1 ENSMUSG00000064372.1
#> rowData names(0):
#> colnames(2325): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... ATCCACCCACCACCAG
#> ATTGGTGGTTACCGAT
#> colData names(1): celltype
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
suppressPackageStartupMessages({
library(scuttle)
library(scater)
library(scran)
})
## Calculate log-normalized counts
<- scuttle::logNormCounts(hermann)
hermann
## Find highly variable genes
<- scran::modelGeneVar(hermann)
dec #> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
<- scran::getTopHVGs(dec, n = 2000)
top.hvgs
## Perform dimensionality reduction
set.seed(100)
<- scater::runPCA(hermann, subset_row = top.hvgs)
hermann <- scater::runTSNE(hermann, dimred = "PCA")
hermann
## Plot tSNE representation
::plotTSNE(hermann, colour_by = "celltype") scater
In this tutorial, we will use the velociraptor R/Bioconductor package to run the RNA velocity analysis. This package provides a wrapper around the scVelo python package, and lets you run most of the essential analysis steps directly from your R session. However, in other situations you may want to use scVelo or other python packages directly, despite performing the initial analysis in R. A large number of tools from the python-based single-cell ecosystem are based around the so callled AnnData
format, which is a type of python object that can hold largely similar types of information as SingleCellExperiment
or Seurat
objects. In order to use our preprocessed data in such python tools, we first need to convert our SingleCellExperiment
to AnnData
format. The zellkonverter package provides a convenient way to convert between these two formats. To illustrate how it works, let us convert the SingleCellExperiment
object to AnnData
format and save it to a file that can be directly read into python.
suppressPackageStartupMessages({
library(zellkonverter)
})::writeH5AD(hermann, file = "Hermann_AnnData.h5ad",
zellkonverterX_name = "counts")
In a python session, we can now read this object back (assuming that the anndata
python module is installed), and continue with any downstream analysis. Note that the ‘main’ assay (specified by the X_name
argument to writeH5AD()
is stored in a slot named X
in the AnnData
object, and all other assays are added as additional layers. Reduced dimension representations (obsm
) and cell annotations (obs
) are also converted. More information about the AnnData
format can be found in the documentation.
## This chunk is run in a python session
import anndata
= anndata.read("Hermann_AnnData.h5ad")
adata
adata#> AnnData object with n_obs × n_vars = 2325 × 54448
#> obs: 'celltype', 'sizeFactor'
#> uns: 'X_name'
#> obsm: 'PCA', 'TSNE'
#> layers: 'logcounts', 'spliced', 'unspliced'
zellkonverter can also be used to read an .h5ad
file (e.g., that was saved from python) into an R session:
<- zellkonverter::readH5AD("Hermann_AnnData.h5ad")
sce #> ℹ Using stored X_name value 'counts'
sce#> class: SingleCellExperiment
#> dim: 54448 2325
#> metadata(0):
#> assays(4): counts logcounts spliced unspliced
#> rownames(54448): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
#> ENSMUSG00000064369.1 ENSMUSG00000064372.1
#> rowData names(0):
#> colnames(2325): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... ATCCACCCACCACCAG
#> ATTGGTGGTTACCGAT
#> colData names(2): celltype sizeFactor
#> reducedDimNames(2): PCA TSNE
#> mainExpName: NULL
#> altExpNames(0):
Next, we will perform the RNA velocity estimation. In velociraptor, this is done using the scvelo()
function, which provides a wrapper around the functionality of scVelo. Take a moment to study the documentation of scvelo()
- you will see for example that all the modes of scVelo (steady-state, stochastic, dynamical) can be used. In addition, the argument use.theirs
(default FALSE
) can be set to TRUE
to perform gene filtering and normalization using scVelo. If set to FALSE
, these steps will be performed in R, which can be useful in order to increase consistency with the rest of an R-based pipeline.
For our example, we will use the dynamical mode. It takes a bit longer to run than the steady state option, but typically gives more reliable results.
suppressPackageStartupMessages({
library(velociraptor)
})<- velociraptor::scvelo(hermann, subset.row = top.hvgs,
velo.out assay.X = "counts",
mode = "dynamical")
The output of scvelo()
is another SingleCellExperiment
object.
velo.out#> class: SingleCellExperiment
#> dim: 2000 2325
#> metadata(5): neighbors recover_dynamics velocity_params velocity_graph
#> velocity_graph_neg
#> assays(10): X spliced ... velocity velocity_u
#> rownames(2000): ENSMUSG00000038015.6 ENSMUSG00000022501.6 ...
#> ENSMUSG00000048787.13 ENSMUSG00000032232.14
#> rowData names(18): fit_r2 fit_alpha ... velocity_genes varm
#> colnames(2325): CCCATACTCCGAAGAG AATCCAGTCATCTGCC ... ATCCACCCACCACCAG
#> ATTGGTGGTTACCGAT
#> colData names(8): velocity_self_transition root_cells ...
#> velocity_confidence velocity_confidence_transition
#> reducedDimNames(1): X_pca
#> mainExpName: NULL
#> altExpNames(0):
The scvelo()
function performs the full RNA velocity estimation analysis, and in the process calls several functions from scVelo. Wd can follow the executed steps via the messages printed in the console.
The first step is to estimate moments (mean and uncentered variance) of the abundances for each cell. The moments are calculated across a small set of neighboring cells (in the PCA space), for increased stability (think of it as a kind of smoothing, or imputation). Two layers, Ms and Mu, which are the first order moments (means) for spliced and unspliced abundances, are added to the data object.
## 30 nearest neighbors (note that the nearest neighbor to each cell is the
## cell itself)
head(metadata(velo.out)$neighbors$indices)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] 0 108 162 57 207 53 145 124 109 678 297 45 349 1
#> [2,] 1 101 3 49 186 119 76 58 53 86 68 742 50 297
#> [3,] 2 387 50 15 3 285 78 187 259 470 34 154 119 1
#> [4,] 3 50 34 15 84 101 9 1 259 2 285 14 53 78
#> [5,] 4 11 267 226 114 42 172 249 442 355 51 94 317 97
#> [6,] 5 173 83 95 8 326 205 55 318 223 381 176 295 388
#> [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#> [1,] 58 68 86 9 84 37 93 3 531 365 187 101
#> [2,] 365 91 600 124 93 45 109 207 2 15 108 162
#> [3,] 990 410 390 109 681 91 548 68 359 186 426 182
#> [4,] 93 91 387 49 123 86 45 109 496 119 154 76
#> [5,] 719 616 423 321 16 543 126 112 128 465 48 276
#> [6,] 21 253 418 304 227 375 369 88 272 762 431 773
#> [,27] [,28] [,29] [,30]
#> [1,] 91 135 475 49
#> [2,] 154 145 393 475
#> [3,] 496 9 76 84
#> [4,] 58 186 37 990
#> [5,] 13 685 706 417
#> [6,] 342 415 453 339
## Note Ms and Mu assays
assayNames(velo.out)
#> [1] "X" "spliced" "unspliced" "Ms" "Mu"
#> [6] "fit_t" "fit_tau" "fit_tau_" "velocity" "velocity_u"
This step fits the model and infers transcription rates, splicing rates and degradation rates for each gene, as well as cell-specific latent times and transcriptional states. scVelo uses an EM algorithm for the estimation, iterating between the E-step (where each observation is assigned a time and a state (induction, repression, or steady state)) and the M-step (where the rate parameters are estimated), as outlined in the figure below from Bergen et al. (2020).
This step is only required for the dynamical model. It adds several columns to rowData(hermann)
(see https://scvelo.readthedocs.io/DynamicalModeling.html), including:
fit_r2
). Note that this can be negative, if the obtained fit is worse than just using a straight line at the mean. This is used to determine which genes are used for the downstream analysis and projection of velocities into a low-dimensional space.fit_alpha
)fit_beta
)fit_gamma
)fit_t_
)fit_likelihood
), averaged across all cells. The likelihood value for a gene and a cell indicates how well the cell is described by the learned phase trajectory.head(rowData(velo.out)[rowData(velo.out)$velocity_genes, ])
#> DataFrame with 6 rows and 18 columns
#> fit_r2 fit_alpha fit_beta fit_gamma fit_t_
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000028693.15 0.100405 60.95574 73.767529 1.142003 17.94801
#> ENSMUSG00000087279.10 0.268690 9.27810 14.083324 0.180749 14.30887
#> ENSMUSG00000118396.2 0.696183 28.32358 0.801604 0.527865 5.17467
#> ENSMUSG00000051113.9 0.279714 24.88695 14.699699 0.333210 11.93588
#> ENSMUSG00000032601.13 0.573506 5.83376 11.455458 0.210135 14.25576
#> ENSMUSG00000024532.13 0.351638 14.16604 21.884510 0.402229 6.60905
#> fit_scaling fit_std_u fit_std_s fit_likelihood fit_u0
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000028693.15 0.0183809 0.189800 17.46369 0.594941 0
#> ENSMUSG00000087279.10 0.0199706 0.181182 12.86018 0.236457 0
#> ENSMUSG00000118396.2 0.4247320 10.735452 11.75326 0.295935 0
#> ENSMUSG00000051113.9 0.0317420 0.412822 22.53213 0.404600 0
#> ENSMUSG00000032601.13 0.0351809 0.193684 8.38194 0.307726 0
#> ENSMUSG00000024532.13 0.0298769 0.193273 11.20748 0.366759 0
#> fit_s0 fit_pval_steady fit_steady_u fit_steady_s
#> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000028693.15 0 0.498334 0.520035 49.0103
#> ENSMUSG00000087279.10 0 0.499027 0.611843 27.5735
#> ENSMUSG00000118396.2 0 0.498511 27.093951 32.0842
#> ENSMUSG00000051113.9 0 0.499705 1.237777 65.3373
#> ENSMUSG00000032601.13 0 0.496976 0.482520 18.6365
#> ENSMUSG00000024532.13 0 0.498405 0.421512 27.8004
#> fit_variance fit_alignment_scaling velocity_genes
#> <numeric> <numeric> <logical>
#> ENSMUSG00000028693.15 0.155117 1.22703 TRUE
#> ENSMUSG00000087279.10 0.826343 4.53596 TRUE
#> ENSMUSG00000118396.2 0.330992 1.25563 TRUE
#> ENSMUSG00000051113.9 0.266072 3.18262 TRUE
#> ENSMUSG00000032601.13 0.554023 3.59790 TRUE
#> ENSMUSG00000024532.13 0.296963 2.15763 TRUE
#> varm
#> <DataFrame>
#> ENSMUSG00000028693.15 757.029:656.900:459.097:...
#> ENSMUSG00000087279.10 869.516:882.060:878.076:...
#> ENSMUSG00000118396.2 669.123:520.053:524.619:...
#> ENSMUSG00000051113.9 760.118:749.991:676.861:...
#> ENSMUSG00000032601.13 516.835:519.185:520.944:...
#> ENSMUSG00000024532.13 554.709:556.350:481.511:...
Once the kinetic rate parameters are estimated, the actual velocities are estimated based on these. This adds a velocity layer to the data object, and the velocity_genes
column in the row data. This column indicates whether the fit for a gene is considered ‘good enough’ for downstream use. Specifically, it requires fit_r2
, fit_likelihood
and fit_gamma
to exceed certain thresholds, fit_scaling
to be within a certain range, and that both the unspliced and spliced mean values are nonzero (see the scVelo code for more details).
<- rowData(velo.out)
rd <- quantile(rd$fit_scaling[!is.na(rd$fit_scaling)], probs = c(0.1, 0.9))
perc table(crit = rd$fit_r2 > 0.01 &
$fit_gamma > 0.01 &
rd$fit_likelihood > 0.001 &
rd$fit_scaling > min(perc[1], 0.03) &
rd$fit_scaling < max(perc[2], 3) &
rdrowSums(assay(velo.out, "Ms")) > 0 &
rowSums(assay(velo.out, "Mu")) > 0,
vel = rd$velocity_genes, useNA = "ifany")
#> vel
#> crit FALSE TRUE
#> FALSE 1167 0
#> TRUE 0 561
#> <NA> 272 0
At this point, we have estimated the velocities - these are vectors in a \(K\)-dimensional space, where \(K\) is the number of retained genes. In order to use these velocities for downstream applications, such as estimating the future state of an individual cell or generating low-dimensional visualizations, we next estimate a so called velocity graph. To this end, scVelo calculates cosine similarities between the velocity vector for each cell and the displacement vector from that cell to each other (neighboring) cell: \[\pi_{ij}=cos\angle(\delta_{ij}, v_i),\]where \(\delta_{ij}=s_j-s_i\) is the displacement vector from cell \(i\) to cell \(j\) in gene expression space, and \(v_i\) is the velocity vector of cell \(i\). The cosine similarity takes values between -1 and +1, and a large value indicates that cell \(i\) has a high probability of transitioning towards cell \(j\) (since its velocity vector points towards cell \(j\)). The velocity graph is stored in the uns
slot of the data object (and subsequently propagated to the metadata
of the returned SingleCellExperiment
), and is represented by a sparse \(N\times N\) matrix (where \(N\) is the number of cells). There is also a velocity_graph_neg
, which is a matrix of the same size as velocity_graph
, containing the negative cosine similarities.
## Velocity graph
dim(metadata(velo.out)$velocity_graph)
#> [1] 2325 2325
metadata(velo.out)$velocity_graph[1:5, 1:5]
#> 5 x 5 sparse Matrix of class "dgTMatrix"
#>
#> [1,] . . . . .
#> [2,] 0.11193331 . . 0.01055382 .
#> [3,] 0.05770175 0.01050411 . . .
#> [4,] 0.10750668 0.04704920 0.03657299 . .
#> [5,] . . . . .
It is often of interest to obtain an ordering of the cells along a trajectory. scVelo provides two different approaches for this: pseudotime and latent time. The velocity pseudotime is obtained via a diffusion random walk on the velocity graph, and measures how many steps (on average) it takes to reach a given cell from one of the root cells. The root cells are obtained from the transition matrix. The latent time is obtained from the transcriptional dynamics fit, by relating gene-specific times (position along the phase curve) to a “universal” latent time, shared across genes. Usually, the latent time captures the actual time better than pseudotime.
head(colData(velo.out))
#> DataFrame with 6 rows and 8 columns
#> velocity_self_transition root_cells end_points
#> <numeric> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.431352 9.43462e-05 1.38203e-07
#> AATCCAGTCATCTGCC 0.356828 1.06404e-04 1.29249e-07
#> GACTGCGGTTTGACTG 0.283698 1.45943e-04 6.18349e-08
#> ACACCAATCTTCGGTC 0.392824 1.37776e-04 7.88960e-08
#> TGACAACAGGACAGAA 0.213529 1.14266e-05 4.23266e-05
#> TTGGAACAGGCGTACA 0.387442 7.31517e-05 1.43643e-04
#> velocity_pseudotime latent_time velocity_length
#> <numeric> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.01160549 0.507783 36.38
#> AATCCAGTCATCTGCC 0.02222421 0.513260 36.40
#> GACTGCGGTTTGACTG 0.02293368 0.522630 35.59
#> ACACCAATCTTCGGTC 0.00721252 0.504956 34.84
#> TGACAACAGGACAGAA 0.55268830 0.306849 41.24
#> TTGGAACAGGCGTACA 0.59105021 0.480577 41.25
#> velocity_confidence velocity_confidence_transition
#> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.946991 0.0712617
#> AATCCAGTCATCTGCC 0.945846 0.1497353
#> GACTGCGGTTTGACTG 0.951023 0.2038010
#> ACACCAATCTTCGGTC 0.946307 0.0961184
#> TGACAACAGGACAGAA 0.964340 0.3313537
#> TTGGAACAGGCGTACA 0.972804 0.1511034
Note that the results of scvelo()
are not added to the input SingleCellExperiment
object, since the two objects can have different dimensions (in particular note the difference in the number of genes). Thus, in order to use information from the velocity calculations together with information from the original object, we need to copy slots from one object to the other. Here, we copy the reduced dimension representations from the original SingleCellExperiment
object to the output of scvelo()
, and use the latter to construct tSNE representations colored by various parameters.
stopifnot(all(colnames(hermann) == colnames(velo.out)))
reducedDims(velo.out) <- reducedDims(hermann)
$celltype <- hermann$celltype
velo.out::plot_grid(
cowplot::plotTSNE(velo.out, colour_by = "celltype"),
scater::plotTSNE(velo.out, colour_by = "velocity_pseudotime"),
scater::plotTSNE(velo.out, colour_by = "latent_time"),
scaterncol = 1, align = "v"
)
As we already saw above, the rowData
and colData
of velo.out
contain the values of various parameters estimated by scvelo()
:
head(rowData(velo.out))
#> DataFrame with 6 rows and 18 columns
#> fit_r2 fit_alpha fit_beta fit_gamma fit_t_
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000038015.6 0.7308033 NaN NaN NaN NaN
#> ENSMUSG00000022501.6 -0.0100850 NaN NaN NaN NaN
#> ENSMUSG00000043050.8 0.2894763 NaN NaN NaN NaN
#> ENSMUSG00000026182.6 -0.0453711 NaN NaN NaN NaN
#> ENSMUSG00000090137.8 0.0746567 NaN NaN NaN NaN
#> ENSMUSG00000074435.10 -0.6170902 NaN NaN NaN NaN
#> fit_scaling fit_std_u fit_std_s fit_likelihood fit_u0
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000038015.6 NaN NaN NaN NaN NaN
#> ENSMUSG00000022501.6 NaN NaN NaN NaN NaN
#> ENSMUSG00000043050.8 NaN NaN NaN NaN NaN
#> ENSMUSG00000026182.6 NaN NaN NaN NaN NaN
#> ENSMUSG00000090137.8 NaN NaN NaN NaN NaN
#> ENSMUSG00000074435.10 NaN NaN NaN NaN NaN
#> fit_s0 fit_pval_steady fit_steady_u fit_steady_s
#> <numeric> <numeric> <numeric> <numeric>
#> ENSMUSG00000038015.6 NaN NaN NaN NaN
#> ENSMUSG00000022501.6 NaN NaN NaN NaN
#> ENSMUSG00000043050.8 NaN NaN NaN NaN
#> ENSMUSG00000026182.6 NaN NaN NaN NaN
#> ENSMUSG00000090137.8 NaN NaN NaN NaN
#> ENSMUSG00000074435.10 NaN NaN NaN NaN
#> fit_variance fit_alignment_scaling velocity_genes
#> <numeric> <numeric> <logical>
#> ENSMUSG00000038015.6 NaN NaN FALSE
#> ENSMUSG00000022501.6 NaN NaN FALSE
#> ENSMUSG00000043050.8 NaN NaN FALSE
#> ENSMUSG00000026182.6 NaN NaN FALSE
#> ENSMUSG00000090137.8 NaN NaN FALSE
#> ENSMUSG00000074435.10 NaN NaN FALSE
#> varm
#> <DataFrame>
#> ENSMUSG00000038015.6 NaN:NaN:NaN:...
#> ENSMUSG00000022501.6 NaN:NaN:NaN:...
#> ENSMUSG00000043050.8 NaN:NaN:NaN:...
#> ENSMUSG00000026182.6 NaN:NaN:NaN:...
#> ENSMUSG00000090137.8 NaN:NaN:NaN:...
#> ENSMUSG00000074435.10 NaN:NaN:NaN:...
head(colData(velo.out))
#> DataFrame with 6 rows and 9 columns
#> velocity_self_transition root_cells end_points
#> <numeric> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.431352 9.43462e-05 1.38203e-07
#> AATCCAGTCATCTGCC 0.356828 1.06404e-04 1.29249e-07
#> GACTGCGGTTTGACTG 0.283698 1.45943e-04 6.18349e-08
#> ACACCAATCTTCGGTC 0.392824 1.37776e-04 7.88960e-08
#> TGACAACAGGACAGAA 0.213529 1.14266e-05 4.23266e-05
#> TTGGAACAGGCGTACA 0.387442 7.31517e-05 1.43643e-04
#> velocity_pseudotime latent_time velocity_length
#> <numeric> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.01160549 0.507783 36.38
#> AATCCAGTCATCTGCC 0.02222421 0.513260 36.40
#> GACTGCGGTTTGACTG 0.02293368 0.522630 35.59
#> ACACCAATCTTCGGTC 0.00721252 0.504956 34.84
#> TGACAACAGGACAGAA 0.55268830 0.306849 41.24
#> TTGGAACAGGCGTACA 0.59105021 0.480577 41.25
#> velocity_confidence velocity_confidence_transition
#> <numeric> <numeric>
#> CCCATACTCCGAAGAG 0.946991 0.0712617
#> AATCCAGTCATCTGCC 0.945846 0.1497353
#> GACTGCGGTTTGACTG 0.951023 0.2038010
#> ACACCAATCTTCGGTC 0.946307 0.0961184
#> TGACAACAGGACAGAA 0.964340 0.3313537
#> TTGGAACAGGCGTACA 0.972804 0.1511034
#> celltype
#> <character>
#> CCCATACTCCGAAGAG DIplotene/Secondary ..
#> AATCCAGTCATCTGCC DIplotene/Secondary ..
#> GACTGCGGTTTGACTG DIplotene/Secondary ..
#> ACACCAATCTTCGGTC DIplotene/Secondary ..
#> TGACAACAGGACAGAA Mid Round spermatids
#> TTGGAACAGGCGTACA Mid Round spermatids
More information about the interpretation of these parameters can be found in the scVelo documentation. Let us, for example, plot the distribution of transcription, splicing and degradation rates for all velocity genes with fit likelihood larger than 0.1:
suppressPackageStartupMessages({
library(dplyr)
})<- as.data.frame(rowData(velo.out)) %>%
plotdf ::filter(fit_likelihood > 0.1 & velocity_genes)
dplyrdim(plotdf)
#> [1] 521 38
::plot_grid(
cowplotggplot(plotdf, aes(x = fit_alpha)) + geom_histogram(bins = 50) +
theme_minimal() + scale_x_log10(),
ggplot(plotdf, aes(x = fit_beta)) + geom_histogram(bins = 50) +
theme_minimal() + scale_x_log10(),
ggplot(plotdf, aes(x = fit_gamma)) + geom_histogram(bins = 50) +
theme_minimal() + scale_x_log10(),
nrow = 1
)
The velociraptor package also provides plotting functions to visualize the RNA velocity results. For example, we can embed the estimated velocities into a provided low-dimensional space. In order to visualize the velocities in a lower-dimensional embedding, we convert the cosine similarities in the velocity graph to transition probabilities of cell-to-cell transitions by applying an exponential kernel: \[\tilde{\pi}_{ij}=\frac{1}{z_i}exp(\frac{\pi_{ij}}{\sigma_i^2}).\] The \(z_i\) are normalization factors and \(\sigma_i\) is an adaptive kernel width parameter. These transition probabilities are used to project the velocities into a low-dimensional embedding. This is achieved by weighting the normalized displacement vectors from a cell to all other cells in the low-dimensional space by the transition probabilities for cell, and taking the resulting weighted average as the low-dimensional velocity vector. More precisely, if \[\tilde{\delta}_{ij}=\frac{\tilde{s}_j-\tilde{s}_i}{\|\tilde{s}_j-\tilde{s}_i\|}\]are the normalized displacement vectors in the low-dimensional embedding, the embedded velocity is estimated by \[\tilde{v}_i=\sum_{j\neq i}\left(\tilde{\pi}_{ij} - \frac{1}{n}\right)\tilde{\delta}_{ij}.\] The \(1/n\) term is included to make the projected velocity zero when the transition probabilities represent a uniform distribution.
suppressPackageStartupMessages({
library(ggplot2)
})<- velociraptor::embedVelocity(reducedDim(velo.out, "TSNE"), velo.out)
embedded <- velociraptor::gridVectors(reducedDim(velo.out, "TSNE"), embedded)
grid.df plotTSNE(velo.out, colour_by = "velocity_pseudotime") +
geom_segment(data = grid.df,
mapping = aes(x = start.1, y = start.2,
xend = end.1, yend = end.2),
arrow = arrow(length = unit(0.05, "inches")))
Next, we will look at individual genes via their phase portraits. This is often useful in order to understand how the velocities are affected/supported by specific genes. We can of course plot genes that we are already familiar with and that we know are related to the process of interest. We can also extract genes with particularly strong influence on the velocity results. ‘Driver genes,’ which display a strong dynamic behavior, can for example be detected via their high likelihoods in the dynamic model. Here, we select the six genes with the largest likelihood value for the fit.
<- rownames(as.data.frame(rowData(velo.out)) %>%
toplh ::arrange(desc(fit_likelihood)) %>%
dplyrhead())
plotVelocity(velo.out, use.dimred = "TSNE", genes = toplh[1:3],
color_by = "celltype")
plotVelocity(velo.out, use.dimred = "TSNE", genes = toplh[4:6],
color_by = "celltype")
Finally, we will illustrate two summary statistics returned by scVelo:
::plot_grid(
cowplot::plotTSNE(velo.out, colour_by = "velocity_length"),
scater::plotTSNE(velo.out, colour_by = "velocity_confidence"),
scaterncol = 1, align = "v"
)