data_transform.Rmd
Data transformation in IBRAP incorporates multiple steps from packages: - counts normalisation which redistributes data to fit a normal distribution ready for downstream statistical analyses. - most genes in the dataset do not exhibit much variation and thus facilitate a fake closeness between cells and wastes computational resources. Therefore, highly variable genes are identified and subset in decreasing order to a cut-off. Default = 1500 - scaling is also applied as a pre-requisite for reduction methods such as, PCA. Scaling redistributes the data within a scale, i.e. -1-1.
Scanpy
Scanpy can either be log1 or log2 transformed which alters the result. By default scanpy uses log1. Next, either seurat v2, seurat v3, or cellranger style gene selection can be applied. Finally, data is scaled.
SCTransform
Seurat developers originally utilised a fairly basic normalisation method. Recently, they released SCTransform which includes general linear modelling to optimise negative binomial regression of data. This algorithm included normalisation, gene selection, and scaling all in one function noromally.
Scran
Scran pools similar cells together to produce a scaling factor that is applied to the pools, this enables a more cell type-specific normalisaton. A similar variance analysis identifes genes and seurats scaling function is applied.
All of the prior methods provide a different methodology for normalising end-bias data. However, full-length data is still presented in scRNA-seq studies.