Review Article

A Survey of Flow Cytometry Data Analysis Methods

Table 1

Summary of survey (M: manual; Y: yes; E: embedded in gating; U: unsupervised; S: supervised; “—”: not supported, not implemented, not applicable; “ ”: same as above). Note that this table does not report Quality Assessment, Normalization, and Feature Extraction components.

PaperOutlier removalAutomated gatingLabellingInterpretation (classification/ comparison of samples)
MethodSupervised/ UnsupervisedMultidimensionalAutomated # of clusters

[45]Logical and cleaning morphological operators applied to the corresponding image representation of FCM dataLogical operation on image representation of FCM data followed by thickeningUBased on location and abundance of populations
Majority operator applied to the image representation of FCM data followed by Soble edge detectionU
Zero-degree B-Spline smoother applied to the 2-dimenisonal FCM data followed by break point detectionU
Gath-Geva fuzzy clusteringU

[30]Embedded in clustering (cluster membership weights can be used to exclude outliers)Gaussian Mixture ModelsUYM
Y
(using BIC)

[46]Embedded in clustering (cluster membership weights can be used to exclude outliers)t-Mixture ModelsUYM
Y
(using BIC)

[47]Embedded in clustering (excluding events that are far from Gaussian functions centers using a predefined cutoff value)Mahalanobis distance from centroids of multivariate Gaussian functions used for classification taskSE

[48]Multilayer perceptron (MLP)SYE

[49]Building templates for automated gating by using a cluster-finding algorithm (Beckton Dickinson’s (BD) snap-to gate algorithm)UE (initially set by operator)

[50]DKLL (an extension of the -means algorithm to allow for non-spherical clusters)UY
Fuzzy -means based on adaptive distanceUY
Fuzzy -means based on maximum likelihoodUY
Fuzzy -means based on minimum total volumeUY
Fuzzy -means based on sum of all normalized determinantsUY

[51]MMComplete linkage hierarchical clustering

[52]Comparing sample to a reference sample by probability binning algorithm

[53] -meansUYHistogram feature guided
Partition index guided

[17]Frequency difference gating approach (defines a gate(s) that contains statistically significant more events in the test sample than the control sample)1UY

[54]MLPSYE
Learning vector quantization (LVQ)SYE
Radial basis function (RBF)SYE
Asymetric RBFSYE
Classification by modeling each class with Gaussian distributionsSYE
-nearest neighbour methodSYE
Kohonen’s self organizing map (SOM)UYM

[55]Static gates applied to dataUE (initially set by operator)CLASSIF1 approach [56, 57]

[36]Building templates for automated gating by using a cluster-finding algorithm (BD Snap-to gate algorithm)UE (initially set by operator)

[43]2MMFunctional linear discriminant analysis

[58]Building templates for automated gating by using a cluster-finding algorithm (BD’s snap-to gate algorithm)UE (initially set by operator)

[59]Gaussian Mixture ModelsUMM

[60]MMAverage-linkage hierarchical clustering

[61]MMClassification based on a semantic network of knowledge base through a hierarchical tree (if-then rule mechanism)

[62] -meansUYM
Calculating modes of density function (calculated by Kernel density estimation ) followed by nearest neighbour heuristicUYM
Gaussian mixture models using Markov chain Monte Carlo (MCMC)UYM

[63]Building templates for automated gating by using a cluster-finding algorithm (BD’s snap-to gate algorithm)UE (initially set by operator)

[64]Automated gating using BD Simulset softwareMCorrelation tests using Spearman’s method

[65]Image representation of randomly selected events from a group of flow data followed by smoothing, regional maxima detection and watershed algorithm to define the gates to apply to all the dataU

[66]SOMUM
Cluster analysis with Winlist (Verity Software House, USA))U

[67]Static gates applied to data and self adjusting gates (details not mentioned) for lymphocytes, monocytes, and granulocytesUE (initially set by operator)CLASSIF1 approach [56, 57]

[68]Fcom tool (an analysis tool in Winlist (Verity Software House, USA))MAverage- linkage hierarchical clustering

[69]Static gates applied to data and self adjusting gates for lymphocytes, monocytes, and granulocytesUE (initially set by operator)CLASSIF1 approach [56, 57]

[70]MM“Professor Fidelio” (a heuristic classification system that reasons on the basis of defined diagnostic patterns [71])

[72] -means followed by Murphy’s cluster joining algorithm based on standard deviation of the data [73]UM
-means followed by a cluster joining algorithm based on modified spread of the data and modified distance between two clusters [72]UM
Preclustering a subset of the data by -means and assigning unclustered events to the closest cluster center followed by a cluster joining algorithm based on modified spread of the data and modified distance between two cluster [72]UM

[73]E (excluding the events that were more than a set number of standard deviations away from the centroids of the clusters) -means followed by Murphy’s cluster joining algorithm based on standard deviation of the dataUYM

[74]MLPSYE

[75]RBFSYE

[76]MLPSYE
SOMUYM
E (excluding the events that were more than a set number of standard deviations away from the centroids of the cluster) -meansUYM

[77]No gating—mean fluorescent intensities of antibodies were used for next stage of analysisMLP

[78]RBFSYE

[40]Histogram of one parameter of FCM data followed by MLP

[79]Classification and regression trees (CARTs)SYE

[80]Support vector machine (SVM)SYE
RBFSYE

[81]RBF using radially symmetric basis function (based on Euclidean distance)SYE
RBF using more general arbitrarily oriented ellipsoidal basis functions (based on Mahalanobis distance)SYE

[82]Gaussian mixture model clusteringUY

[83]Embedded in clustering (excluding events that are far from Gaussian functions centers using a predefined cutoff value)Mahalanobis distance from the centroids of multivariate Gaussian functions used for classification taskSE

[84]MClassification based on a shrunken centroids approach [85]
Hierarchical clustering

[41]MKernel density estimation followed by calculating differences between patients by Kulback-Leibler divergence to form a similarity matrix and then dimensionality reduction by multidimensional scaling for 2-dimensional visualization

[86]RBFSYE

[87]MHierarchical clustering
Principal component analysis (PCA) for dimensionality reduction and visualization to see if classes are separable by looking at the first few principle components

1Closely related to probability binning algorithm introduced in [52].
2This study utilizes quality assessment strategy introduced in [42] that is based on comparison of density, ECDF (empirical cumulative distribution function), box plots, and two types of bivariate plots of similar samples.