Table 1.
Comparison of Clustering Accuracy between the K-Means Algorithm and the t-Mixture Model in Making Deterministic Genotype Calls[Note]
% Miscalls for Scenario |
|||
Algorithm/Model | LowAmbiguity | MediumAmbiguity | HighAmbiguity |
K-means | 9.59 | 8.82 | 8.82 |
t-mixture | .03 | .30 | .72 |
Note.— For comparison purposes, we generated 100 data sets for each of low, medium, or high ambiguity scenarios. In each data set, the Gaussian mixture model was used in generating 100 data points forming three genotype clusters. For each algorithm, the percentage (%) of miscalls was defined as number of miscalls / total genotype calls.