Longitudinal data clustering methods: A Systematic Review
Abstract
In the last few decades, in many research fields, different methods were introduced to discover groups with the same trends in longitudinal data. The clustering process is an unsupervised learning method, which classifies longitudinal data based on different criteria by performing algorithms. The current study was performed with the aim of reviewing various methods of longitudinal data clustering, including two general categories of non-parametric methods and model-based methods. PubMed, SCOPUS, ISI, Ovid, and Google Scholar were searched between 2000 and 2021. According to our systematic review, the non-parametric k-means Clustering Method utilizing Euclidean distance emerges as a leading approach for clustering longitudinal data This research, with an overview of the studies done in the field of clustering, can help researchers as a toolbox to choose various methods of longitudinal data clustering in idea generation and choosing the appropriate method in the classification and analysis of longitudinal data.
1. Kaur Mann NKA. Review Paper on Clustering Techniques. Global Journal of Computer Science and Technology. 2013;13:42-7.
2. Abraham C, Cornillon P, MatznerLober E, Molinari N. Unsupervised Curve Clustering using B-Splines. Scandinavian
Journal of Statistics. 2003;30(3):581-95.
3. Fitzmaurice G, Laird N, Ware J. Applied longitudinal analysis. 2 ed: Wiley; 2011.
4. James GM, Sugar CA. Clustering for Sparsely Sampled Functional Data. Journal of the American Statistical Association.
2003;98(462):397-408.
5. Rossi F, Conan-Guez B, Golli A. Clustering Functional Data with the SOM. 2004.
6. Tarpey T, Kinateder KK. Clustering functional data. Journal of classification. 2003;20(1).
7. Caliński T, Ja H. A Dendrite Method for Cluster Analysis. Communications in Statistics - Theory and Methods. 1974;3:1-27.
8. Milligan GW, Cooper MC. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 1985;50(2):159-79.
9. Yosung S, Jiwon C, In-Chan C, editors. A Comparison Study of Cluster Validity Indices Using a Nonhierarchical Clustering AlgorithmInternational Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06); 2005 28-30 Nov. 2005
10. Genolini C, Ecochard R, Benghezal M, Driss T, Andrieu S, Subtil F. kmlShape: An Efficient Method to Cluster Longitudinal Data (Time-Series) According to Their Shapes. PloS one. 2016;11(6):e0150738.
11. Delmelle EC. Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change. Annals of the American Association of Geographers. 2016;106(1):36-56.
12. Hedeker D, Gibbons RD. Longitudinal data analysis: Wiley-Interscience; 2006.
13. Morris R, Blashfield R, Satz P. Developmental classification of readingdisabled children. Journal of clinical and experimental neuropsychology. 1986;8(4):371-92.
14. Qin S, Jiao K, He J, Lyu D. Forage crops alter soil bacterial and fungal communities in an apple orchard. Acta Agriculturae Scandinavica, Section B — Soil & Plant Science. 2016;66(3):229-36.
15. Ciampi A, Campbell H, Dyachenko A, Rich B, McCusker J, Cole MG. Model-Based Clustering of Longitudinal Data: Application to Modeling Disease Course and Gene Expression Trajectories. Communications in Statistics - Simulation and Computation. 2012;41(7):992-1005.
16. Den Teuling NGP, Pauws SC, van den Heuvel ER. A comparison of methods for clustering longitudinal data with slowly changing trends. Communications in Statistics - Simulation and Computation. 2021:1-28.
17. Maruotti A, Vichi M. Time-varying clustering of multivariate longitudinal observations. Communications in StatisticsTheory and Methods. 2016;45(2):430-43.
18. Heggeseth BC. Longitudinal cluster analysis with applications to growth trajectories: University of California, Berkeley; 2013.19. Tiedeman D, editor On the study of types. Symposium on pattern analysis: Air University, USAF School of Aviation Medicine Randolph Field, TX; 1955.
20. Wolfe J. A Computer Program for the Maximum-Likelihood Analysis of Types. 1965:57-60.
21. Subedi S, Browne RP. A family of parsimonious mixtures of multivariate Poisson‐lognormal distributions for clustering multivariate count data. Stat. 2020;9(1):e310.
22. Roick T, Karlis D, McNicholas PD. Clustering discrete-valued time series. Advances in Data Analysis and Classification.
2021;15:209-29.
23. Ng TLJ, Murphy TB. Model-based clustering of count processes. Journal of Classification. 2021;38:188-211
24. Murphy K, Murphy TB, Piccarreta R, Gormley IC. Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society Series A: Statistics in Society. 2021;184(4):1414-51.
25. Karlis D. Mixture modelling of discrete data. Handbook of Mixture Analysis. 2019:193-218.
26. Bouveyron C, Celeux G, Murphy TB, Raftery AE. Model-based clustering and classification for data science: with applications in R: Cambridge University Press; 2019.
27. Salter-Townshend M, White A, Gollini I, Murphy TB. Review of statistical network analysis: models, algorithms, and software. 2012.
28. MacQueen J, editor Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; 1967.
29. Genolini C, Falissard B. KmL: a package to cluster longitudinal data. Computer methods and programs inbiomedicine.
2011;104(3):e112-21.
30. Celeux G, Govaert G. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis. 1992;14(3):315-32.
31. Laird NM, Ware JH. Random-Effects Models for Longitudinal Data. Biometrics. 1982;38(4):963-74.
32. Nagin DS, Odgers CL. Group-based trajectory modeling in clinical research. Annual review of clinical psychology. 2010;6:109-38.
33. Jain AK. Data clustering: 50 years beyond K-means. Pattern Recognition Letters. 2010;31(8):651-66.
34. Fréchet MM. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884-1940).
1906;22(1):1-72.
35. Alt H, Godau M. Computing the Fréchet Distance between Two Polygonal Curves. Int J Comput Geometry Appl. 1995;5:75-91.
36. Agarwal PK, Avraham RB, Kaplan H, Sharir M. Computing the Discrete Fréchet Distance in Subquadratic Time. SIAM Journal on Computing. 2014;43(2):429-49.
37. Bellman R, Kalaba R. On adaptive control processes. IRE Transactions on Automatic Control. 1959;4(2):1-9.
38. Han J, Kamber M, Pei J. Data Mining,Concepts and Techniques. 3rd Edition ed. Edition T, editor2012. 459-61 p.
39. McNicholas PD. Model-based clustering. Journal of Classification. 2016;33(3):331-73.
40. Martinez WL, Martinez AR. Computational statistics handbook with MATLAB. Edition r, editor: Chapman and Hall/CRC; 2015.
41. McNicholas PD, Murphy TB. Model based clustering of longitudinal dataCanadian Journal of Statistics / La Revue Canadienne de Statistique. 2010;38(1):153-68.
42. Golumbeanu M, Beerenwinkel N, editors. Clustering time series gene expression data with TMixClust2018 20182018.
43. Heller KA, Ghahramani Z, editors. Bayesian hierarchical clustering. Proceedings of the 22nd international conference on Machine learning; 2005.
44. Peugh J, Fan X. Modeling Unobserved Heterogeneity Using Latent Profile Analysis: A Monte Carlo Simulation. Structural Equation Modeling: A Multidisciplinary Journal. 2013;20(4):616-39.
45. Savage RS, Heller K, Xu Y, Ghahramani Z, Truman WM, Grant M, et al. R/BHC: fast Bayesian hierarchical clustering for microarray data. BMC Bioinformatics. 2009;10(1):242.
46. Lazarsfeld PF, Henry NW. Latent Structure Analysis: Houghton, Mifflin; 1968.
47. Twisk J, Hoekstra T. Classifying developmental trajectories over time should be done with great caution: A comparison between methods. Journal of clinical epidemiology. 2012;65:1078-87.
48. Feldman BJ, Masyn KE, Conger RD. New approaches to studying problem behaviors: a comparison of methods for
modeling longitudinal, categorical adolescent drinking data. Developmental psychology. 2009;45(3):652-76.
49. McLachlan GJ, Peel D. Finite mixture models: John Wiley & Sons; 2004.
50. Nagin DS. Analyzing developmental trajectories: A semiparametric, groupbased approach. Psychological Methods.
1999;4(2):139-57.
51. Nagin DS, Tremblay RE. Developmental trajectory groups: Fact or a useful statistical fiction? Criminology: An Interdisciplinary Journal. 2005;43(4):873-904.
52. Nagin DS, Odgers CL. Group-based trajectory modeling in clinical research. Annual review of clinical psychology. 2010;6:109-38.
53. Muthén B, Shedden K. Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics. 1999;55(2):463-9.
54. Verbeke G, Lesaffre E. A Linear Mixed-Effects Model With Heterogeneity in the Random-Effects Population. Journal
of the American Statistical Association. 1996;91(433):217-21.
55. Gong H, Xun X, Zhou Y. Profile clustering in clinical trials with longitudinal and functional data methods. Journal of
biopharmaceutical statistics. 2019;29(3):541-57.
56. Schramm C, Vial C, Bachoud-Lévi A-C, Katsahian S. Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data. Statistical methods in medical research. 2018;27(1):97-113.
57. Zhu X, Qu A. Cluster analysis of longitudinal profiles with subgroups. 2018.
58. Genolini C, Alacoque x, Sentenac M, Arnaud C. kml and kml3d : R Packages to Cluster Longitudinal Data. Journal of Statistical Software. 2015;65:1-34.
59. Da Costa JP, Garcia A. New confinement index and new perspective for comparing countries - COVID-19. Computer methods and programs in biomedicine. 2021;210:106346.
60. Twisk J, Hoekstra T. Classifying developmental trajectories over time should be done with great caution: a comparison between methods. J Clin Epidemiol. 2012;65(10):1078-87
Files | ||
Issue | Vol 9 No 4 (2023) | |
Section | Articles | |
DOI | https://doi.org/10.18502/jbe.v9i4.16666 | |
Keywords | ||
clustering longitudinal data non-parametric methods model-based methods. |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |