Articles

Longitudinal data clustering methods: A Systematic Review

Abstract

 In the last few decades, in many research fields, different methods were introduced to discover groups with the same trends in longitudinal data. The clustering process is an unsupervised learning method, which classifies longitudinal data based on different criteria by performing algorithms. The current study was performed with the aim of reviewing various methods of longitudinal data clustering, including two general categories of non-parametric methods and model-based methods. PubMed, SCOPUS, ISI, Ovid, and Google Scholar were searched between 2000 and 2021. According to our systematic review, the non-parametric k-means Clustering Method utilizing Euclidean distance emerges as a leading approach for clustering longitudinal data This research, with an overview of the studies done in the field of clustering, can help researchers as a toolbox to choose various methods of longitudinal data clustering in idea generation and choosing the appropriate method in the classification and analysis of longitudinal data.

References
1. Kaur Mann NKA. Review Paper
on Clustering Techniques. Global Journal
of Computer Science and Technology.
2013;13:42-7.
2. Abraham C, Cornillon P, MatznerLober E, Molinari N. Unsupervised Curve
Clustering using B-Splines. Scandinavian
Journal of Statistics. 2003;30(3):581-95.
3. Fitzmaurice G, Laird N, Ware J. Applied
longitudinal analysis. 2 ed: Wiley; 2011.
4. James GM, Sugar CA. Clustering for
Sparsely Sampled Functional Data. Journal
of the American Statistical Association.
2003;98(462):397-408.
5. Rossi F, Conan-Guez B, Golli A.
Clustering Functional Data with the SOM.
2004.
6. Tarpey T, Kinateder KK. Clustering
functional data. Journal of classification.
2003;20(1).
7. Caliński T, Ja H. A Dendrite Method for
Cluster Analysis. Communications in Statistics
- Theory and Methods. 1974;3:1-27.
8. Milligan GW, Cooper MC. An
examination of procedures for determining the
number of clusters in a data set. Psychometrika.
1985;50(2):159-79.
9. Yosung S, Jiwon C, In-Chan C, editors.
A Comparison Study of Cluster Validity Indices
Using a Nonhierarchical Clustering Algorithm
International Conference on Computational
Intelligence for Modelling, Control and
Automation and International Conference
on Intelligent Agents, Web Technologies and
Internet Commerce (CIMCA-IAWTIC'06);
2005 28-30 Nov. 2005
10. Genolini C, Ecochard R, Benghezal
M, Driss T, Andrieu S, Subtil F. kmlShape: An
Efficient Method to Cluster Longitudinal Data
(Time-Series) According to Their Shapes. PloS
one. 2016;11(6):e0150738.
11. Delmelle EC. Mapping the DNA of
Urban Neighborhoods: Clustering Longitudinal
Sequences of Neighborhood Socioeconomic
Change. Annals of the American Association
of Geographers. 2016;106(1):36-56.
12. Hedeker D, Gibbons RD. Longitudinal
data analysis: Wiley-Interscience; 2006.
13. Morris R, Blashfield R, Satz P.
Developmental classification of readingdisabled children. Journal of clinical and
experimental neuropsychology. 1986;8(4):371-
92.
14. Qin S, Jiao K, He J, Lyu D. Forage crops
alter soil bacterial and fungal communities
in an apple orchard. Acta Agriculturae
Scandinavica, Section B — Soil & Plant
Science. 2016;66(3):229-36.
15. Ciampi A, Campbell H, Dyachenko A,
Rich B, McCusker J, Cole MG. Model-Based
Clustering of Longitudinal Data: Application
to Modeling Disease Course and Gene
Expression Trajectories. Communications
in Statistics - Simulation and Computation.
2012;41(7):992-1005.
16. Den Teuling NGP, Pauws SC, van
den Heuvel ER. A comparison of methods
for clustering longitudinal data with slowly
changing trends. Communications in Statistics
- Simulation and Computation. 2021:1-28.
17. Maruotti A, Vichi M. Time-varying
clustering of multivariate longitudinal
observations. Communications in StatisticsTheory and Methods. 2016;45(2):430-43.
18. Heggeseth BC. Longitudinal cluster
analysis with applications to growth trajectories:
University of California, Berkeley; 2013.
19. Tiedeman D, editor On the study of
types. Symposium on pattern analysis: Air
University, USAF School of Aviation Medicine
Randolph Field, TX; 1955.
20. Wolfe J. A Computer Program for the
Maximum-Likelihood Analysis of Types.
1965:57-60.
21. Subedi S, Browne RP. A family
of parsimonious mixtures of multivariate
Poisson‐lognormal distributions for clustering
multivariate count data. Stat. 2020;9(1):e310.
22. Roick T, Karlis D, McNicholas
PD. Clustering discrete-valued time series.
Advances in Data Analysis and Classification.
2021;15:209-29.
23. Ng TLJ, Murphy TB. Model-based
clustering of count processes. Journal of
Classification. 2021;38:188-211
24. Murphy K, Murphy TB, Piccarreta
R, Gormley IC. Clustering longitudinal
life-course sequences using mixtures of
exponential-distance models. Journal of the
Royal Statistical Society Series A: Statistics in
Society. 2021;184(4):1414-51.
25. Karlis D. Mixture modelling of discrete
data. Handbook of Mixture Analysis. 2019:193-
218.
26. Bouveyron C, Celeux G, Murphy
TB, Raftery AE. Model-based clustering and
classification for data science: with applications
in R: Cambridge University Press; 2019.
27. Salter-Townshend M, White A, Gollini
I, Murphy TB. Review of statistical network
analysis: models, algorithms, and software.
2012.
28. MacQueen J, editor Some methods
for classification and analysis of multivariate
observations. Proceedings of the 5th Berkeley
Symposium on Mathematical Statistics and
Probability; 1967.
29. Genolini C, Falissard B. KmL: a
package to cluster longitudinal data. Computer
methods and programs in biomedicine.
2011;104(3):e112-21.
30. Celeux G, Govaert G. A classification
EM algorithm for clustering and two stochastic
versions. Computational Statistics & Data
Analysis. 1992;14(3):315-32.
31. Laird NM, Ware JH. Random-Effects
Models for Longitudinal Data. Biometrics.
1982;38(4):963-74.
32. Nagin DS, Odgers CL. Group-based
trajectory modeling in clinical research. Annual
review of clinical psychology. 2010;6:109-38.
33. Jain AK. Data clustering: 50 years
beyond K-means. Pattern Recognition Letters.
2010;31(8):651-66.
34. Fréchet MM. Sur quelques points du
calcul fonctionnel. Rendiconti del Circolo
Matematico di Palermo (1884-1940).
1906;22(1):1-72.
35. Alt H, Godau M. Computing the Fréchet
Distance between Two Polygonal Curves. Int J
Comput Geometry Appl. 1995;5:75-91.
36. Agarwal PK, Avraham RB, Kaplan
H, Sharir M. Computing the Discrete Fréchet
Distance in Subquadratic Time. SIAM Journal
on Computing. 2014;43(2):429-49.
37. Bellman R, Kalaba R. On adaptive
control processes. IRE Transactions on
Automatic Control. 1959;4(2):1-9.
38. Han J, Kamber M, Pei J. Data
Mining,Concepts and Techniques. 3rd Edition
ed. Edition T, editor2012. 459-61 p.
39. McNicholas PD. Model-based
clustering. Journal of Classification.
2016;33(3):331-73.
40. Martinez WL, Martinez AR.
Computational statistics handbook with
MATLAB. Edition r, editor: Chapman and
Hall/CRC; 2015.
41. McNicholas PD, Murphy TB. Model based clustering of longitudinal data
Canadian Journal of Statistics / La Revue
Canadienne de Statistique. 2010;38(1):153-68.
42. Golumbeanu M, Beerenwinkel N,
editors. Clustering time series gene expression
data with TMixClust2018 20182018.
43. Heller KA, Ghahramani Z, editors.
Bayesian hierarchical clustering. Proceedings
of the 22nd international conference on
Machine learning; 2005.
44. Peugh J, Fan X. Modeling Unobserved
Heterogeneity Using Latent Profile Analysis:
A Monte Carlo Simulation. Structural Equation
Modeling: A Multidisciplinary Journal.
2013;20(4):616-39.
45. Savage RS, Heller K, Xu Y, Ghahramani
Z, Truman WM, Grant M, et al. R/BHC: fast
Bayesian hierarchical clustering for microarray
data. BMC Bioinformatics. 2009;10(1):242.
46. Lazarsfeld PF, Henry NW. Latent
Structure Analysis: Houghton, Mifflin; 1968.
47. Twisk J, Hoekstra T. Classifying
developmental trajectories over time should be
done with great caution: A comparison between
methods. Journal of clinical epidemiology.
2012;65:1078-87.
48. Feldman BJ, Masyn KE, Conger
RD. New approaches to studying problem
behaviors: a comparison of methods for
modeling longitudinal, categorical adolescent
drinking data. Developmental psychology.
2009;45(3):652-76.
49. McLachlan GJ, Peel D. Finite mixture
models: John Wiley & Sons; 2004.
50. Nagin DS. Analyzing developmental
trajectories: A semiparametric, groupbased approach. Psychological Methods.
1999;4(2):139-57.
51. Nagin DS, Tremblay RE. Developmental
trajectory groups: Fact or a useful statistical
fiction? Criminology: An Interdisciplinary
Journal. 2005;43(4):873-904.
52. Nagin DS, Odgers CL. Group-based
trajectory modeling in clinical research. Annual
review of clinical psychology. 2010;6:109-38.
53. Muthén B, Shedden K. Finite mixture
modeling with mixture outcomes using the EM
algorithm. Biometrics. 1999;55(2):463-9.
54. Verbeke G, Lesaffre E. A Linear
Mixed-Effects Model With Heterogeneity
in the Random-Effects Population. Journal
of the American Statistical Association.
1996;91(433):217-21.
55. Gong H, Xun X, Zhou Y. Profile
clustering in clinical trials with longitudinal
and functional data methods. Journal of
biopharmaceutical statistics. 2019;29(3):541-
57.
56. Schramm C, Vial C, Bachoud-Lévi A-C,
Katsahian S. Clustering of longitudinal data by
using an extended baseline: A new method for
treatment efficacy clustering in longitudinal
data. Statistical methods in medical research.
2018;27(1):97-113.
57. Zhu X, Qu A. Cluster analysis of
longitudinal profiles with subgroups. 2018.
58. Genolini C, Alacoque x, Sentenac M,
Arnaud C. kml and kml3d : R Packages to
Cluster Longitudinal Data. Journal of Statistical
Software. 2015;65:1-34.
59. Da Costa JP, Garcia A. New confinement
index and new perspective for comparing
countries - COVID-19. Computer methods and
programs in biomedicine. 2021;210:106346.
60. Twisk J, Hoekstra T. Classifying
developmental trajectories over time should be
done with great caution: a comparison between
methods. J Clin Epidemiol. 2012;65(10):1078-
87
Files
IssueVol 9 No 4 (2023) QRcode
SectionArticles
DOI https://doi.org/10.18502/jbe.v9i4.16666
Keywords
clustering longitudinal data non-parametric methods model-based methods.

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
1.
Dehghani tafti A, Jahani Y, Jambarsang S, Bahrampour A. Longitudinal data clustering methods: A Systematic Review. JBE. 2023;9(4):396-411.