Ant Colony Inspired Clustering in Biomedical Data Processing Miroslav Bursa and Lenka Lhotska Dept. of Cybernetics, Gerstner Laboratory, BioDat Research Group Czech Technical University in Prague, Technicka 2, Prague 6, Czech Republic Phone: +420 224 357 666, Fax: +420 224 357 224 email: [bursam,lhotska]@fel.cvut.cz ABSTRACT: This paper presents an overview of the methods inspired by the behavior of real ants in nature and which are currently in the focus of current research. The behavior of ant colonies shows many interesting properties that have been used in static and dynamic combinatorial problem-solving tasks (mostly since 1990). Also some applications to data clustering have been proposed. The paper addresses mainly the problem of ant-inspired clustering in the process of long-term electrocardiogram and electroencephalogram processing. KEYWORDS: Ant Colony Optimization, Ant Colony Clustering, Unsupervised Clustering, Data Mining, Electrocardiogram, Electroencephalogram, Sleep Analysis, Holter, Swarm Intelligence, Nature Inspired Methods INTRODUCTION The paper concentrates on the field of use of artificial intelligence techniques and their use in biomedical data processing. It aims at the clustering techniques inspired by natural processes, mainly ant colonies. After the introduction into the data and state-of-the-art of ant-colony-inspired metaheuristic, more thorough overview of ant-colony inspired clustering metaheuristic is presented, together with the ACO_DTree [22] method, developed by the author, which is based on the autocatalytic collective behavior of real insect colonies. Over the basic algorithm it involves techniques to increase robustness and performance of the method. Furthermore, basic methods for determining validity of the clustering are mentioned. ANT COLONY INSPIRED METHODS The inspiration of the ant inspired algorithms comes from the foraging behavior of real ant colonies which has been observed in nature and which has been studied by biology scientists. An ant colony operates without any central or hierarchical control to direct the work. No ant is able to asses the global needs of the colony: the capacities of the individuals are limited. There is abundant evidence, throughout physics, the social sciences, and biology, that such simple behavior of the individuals can lead to predictable patterns in the behavior of the groups. The idea of ant colony technique for optimization was introduced in the early 1990s by M. Dorigo. DATA MINING AND CLUSTERING Nowadays many data-mining algorithms with still growing number of modifications exist. Such modifications aim at speeding up the data mining process, increase its robustness and stability. But even with rapidly increasing computational power of modern computers, the analysis of huge databases is very expensive (in terms of computer time and/or memory – and therefore also financially). This is why scientists instantly create, develop and evaluate novel methods to analyze and process these data. In contrary to classical methods, nature-inspired methods offer many aspects, which can increase speed and robustness of classical methods. Natural clustering techniques inspired by nature also exist (such as self-organizing maps, neural networks, evolutionary algorithms, etc.). Data analysis procedures can be classified as either exploratory or confirmatory, based on the availability of appropriate models for the data source. A key element in both types of procedures is the grouping, or classification of measurements based on either goodness-of-fit to a postulated model, or natural groupings (clustering) revealed through analysis. The latter case is appropriate in the case of ECG processing. This is where data clustering can significantly help to reduce the amount of work which has to be performed by the expert (cardiologist). BIOMEDICAL DATA This section describes data (biological signals), which have been used in this study: Electrocardiogram (EEG) and Electroencephalogram (EEG). Electrocardiogram (ECG) recording is still one of the most important non-invasive diagnostic techniques used in patient diagnostics process. The recent research shows that also depressive patients can be diagnosed using the ECG processing (heart rate variability) [1]. Electroencephalogram (EEG) is an electrical recording of brain activity. It can be used in sleep stage classification. Both these recordings produce a huge amount of data. Thus efficient processing algorithms must be used. And this is the field where the nature inspired methods can be successfully used. ELECTROCARDIOGRAM An electrocardiogram processing consists of the following processes: signal pre-processing, signal transfer and/or storage, digital signal processing and feature extraction, clustering of the similar data (mainly in long-term recordings), signal interpretation (classification) and expert validation. In the majority of the process the ant-colony inspired methods can be used with more or less success. Feature extraction From the ECG signal, the following eight features have been automatically extracted (see [2]): amplitudes of Q, R, S, positive T and negative T wave, amplitude ratio of Q/R, R/S and R/T waves. For processing, the features have been normalized. These features are usable for other clustering purposes and for beat clustering. In [3], certain description of the data can also be found (together with some basic anonymous description of the patients). For the sake of simplicity, only two distinct classes have been used: normal cardiac action and abnormal cardiac action. The classification into more classes is nearly impossible due to lack of the data (mainly abnormal heart action signal) in some signals. This way (by using only PVC (Premature Ventricular Contraction) beat as abnormal heart actions) more records from the MIT-BIH database can be correctly processed. Along with the features a signal window of 401 samples has been stored (containing the beat which was used for feature extraction) thus making it available to perform a DTW comparison and template matching. ELECTROENCEPHALOGRAM All recordings used in this work contain eight EEG channels (these are FP1, FP2, T3, T4, C3, C4, O1, O2), Electrooculogram (EOG), Electromyogram (EMG), Respiratory channel (PNG) and Electrocardiogram (ECG). All the data have been annotated by an expert into four classes (wake, quiet sleep, active sleep, movement artifact). For accurate classification it is necessary to determine and/or calculate the most informative features. In previous studies a method based on power spectral density (PSD) has been applied to each EEG channel. Features derived from EOG, EMG, ECG and PNG signals have been also used. The most informative one is the measure of regularity of respiration in PNG signal. The methods, which have been used for feature extraction, are in detail described in [4]. ANT COLONY INSPIRED METHODS Ants have inspired many researches by their optimization which runs with no direct governing. It is a form of distributed and highly cooperative process. Like other cooperative systems in nature, the ants are subject to ongoing research and the ant-colony inspired methods are the state-of-the-art of novel optimization methods. The most popular is Ant Colony Optimization (ACO) introduced by Marco Dorigo [5], and his colleagues [6] in the early 1990's. Their work has been based on the observation of real ant colonies. Review can be found in [25]. Ants belong to social insect category. It means that the ant would be unable to survive on its own. But the colony as whole is very productive and can survive in the cruel reality. The source of inspiration for ant metaheuristic is their so called foraging behavior and particularly their ability to find the shortest path between food sources and their nest. When ants search for food, they initially perform a random-walk search in their local neighborhood. As the ants move, they deposit a chemical substance called pheromone on their path. This substance is perceived by other ants, helping them to make decision on their next move. This way a solution is constructed in an incremental fashion. The amount of the pheromone laid is proportional to the number of ants that used the route. Pheromone also evaporates in time. Pheromone evaporation helps to avoid local minima and allows dynamic adaptation when the problem (route, food source) changes. As soon as an ant finds a food source, it evaluates the quantity and the quality of the food and carries some of it back to the nest. During the return trip, the quantity of pheromone that an ant leaves on the ground may depend on the quantity and quality of the food. The pheromone trails will guide other ants to the food source. This kind of indirect communication (via pheromone exchange on the trail) known as stigmergy [7] allows the species to find shortest path between the nest and the food source. ANT COLONY OPTIMIZATION FOR DISCRETE COMBINATORIAL PROBLEMS The method of Ant Colony Optimization metaheuristic technique (M. Dorigo) is a model of the ant behavior, which is used for combinatorial problems [24]. The method is inspired by the way the real ants construct path using chemical substance called pheromone. A modification of Ant Colony Optimization can also be used for dynamic optimization such as network routing [8]. This group of algorithms has been used to solve many combinatorial problems: Quadratic Assignment Problem, Traveling Salesman Problem, and Vehicle Routing Problem: This branch addresses the NP problems, static and dynamic combinatorial optimization problems and distributed problems [25]. ANT COLONY OPTIMIZATION FOR CONTINUOUS DOMAIN Method for optimization in continuous space uses probabilistic density function with Gaussian kernel, representing the spatial distribution of pheromone. The method presented by M. Socha [9] is the most related to the ant-inspired techniques. ANT COLONY INSPIRED METHODS FOR CLUSTERING Several species of ant workers have been reported to form piles of corpses (cemeteries) to clean up their nests. This aggregation phenomenon is caused by attraction between dead items mediated by the ant workers. The workers deposit (with higher probability) the items in the region with higher similarity (when more similar items are present within the range of perception). For example, the Messor sancta ants organize dead corpses into clusters; brood sorting has been studied in ant colony of Leptothorax unifasciatus. This approach has been modelled [10] and [11] to perform a clustering of data. The approach (as all the clustering methods) is very sensitive to the similarity measure used (e. g. Euclidean distance, etc.) and the range of agent perception. Note, that no pheromone is used in this method. Also some methods using pheromone exist, namely A2CA [12]. Another approach can be seen the work of J. Handl in [13] (an ATTA algorithm), which introduces modified neighborhood function (penalizing high dissimilarities), shortterm memory with look-ahead (jumping ants), increasing radius of ant perception, time-dependent modulation of the neighborhood function (replacing the scaling parameter). The work also introduces modified threshold function for picking and dropping the data. The work is followed by the work of Tan et al. [14] which removes the ant metaphor from the method and presents a deterministic version of ant-clustering algorithm. Basic ant inspired clustering algorithm The Ant Colony inspired clustering works as follows. First the data vectors are randomly scattered onto a twodimensional grid (usually a toroidal one) (which is an analogy of the real world of an ant in the nature). Ants (also called agents) are then randomly placed onto the two-dimensional grid. In each iteration step an ant searches its neighborhood and computes a probability of picking up a vector (if the ant is unloaded and steps onto a vector) or of dropping down a vector (if the ant is loaded and steps onto a free grid element). The ant moves randomly. Intuitively the search process can be speed up by traversing the data vectors or guided by pheromone placed onto the grid. Although it may seem that the method described based on the Lumer-Faieta experiment [10] is the only ant-colony inspired method which is suitable (and usable) in data clustering, many researches have contributed methods based on Ant System and Ant Colony optimization. This method is definitely closest to its natural counterpart. Its drawback is that it has a number of parameters that must be set empirically and can produce more clusters than optimal number. The task is not to model the natural process in its every detail, but rather to pick up the most robust and usable concepts. Other algorithms The proposed algorithms for clustering which are inspired by ant colonies behavior can be divided in a few groups: The first approach is to somehow improve classical clustering methods, such as k-means or SOM methods (Kohonen network): see [15], resp. [16]. Such concepts improve the traditional methods by the robust and efficient methods of Ant Colony Optimization. They mainly keep the traditional methods from being stuck in local minima. Tsai et al. [17] proposed a novel clustering method called ant colony optimization with different favor algorithm which performed better than the fast SOM K-means and generic K-means algorithm. The second approach is to perform the clustering in some kind of the graph structure. It can be modeled in the following way: the data vectors represent vertices and the edges contain pheromone values [18]. This kind of methods reports also good results in changing environment. Its drawback is obviously the memory consumption. Therefore only limited number of edges (with pheromones) is held in memory. Thus, this kind of algorithms is does not scale very well. Another approach is the approach when the data are regarded as ants itself. The genome of the ant is based on the data vector. The ants interact as they do in nature: they accept individuals with similar genetic information (which produces similar odour) [19]. And they reject individuals with different genome. This way, a structure of clusters is obtained. We can also find some other modifications, among others fuzzy modifications [20]. Weng et al. [21] proposed a time series segmentation algorithm based on ACO algorithm. Handl et al. [13] developed ant based clustering method by incorporating adaptive, heterogeneous ants, a time-dependent transporting activity and a method that transforms the spatial embedding produced by the algorithm into an explicit partitioning. This is not the exhaustive list of all techniques, other approaches can be found in literature. ACO_DTree method Detailed description of the method can be found in [22], the method is based on the MAX-MIN Ant System algorithm [23]. The proposed method works as follows: first a population of random solutions is created. In each step, the population is populated with new solutions and further improved. Then the population is evaluated and only the specified number of solutions is conserved. Then, a certain amount of pheromone is deposited in proportion to the quality of best individuals and pheromone evaporation is performed. The method can be described by the following code: Generate initial population of solutions Repeat until stopping criterion is reached Create new solutions (pheromone driven) Evolve population (mutation) Evaluate population (determine classif. error) Select best individuals Pheromone evaporation Daemon actions (pheromone laying, param. adapt.) End Repeat The pheromone matrix used in the method conforms to full graph (graph with all possible edges) where nodes represent feature indexes and edges contain pheromone and represent transition from one feature to another. First a root node with random index and value is created. Then, recursively for each possible sub-node (with certain probability), next feature is probabilistically chosen from the graph with preference increasing proportionally to pheromone on the edges. This process is iterated up to maximum level of the tree. Only the best solutions (ants) are allowed to deposit pheromone into the matrix. The amount deposited is determined by the quality of the solution. The trees itself are further optimized using a local search technique, namely the Particle Swarm Optimization method. It is a population approach inspired by the behavior of bird flocks or fish schools. However, any kind of local search method can be used. Based on the results of the preliminary experiments, population size and number of new solutions added has been fixed to 8 and 4 solutions respectively. These parameters actually increase and decrease the number of solutions generated over time. Similar effect can be obtained by setting maximum number of iterations. Elitist ratio (number of best solutions which can deposit pheromone) has been also fixed the value of ½ of the population (with minimum of 5). The optimized parameters tuned are: maximum tree depth and number of iterations. In order to avoid premature convergence and maintain diversity in the population of solutions, adaptive techniques have been used. First, the 0.05;1.05 pheromone amount on the edge is limited and can be in the range , the evaporation rate and lay rate is adaptively changed to maintain an average pheromone value over the whole pheromone matrix (if the average pheromone drops by 10 %, the pheromone lay rate is increased, similar policy is applied to the pheromone evaporate rate; both the values are bounded by the minimum and maximum value). This could lead to saturation of pheromone values, thus a countermeasure to maintain saturated edges on the minimum is also used. The balanced process diversifies the population and avoids getting stuck in local minima. RESULTS Using the the ACO_DTree method together with Particle Swarm Optimization as a local search method, we have obtained 97.11 % accuracy over the training set (training set has been randomly selected from the whole data set in the ratio of 66 % and 33 % of training respective testing data vectors). Using the EEG recordings of patients we obtained an accuracy of 82 % in the artifact removal process. The overall classification accuracy is 71.3 %. The results are summarized in Table I. The ACO_Dtree method outperformed the Random Tree method in all cases. Task ECG Classification EEG Classification EEG Active vs. Quiet sleep EEG Noise removal ACO_Dtree 97.11 % 71.30 % 96.38 % 91.02 % WEKA Random Tree 96.53 % 66.21 % 95.37 % 90.80 \% Table I:. Comparison of the ACO_Dtree method with classical approach for classification tree generation. CONCLUSION In this paper we have presented an introduction into ant inspired techniques with respect to biomedical data processing. We have presented and evaluated a hybrid method which can be used for data partitioning, data classification, which is also usable feature selection. The method is based on the hybrid combination of evolutionary algorithm with ant colony optimization. This combination allows better convergence and leads to increased robustness. The method has been compared with a simple evolutionary algorithm, which does not use pheromone and with Random tree generation method (from the WEKA [26] toolkit). The hybrid method outperformed the other method in all cases. The method has been (after preliminary tests on smaller datasets) applied to the MIT-BIH database with more than 80.000 records. The EEG data contains about 450.000 instances. Certain parameters of the method have been experimentally determined. The population size should equal the number of features in the signal (the square root of the size of pheromone matrix). PSO re-optimization of the individuals is very important, however with vigorous optimization, the advantage of robustness is lost (the results on training data set are excellent, but very poor on the testing data set). Lower accuracy on the EEG set is manly due to high amount of expert misclassification in the data (the neurologists obtain classification consensus in about 70 % of the cases). The results show that the approach is suitable for biological data clustering. The advantage is that it produces clear structure with clinical use. DISCUSSION The ant colony inspired methods can be applied in many stages of the ECG processing process. However, only some process stages are really suitable for such methods. We must be aware that ant colony methods can find only an approximation of the best solution. Thus some compromise has to be accepted. The main research on the nature inspired methods does not focus on the strict modeling of the natural processes; it merely focuses on using the best ideas to improve the convergence and accuracy of such methods. The best results have been obtained in automated ECG beat clustering (and interpretation) process. The main research on the nature inspired methods does not focus on the strict modeling of the natural processes; it merely focuses on using the best ideas to improve the convergence and accuracy of such methods. The best results have been obtained in automated ECG beat clustering (and interpretation) process. ACKNOWLEDGMENT The research has been supported by the research program No. MSM 6840770012 "Transdisciplinary Research in the Area of Biomedical Engineering II" of the CTU in Prague, sponsored by the Ministry of Education, Youth and Sports of the Czech Republic and by the FP6-IST project No. 13569 NiSIS (Nature-inspired Smart Information Systems). REFERENCES [1] R. Carney, J. Blumenthal, P. Stein, L. Watkins, D.Catellier, L. Berkmann, S. Czajkowski, C. O’Connor, P. Stone, K. Freeland. Depression, Heart Rate Variability, and Acute Myocardial Infarction. Circulation. 2001;104:2024-2028 [2] Chudáček V, Lhotská L., Unsupervised Creation of Heart Beats Classes from Long-term ECG Monitoring, Biennial International EURASIP Conference BIOSIGNAL, 18:199–201, 2006 [3] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220, 2006 [4] Gerla, V.; Lhotska, L.; Krajca, V. & Paul, K.: Multichannel Analysis of the Newborn EEG Data, IEEE ITAB International Special Topics Conference on Information Technology in Biomedicine. Piscataway: IEEE, 2006 [5] Dorigo, M.: Optimization, learning and natural algorithms, PhD. thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy, 1992 [6] Bonabeau, E.; Dorigo, M. & Theraulaz, G.: Swarm Intelligence: From Natural to Artificial Systems, Oxford University Press, New York, NY, 1999 [7] Grasse, P., La reconstruction du nid et les coordinations inter-individuelles chez bellicositermes natalensis et cubitermes sp. La théorie de la stigmergie: Essai d'interprétation des termites constructeurs, Insectes Sociaux, 1959, 6, 41-81 [8] Caro, G. D.; Ducatelle, F. & Gambardella, L. M.: AntHocNet: An Adaptive Nature-Inspired Algorithm for Routing in Mobile Ad Hoc Networks, Telecommunications (ETT), Special Issue on Self Organization in Mobile Networking, 2005, 16, No 2 [9] K. Socha. ACO for Continuous and Mixed-Variable Optimization. Proceedings of 4th International Workshop on Ant Colony Optimization and Swarm Intelligence (ANTS'2004), Brussels, Belgium, September 5-8, 2004 [10] Lumer, E. D. & Faieta, B.: Diversity and Adaptation in Populations of Clustering Ants, From Animals to Animats: Proc. of the 3th Int. Conf. on the Simulation of Adaptive Behaviour, 1994, 3, 501-508 [11] J. L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, L. Chrétien. The Dynamics of Collective Sorting: Robot-like Ants and Ant-like Robots. From Animals to Animats: Proc. of the 1st International Conference on Simulation of Adaptive Behaviour 1990:356-353 [12] Vizine, A. L.; de Castro, N. L.; Hruschka, E. R. & Gudwin, R. R.: Towards Improving Clustering Ants: An Adaptive Ant Clustering Algorithm, Informatica 29, 2005, 143-154 [13] Handl, J.; Knowles, J. & Dorigo, M.: Ant-based clustering and topographic mapping, Artificial Life 12(1), 2006, 12, 35-61 [14] Tan, S. C.; Ting, K. M. & , S. W. T.: Reproducing the Results of Ant-based Clustering Without Using Ants, CEC 2006. IEEE Congress on Evolutionary Computation, 2006, 1760-1767 [15] Kuo, R. J.; Wang, H. S.; Hu; Tung-Lai & Chou, S. H.: Application of Ant K-Means on Clustering Analysis, Computers and Mathematics with Applications, Elsevier, 2005, 50, 1709-1724 [16] Chi, S. & Yang, C. C.: Integration of Ant Colony SOM and K-Means for Clustering Analysis, KnowledgeBased Intelligent Information and Engineering Systems, LNCS, Springer, 2006, Volume 4251, 1-8 [17] Tsai, C. F.; Tsai, C. W.; Wu, H. C. & Yang, T.: ACODF: a novel data clustering approach for data mining in large databases, Journal of Systems and Software, 2004, Vol. 73, Issue 1, 133-145 [18] Ling, C.; Li, T. & Yixin, C.: An Ant Clustering Method for a Dynamic Database, D.S. Yeung et al. ICMLC 2005, LNAI 3930, Springer Verlag Berlin Heidelberg, 2005, 1006, 169-178 [19] Labroche, N.; Monmarche, N. & Venturini, G.: Antclust: Ant Clustering and Web Usage Mining, Proceedings of the Genetic and Evolutionary Computation Conference, Chicago Temmuz, 2003, 25-36 [20] Schockaert, S.; Cock, M. D.; Cornelis, C. & Etienne, E. E. K.: Fuzzy Ant Based Clustering, ANTS 2004, LNCS 3172, Springer-Verlag Berlin Heidelberg, 2004, 342-349 [21] Weng, S. S. & Liu, Y. H.: Mining time series data for segmentation by using Ant Colony Optimization, European Journal of Operational Research, 2006 [22] Bursa, M.; Lhotska, L. & Macas, M.: Hybridized Swarm Metaheuristics for Evolutionary Random Forest Generation, Proceedings of the 7th International Conference on Hybrid Intelligent Systems 2007 (IEEE CSP), 2007, 150-155 [23] T. Stützle, T. & Hoos, H. H., MAX-MIN Ant system, Future Gen. Comput. Syst. 16, 2000, 8, 889-914 [24] Dorigo, M. & Stützle, T., Ant Colony Optimization, MIT Press, Cambridge, MA, 2004 [25] Blum, C., Ant colony optimization: Introduction and recent trends, Physics of Life Reviews,Volume 2, Issue 4, 2005, 353-373 [26] Witten, I. H. & Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005
© Copyright 2024