Journal of Applied Mathematics

Volume 2017, Article ID 4323590, 9 pages

https://doi.org/10.1155/2017/4323590

## A Greedy Clustering Algorithm Based on Interval Pattern Concepts and the Problem of Optimal Box Positioning

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Leninskie Gory 1, Moscow 119991, Russia

Correspondence should be addressed to Stepan A. Nersisyan; moc.liamg@naysisren.a.s

Received 19 June 2017; Accepted 15 August 2017; Published 25 September 2017

Academic Editor: Dimitris Fotakis

Copyright © 2017 Stepan A. Nersisyan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We consider a clustering approach based on interval pattern concepts. Exact algorithms developed within the framework of this approach are unable to produce a solution for high-dimensional data in a reasonable time, so we propose a fast greedy algorithm which solves the problem in geometrical reformulation and shows a good rate of convergence and adequate accuracy for experimental high-dimensional data. Particularly, the algorithm provided high-quality clustering of tactile frames registered by Medical Tactile Endosurgical Complex.

#### 1. Introduction

We consider the problem of clustering, that is, splitting a finite set into disjoint subsets (called* clusters*) in such a way that points from the same cluster are similar (with respect to some criterion) and points from different clusters are dissimilar (see, e.g., [1]). It is convenient to present the input data in the form of a numerical context (table) whose rows correspond to objects and columns correspond to attributes of the objects.

Formal concept analysis (FCA) is a data analysis method based on applied lattice theory and order theory. The object-attribute binary relation is visualized with the use of the line diagram of the concept lattice. Within the framework of this theory a formal concept is defined as a pair (extent, intent) obeying a Galois connection (for exact definitions see the monograph [2] by Ganter and Wille).

There exist several generalizations of FCA to fuzzy and numerical contexts. One of them is known as the theory of pattern structures introduced by Ganter and Kuznetsov in [3]. An important particular case of pattern concepts, which are the key object in the theory of pattern structures, is interval pattern concepts with the operation of interval intersection. Interval pattern concepts allow one to apply cluster analysis to rows of formal numerical contexts. In this case the criterion of similarity consists in belonging of all the differences between the values of the corresponding attributes to given intervals.

It can be easily seen that the problem of finding an interval pattern concept of maximum extent size (i.e., cardinality) can be reformulated as the problem of the optimal positioning of a -dimensional box with given edge lengths for the given set , that is, finding a position of the box that maximizes the number of points of the set enclosed by the box (the details are given below in Section 2.2).

The existing algorithms that solve the problem of finding the optimal position of a box do not allow one to obtain an exact or at least approximate solution for high-dimensional data within a reasonable time (see a detailed survey in Section 2.2). The main goal of this paper is to propose a greedy algorithm which gives an approximate solution to this problem and a clustering algorithm based on the optimal positioning problem. We propose a clustering algorithm with worst-case time and space complexity, where denotes the number of iterations of the main stage of the algorithm, and parameters and regulate the duration of each iteration. Greater number of iterations and greater duration of each iteration provide better approximation.

The rest of the paper is organized as follows. In Section 2 we introduce the main definitions and formalize the statement of the problem. In Sections 3 and 4 we formulate our algorithms. In Sections 5 and 6 we describe the validation results and make some concluding remarks.

#### 2. Main Definitions and Statement of the Problem

In this section we start with the main definitions from the theory of formal concepts and then present a geometrical reformulation of the problem of finding the interval pattern concept of maximum extent size (we call it simply the* maximum interval pattern concept*).

##### 2.1. Formal Concepts

Let us recall the main definitions which we need to formalize our clustering method based on interval pattern concepts. Additional details can be found in [2, 3].

*Definition 1. *An* upper (lower) semilattice* is a partially ordered set such that for any elements there exists a unique least upper bound (greatest lower bound, resp.).

*Definition 2. *A* semilattice operation* on the set is a binary operation : that features the following properties for a certain and any elements : (i) (idempotency).(ii) (commutativity).(iii) (associativity).(iv).

*Definition 3. *A* lattice* is an ordered set which is at the same time an upper and a lower semilattice.

*Definition 4. *Let and be partially ordered sets. A* Galois connection* between these sets is a pair of maps : and : (each of them is referred to as a* Galois operator*) such that the following relations hold for any and : (i) (anti-isotone property).(ii) (anti-isotone property).(iii) and (isotone property).Applying the Galois operator twice, namely, and , defines a* closure operator*.

*Definition 5. *A* closure operator * on is a map that assigns a* closure * to each subset under the following conditions: (i) (monotony).(ii) (extensity).(iii) (idempotency).

*Definition 6. *A* pattern structure* is a triple , where is a set of objects, is a meet-semilattice of potential object descriptions, and : is a function that associates descriptions with objects.

The Galois connection between the subsets of the set of objects and the set of descriptions for the pattern structure is defined as follows:

*Definition 7. *A* pattern concept of the pattern structure * is a pair , where is a subset of the set of objects and is one of the descriptions in the semilattice, such that and ; is called the* pattern extent* of the concept and is the* pattern intent*.

A particular case of a pattern concept is the interval pattern concept. The set consists of the rows of a numerical context, which are treated as tuples of intervals of zero length. An interval pattern concept is a pair , where is a subset of the set of objects and is a tuple of intervals with ends determined by the smallest and the largest values of the corresponding component in the descriptions of all objects in .

Interval pattern concepts are convenient to use in the analysis of numerical contexts, when there is a need to divide all data into clusters that comprise objects in which the numerical data is similarly “distributed” in the rows.

For each component of an interval pattern concept we introduce the width : the difference between the largest and the smallest values of the component. Then a clustering procedure can be defined using a standard greedy approach. Specifically, at each step the maximum interval pattern concept is identified, that is, an interval pattern concept with the maximum number of objects, whose width with respect to each component does not exceed a predefined . The objects of the identified interval pattern concept are combined into a cluster and excluded from the set of objects analyzed at subsequent steps.

In Example presented in Table 1 the objects are pupils and the numerical data of the context consist of the grades they got at exams in various disciplines.