АЛМАТЫ ТЕХНОЛОГИЯЛЫҚ УНИВЕРСИТЕТІ ИНЖИНИРИНГ ЖӘНЕ АҚПАРАТТЫҚ ТЕХНОЛОГИЯЛАР ФАКУЛЬТЕТІ Ақпараттық Технологиялар Кафедрасы
СӨЖ “Сlustering task.”
Баға
Ескертпе
Қолы
Орындаған:
Бердіғалиев Ақниет
Қабылдаған:
Қабдолдина А.
Алматы 2023ж
Clustering is the grouping of objects according to the similarity of their properties; each cluster consists of similar objects, and the objects of different clusters differ significantly.
Clustering goals in Data Mining can be different and depend on the specific task being solved.
Let's consider these tasks.
• Data study. Splitting a set of objects into similar groups helps to identify the structure of data, increase the visibility of their representation, put forward new hypotheses, and understand how informative the properties of objects are.
• Facilitating analysis. Clustering can simplify further data processing and
model building: each cluster is processed individually and a model is created for each cluster separately. In this sense, clustering is a preparatory stage before solving other tasks
Data Mining: classifications, regressions, associations, sequential patterns.
In the previous lessons, we got acquainted with several common metric approaches to classification and regression problems. In them, we were given a training sample (marked up data), according to which the model was built. But in practice, there are problems when the initial set of objects is not marked up and you need to build a model that would distribute images into groups (classes) using a given (or selected) metric between objects:
This is a statement of the clustering problem. Here at the output, in general, you need to find:
multiple clusters ;
clustering algorithm .
And, since the algorithm itself must distribute the images into classes, the clustering task is attributed to unsupervised learning tasks. That is, it is a private task of teaching without a teacher. In addition to clustering, there are completely different methods that also apply to teaching without a teacher. Therefore, it is not necessary to assume that clustering covers the entire class of such tasks.