"SCIENCE AND EDUCATION IN THE MODERN WORLD: CHALLENGES OF THE XXI CENTURY" NUR-SULTAN, KAZAKHSTAN, JULY 2019 38
Simply put, machine learning automates the process of extracting known and unknown
patterns from data. The machine learning algorithm expresses these patterns in the form of a
formula or instruction that can be applied to new and previously unknown data. The algorithm
studies the results and adapts its work in accordance with the newly identified patterns. Such
training can be carried out in a controlled or uncontrolled manner.
Controlled methods of machine learning involve learning based on a set of so-called tagged data.
The model is trained using records of both fraudulent and legitimate operations. After that, she
tries to develop a set of functions or instructions that can confirm or deny the fact of fraud in the
new samples. Standard methods for supervised machine learning include logistic regression,
neural networks, decision tree, gradient boosting, random forests, support vectors (SVM), etc.
Uncontrolled machine learning is based on other principles. Since it is not known in advance
what data is related to fraudulent operations, the model should create a function that describes
the data structure. Thus, the algorithm marks all data that does not fit into this model as
anomalous. To train such a model, it is sufficient to simply provide it with data, and it will try to
create a set of functions or instructions describing the basic structure and parameters of the data.
This set of functions or instructions can then be applied to new and previously unknown data.
The following difficulty is associated with the uncontrolled learning method: it is often very
difficult to assess how accurate the detection scheme is until data are manually checked. The
standard methods of this machine learning include self-organizing maps, the k-means method,
dbscan algorithms, nuclear smoothing, single-class SVM, principal component method, etc.