UDK 004. 5 BIG DATA AND MACHINE LEARNING IN BANKING SPHERE Arsenov Temirlan Erlanovich 2nd year master student of Almaty Management University
Almaty, Kazakhstan
Supervisor - T. Bakibayev Abstraction: Detecting fraud is no easy task. In fact, fraudulent transactions are quite
rare and affect only an extremely small part of the organization‘s activities. But even a small
percentage of such operations can quickly lead to huge financial losses if the company lacks the
necessary tools and systems. And that's the problem. Criminals are very creative. As soon as the
usual schemes become ineffective, they immediately change tactics. True, there is good news -
thanks to the development of machine learning technologies, systems can learn, adapt and find
new ways to prevent fraud.
Keywords: Machine learning, banks, Big Data, algorithms, fraud
Most organizations still use rule-based algorithms as their primary fraud detection tool.
The rules easily reveal known patterns, but they are completely ineffective when it comes to
new, yet unknown schemes; they do not know how to adapt to new conditions and are not able to
resist the actions of fraudsters, who are becoming more sophisticated. Only machine learning
systems can handle this.
Today, machine learning is very popular. Most vendors claim that they somehow use it in
their fraud detection solutions. Machine learning was invented a long time ago. However, one
should not think that since machine learning has been used for a long time in this field, you can
now rest on our laurels. In fact, everything is completely the opposite.
Machine learning is a critical component in a fraud detection toolkit.
Data volumes are growing, and the task of identifying fraud becomes more complex.
When it comes to creating machine learning systems, the key to success is data. The more data,
the better the model, and the fraud detection model is no exception. With the growth of
information and the level of complexity, security specialists need scalable machine learning
platforms. Although traditional tools are quite effective when working with thousands of records
and several megabytes of data, in the real world, problems are measured in gigabytes or even
terabytes of data.
There is no one single algorithm or machine learning method that will work 100% in all
cases. The key to success is the ability to try out a lot of individual algorithms and their
combinations, as well as test them on various data sets. Specialists in data research require a set
of a variety of controlled and uncontrolled teaching methods, as well as various technologies for
constructing features. Finally, the creative aspect of using machine learning to detect fraud
should not be overlooked. It is about new and unusual ways of using machine learning - for
example, combining the methods of teaching models with a teacher and without a teacher, which
is much more efficient than using separate methods.
The more data, the better the model, and the fraud detection model is no exception. With
the growth of information and the level of complexity, security specialists need scalable machine
learning platforms.
It would seem that everything is obvious, but it is a most difficult task for many
organizations. After developing a machine learning model, it must be integrated into the work
environment. If your data is stored in Hadoop, it makes sense to develop a machine learning
model that you can later integrate into Hadoop without any problems. Similarly, in the case of
streaming data in real time, a machine learning system capable of operating in real time or