Байланысты: Сборник материалов конференции (продолжение)
OVERVIEW OF STATISTICAL LEARNING METHODS IN THE FIELD OF PREDICTIVE MEDICINE Iglikov T.D., Atymtayeva L.B. International IT University (IITU) Abstract The whole humanity seeks for ideas in enhancing the healthcare and one of the most
important goals of medicine is making the correct diagnoses. Diagnoses accuracy is highly
important, because of possibility saving many human lives with a proper treating. Nowadays this
problem can be solved not only from medicine prospective, but with computer power and
statistical theory. So, in this article covered an overview of applicable methods of statistical
learning, which could help in predicting different diseases using databases with patients’ medical
indicators or any other useful and meaningful digital information. The main idea of this article is
to examine different statistical learning techniques in the field of medicine for finding well
interpretable and effectively predicting models, which could increase probability of correct and
forehanded detection of diseases. Hopely, implementing such models may lead to higher quality
of patient care in any medical organization.
Key words: statistical learning, disease prediction, computer science, supervised learning,
unsupervised learning, learning problem, logistic regression, decision tree, random forest, k-
nearest neighbors, support vector machines
At first it’s important to understand, what is statistical learning and machine learning?
And what is difference between statistical learning and machine learning? For many people it
may seem that it is the same collocations and even technical students have a vague
understanding. Both statistical learning and machine learning are aiming on extracting some
useful information from datasets, creating model and using it for predictions. Machine learning
seeking for accurate predictions for unobserved data and statistical learning looking for strong
inferences between observations and outcomes. In many practical cases it’s not important to
have a clear inference, rather than accurate and precise predictions, for example for companies
who want to predict future prices of cars, houses and etc. it’s not important how the prediction
was made, their main focus is on accuracy of predictions, so that the company may get the profit
of new data. In opposite for predicting different diseases in medicine very influential and critical
to know how these predictions were made, how features correlated with outcome and why some
of features are more significant than others. Many methods could be used either in statistical
learning or in machine learning and all of them could be divided into two big categories:
supervised learning and unsupervised learning.
Supervised learning based on making predictions in situation when we already have
outputs for our inputs. The key idea is to build efficient model, which will be capable to make
decent predictions for future inputs. However in unsupervised learning we facing no outputs to
our inputs and commonly It’s the case of clustering and grouping problem, when we trying to
find some patterns or relations inside of the dataset[1]. For sure unsupervised learning is much
more challenging and complicated challenge for solving.