k Nearest Neighbors classifier with sampling dependent decision rules

Dátum
Folyóirat címe
Folyóirat ISSN
Kötet címe (évfolyam száma)
Kiadó
Absztrakt

The k Nearest Neighbors (kNN) method is a widely used technique to solve classification or regression problems in machine learning and data science. Compared to other methods like Support Vector Machines or Neural Networks, kNN has extremely low number of parameters, reducing the chances of overfitting when the number of training vectors is relatively small. Consequently, kNN has many practical applications, especially in fields where the available training data is limited or the acquisition of data is expensive.

However, in many machine learning related problems various circumstances can make the operation of the original kNN less accurate. Such circumstances may arise due to the unbalanced class sizes, to the differing densities of training vectors or to the noisy entities present in most databases. In this study, we introduce novel, local decision rules that also take into consideration possible sampling issues. The proposed model uses only the k nearest neighbors already chosen for classification and executes an algorithm with O(k^2) time complexity that can be considered efficient until k is relatively low. The model was evaluated on the widely used test databases of classification and based on the test results we can state that the proposed decision rule is able to increase the accuracy of classification in various problems.

Leírás
Kulcsszavak
kNN, k Nearest Neighbors, imbalanced sampling, classification
Forrás