Search WWW Search
»Journal Description
»Call for Papers and Reviewers
»Author Guidelines
»Contents & Papers
»Call for Special Issues

Classification of Imbalanced Data Using a Modified Fuzzy-Neighbor Weighted Approach


Harshita Patel1*,Ghanshyam Singh Thakur1


1 Department of Mathematics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal-462003, India


Classification of imbalanced datasets is one of the widely explored challenges of the decade. The imbalance occurs in many real world datasets due to uneven distribution of data into classes, i.e. one class has more instances while others have a few that results in the biased performances of traditional classifiers towards the majority class with large number of instances and ignorance of other classes with less data. Many solutions have been proposed to deal with this issue in various crisp and fuzzy methods. This paper proposes a new hybrid fuzzy weighted nearest neighbor approach to find better overall classification performance for both minority and majority classes of imbalanced data. Benefits of neighbor weighted K nearest neighbor approach i.e. assignment of large weights to small classes and small weights to large classes are merged with fuzzy logic. Fuzzy classification helps in classifying objects more adequately as it determines that how much an object belongs to a class. Experimental results exhibit the improvements in classification of imbalanced data of different imbalance ratios in comparison with other methods.


Imbalanced data, K nearest neighbor, Fuzzy K nearest neighbor, Classification.

Full Text:

  1. J. Han and M. Kamber, Data Mining, Concepts and Techniques, Morgan, Kaufmann, 2000.
  2. H. Patel and D.S. Rajput, “Data Mining Applications in Present Scenario: A review”, International Journal of Soft Computing, Vol. 6, pp. 136-142, 2011.
  3. Q. Yang and X. Wu, “10 challenging problems in data mining research”,International Journal of Information Technology and Decision Making, Vol. 5, pp. 597-604, 2006.
  4. Editorial, Special issue on “New trends in data mining”, NTDM. Knowledge Based Systems, Elsevier, pp. 1-2, 2012.
  5. T. Raeder, G. Forman and N. V. Chawla, “Learning from Imbalanced Data: Evaluation Matters”, in D.E. Holmes, L.C. Jain (Eds). Data Mining: Foundations and Intelligent Paradigms, ISRL 23, pp. 315-331, 2012.
  6. R. Pavón, R. Laza, M. Reboiro-Jato and F. Fdez-Riverola, “Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents”, Advances in Intelligent and Soft Computing, Vol. 93, pp. 345–353, 2011.
  7. R. B. Rao, S. Krishanan and R.S. Niculscu, “Data Mining for Improved Cardiac Care”, ACM SIGKDD Exploration Newsletter, Vol. 8, pp. 3–10, 2006.
  8. M. Kubat, R. C. Holte and S. Matwin, “Machine Learning for the Detection of Oil Spills in Satellite Images”, Machine Learning,pp. 195–215, 1998.
  9. H. He and E. A. Garcia, “Learning from Imbalanced Data”, IEEE Transactions on Knowledge and Data Engineering,Vol. 21, pp. 1263-1284, 2009.
  10. X. Wu et al., “Top 10 Algorithms in Data Mining”, Knowledge Information Systems, Vol. 14, pp. 1-37, 2008.
  11. G. Loizou and S. J. Maybank, “The Nearest Neighbor and the Bayes Error Rates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.9, pp. 254-262, 1987.
  12. T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory, Vol. 13, pp. 21-27, 1967.
  13. E. Kriminger and C. Lakshminarayan. “Nearest Neighbor Distributions for Imbalanced Classification”, In:Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, Brisbane, 2012, pp. 10-15.
  14. N. Tomašev and D. Mladenic. “Class Imbalance and the Curse of Minority Hubs”, Knowledge-Based Systems, Vol. 53, pp. 157–172, 2013.
  15. D. Ryu, J. Jang and J. Baik, “A hybrid instance selection using nearest-neighbor for cross-project defect prediction”, Journal of Computer Science and Technology, Vol. 30, pp. 969-980, 2015.
  16. W. Liu and S. Chawla. “Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets”, PAKDD 2011, Part II, LANI 6635, pp. 345-356, 2011.
  17. H. Dubey and V. Pudi. “Class based weighted k nearest neighbor over imbalanced dataset”, PAKDD 2013, Part II, LANI, 7819,pp. 305-316, 2013.
  18. H. Patel and G.S. Thakur, “A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data”, In: Proc. of the 12th International Conference on Data Mining (DMIN’16), pp 106-110, 2016.
  19. S. Ando, “Classifying imbalanced data in distance-based feature space”,Knowledge and Information Systems, vol. 46, pp. 707–730, 2016.
  20. A. Fernandez, M. J. Jesus and F. Herrera, “On the Influence of an Adaptive Inference System in Fuzzy Rule Based Classification Systems for Imbalanced Data-Sets”, Expert Systems with Applications, Vol. 36, pp. 9805-9812, 2009.
  21. H. Han and B. Mao, “Fuzzy-Rough k-Nearest Neighbor Algorithm for Imbalanced Data Sets Learning”, In:Proc. of FSKD 2010-Seventh International Conference on Fuzzy Systems and Knowledge Discovery, IEEE circuits and systems society, China, pp. 1286-1290, 2010.
  22. C. Liu, L. Cao and P.S. Yu, “Coupled Fuzzy K-Nearest Neighbors Classification of Imbalanced Non-IID Categorical Data”, In:Proc. of IJCNN - International Joint Conference on Neural Networks, IEEE, Beijing, 2014, pp. 1122-1129.
  23. E. Ramentol, S. Vluymans, N. Verbiest, Y. Caballero, R. Bello, C. Cornelis, and F. Herrera, “IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification”, IEEE Transactions on Fuzzy Systems. 2014.
  24. E. Fix and J. L. Hodges, “Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties”, Technical Report 4, Project no. 21-29-004, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.
  25. E. Fix and J. L. Hodges, “Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties”, International Statistical Review, Vol. 57, pp. 238–247, 1989.
  26. J. M. Keller, M. R. Grey and J. A. Givens Jr., “A Fuzzy k- Nearest Neighbor Algorithm”, IEEE Transactions on System, Man and Cybernetics, Vol. 15, pp. 580-585, 1985.
  27. S. Tan, “Neighbor-weighted K-Nearest Neighbor for unbalanced text corpus”, Expert Systems with Applications, vol. 28, no. 4, pp. 667–671, 2005.

INASS Home | Copyright@2008 The Intelligent Networks and Systems Society