Search WWW Search inass.org
»Journal Description
»Topics
»Call for Papers and Reviewers
»Author Guidelines
»Contents & Papers
»Call for Special Issues
»SCOPUS
 
»IEEE CIS
»INNS
»IEEE IS
DOI: http://dx.doi.org/10.22266/ijies2017.0430.17

Improved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data

Author(s):

Harshita Patel1*, Ghanshyam Singh Thakur1


Affiliations:

1Department of Mathematics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal-462003, India







Abstract:

Learning from imbalanced data is one of the burning issues of the era. Traditional classification methods exhibit degradation in their performances while dealing with imbalanced data sets due to skewed distribution of data into classes. Among various suggested solutions, instance based weighted approaches secured the space in such cases. In this paper, we are proposing a new fuzzy weighted nearest neighbor method that optimally handle the imbalance issue of data. Use of optimal weights improve the performance of fuzzy nearest neighbor algorithm for default balanced distribution of data, for the classification of imbalanced data concept of adaptive K is merged with it that apply large K, number of nearest neighbors for large class and small K for small class. We deploy this combination to classify imbalanced data with better accuracy for different evaluation measures. Experimental results affirm that our proposed method perform well than the traditional fuzzy nearest neighbor classification for these type of data sets.


Keywords:

Fuzzy classification, Imbalanced datasets, K-nearest neighbors, Optimal solution.


Full Text:




References:
  1. J. Han and M. Kamber, Data Mining, Concepts and Techniques, 3rd ed., Morgan, Kaufmann, 2006.
  2. H. Patel and D.S. Rajput, “Data Mining Applications in Present Scenario: A review”, International Journal of Soft Computing, Vol. 6, No. 4, pp. 136-142, 2011.
  3. H. He and E. A. Garcia, “Learning from Imbalanced Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 9, pp. 1263-1284, 2009.
  4. Q. Yang and X. Wu, “10 challenging problems in data mining research”, International Journal of Information Technology and Decision Making, Vol. 5, No. 4, pp. 597-604, 2006.
  5. Editorial, Special issue on “New trends in data mining”, NTDM. Knowledge Based Systems, Elsevier, pp. 1-2, 2012.
  6. R. Pavón, R. Laza, M. Reboiro-Jato and F. Fdez-Riverola, “Assessing the impact of class-imbalanced data for classifying relevant/irrelevant medline documents”, Advances in Intelligent and Soft Computing, Vol. 93, pp. 345–353, 2011.
  7. R. B. Rao, S. Krishanan and R.S. Niculscu, “Data Mining for Improved Cardiac Care”, ACM SIGKDD Exploration Newsletter, Vol. 8, No. 1, pp. 3–10, 2006.
  8. M. Kubat, R. C. Holte and S. Matwin, “Machine Learning for the Detection of Oil Spills in Satellite Images”, Machine Learning, Vol. 30, No. 2, pp. 195–215, 1998.
  9. P. Chan and S. Stolfo, “Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection”, In Proceedings of Knowledge Discovery and Data Mining, pp. 164–168, 1998.
  10. X. C. Li, W. J. Mao, D. Zeng, P. Su and F. Y. Wang. “Performance Evaluation of Machine Learning Methods in Cultural Modeling”, Journal of Computer Science and Technology, Vol. 24, No. 6, pp. 1010-1017, 2009.
  11. G. Loizou and S. J. Maybank, “The Nearest Neighbor and the Bayes Error Rates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, No. 2, pp. 254-262, 1987.
  12. T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory, Vol. 13, No. 1, pp. 21-27, 1967.
  13. R. C. Prati, G. E. A. P. A. Batista and D. F. Silva, “Class imbalance revisited: a new experimental setup to assess the performance of treatment methods”, Knowledge and Information Systems, Vol. 45, No. 1, pp. 247-270, 2015.
  14. E. Kriminger and C. Lakshminarayan. “Nearest Neighbor Distributions for Imbalanced Classification”, In: Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, Brisbane, 2012, pp. 10-15.
  15. N. Tomašev and D. Mladenic. “Class Imbalance and the Curse of Minority Hubs”, Knowledge-Based Systems, Vol. 53, pp. 157–172, 2013.
  16. D. Ryu, J. Jang and J. Baik, “A hybrid instance selection using nearest-neighbor for cross-project defect prediction”, Journal of Computer Science and Technology, Vol. 30, No. 5, pp. 969-980, 2015.
  17. H. Dubey and V. Pudi. “Class based weighted k nearest neighbor over imbalanced dataset”, PAKDD 2013, Part II, LANI, 7819, pp. 305-316, 2013.
  18. H. Patel and G.S. Thakur, “A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data”, In: Proc. of the 12th International Conference on Data Mining (DMIN’16), pp. 106-110, 2016.
  19. S. Ando, “Classifying imbalanced data in distance-based feature space”, Knowledge and Information Systems, vol. 46, No. 3, pp. 707–730, 2016.
  20. W. Liu and S. Chawla. “Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets”, PAKDD 2011, Part II, LANI 6635, pp. 345-356, 2011.
  21. E. Ramentol, S. Vluymans, N. Verbiest, Y. Caballero, R. Bello, C. Cornelis, and F. Herrera, “IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification”, IEEE Transactions on Fuzzy Systems. 2014.
  22. A. Fernandez, M. J. Jesus and F. Herrera, “On the Influence of an Adaptive Inference System in Fuzzy Rule Based Classification Systems for Imbalanced Data-Sets”, Expert Systems with Applications, Vol. 36, No. 6, pp. 9805-9812, 2009.
  23. H. Han and B. Mao, “Fuzzy-Rough k-Nearest Neighbor Algorithm for Imbalanced Data Sets Learning”, In: Proc. of FSKD 2010-Seventh International Conference on Fuzzy Systems and Knowledge Discovery, IEEE circuits and systems society, China, pp. 1286-1290, 2010.
  24. C. Liu, L. Cao and P.S. Yu, “Coupled Fuzzy K-Nearest Neighbors Classification of Imbalanced Non-IID Categorical Data”, In: Proc. of IJCNN - International Joint Conference on Neural Networks, IEEE, Beijing, 2014, pp. 1122-1129.
  25. H. Patel and G. S. Thakur, “Classification of Imbalanced Data using a Modified Fuzzy-Neighbor Weighted Approach”, International Journal of Intelligent Engineering and Systems, Vol. 10, No. 1, pp. 56-64, 2017.
  26. E. Fix and J. L. Hodges, “Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties”, International Statistical Review, Vol. 57, pp. 238–247, 1989.
  27. J. M. Keller, M. R. Grey and J. A. Givens Jr., “A Fuzzy k- Nearest Neighbor Algorithm”, IEEE Transactions on System, Man and Cybernetics, Vol. 4, pp. 580-585, 1985.
  28. L. Baoli, L. Qin and Y. Shiwen “An adaptive k-nearest neighbor text categorization strategy” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 4, pp. 215-226, 2004.
  29. T. D. Pham, “An optimally weighted fuzzy k-NN algorithm”, In: proc of International Conference on Pattern Recognition and Image Analysis, U.K. 2005, pp. 239-247.
  30. A. Asuncion and D. J. Newman, UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, 2007. http://www.ics.uci.edu/~mlearn/MLRepository.html
  31. KEEL: Knowledge Extraction based on Evolutionary Learning. http://sci2s.ugr.es/keel/imbalanced.php

INASS Home | Copyright@2008 The Intelligent Networks and Systems Society