Implementation of Imbalanced Class Handling Techniques to Improve Naive Bayes Model Performance

  • Elvi Rahmi Politeknik Negeri Bengkalis
  • Eva Yumami
  • Natasya Muliani

Abstract

Choosing a major is a crucial decision in a student's life, often influenced by factors such as interests, aptitude, and external influences like parental or teacher advice. In Indonesia, high school students typically choose between Science (IPA) and Social Sciences (IPS) tracks, which significantly impacts their university major choices. Previous studies have shown promising results in predicting students' major choices using various machine learning algorithms, with some achieving 100% accuracy. However, such high accuracy rates should be interpreted cautiously due to potential overfitting issues. Moreover, these studies often overlook class imbalance, particularly between the IPA and IPS tracks. This imbalance can lead to biased models that favor the majority class.

 This study evaluates the effectiveness of the Synthetic Minority Over-sampling Technique (SMOTE) in addressing class imbalance in predicting high school students' major choices. The results indicate that SMOTE does not significantly improve the performance of a Naive Bayes classifier on our dataset. Both models with and without SMOTE achieved 100% accuracy, F1-score, and recall, suggesting that other factors may be more influential in this context.

Published
2024-12-24
Section
Articles