Penelitian Komparasi Algoritma Klasifikasi Dalam Menentukan Website Palsu

Sunaryono Sunaryono

Sunaryono Sunaryono STMIK Widya Utama

Abstract

Website counterfeit or phishing website is a crime in the virtual world whose popularity is increasing even in Indonesia until now in 2015. In this study we take on phishing websites dataset from UCI Repository as many as 2546 data by 30 variables used to determine the website is a phising website or not. Having obtained the data, the authors conducted a study to determine the most appropriate algorithms. Determination of the algorithms with comparisons between algorithms classification techniques. Based on some related research and the advantages of the algorithm, the authors took five algorithms to be tested, the algorithm Decission Tree (C4.5), Naive Bayes, KNN, Support Vector Machine and Neural Network. This study using a test of accuracy and AUC as well as different test parametric T-test. In each model, the authors divide the main data into five sections, and on each of the training data validation was done using K-Fold Cross Validation. The results of this study demonstrate that Neural Network algorithm and SVM into the most appropriate algorithm used by the average value of accuracy is 94 and the value AUC 0.9.

References

[1] Breiman, L.1996. Bagging predictors. Machine Learning, 24(2): 123â€“140.
[2] Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. 2011. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 41(3): 552â€“568.
[3] Ko, Y.D., Moon, P., Kim, C. E., Ham, M.H., Myoung, J.M., & Yun, I. 2009. Modeling and Optimization of the Growth Rate for ZnO thin Films using neural Networks and genetic Algorithms. Expert Systems with Applications, 36(2): 4061â€“4066.
[4] Lee, J., & Kang, S. 2007. GA Based Meta-Modeling of BPN Architecture for Constrained Approximate Optimization. International Journal of Solids and Structures, 44(18-19): 5980â€“5993.
[5] Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. 2008. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4): 485â€“496.
[6] Lin, S.W., Chen, S.C., Wu, W.J., & Chen, C.H. 2009. Parameter Determination and Feature Selection for Back-Propagation Network by particle Swarm Optimization. Knowledge and Information Systems, 21(2): 249â€“266.
[7] Menzies, T., Greenwald, J., & Frank, A. 2007. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1): 2â€“13.
[8] Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. 2010. Defect Prediction from Static Code Features: Current Results, Limitations, New Approaches. Automated Software Engineering, 17(4): 375â€“407.
[9] Seiffert, C., Khoshgoftaar, T. M., & Van Hulse, J. 2009. Improving Software-Quality Predictions With Data Sampling and Boosting. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(6): 1283â€“1294.
[10] Shull, F., Basili, V., Boehm, B., Brown, A. W., Costa, P., Lindvall, M.,Zelkowitz, M. 2002. What We Have Learned about Fighting Defects. In Proceedings Eighth IEEE Symposium on Software Metrics 2002, 249â€“258.
[11] Tony Hou, T.H., Su, C.H., & Chang, H.Z. 2008. Using Neural Networks and Immune Algorithms to Find the Optimal Parameters for an IC Wire Bonding Process. Expert Systems with Applications, 34(1): 427â€“436.
[12] Wahono, R. S., & Herman, N. S. 2014. Genetic Feature Selection for Software Defect Prediction. Advanced Science Letters, 20(1): 239â€“244.
[13] Wahono, R. S., & Suryana, N. 2013. Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect Prediction. International Journal of Software Engineering and Its Applications, 7(5): 153â€“166.
[14] Wang, S., & Yao, X. 2013. Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 62(2): 434â€“443.
[15] Wang, T.Y., & Huang, C.Y. 2007. Applying optimized BPN to A Chaotic Time Series Problem. Expert Systems with Applications, 32(1): 193â€“200.
[16] Witten, I. H., Frank, E., & Hall, M. A. 2011. Data Mining Third Edition. Elsevier Inc.
[17] Yusta, S. C. 2009. Different Metaheuristic Strategies to Solve the Feature Selection Problem. Pattern Recognition Letters, 30(5): 525â€“534.
[18] Zheng, J. 2010. Cost-Sensitive Boosting Neural Networks for Software Defect Prediction. Expert Systems with Applications, 37(6): 4537â€“4543.