Feature Selection Techniques for Enhancing Sybil Accounts Detection in Online Social Networks

Authors

  •   Dheeraj Sonkhla Assistant Professor, Himachal Pradesh University Regional Centre, Mohli, Khaniara, Dharamshala - 176 218, Himachal Pradesh

DOI:

https://doi.org/10.17010/ijcs/2025/v10/i5/175883

Keywords:

Correlation matrix with heatmap, Feature selection method, Genetic Algorithm (GA), k-Nearest Neighbor (kNN) classifier, Machine Learning, Random Forest (RF) classifier, Recursive Feature Elimination (RFE) method, Univariate method.
Publication Chronology: Paper Submission Date : August 6, 2025 ; Paper sent back for Revision : August 12, 2025 ; Paper Acceptance Date : August 18, 2025 ; Paper Published Online : October 5, 2025.

Abstract

The 24X7 affordable availability of Online Social Network (OSNs) is attracting more and more users to get connected to these social media. Fake users and spambots too are vying for claiming their illegitimate share on these OSNs. The OSNs, in the process, are also generating large datasets pertaining to details of all these types of users which normally contain huge sets of features both, relevant and irrelevant. These datasets can be used of for initiating some preventive measures against various attacks such as Sybil attack. Feature selection process applied appropriately on the dataset(s) helps in improving the efficiency and overall performance of predictive models. Based on the necessary selection criterion, three main categories of feature selection techniques developed over the time are: filter, wrapper, and embedded methods. There is a need to study and compare the performances of these feature selection techniques on the same sets of datasets under various prediction models. Thus, in this paper, I have conducted such experiments on twitter datasets containing accounts of real users, fake followers and spambots. The filter methods used for our experimentation are univariate and correlation matrix with heatmap. The Genetic Algorithm (GA) and Recursive Feature Elimination (RFE) techniques as parts of wrapper methods have been used separately. For embedded method, we have implemented Lasso regression. The results have shown that the filter methods provide the best results for datasets involving fake followers and the wrapper methods provide better performance in case of spambot datasets.

Downloads

Download data is not yet available.

Published

2025-12-26

How to Cite

Sonkhla, D. (2025). Feature Selection Techniques for Enhancing Sybil Accounts Detection in Online Social Networks. Indian Journal of Computer Science, 10(5), 8–17. https://doi.org/10.17010/ijcs/2025/v10/i5/175883

References

[1] N. Bindra and M. Sood, “Data pre-processing techniques for boosting performance in network traffic classification,” in 1st Int. Conf. Comput. Intell. Data Analytics, ICCIDA-2018, 26–27 Oct. 2018, Springer CCIS Series, Gandhi Inst. Technol. (GIFT), Bhubaneshwar, Odisha, India.

[2] J. Newsome, E. Shi, D. Song, and A. Perrig, “The Sybil attack in sensor networks: Analysis & defences,” in Proc. 3rd Int. Symp. Inf. Process. Sensor Networks, 2004, IPSN 2004, IEEE, 2004.

[3] A. Vasudeva, and M. Sood, “Survey on Sybil attack defense mechanisms in wireless ad hoc networks,” J. Network Comput. Appl., vol. 120, pp. 78–118, Oct. 2018, doi: 10.1016/j.jnca.2018.07.006.

[4] A. Vasudeva, M. Sood, and P. Prakash, “A vampire act of Sybil attack on the highest node degree clustering in mobile ad hoc networks,” Indian J. Sci. Technol., vol. 9, no. 32, Aug. 2016, doi: 10.17485/ijst/2016/v9i32/100217.

[5] M. Sood and A. Vasudeva, “Perspectives of Sybil attack in routing protocols of mobile ad hoc network,” in N. Chaki, N. Meghanathan, D. Nagamalai (Eds), Comput. Networks & Commun. (NetCom). Lecture Notes in Elect. Eng., vol. 131. Springer, New York, NY, doi:10.1007/978-1-4614-6154-8_1.

[6] A. Vasudeva and M. Sood, “Sybil attack on lowest id clustering algorithm in the mobile ad hoc network," Int. J. Netw. Secur. Appl., vol. 4, no. 5, pp. 135–147, Sep. 2012, doi: 10.5121/ijnsa.2012.4511.

[7] J. R. Douceur, “The Sybil attack,” in P. Druschel, F. Kaashoek, A. Rowstron (eds) Peer-to-Peer Syst. IPTPS 2002. Lecture Notes Comput. Sci., vol 2429. Springer, Berlin, Heidelberg, doi: 10.1007/3-540-45748-8_24.

[8] C. Grimme, M. Preuss, L. Adam, and H. Trautmann, “Social bots: Human-like by means of human control?,” Big Data, vol. 5, no. 4, pp. 279–293, 2017, doi: 10.1089/big.2017.0044.

[9] C. Shao, G. L. Ciampaglia, O. Varol, K. C.Yang, A. Flammini, and F. Menczer, “The spread of low-credibility content by social bots,” Nature Commun., vol. 9, no. 1, 4787, 2018, doi: 10.1038/s41467-018-06930-7.

[10] S. Cresci, R. D. Pietro, R. Petrocchi, A. Spognardi, and M. Tesconi, “Fame for sale: Efficient detection of fake Twitter followers,” Decision Support Sys., vol. 80, pp. 56–71, 2015, doi: 10.1016/j.dss.2015.09.003.

[11] S. Cresci, R. D. Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, “The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race,” in Proc. 26th Int. Conf. World Wide Web Companion, Apr. 2017, pp. 963–972, doi: 10.1145/3041021.3055135.

[12] M. Alsaleh, A. Alarifi, A. Al-Salman, M. Alfayez, and A. Almuhaysin, “TSD: Detecting Sybil accounts in Twitter,” in 2014 13th Int. Conf. Mach. Learn. Appl., IEEE, pp. 463–469, Dec. 2014.

[13] N. Sánchez-Maroño, A. Alonso-Betanzos, and M.Tombilla-Sanromán, “Filter methods for feature selection–a comparative study,” in H. Yin, P. Tino, E. Corchado, W. Byrne, X. Yao (eds), Intell. Data Eng. Automated Learn. - IDEAL 2007. Lecture Notes Comput. Sci., vol. 4881. Springer, Berlin, Heidelberg, doi: 10.1007/978-3-540-77226-2_19.

[14] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proc. 14th Int. Conf. Mach.Learn., vol. 97, no. 412–420, p. 35, Jul. 1997, doi: 10.1007/978-3-540-77226-2_19.

[15] G. Forman, “An extensive empirical study of feature selection metrics for text classification,” J. Mach. Learn. Res., pp. 1289–1305, 2003. [Online]. Available: https://www.jmlr.org/papers/volume3/forman03a/forman03a_full.pdf

[16] Y. Zhang, S. Li, T. Wang, and Z. Zhang, “Divergence-based feature selection for separate classes,” Neurocomputing, vol. 101, pp. 32–42, Feb. 2013, doi: 10.1016/j.neucom.2012.06.036.

[17] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. Royal Statistical Soc.: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. [Online]. Available: https://www.jstor.org/stable/2346178

[18] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document Anal. Recognit., vol.1, Montreal, QC, Canada, 1995, pp. 278–282, doi: 10.1109/ICDAR.1995.598994.

[19] N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” American Statistician, vol. 46, no. 3, pp. 175–185, 1992, doi: 10.1080/00031305.1992.10475879.