DEV Community

Cover image for Cybersecurity Applications of Image Classification
Ransika Silva
Ransika Silva

Posted on

Cybersecurity Applications of Image Classification

Machine learning has become increasingly valuable in the war against the ever-changing world of cyberattacks.Before, we spoke about the ability of machine learning to support cybersecurity at large. Here, let us look at the area of machine learning that it does best - that of classifying pictures and examine the areas of use and the challenges that are unique within the context of cybersecurity.

Detecting Phish Websites

Phishing sites are replicas of legitimate websites with the intention of misleading the user into providing sensitive information. Although the ML checks the URL and HTML attributes for the phish attack[1], visual signals are strongly pertinent too. The attackers duplicate the trusted websites with minute modifications.

Image classifiers are trained to classify such visual abnormalities such as misplaced logos, outdated visualizations, unusual position of the login form etc. Models such as EvilNet[2] are reported to be over 95% accurate for classifying the phishing websites using screenshots.

However, phishers keep making progress continuously. The latest threats need the most up-to-date retraining of the image classifiers. The most targeted spear-phish are likely more likely to be evasive as well[3]. Screenshots are ever more valuable to inspect with the traditional signals of phish detection.

Analyzing Malware Imagery

Malware often includes graphical resources and icons in order to make it appear legitimate and deceive the user into running the malicious code. Models used to classify images can learn malware visual patterns.

For instance, malware such as that distributed by e-mail are likely to employ plain or standard icons for the purpose of maximizing execution possibilities under varying themes of the operating system - something that image classifers are likely to learn about[4]. Screenshots of ransomeware payment websites or malicious app installation webpage screenshots are similarly learnable trends[5].

Challenges arise with more complex malware that impersonates using techniques such as employing techniques of binary steganography or obfuscation[6]. Multi-modal analysis with artifact binaries and behavioural details with visual content are often needed.

Detecting Deepfakes

Detecting

Deepfake videos that change faces, edit audio tracks, and doctor video are the new disinformation and social engineering frontier that ought to be troubling us. Even if they are sometimes cybersecurity threats by nature, deepfakes that are being used for impersonation, forgeries, extortion, and other malicious purposes most certainly fall under that category.

CNNs and classification models of the image are able to effectively identify the artifact and anomalies that exist within synthetic videos such as flicker, warping, asymmetry blinking etc.[7][8]

However, deepfake tools are always being refined and are increasingly more difficult to catch. The most critical challenge areas are robustness and generalization - detectors trained with particular manipulation techniques are sometimes incompetent at identifying new variants[9]. Staying abreast of synthetic media evolution always remains cat-and-mouse-like.

The Road Ahead

We've had encouraging examples of the application of image classification to cybersecurity but there's much more work to be done. In addition to the cat-and-mouse of malicious practice, challenges are data sparsity[10], adversarial robustness[11], scalability[12], and explainability[13].

As the cyberthreat profile increases in sophistication, visual insight shall become increasingly crucial supplemented by other machine learning and conventional security techniques. Additional study of efficient learning with efficiency of the data, efficiency, generalization, and interpretability shall be necessary for proper leveraging of the capability of image classification for cybersecurity.


References

  1. Sahingoz, O.K., et al. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications.
  2. Rao, R.S., & Ali, S.T. (2017). EvilNet: Generating Adversarial Examples to Fool Phishing Detection Models. ArXiv.
  3. Das, A., et al. (2021). Phishpedia: A Web-based Encyclopedia of Phishing Attacks. USENIX Security Symposium.
  4. Kancherla, K., & Mukkamala, S. (2018). Image visualization based malware detection. IEEE Symposium on Computational Intelligence in Cyber Security.
  5. Mercaldo, F., et al. (2019). Ransomware Analysis with AI: The Visualization Approach. Intelligent Systems Reference Library.
  6. Xue, M., et al. (2019). Adaptive Android Malware Detection with Dynamic Analysis. ACM Turing Celebration Conference.
  7. Tolosana, R., et al. (2020). Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion.
  8. Mittal, S., et al. (2020). Detecting Deepfakes and Adversarial Attacks using Image Classification Models. IEEE International Conference on Informatics, IoT, and Enabling Technologies.
  9. Vashisht, P., et al. (2021). Generalization in Deepfake Detection: an Empirical Analysis. ArXiv.
  10. Mahdavifar, S., & Ghorbani, A.A. (2020). A Survey of Cybersecurity Datasets for Machine Learning. ArXiv.
  11. Alawad, G., et al. (2022). Adversarial Image Classification by Hybrid Attack. International Conference on Future Communication Technologies and Applications.
  12. Li, C., et al. (2020). Analyzing the Training Costs of Large Scale Image Classification. NeurIPS.
  13. Kuppa, A., & Nadeem, T. (2021). xCyberSec: What, Where, Why and How is Explainability in AI for Cybersecurity. Computers & Security.

Top comments (0)