How NLP Can Help Us Understand Web Attackers

Itsik Mantin, Ori Or-Meir at Global AppSec Tel Aviv 2019

Word2Vec is a popular Natural Language Processing approach, which was imported by different domains (as Something2Vec), embedding domain objects in Euclidean spaces for similarity/distance calculation, clustering, visualization and more.
We will present our research on importing the Word2Vec to Web Application Security. Looking at malicious web requests as words and at their sequences as sentences, we applied a variant of Word2Vec to embed web attack vectors in an Euclidean space and to analyze their contextual relations. This embedding allows identification of attack vectors that tend to come together, either of the same attack category, like different SQL Injection attempts, or from “adjacent” attack types like File Upload and Backdoor Communication.
We will discuss practical applications of this research, like modeling web scanning tools and popular attack flows, assessment of accuracy and effectiveness of security rules, isolation of attacks belonging to the same campaign and telling targeted attacks from web scans

Itsik Mantin
Lead Scientist, Imperva
In the last 20 years I have researched and innovated in various cyber-security domains, including web application security, advanced persistent threats, DRM systems, automotive systems and more.

Ori Or-Meir
Data Scientist, Imperva
From an early age I liked solving puzzles of all kinds. Finding solutions to problems is my passion, and finding patterns in ‘randomness’ is my hobby.