Despite having one of the lowest false positive rates for web attack detection in the market, Cloudbric was always determined to find better way to further improve its accuracy rate. Cloudbric’s research and development team ultimately decided to build one of the world’s first deep learning modules specifically designed for intelligent detection and differentiation of online traffic data.
Thus, VISION was born through a collaborative effort of Cloudbric's internal team of machine learning developers as well as PhD scholars from the Korea Institute of Science & Technology (KAIST), Korea University, and the Massachusetts Institute of Technology (MIT).
In order to properly train the newly developed deep learning module, the team first had to overcome how to feed a deep learning machine with web traffic data. In the past, deep learning machines were successfully designed to accept data in pixels. This made the feeding of image data very convenient since all images are already comprised of thousands of pixels. However, all online requests, communications, and web addresses are presented via a system of letters and phrases. This left the R&D team with a tough task to repurpose alphabetical characters into a deep learning machine for it to properly conceptualize the data and make sound judgements.
The Cloudbric team came to the conclusion that web traffic must be converted to a more easily digestible form of data for a deep learning machine to grasp. To explain better how the conversion works, each letter and symbol of the various components that make up online traffic was first converted into an image. The deep learning machine was then programmed to discover specific patterns or regularities/irregularities within these corresponding images. From there, the deep learning module was trained to differentiate between patterns found in legitimate and malicious online traffic.
Cloudbric researchers were able to convert a specific pattern of online traffic known as “DBFC” into the image shown in Figure 1. The deep learning machine would then be trained to find the relation between other images or pieces of online traffic as a benchmark. Afterwards, based on its findings, the machine would accurately decide whether “DBFC” is legitimate traffic or malicious traffic.
At the initial onset of Cloudbric’s development, there were a lack of other deep learning modules out in the market that could properly convert characters into sets of distinguished images. Cloudbric was able to accomplish this feat by testing two (2) open source machines based on Convolutional Neural Network (CNN) structures that were apt for this task.
Cloudbric tested both machines and found out that both have their advantages and disadvantages. For instance, one machine would be easier to train, but performed with less accuracy than the second machine. Cloudbric ultimately chose to utilize the faster module for web traffic detection because the latter machine was unable to process the extraordinary number of web traffic logs being updated on a continuous basis.
After the appropriate machine selection, the research and development team ran into another problem. When they tried to apply this machine specifically geared towards web attacks, the team quickly realized that certain characters in attack URLs could not be included in the conventional set of 68 letters (which is recognized by machine). This meant that the Cloudbric team had to find a way for the machine to accept any characters without any restriction due to the fact that web sites could be built in a variety of languages.
This led to the team implementing a patented technology, which helps read cyber attacks in UTF-8 hexadecimal format, and then feed it back into to the deep learning machine. This allowed the deep learning machine to accept any UTF-8 based characters, which eventually enabled the team to train the module geared specifically for web traffic recognition.
In regards to machine accuracy, the Cloudbric team came up with a primitive, yet effective solution known as incremental learning. Instead of having a single machine doing all the work, the Cloudbric team will initially establish four deep learning machines that have each been trained with four weeks worth of cyber attack data. The team would then assign weights to each of these machines depending on their error rates.
The period and the final result below are subject to change depending on the accuracy of the Cloudbric testing results. However, current internal deep learning testing results have shown a stunning 85% accuracy rate increase compared to the standard logic-based Cloudbric Web Application Firewall (WAF) engine. For the sake of comparison, Cloudbric’s WAF engine is regarded as one of the lowest false positive rated WAFs in the market by industry analysts.