 
--Dr. V.S. Subrahmanian--
Digitization has indeed made the world more connected! However, this connection comes together with the threat of public exposure and misuse of confidential and private information. India is also grappling with this threat and was one of the top three nations in Asia worst affected by the cyberattacks in the year 2021. The problem however isn’t restricted to India but is a challenge for countries across the globe. Therefore, it is of utmost importance to come up with the tools and systems to prevent cyberattacks from taking place. Cybersecurity was, therefore, chosen as a subject of discussion for the Seventh RBCDSAI latent view Colloquium which was organized on 12th May from 4-5 PM. Prof. V.S. Subrahmanian, who is the Walter P. Murphy Professor of Computer Science and Buffet Faculty Fellow in the Buffet Institute of Global Affairs at Northwestern University, delivered a talk entitled “VEST: Vulnerability Exploitation Scoring & Timing” discussing his research work in the area of cybersecurity.
Prof. Subrahmanian commenced his talk by categorizing vulnerabilities into two categories- Ones which are only known to cyber attackers and others vulnerabilities which people do know about and yet have not protected against it. He mentioned that his research work is on the latter type of vulnerabilities. Elaborating on the goals of his research work, he said that they are interested in building a global early warning system for cyberattacks that takes known vulnerabilities as input along with some other inputs and predicts answers to three questions: If a vulnerability will be exploited in the future? When a vulnerability will be exploited? What the severity of the vulnerability will be? Such a warning system, he believes would help agencies like MITRE and NIST in prioritizing vulnerabilities; all software and hardware vendors like Microsoft etc. to prioritize the development of patches for vulnerabilities, and all the companies that use hardware and software will be able to prioritize decisions of what to patch and what not to patch?
Further elaborating on the problem, he cited an example of Wannacry ransomware which attacked computer systems with Microsoft’s Windows operating system. He told that in this case the long time gap between when the vulnerability was known publicly and when CVSS (Common Vulnerability Scoring System) score was provided by NIST was exploited by the attackers to make their gain from this available vulnerability. Next, he gave some important statistics on vulnerabilities i.e. 9.2% of vulnerabilities are actually exploited, 49.6% of vulnerabilities are exploited before they are officially published in the National Vulnerability Database, and half of the vulnerabilities are exploited before the CVSS score is published and the average period of time for exploitation to occur is around 24.05 days.
Talking about the early warning system- VEST (Vulnerability Exploitation Scoring & Timing) that was developed in his lab, he said that VEST was close to predict the exploitation time of the Wannacry cyberattack (117 days) from the actual time (114 days). Describing the pipeline of the VEST, he said that the VEST dataset comprised 26,093 CVEs from NVD and MITR, 6 lakh 30 thousand tweets with the word CVE in them and about 50 thousand authors that used them in their tweets.
Next, he talked about the four features employed by VEST:
- Basic non-NLP features which included basic Twitter features associated with CVE like number of tweets, liked tweets, retweets, replies, hashtags, number of accounts in conversation etc.
- HAWKS process which estimates the volume of re-tweets related to CVE expected in the future based on the current trends
- CAT graphs (CVE Author Tweet graphs) which is a family of graphs containing author graphs (contains people who posted about CVE), tweet graphs (contains content) and CVE graphs. There are 6 types of unweighted CAT graphs in all and 18 types of weighted CAT graphs.
- The novel attention embedding method associated with any CVE is done by training the GCN with attention embedding.
He further mentioned that once a set of predictions is generated from the above features then late fusion is performed by feeding all the features got from GCN to the fusion engine which then generates the final prediction using either classification or regression. Final prediction, he said, includes exploitation probability, exploitation timing, CVSS scores and CVSS attributes. He further pointed out that earlier studies made the mistake of predicting vulnerabilities based on CVSS scores or using cross-validation due to which they yielded poor results.
Moving on, he said that the VEST team has checked how fast they can predict the probability of vulnerability by checking the data from the first three days, first seven days and first fourteen days of the first tweet on the subject using their approach (CAT+Hawkes+GCN). They found that the first three days’ tweets are enough to predict the probability of vulnerability. For the second question of when it will happen, they found that errors were low in the first fourteen days so using more Twitter data for prediction reduces error regardless of what approach is used. However, for estimating severity they found that the first three day data is enough. Talking about the future plans of the research, he said that they aim to improve coverage by adding other datasets such as reddit, dark web, hacker forums etc.
He next talked about other models developed by his team in the same space. While VEST helps in answering what should be patched and when should be patched, the CAFE model predicts the expected spread of malware through the host population i.e. how many machines in the network will be potentially affected. He also mentioned the Predictive Cyber Alert management system that helps CISO’s identify the false alarms, determine how to allocate alerts to different types of analysts and also helps to make the financial case for which security alerts should be worried about. Prof. Subrahmanian also said that his lab is also working on ways to extend the time of cyberattack by the logic of cyber deception i.e. by generating fake documents to confuse hackers.
He concluded the talk by saying that we need a global early warning system for cyber-attacks and every country needs culturally sensitive models for acting in the present of such early warnings.
The video is available on our YouTube channel: Link.
Keywords
Cyberattacks, Early Warning System, Ransomware