Hacking Book | Free Online Hacking Learning


information endogenesis: a necessary condition for advanced threat detection

Posted by chiappelli at 2020-02-27

Threat Intelligence is an effective means to detect network threats.

For the popular malicious code and attacks, it mainly relies on Threat Intelligence generated based on Internet traffic and sample data.

For the high-level threat with high orientation and pertinence, it depends heavily on the localization data and judgment ability of the target.

Therefore, based on the internal information and business system to achieve "intelligence endogenous", to build endogenous security capabilities, become a necessary condition to achieve and enhance advanced threat detection.

The effectiveness of threat intelligence has been highly recognized

At the beginning of 2019, according to the Threat Intelligence evolution: 2019 Application Research Report released by sans, the US security research organization, 81% of respondents believe that threat intelligence improves the security detection and response ability of enterprises. This is a significant increase from 60% in the previous year, which proves that the effectiveness of threat intelligence is highly recognized.

The questionnaire released by sans also makes statistics on the production distribution of Threat Intelligence at present:

According to the survey, in the comparison of the proportion of Threat Intelligence producers and consumers, consumers account for the majority, with an average of 50-60%; the proportion of both producers and consumers accounts for 40%, which is higher than expected.

In China, consumers should take the lead. This is not to say that the security teams of domestic institutions can not get Threat Intelligence Based on their own data and analysis, but most of them have not been systematically managed and shared.

The two most popular types of Threat Intelligence: IOC and TTP

We believe that the narrow sense Threat Intelligence has a clear hierarchical structure in terms of the difficulty and stability of acquisition. Therefore, Qi Anxin proposed a pyramid model of threat intelligence, including file samples, host features, event features, organization and personnel information.

What types of intelligence are most popular?

Not surprisingly, IOC is the most popular. IOC corresponds to the bottom two layers of Threat Intelligence pyramid. The reason why IOC ranks first is simple: in the context of information security, "C" in IOC means "lost". Once matched, it means that relevant assets have been controlled, highlighting the real and urgent danger and bringing the first priority disposal demand. In the event response manual released by Kaspersky, IOC has become the core source of triggering security response, not one of them.

In the second place is the TTP information of the attacking opponent. This is a much more stable type of information than IOC. "Unknown attack, how to know the defense": understanding the opponent's tactics is the basis of guiding safety protection.

From the trend, the popularity of TTP intelligence will continue to improve. In the future, the advanced level of active defense is likely to make TTP more important than IOC. The reason for this change lies in that threat intelligence users will increasingly tend to put forward protective measures: to understand what new vulnerabilities have emerged, whether relevant equipment has been affected and patched in time, and to eliminate the threat before the actual risk is caused by the attack; to assume the attackers according to the guidance of TTP, and to actively hunt for threats.

At present, the very popular att & CK framework is essentially the attacker's TTP knowledge base. Similarly, NSA ntctf framework:

It should be noted that each link of ATT & CK contains different techniques, some of which are used more frequently than others. Taking the payload access channel of apt attack as an example, by analyzing a large number of apt attack activities, we have the following statistics on common attack means:

In order to ensure the directionality of attacks, most apt attacks use harpoon mail, followed by water pit. Generally speaking, for the candidate attack target from the water pit, the cloud will further identify and filter according to the collected machine information, and the attacker will only launch a follow-up payload to the real interested target.

The more the defenders know about the attacker's technique, the more they can identify which technical points in att & CK matrix need to be prioritized and covered as carefully as possible. With the accumulation of knowledge and tools based on att & CK framework, we have reason to believe that TTP threat intelligence plays an increasingly important role in guiding threat hunting and intelligence production.

Information endogenesis is an inevitable need to deal with high-level threats

In 2015, Robert M. Lee put forward the famous sliding scale model, which makes a clear division of capability advancement, and becomes a very useful reference for guiding the network security capability construction of the defense party and evaluating its own capability stage.

Threat Intelligence is in a high position in this scale, including intelligence production and threat hunting, which belongs to the stage of capability enhancement. Intelligence consumption needs corresponding detection infrastructure, such as SOC, Siem, border and terminal security product deployment and import, as well as corresponding team building and process design.

The raw material of threat intelligence production is the collection of original data, which depends on the reservation and optimization of infrastructure planning and design in the previous stages, and facilitates the deployment of data collection software and hardware when necessary.

In the process of daily security operation and event response, information related to known malicious samples, IP and domain name host activities will be generated. In the process of executing a large number of high-level threat emergency response tasks in the customer's local area, the security incident response team of Qianxin will also find previously unknown malicious entities. These are all basic Threat Intelligence. If we can manage and track the Threat Intelligence Platform (tip) and share it within a certain range, we can greatly improve the overall protection capability.

High level threat attack is highly directional

The essence of advanced threat of apt is pertinence, which is totally different from popular attack. Here is a simple comparison of the two attacks:

A while ago, we controlled the attack platform of a national apt organization from South Asia. Its background management interface is as follows:

The challenge of connecting to a limited network

1. Isolation network environment

No system in the isolated network can be directly connected to the Internet. Security data, including Threat Intelligence, can only enter the network through unidirectional import facilities such as U disk, unidirectional gateway, optical tower, etc.

2. Semi open environment

Terminals in the network can not directly connect to the cloud, but can connect to the centralized control platform with Threat Intelligence distribution capability deployed in the local network. The platform can receive the NDR or EDR data from the terminal and connect to the intelligence center in the Internet cloud.

3. Internet Environment

The terminal in the network can directly connect to the Threat Intelligence Center on the Internet, query the security information or upload the host activity data for the cloud to determine.

The above three deployment modes account for roughly one third of the customers of qishin. As the main target of most apt attacks, large-scale government and enterprise organizations mostly adopt isolated network or semi open deployment mode, the reason is very simple: Secret stealing is one of the main purposes of apt attacks, what attackers value is also what defenders value, and it is almost inevitable to adopt a certain degree of isolation network environment. This kind of closed or semi closed deployment mode limits the original information of network and host activities to the inside of the organization, and external security vendors cannot conduct centralized analysis and association in the cloud.

Safety manufacturers' capability sinking is an inevitable solution

The high directionality of high-level threats and the network closeness of attacking targets lead to the threat analysis of the detected data from the outside based on the cloud on the security vendor side, which forms a typical paradox: it is difficult to easily obtain the data of the analyzed objects.

The solution to this problem is obvious: sink the security capabilities of security vendors in the cloud, including tool platform, basic data, operation process and analysts to the customer side, and realize threat determination and intelligence management based on local data.

Necessary components and process architecture

A simplified solution architecture is as follows:

There are several core components:

Data sensor

Responsible for the collection of fact metadata, there are only two types: network-based and host based information collectors. The network collector itself does not need to have judgment ability. The most important task is to restore the metadata of network activities from the traffic. For example, the probe of Qianxin Tianyan advanced threat discovery product has strong metadata extraction ability: complete TCP session record, email data, upload and download files, DNS resolution record, URL access record, etc

Host based data collection mainly comes from EDR software. The host is the source of the truth of all events. There is a lot of information to collect, such as process class, file class, registry activity, network activity, system service, etc. the collection of information needs to balance the impact on host performance and network bandwidth and the coverage of attack methods.

Based on Qi Anxin's many years of threat analysis experience, we can safely say that the future security analysis must be an era in which metadata is the king, and we must have a complete historical data record and query support ability, so as to provide a reliable basis for supporting event backtracking, restore the whole process of attack activities and extract threat information.

Threat Intelligence Management Platform

It is responsible for the integration, management and sharing of threat intelligence. It can receive one-way threat information and other security data (such as white list) from the outside in the cloud. It supports the integration of multi-source Threat Intelligence, the privatization management of its own threat information, and provides a query interface of Threat Intelligence for SoC / Siem platforms.

Advanced threat detection and filtering system

Based on the fact metadata collected locally, combined with the platform, process and personnel that the security manufacturer sinks to the customer side, the rules or models are used for batch file or traffic processing, the known white and known black objects are filtered out, the advanced threat characteristics are matched, and the objects to be manually confirmed are output to the internal operation platform. The overall architecture is described in more detail next.

Security operation center

The core of the overall threat disposal is to generate alarms based on Threat Intelligence and assets, issue work orders for manual disposal or cooperate with the now deadly automated scheduling system to perform threat response.

Threat analysis team

Experienced threat analysts are essential for the analysis and determination of advanced threats. They are responsible for the analysis of samples and events, the extraction of sample identification features, the design of association filtering rules, the collection of model training samples, Sandbox confrontation, and the final arbiter to achieve the closed-loop analysis. The inequities of attack and defense are reflected in two aspects: one is that defense needs to be "fully deployed" versus attack needs to be "single point breakthrough"; the other is that the "enemy is dark and I am clear" in confrontation, such as the traditional virus killing scheme. The patient attacker can get a free test environment, through repeated attempts, almost always find a way around. Therefore, for high-level threats, the failure of general security protection is a basic assumption. The focus of detection and determination should be on more links of the attack chain, and ultimately lead to human to human confrontation.

Building advanced threat detection and filtering system

To build this system, we need to carry out different detection according to the data source we can receive. The most basic type is file and traffic.

Here we introduce the logical framework of Qi Anxin for analyzing file objects to detect apt attacks. Its simplified model is as follows:

This architecture diagram looks a bit complicated. You only need to pay attention to the input and output of each component.

The large level is mainly divided into three parts:

1) The underlying core capabilities and technologies are basically composed of a bunch of engines or analyzers;

2) The middle knowledge base, also known as data set, contains data that will be used by the lower level engine and receive the output of the engine;

3) The business application of the upper layer can be the management and operation system of apt organization's analysis and relevant information, or other business systems related to malicious code, supported by the packed knowledge base and core competence technology. Some of these components need to be explained as follows:

White list

Including white samples, IP, domain names and other non malicious data, the vast majority of IOC data is essentially a blacklist, while the white list can be used to quickly filter objects that do not need to be processed or suppress false positives. The resources and investment required for the production of white list may not be less than that for the construction of black list, and multi-source collection and analysis are required. In fact, the white list and the black list constitute the core competitiveness of the security manufacturers. For the white list of the threat detection system on the customer side, some of them need to be input from the outside, and some of them need to be internal to their customers.

Threat information base

Including some known black information collected from open source or commercial channels, which can be pushed from the cloud and stored in the threat intelligence platform, mainly used to quickly filter known black objects.

File metadata

The collection of static attributes and dynamic behavior information corresponding to a file, which has various possible sources: the collected open source data, the output of the file depth analysis engine, the judgment results of multi virus engine scanning, the network activity information obtained by sandbox operation, etc. These data are the operation objects of subsequent rule matching and machine learning model construction, and the data base of advanced threat analysis.

Rule base

This is a generalized rule set, including recognition rules for file feature matching, rules for malicious judgment of sandbox dynamic behavior matching and Event Association abstract rules for Siem. Now some security consulting organizations advocate the so-called irregular detection mechanism. Here I want to pour a little cold water on it: from now on to the foreseeable future, rule-based detection will be the most effective means to deal with the known security threats that cause most of the actual losses. As the core and cornerstone of the current security detection, its status cannot be shaken in a short time. In our advanced threat detection system, rules are used to identify known threats and filter them, while relaxed rules can be used to find exceptions and submit them to analysts for continuous operation.

model base

Machine learning is an effective tool in some scenarios. In our advanced threat detection scenarios, it is mainly used for the classification and clustering of apt related samples. We implement an engine of sample classification and clustering based on file metadata, and build some models to output the classification results of apt samples with low accuracy. The key point here is whether our model can output an absolute number of entries to be confirmed manually under the premise of a specific number of sample inputs. If the daily output can be controlled at 100 levels, even if only 50% of the accuracy, this output will be very valuable under the guarantee of the operation capacity of Nikkei.

File metadata parsing engine

We have implemented a pure static deep file parsing metadata extraction engine for files, especially for non PE types. These metadata will be used for classification by rules and models. To what extent? For example, for an excel file, we will extract the macro code; for a word file, we will extract the data in each subflow. In addition to information extraction, the parsing engine will identify various vulnerabilities of non PE documents that are often used by apt attacks based on static features, and output corresponding tags.

Static multi engine contrast scan

Multiple virus detection engines scan clusters for high-speed initial determination of massive samples, and the output results are used for fast filtering of known malicious codes. In our analysis target scenario, the purpose of multi engine scanning is not to judge the black, but to filter the known black.

High resistance sandbox

The sandbox in the advanced threat discovery system is not to determine the role, but mainly plays the role of information collection. Its main mission is to run out of various behaviors of the samples as much as possible. These information will be stored together with the static metadata of the samples for subsequent association analysis. Sandbox is also a component that needs continuous maintenance. To analyze why some new samples can't run in sandbox, whether the execution environment factors or malicious code upgrade the countermeasures, someone needs to analyze the reasons and add corresponding anti countermeasures. If triggering malicious behavior requires specific interaction, analysts need to develop corresponding intelligent interaction mechanism.

This advanced threat detection and filtering system receives the input of the file, and finally outputs the tags of some features of the file in the form of tags.

An implemented system

Following the above logical architecture, we have implemented a high-level threat operation system for handling mail and its attachments. The actual architecture is as follows:

Our simplified system deployed in some customers has found many clues of mail based directed attack activities. The input of the whole system is mail and corresponding attachment files. After system analysis and processing, the label information of mail and files will be output to the operation platform for manual confirmation. The following is the hit label statistics page of the operation platform:

At present, the system supports hundreds of kinds of tags to mark various suspicious behaviors and data, which come from static, dynamic and machine learning classification. At present, under the condition of 10000 level file input, the system can generate 100 level suspicious sample output with labels.

Advanced threat detection results

Suspicious samples based on homology analysis

The samples of suspected Lazarus Gang source use HWP software format vulnerability, the system identifies the vulnerability utilization, and gives a certain confidence of gang ownership and manual confirmation through the homologous system. The other is to take advantage of the cve-2018-0798 vulnerability samples, and give a high confidence biter group ownership based on machine learning homologous clustering, and finally confirm it manually.

The samples in word format contain malicious macro code, and the homology analysis belongs to the oilrig gang.

The samples in word format contain malicious code execution of office DDE technology, and the homology analysis belongs to apt28 gang.

Word format attack samples with malicious macro code belong to cobalt group, a network criminal organization targeting at ATM financial system.

The tag homology system constructed by us can effectively identify a large number of apt tissue samples. In order to share our findings with the security industry at home and abroad, we have created a GitHub project called apt  digital  wepon, which displays the file sample information related to apt organizations, and has collected more than 20000 hash samples from more than 150 apt groups or campaigns. If you have the permission of VirusTotal, you can download directly to those samples themselves.

This project is very popular. At present, it has more than 300 stars. Those interested in research can visit the URL: https://github.com/reddrip7/apt_digital_wepon

Rule based mail recognition

Recently, the red raindrop team of Qianxin Threat Intelligence Center announced the attack activity of "tiger Hibiscus", a high-level threat organization from Northeast Asia, which exploits the 0day level browser vulnerability, and the discovery of attack clues also comes from the advanced threat detection and filtering system.

The email we witnessed was like this:

Based on the understanding of phishing email apt attack, we preset some rules of suspicious features:

This message hit some rules that caused it to be tagged:

The tag indicates that the analyst finds suspicious URL redirection in the email, suspicious Chinese in the link, and manually confirms the exploitation of the link oriented browser vulnerability in the email. Successful discovery of such attacks depends on our in-depth understanding of the technique of attack origin, that is, TTP details, and using rules to set corresponding checks Point to capture samples that need manual intervention. From this perspective, at present, the information provided by att & CK matrix is too rough, and there is a great room for improvement. It needs the joint efforts of the open source community to contribute details so that att & CK can play a greater guiding role.

Reference material


About author

Wang Lijun: Director of qi'anxin Threat Intelligence Center and expert of Hufu think tank.