0x00 background
I've been too busy in the past two months. This blog has been delayed until now. A series of safety analysis was planned, but it was put on hold for work reasons. In the Dragon Boat Festival, I went to Chengdu to eat hot pot and wrote this piece of Threat Intelligence in security analysis on the shuttle. As the beginning of the Security Analysis series.
0x01 what is Threat Intelligence
I define Threat Intelligence as security information that has been studied and judged.
There are three entities: research, security information, threat intelligence. The relationship among them is as follows:
No untrained security information can be called "Threat Intelligence". Threat Intelligence is used to support decision-making or security analysis. Security information of unknown source and authenticity will affect the accuracy of decision-making and analysis results.
Threat Intelligence is a description of the current (and past) state of the target for a period of time; it is affected by the quantity and quality of security information and the rationality of the research and judgment process, so that threat intelligence is not 100% correct.
0x02 safety information collection
Security analysis relies on the ability to obtain and use data, so the first problem to be overcome is "how to collect security information".
I wrote a brief discussion on the life cycle of safety analysis before. It expresses a view that the starting point of safety analysis should be "setting goals". It is also applicable here. Before the safety information collection plan, the objectives and scope shall be clear. The information collection plan formulated shall include "the type of information to be processed", "the feasible entry point for information research and judgment", "the widest possible source channel" and "when to collect information".
For the information itself, the following points must be paid attention to when collecting:
- Input information high-definition cleanliness, try to avoid useless data
- High availability of information
- Ensure high accuracy of information
- The coverage of information sources should not be neglected
- Information source must be trusted
- Information must be timely
I've talked about so many requirements for information collection. I'd like to talk about the sources of information: osint, closed and confidential.
- OSI NT is publicly available data and the most common way to obtain information. Among them, media, institutions, open blogs, social platforms, conference papers, big factory announcements and so on. The information that SLR can access through the Internet belongs to the osint data. This data source usually uses crawlers to crawl web pages, APIs, RSS or email subscriptions. There are also many Threat Intelligence platforms based on osint data on the market. Using the information from this source, we usually face the problems of information cleanliness, accuracy, coverage, etc. Because open means public, miscellaneous, inaccurate, redundant... When using osint data, we must solve the problems related to information processing.
- Closed data is information collected for a specific direction, which often limits public access. Corresponding to VT, riskiq, recorded future, Weibo online... The data from this source may be exclusive or based on the secondary development of public intelligence. Such information is more valuable than osint, but obtaining such data requires a certain price.
- Confidential data is information collected by specific means and covert means. Such information is very accurate, highly available, highly credible, highly accurate and timely. But the coverage is very narrow, only to meet a single demand point. Data sources in this regard are represented by honeypots.
Security analysts should be based on full source analysis, rather than limited to easily accessible information. In any way, the starting point is to get the desired information, and the goal is to export the high-quality information needed for decision-making. From the cost point of view, the cost of open-source data collection is far lower than the deployment of private assets; the difficulty of open-source data acquisition is low, but the amount of processing is huge, so a more reasonable information acquisition structure is that the three complement each other.
0x03 information research and judgment
In the stage of information collection, it is only to build the access to information, aiming at the source, not the information itself. Only by studying and judging information itself can it be transformed into Threat Intelligence.
Information research and judgment is a very important link in the life cycle of threat intelligence. The mainstream way of research and judgment: human flesh to see or run some machine learning algorithms.
People's subjective judgment is very accurate. Since they are human beings, they must have their own fields of expertise and knowledge blind areas. It is difficult for people to judge the information in the fields that they are not good at. At the same time, people's energy is limited and they are weak in the face of massive information.
In order to solve the problem of subjective judgment, some manufacturers have introduced machine learning to the information judgment. There is no denying that this is the trend in the era of information explosion. Limited by the current bottleneck of machine learning development, it is difficult to have an algorithm that can fully automatically carry out information research and judgment, and then pat me on the face with the processing results of four nines accuracy rate, and directly tell me that this is accurate Threat Intelligence. It's impossible.
Threat Intelligence is used to make decisions and support our analysis. Any threat intelligence that can not reach the accuracy of 99.99% cannot be directly used in actual production. In the case of incomplete credibility, someone must intervene. It also indirectly shows the necessity of safety analysis and safety operation. In the field of security analysis, human-computer collaboration is still the mainstream in the future.
0x04 information research and judgment model
On the osint information research and judgment, I will talk about the specific methods of information research and judgment.
There are three basic principles in the research and judgment process:
- No subjective influence
- Information sources must be evaluated
- Keep information as close to the source as possible
Many people do research and judgment, NLP, thesaurus, or even supervised / unsupervised machine learning, which are not to the point. I don't deny the necessity of these jobs. But the thinking is more or less crooked.
Therefore, I think there are two aspects of information research:
- information sources
- Information itself
Many people only pay attention to information itself, but ignore the dimension of "information source". By adding the source reliability judgment, the accuracy rate of information research will be greatly improved.
Size determination
Based on these two dimensions, is it easier to divide some fine particles? Here are some judging sizes:
Source:
- Completely reliable
- Authenticity, integrity, reliability, and credibility of all professional fields
- In the history record, this information source has no stain
- Usually reliable
- There are individual problems in authenticity, integrity, reliability and professional fields (one of them)
- In the history, the information source has individual stain records
- Generally reliable
- There are some problems in authenticity, integrity, reliability and professional fields (two of them)
- In the history, there are some tainted records in this information source
- Unknown
- The information source property cannot be determined, and there is no history information record
- Unbelievable
- There are some doubts about authenticity, integrity, reliability and professional fields
- In the history, there are some tainted records in this information source
- It must not be credible
- There are clear doubts about authenticity, integrity, reliability and professional fields
- In the history, this information source has a large number of tainted records
Information itself:
- Extremely high quality
- Other independent sources confirm that the information is reliable
- This information is within our focus
- This information is logical
- High quality
- Other independent sources confirm that the information is reliable
- This information deviates from the scope of our concern
- This information is logical
- General quality
- Reliability cannot be determined from other independent sources, but it is logical
- This information is within our focus
- Unknown
- Reliability, logicality and attention matching of information itself cannot be determined
- Low quality
- Reliability cannot be determined from other independent sources, but it is logical
- This information deviates from the scope of our concern
- No value
- Reliability cannot be determined from other independent sources, not logical
- This information deviates from the scope of our concern
After the size is marked, the information source can be y-axis, the information itself is x-axis, and the unknown state is the origin to establish the coordinate system:
In this way, the information processed by the machine can be divided into three levels:
- Valuable Threat Intelligence
- Threat Intelligence requiring manual study
- Spam information
To put it in words: the security information with reliable information source and high quality of information itself is valuable threat information.
Value description
Of course, there are 369 Threat Intelligence, and the information that needs to be studied and judged has priorities. In fact, there are ways to quantify "value".
Although the above description of size is divided into six levels, human language is actually represented in the digital world by numbers. The standard can be quantified by algorithm.
The value of threat intelligence can be identified by the module of its coordinates:
The value of a, which is automatically determined as threat intelligence, is:
(x > 0 and Y > 0)
Similarly, the priority to be determined for mapping B and C is:
(x > 0 | y > 0)
(x > 0 | y > 0)
In this way, the level of threat intelligence can be distinguished. This is only limited to OSINT information processing. Different types of safety information, although the size is different, but the general idea is the same.
0x05 practice
I've talked about so many methodologies. To help better understand them, here's an example.
The scene is set as: enterprise security construction, collection of vulnerability Threat Intelligence for security operation.
- Step 1: determine the scope
- First of all, we need to understand the asset information of the enterprise and make clear which loopholes need to be concerned.
- Step 2: make a collection plan
- Determine the information source, information format, information research and judgment method, and information collection method.
- Common vulnerability information sources include: CVE vulnerability library, NVD vulnerability library, cnvd vulnerability library, media website, email subscription, personal / organizational blog, social platform (Facebook, twitter, wechat group, friend circle), etc.
- Clear information format. Generally, the vulnerability library has RSS subscription service, which can directly obtain structured data for regular and dictionary matching. The information of media websites, blogs and social platforms is often unstructured. Such data generally needs NLP to process. Different kinds of information are processed in different ways. Clear information format is for better data processing.
- Different information sources have different timeliness and collection methods. The timeliness of social platform is relatively high, so the time interval of information crawling should be as small as possible. The vulnerability library does not need to be crawled once a day. In most cases, information is collected by crawlers in an active way. In another case, mail subscription needs to be received in a passive way.
- Step 3: set the size
- Two dimensions of size: source reputation and information quality
- The source reputation needs to be accumulated, of course, it can also be preset. The weight of the official announcement website, Twitter's big V, professional security media, etc. can be a little larger.
- Information quality should be matched according to different sources, such as whether we pay attention to the products announced by manufacturers, how hot the tweets are, whether the vulnerabilities disclosed by the security media have the same information in other sources
- Step 4: machine analysis and judgment
- As mentioned in the previous section, there are several ways to differentiate the level of threat intelligence.
- Step 5: manual study and judgment
- Machine judgment is not credible, it can distinguish high value information in a certain range. For example, "Microsoft released the security update, and we have used the product of this update", which must be mapped to the first quadrant. But in many cases, information is mapped to quadrants two or four. Especially for the information from unstructured sources, for example, an inactive user sends a 0day message on twitter. The information with suspicious information sources (referring to inactive users, rather than twitter) often falls into the fourth quadrant. In this case, it needs to be connected to manual research and judgment.
- Step 6: disposal
- slightly
0x06 postscript
First of all, this film only describes the perspective of security analysis, how to produce Threat Intelligence, and provides an engineering solution.
Because the problem of security analysis is too big, the general method abstracted out is always dry, hoping to combine the content of "0x05 practice" for readers to gain.
PS: blog synchronization update, http://pi4net.com needs to turn over the wall.
If you are interested in security analysis, please comment. It may be the next blog content.