Preface
How to know if your company has been invaded? Is it because no one comes to "black" or because of the lack of self perception, it can't be found yet? In fact, intrusion detection is a serious challenge for every large Internet enterprise. The higher the value of the company, the greater the threat of invasion. Even the Internet ancestor like Yahoo still suffered from full data theft when it came to an end. Once Internet companies are successfully "invaded", the consequences will be unimaginable.
Based on the consideration of "attack and defense confrontation", this paper will not mention the specific intrusion detection model, algorithm and strategy. Those students who want to directly copy the "intrusion strategy" may be disappointed. But we will share some of the operation ideas. Please give your advice. If it can help the later generations, it will be better. You are also welcome to discuss with us.
Definition of intrusion
Typical intrusion scenarios:
Hackers can remotely control the target's laptop / mobile phone / server / network equipment through the network, and then read the target's privacy data at will, or use the functions on the target system, including but not limited to using the microphone of the mobile phone to monitor the target, using the camera to peep at the target, using the target's computing power to mine and using the target equipment The network ability to launch DDoS attacks and so on. Or crack the password of a service, go in to check sensitive data, control access control / traffic lights. All of these are classic intrusion scenarios.
We can give a definition of intrusion: hackers control and use our resources (including but not limited to reading and writing data, executing commands, controlling resources, etc.) to achieve various purposes without authorization. In a broad sense, hackers use SQL injection vulnerabilities to steal data, or get the account password of the target domain name in the ISP to tamper with DNS to point to a black page, or find the target social account. On Weibo / QQ / mailbox, unauthorized control of virtual assets belongs to the scope of intrusion.
Intrusion detection for enterprises
The scope of enterprise intrusion detection is narrow in most cases: generally, it refers to the behavior of hackers controlling PC, system, server and network (including office network and production network).
The most common way for hackers to control host assets such as PC and server is to execute instructions through shell. The action of getting shell is called getshell.
For example, through the upload vulnerability of Web services, you can get the webshell, or directly execute commands / codes by using the rce vulnerability (the rce environment provides a shell in disguise). In addition, the "Trojan back door" is implanted in some way, and then the Trojan integrated shell function is directly used to remotely control the target, which is also typical.
Therefore, intrusion detection can focus on the action of getshell, as well as the malicious behavior after the success of getshell (in order to expand the results of the war, hackers will mostly use shell to detect, search for stealing, and horizontally move to attack other internal targets, which are different from the characteristics of good people can also be regarded as important features).
Some peers (including commercial products) like to report some "external scanning, attack detection and attempt behavior" before getshell, and call it "situation awareness" to tell enterprises that someone is "trying to attack". In my opinion, the actual combat value is not great. Many enterprises, including meituan, are suffering from "unidentified" attacks all the time. Knowing that someone is "trying" to attack, if they can't act effectively, they can't effectively warn against the actions. In addition to the cost of effort, they don't have much practical value.
When we are used to "attack" as the normal, we will solve the problem in such a normal way, what reinforcement strategies can be used, what can achieve the normal operation, and if there is any strategy that can not be normalized operation, such as requiring many people to work overtime and guard temporarily, most of this strategy will gradually disappear in the near future. There is no essential difference between this strategy and ours.
Similar to SQL injection, XSS and other indirect getshell web attacks, they are not considered in the narrow sense of "intrusion detection" for the time being. It is suggested that they can be classified into "vulnerability", "threat awareness" and other fields, which will be discussed separately. Of course, if we use SQL injection, XSS and other portals to perform the getshell operation, we still grasp the key point of getshell, and don't care where the vulnerability portal is.
"Invasion" and "inner ghost"
The scene close to the invasion is "inner ghost". Intrusion itself is a means, getshell is just the starting point, and the goal of hacker getshell is to control resources and steal data later. The "internal ghost" naturally has legal authority and can legally access sensitive assets, but for purposes other than work, they illegally dispose of these resources, including copying copies, transferring and leaking, tampering with data for profit, etc.
The behavior of the internal ghost is not in the scope of "intrusion detection". It is generally managed and audited from the perspective of internal risk control, such as separation of duties, double auditing, etc. There are also data anti disclosure products (DLP) to assist it, which will not be discussed in detail here.
Sometimes, when the hacker knows that employee a has the right to contact the target asset, he will attack employee a and steal the data with the right of employee a, which is also defined as "intrusion". After all, a is not the "inner ghost" of subjective malice. If it can not be captured at the moment when the hacker attacks a, or it can not distinguish between the data stolen by the hacker controlled a and the access data of the normal employee a, then this intrusion detection also fails.
The nature of intrusion detection
As mentioned above, intrusion means that hackers can operate our assets without our consent, and there are no restrictions on the means. So how to find out the difference between the invasion and the legal normal behavior and separate it from the legal behavior is "invasion discovery". In the algorithm model, this is a marking problem (intrusion, non intrusion).
Unfortunately, the "black" samples of this kind of intrusion are very rare. It is difficult to find the rules of intrusion through a large number of labeled data and supervised training of intrusion detection model. Therefore, intrusion detection strategy developers often need to invest a lot of time to refine more accurate expression model, or spend more energy to construct "similar intrusion" simulation data.
A classic example is that in order to detect webshell, security practitioners can search GitHub for some public webshell samples, with the number less than 1000. However, these data are far from enough for the training needs of machine learning, which is often one million level. In addition, these sample sets on GitHub have a large number of similar samples generated by a single technique, and some of them are lack of antagonistic techniques. Therefore, such training, trying to let AI master the characteristics of webshell and distinguish them through "a large number of samples", in principle, is unlikely to be perfectly realized.
At this time, it is called traditional feature engineering to classify the known samples and extract more accurate expression model. However, traditional feature engineering is often regarded as inefficient repetitive work, but the effect is often stable. After all, adding a technical feature can stably find a kind of webshell. However, the construction of a large number of malicious samples, although there are machine learning, AI and other halos, is often difficult to achieve success in the actual environment: the automatically generated samples are difficult to describe the original meaning of webshell, mostly describe the automatically generated algorithm features.
On the other hand, the difference of intrusion is to see whether the behavior itself "authorizes", and whether the authorization itself has no significant distinguishing feature. Therefore, if we can make some reinforcement to converge the legitimate access to the limited channel, and make a strong distinction to the channel, we can also greatly reduce the cost of intrusion detection. For example, strict authentication is required for access sources, whether natural persons or program APIs, to hold legal bills. When sending invoices, multi-dimensional authentication and authorization are required according to different situations, and then Iam is used to record and monitor the scope of these bills that they can access, and a lower level log is generated to make abnormal access model perception.
This whole life cycle risk control model is also the premise and foundation for Google's beyondcorp boundless network to be implemented.
Therefore, there are 2 main ideas of intrusion detection:
- Pattern matching based on black characteristics (for example, webshell keyword matching).
Pattern matching based on black characteristics (for example, webshell keyword matching).
- According to the historical behavior of the business (generate baseline model), make an exception comparison (not white but black). If the historical behavior of the business is not convergent enough, use reinforcement to converge it, and then pick out the small number of non-conforming abnormal behaviors.
According to the historical behavior of the business (generate baseline model), make an exception comparison (not white but black). If the historical behavior of the business is not convergent enough, use reinforcement to converge it, and then pick out the small number of non-conforming abnormal behaviors.
Intrusion detection and attack vector
According to different targets, the attack surface that may be exposed to hackers will be different, and the intrusion methods that hackers may adopt will be totally different. For example, invading our PC / laptop, as well as the server deployed in the computer room / cloud, the methods of attack and defense are quite different.
For a specific "target", the access channel may be a limited set, and the path to be attacked is also limited. The combination of "attack method" + "attack surface of target" is called "attack vector".
Therefore, when we talk about the effect of intrusion detection model, we need to define the attack vector, collect the corresponding logs (data) for different attack paths, and then we can make the corresponding detection model. For example, the shell command data set after SSH login cannot be used to detect the behavior of webshell. However, the data collected based on network traffic is also impossible to perceive whether the hacker has executed any command in the shell environment after SSH.
Based on this, if an enterprise doesn't mention specific scenarios, it will say that it has done a good job of apt perception model, which is obviously "boasting".
Therefore, intrusion detection should first list all kinds of attack vectors, and collect data for each subdivision scenario (HIDS + NIDS + WAF + rasp + application layer log + system log + PC...) According to the actual data characteristics of the company, the corresponding detection model adapted to the actual situation of the company is made. The technology stack, data scale and exposed attack surface of different companies will have a significant impact on the model. For example, many security workers are especially good at the detection of webshell in PHP, but they have come to a Java company
Common invasion methods and Countermeasures
If we don't understand the common hacking techniques, it's hard to target them, and sometimes even fall into the trap of "political correctness". For example, the penetration test team said that we did action a and you didn't find it, so you can't. However, the actual situation is that the scene may not be a complete intrusion chain, even if the action is not found, it may have no impact on the effect of intrusion detection. Every attack vector's harm to the company, how to rank the probability of occurrence, how to solve the cost and benefit it costs, all need to have professional experience to make support and decision.
Now let's briefly introduce the classic process in the hacker invasion tutorial (refer to the kill chain model for the whole process):
Before intruding into a target, the hacker may not know enough about the target, so the first thing is often "stampede", that is, collecting information and deepening understanding. For example, hackers need to know what assets (domain name, IP, service) the target has, how their respective status is, whether there are known vulnerabilities, who manages them (and how to manage them legally), what are known leaked information (such as password in social work library, etc.)
Once the stampede is completed, skilled hackers will brew and verify the feasibility of "attack vector" one by one according to the characteristics of various assets. The common attack methods and defense suggestions are listed below.
High risk service intrusion
All public services are "high-risk services", because there may be known attack methods (Advanced attackers even have corresponding 0days) for the protocol or the open-source components to implement the protocol. As long as your value is high enough and hackers have enough motivation and resources to mine, when you open high-risk services to the Internet and open them to all people, It is equivalent to opening the door for hackers.
For example, SSH, RDP and other operation and maintenance management related services are designed for administrators. As long as they know the password / secret key, anyone can log in to the server and complete the intrusion. Hackers may obtain credentials by guessing the password (combined with the information leakage of social work database, online disk retrieval or brute force cracking). In fact, such attacks are so common that hackers have long been made into fully automated worm tools for Internet scanning. If a host purchased in the cloud has set a weak password, it will often infect the worm within a few minutes, because there are so many automatic attackers.
Perhaps, your password is set to be very strong, but this is not the reason why you can continue to expose the service to the Internet. We should limit these ports, only allow their own IP (or internal fortress host) access, and completely cut off the possibility of hackers invading us through it.
Similarly, mysql, redis, FTP, SMTP, MSSQL, Rsync and other services used to manage servers, databases, and files should not be open to the Internet without restrictions. Otherwise, the worm attack tools will break through our services in just a few minutes, or even directly encrypt our data, or even require us to pay bitcoin for extortion.
There are also some high-risk services with rce vulnerabilities (remote command execution). As long as the port is open, hackers can use the existing exp to directly get shell to complete the invasion.
Defense suggestion: the cost of intrusion detection for each high-risk service is high, because there are many specific points of high-risk service, which does not necessarily have common characteristics. Therefore, the cost performance of converging attack entry is higher through reinforcement. Prohibit all high-risk ports from opening to the Internet, which can reduce the intrusion probability by more than 90%.
Web intrusion
With the reinforcement of high-risk ports, many attacks in hacker knowledge base will fail. But Web services are the main service form of modern Internet companies, and it is impossible to turn them off. As a result, dynamic web service vulnerabilities based on PHP, Java, ASP, asp.net, node, CGI written by C and so on become the most important entry for hackers.
For example, use the upload function to upload a webshell directly, use the file inclusion function to directly reference and execute a remote webshell (or code), then use the code execution function to directly execute arbitrary commands as the shell's entry, analyze some picture and video services, upload a malicious sample, trigger the vulnerability of the parsing library
Application security under Web services is a special field (the white hat talks about Web Security), and the specific attack and defense scenarios and confrontation have developed very mature. Of course, because they are all web services as the entry, there will be some common sense of intrusion behavior. Relatively speaking, we can easily find some differences between hacker getshell and normal business behavior.
For the intrusion trace detection of Web services, we can consider collecting WAF logs, access logs, system calls recorded by auditd, or shell instructions, as well as the data related to response at the network level, to extract the characteristics of the successful attack. We suggest that we focus on these aspects.
0day intrusion
Judging from the leaked toolkit, NSA had 0day weapons to directly attack Apache and nginx services in the early years. This means that the opponent probably doesn't care what our code and service are like at all. Take a dozen of 0days and get the shell.
But for intrusion detection, this is not terrible: no matter what vulnerability an adversary exploits as an entry, the shellcode it uses and the subsequent behavior itself still have common characteristics. Apache has 0day vulnerability to be attacked, or a PHP page has low-level code vulnerability to be exploited. From the perspective of intrusion behavior, it may be exactly the same, and intrusion detection model can also be general.
Therefore, focusing on the hackers' getshell entry and subsequent behaviors may be more valuable than focusing on vulnerability entry. Of course, the specific exploitation of loopholes should be followed up, and then verify whether its behavior is in line with expectations.
Office terminal intrusion
In the vast majority of apt reports, hackers start with people (office terminals), such as sending an email, coaxing us to open it, controlling our PCs, then making long-term observation / browsing, getting our legal credentials, and then roaming the intranet. So most of these reports focus on the description of Trojan behavior and family code similarity. Most of the anti apt products and solutions are also at the system call level of the office terminal, using similar methods to test the behavior of "no killing Trojan horse".
Therefore, EDR products + e-mail security gateway + behavior audit at the exit of office network + sandbox of apt products can collect corresponding data and make similar intrusion detection perception model. The most important point is that hackers like to pay attention to the important internal infrastructure, including but not limited to ad domain control, mail server, password management system, permission management system, etc. once taken, it is equivalent to becoming the "God" of the internal network, and can do whatever they want. Therefore, for the company, the important infrastructure should have targeted attack and defense reinforcement discussion. Microsoft has even issued a special reinforcement white paper for ad attack and defense.
Basic principles of intrusion detection
A model that does not follow up every alarm thoroughly is equivalent to an invalid model. After the invasion and before the defense, there are actually alarms, but there are too many alarms that haven't been followed / thoroughly investigated. This is "after the horse", which is equivalent to the lack of discovery ability. Therefore, for thousands of products with daily average alarms, security operators often express frustration.
We must shield some similar alarms that occur repeatedly to concentrate on closing each alarm. This will result in a white list, that is, a missing report, so the missing report of the model is inevitable.
As there will be false positives in any model, we must make multiple models in multiple latitudes to form correlation and depth. Assuming that webshell static text analysis is bypassed by hacker deformation, malicious calls in rasp (runtime environment) can also be monitored, so that you can choose to accept single model's missing reports, but still have the ability of discovery on the whole.
Since every single scenario model has false positives and false positives, we need to consider "cost performance" for what scenario we do and what scenario we don't do. For example, some deformed webshell can be written very similar to business code, which is almost unrecognizable to human eyes, and the pursuit of confrontation in text analysis is poor cost-effective decision-making. If the rasp detection scheme is passed, its cost performance is higher and more feasible.
It's not easy for us to know all the attack methods of hackers, and it's not likely to build strategies for each method (considering that resources are always scarce). Therefore, for the key business, it is necessary to strengthen the way (and also need to monitor the effectiveness of the reinforcement in a normalized way), so that the path that hackers can attack is extremely convergent, and only in the key link to resist. At least, it can provide protection for the core business.
Based on the above principles, we can know the fact that we may never achieve 100% intrusion detection on a single point, but we can make it difficult for attackers to bypass all points through some combination methods.
When the boss or the blue army challenges and the detection ability of a single point is lacking, if they want to be "politically correct", they will invest endlessly in this single point and try to make the single point 100% discoverable. In many cases, they are just trying to create a "perpetual motivation", which is a pure waste of human resources and no real income. It will save resources and arrange more defense chains in depth with high cost performance, obviously the effect will be better.
Main forms of intrusion detection products
Intrusion detection is ultimately based on data to model, for example, for the detection of webshell, first identify the web directory, and then analyze the text of the files under the web directory, which requires a collector. The intrusion detection model based on shell command needs to obtain all shell commands, which may require hook system call or hijacking shell. Detection based on network IP reputation, traffic payload, or content inspection based on mail gateway may need to be embedded in the network boundary to bypass traffic collection.
There are also some integrators, based on multiple sensors, who collect the logs of all parties, summarize them in one SOC or Siem, and then submit them to the big data platform for comprehensive analysis. Therefore, the industry's intrusion detection related products are roughly divided into the following forms:
- Host agent class: after the hacker attacks the host, the actions on the host may produce traces such as logs, processes, commands, networks, etc., so deploying a collector (also including some detection rules) on the host is called host based intrusion detection system, or HIDS for short.
Host agent class: after the hacker attacks the host, the actions on the host may produce traces such as logs, processes, commands, networks, etc., so deploying a collector (also including some detection rules) on the host is called host based intrusion detection system, or HIDS for short.
- Typical products: ossec, Ivy cloud, Android, security dog. Google recently released an alpha version of similar product cloud security command center. Of course, some apt manufacturers often have sensors / agents on the host, such as fireeye.
Typical products: ossec, Ivy cloud, Android, security dog. Google recently released an alpha version of similar product cloud security command center. Of course, some apt manufacturers often have sensors / agents on the host, such as fireeye.
- Network detection class: because most attack vectors will put some payload on the target through the network, or the protocol controlling the target itself has strong characteristics, so it has the advantage of recognition at the network level.
Network detection class: because most attack vectors will put some payload on the target through the network, or the protocol controlling the target itself has strong characteristics, so it has the advantage of recognition at the network level.
- Typical products: snort to commercial NIDS / nips, corresponding to apt level, and products like nex of fireeye.
Typical products: snort to commercial NIDS / nips, corresponding to apt level, and products like nex of fireeye.
- Log centralized storage analysis class: this kind of product allows hosts, network devices and applications to output their own logs, which are centralized in a unified background. In this background, comprehensive analysis of all kinds of logs is carried out to determine whether multiple paths of an intrusion can be associated. For example, the web access log of host a shows that it has been scanned and attacked, and then there is a strange process and network connection at the host level. Finally, host a tries to infiltrate other hosts in the intranet horizontally.
Log centralized storage analysis class: this kind of product allows hosts, network devices and applications to output their own logs, which are centralized in a unified background. In this background, comprehensive analysis of all kinds of logs is carried out to determine whether multiple paths of an intrusion can be associated. For example, the web access log of host a shows that it has been scanned and attacked, and then there is a strange process and network connection at the host level. Finally, host a tries to infiltrate other hosts in the intranet horizontally.
- Typical products: Siem products such as logrhythms and splunks.
Typical products: Siem products such as logrhythms and splunks.
- Apt sandbox: sandbox products are more similar to a cloud version of advanced anti-virus software, through simulation of observation behavior, to combat the characteristics of unknown sample weak features. However, it needs a simulated running process, with high performance overhead. In the early days, it was considered as a "low cost-effective" solution. However, due to the hidden behavior of malicious files, it is difficult to resist the characteristics, so it has become the core component of apt products. Unknown samples obtained through network traffic, terminal collection, server suspicious sample extraction, email attachment extraction, etc. can be submitted to the sandbox for a run to determine whether it is malicious.
Apt sandbox: sandbox products are more similar to a cloud version of advanced anti-virus software, through simulation of observation behavior, to combat the characteristics of unknown sample weak features. However, it needs a simulated running process, with high performance overhead. In the early days, it was considered as a "low cost-effective" solution. However, due to the hidden behavior of malicious files, it is difficult to resist the characteristics, so it has become the core component of apt products. Unknown samples obtained through network traffic, terminal collection, server suspicious sample extraction, email attachment extraction, etc. can be submitted to the sandbox for a run to determine whether it is malicious.
- Typical products: fireeye, Palo Alto, Symantec, micro step.
Typical products: fireeye, Palo Alto, Symantec, micro step.
- Terminal intrusion detection products: at present, there is no actual product for mobile terminal, and it is not necessary. The first thing necessary for PC is anti-virus software. If malicious program can be detected, it can avoid intrusion to some extent. But if you encounter the advanced 0day and Trojan horse, the anti-virus software may be bypassed. Referring to the idea of HIDS on the server, the concept of EDR is also born. In addition to the local logic, the host will collect more data to the back-end for comprehensive analysis and linkage. It is also said that the next generation of anti-virus software will bring EDR capabilities, but the current sales are still sold separately.
Terminal intrusion detection products: at present, there is no actual product for mobile terminal, and it is not necessary. The first thing necessary for PC is anti-virus software. If malicious program can be detected, it can avoid intrusion to some extent. But if you encounter the advanced 0day and Trojan horse, the anti-virus software may be bypassed. Referring to the idea of HIDS on the server, the concept of EDR is also born. In addition to the local logic, the host will collect more data to the back-end for comprehensive analysis and linkage. It is also said that the next generation of anti-virus software will bring EDR capabilities, but the current sales are still sold separately.
- Typical products: anti-virus software includes bit9, Sep, Symantec, Kaspersky and McAfee; EDR products are not listed. Tencent's IOA and Ali Lang can play similar roles to some extent;
Typical products: anti-virus software includes bit9, Sep, Symantec, Kaspersky and McAfee; EDR products are not listed. Tencent's IOA and Ali Lang can play similar roles to some extent;
Evaluation index of intrusion detection effect
First of all, active discovery of intrusion cases / all intrusions = active discovery rate. This indicator must be the most intuitive. What is more troublesome is the denominator. Many real intrusions will not appear in the denominator if there is no external feedback and we do not detect them. Therefore, the effective discovery rate is always high. Who can guarantee that all current intrusions are discovered? (but in fact, as long as the number of intrusions is enough, whether it's the information received by SRC or a big news reported by the "dark net", including the objectively known intrusions into the denominator, an active discovery rate can always be calculated.)
In addition, the real intrusion is actually a low-frequency behavior. If large-scale Internet enterprises are invaded by hundreds of them all year round, it is certainly not normal. Therefore, if there is no real intrusion case for a long time, this index will not change for a long time, nor can it describe whether the intrusion detection ability is improving.
Therefore, we will generally introduce two indicators to observe:
- Active detection rate of blue army confrontation
Active detection rate of blue army confrontation
- Known scene coverage
Known scene coverage
The blue army's active high-frequency confrontation and exercise can make up for the low-frequency deficiency of the real invasion event. However, the attack techniques mastered by the blue army are often limited. After many exercises, the techniques and scenes may be listed. Assuming that the builder of a certain scene has not yet completed the ability, the blue army exercises the same posture 100 times, and adds 100 undiscovered exercise cases, which is not more helpful to the builder. Therefore, it is also a good evaluation index to take out the built-up coverage of known attack tactics.
The intrusion detection team focuses on the priority evaluation and rapid coverage of known attack techniques, and has its own professional judgment on the degree of construction to meet the needs (refer to the "cost performance" principle in the intrusion detection principle).
The basic acceptance principle is to announce that the intrusion detection capability of a scenario has been built:
- The daily average work order of this scenario is less than x, and the peak value is less than y; currently, the daily average work order of all scenarios is less than XX, and the peak value is less than YY, and the strategy beyond this indicator will not be received, because too many alarms will lead to the inundation of effective information, and on the contrary, the ability previously possessed will be interfered, so it is considered that this scenario has not yet possessed the ability of confrontation.
The daily average work order of this scenario is less than x, and the peak value is less than y; currently, the daily average work order of all scenarios is less than XX, and the peak value is less than YY, and the strategy beyond this indicator will not be received, because too many alarms will lead to the inundation of effective information, and on the contrary, the ability previously possessed will be interfered, so it is considered that this scenario has not yet possessed the ability of confrontation.
- The same event only alarms for the first time, and automatic aggregation occurs many times.
The same event only alarms for the first time, and automatic aggregation occurs many times.
- Have self-learning ability of false alarm.
Have self-learning ability of false alarm.
- Alarms are readable (with clear risk description, key information, processing guidance, auxiliary information or index, which is convenient for qualitative analysis). Alarms in key value mode are not encouraged. It is recommended to use natural language to describe core logic and response process.
Alarms are readable (with clear risk description, key information, processing guidance, auxiliary information or index, which is convenient for qualitative analysis). Alarms in key value mode are not encouraged. It is recommended to use natural language to describe core logic and response process.
- Have clear documentation and self-test report (just like delivering a R & D product, product documentation and self-test process are the guarantee of quality).
Have clear documentation and self-test report (just like delivering a R & D product, product documentation and self-test process are the guarantee of quality).
- There is a blue army actual combat acceptance report for the scene.
There is a blue army actual combat acceptance report for the scene.
- It is not recommended to call wechat, SMS and other interfaces to send alarms (the difference between alarms and events is that events can be closed-loop, and alarms are just reminders). A unified alarm event framework can effectively manage events to ensure closed-loop, and provide long-term basic operation data, such as stop loss efficiency and false alarm volume / rate.
It is not recommended to call wechat, SMS and other interfaces to send alarms (the difference between alarms and events is that events can be closed-loop, and alarms are just reminders). A unified alarm event framework can effectively manage events to ensure closed-loop, and provide long-term basic operation data, such as stop loss efficiency and false alarm volume / rate.
The document of the strategist should explain which situations the current model has the perception ability to and under which conditions it will not alarm (test one's understanding of the scene and his own model). Through the above judgment, the maturity of the strategy can form a self rating, 0-100 free rough estimation. It's often difficult to achieve 100 points for a single scenario, but that doesn't matter, because the marginal cost of increasing from 80 to 100 points can be very high. It is not recommended to pursue the ultimate, but to comprehensively examine whether to quickly put into the next scene.
If real confrontation often occurs in a scene with less than full score, and there is no cross strategy to make up for it, the self-assessment conclusion may need to be reviewed and the acceptance criteria may be improved. At least, the actual cases encountered in the work should be given priority.
Key factors affecting intrusion detection
When discussing the factors that affect intrusion detection, we can simply look at the mistakes that have occurred in the past that have prevented the defenders from actively discovering the intrusion:
- The dependent data is lost, for example, the HIDS is not deployed and installed / the agent is hung / the data reporting process is lost / bug, or the data is lost in the background transmission chain.
The dependent data is lost, for example, the HIDS is not deployed and installed / the agent is hung / the data reporting process is lost / bug, or the data is lost in the background transmission chain.
- The strategy script bug did not start (in fact, we have lost the strategy awareness).
The strategy script bug did not start (in fact, we have lost the strategy awareness).
- We haven't built the corresponding strategy (we haven't built the corresponding strategy in this scenario until the invasion happens many times).
We haven't built the corresponding strategy (we haven't built the corresponding strategy in this scenario until the invasion happens many times).
- The sensitivity / maturity of the strategy is not enough (for example, the threshold value of the scan is not reached, and webshell uses a distorted countermeasure).
The sensitivity / maturity of the strategy is not enough (for example, the threshold value of the scan is not reached, and webshell uses a distorted countermeasure).
- Part of the basic data that the model depends on is wrong, and wrong judgment is made.
Part of the basic data that the model relies on is wrong and wrong judgment is made.
- The alarm was successful, but the student in charge of emergency judgment / no follow-up / auxiliary information was not qualitative enough, and no action was taken.
The alarm was successful, but the student in charge of emergency judgment / no follow-up / auxiliary information was not qualitative enough, and no action was taken.
So in fact, in order for an intrusion event to be captured, we need the intrusion detection system to run for a long time, with high quality and high availability. This is a very professional work, beyond the ability and willingness of the vast majority of safety engineers. Therefore, it is suggested to assign special operators to be responsible for the following objectives:
- Integrity of data collection (reconciliation of the whole link).
Integrity of data collection (reconciliation of the whole link).
- Every strategy works normally at all times (automatic dial test monitoring).
Every strategy works normally at all times (automatic dial test monitoring).
- Accuracy of basic data.
Accuracy of basic data.
- Convenience of work order operation support platform and tracing auxiliary tools.
Convenience of work order operation support platform and tracing auxiliary tools.
Some students may think that the key factor affecting intrusion detection is not the validity of the model? What's all this mess?
In fact, the daily average data volume of intrusion detection system of large Internet enterprises may reach hundreds of T, or even more. It involves dozens of business modules and hundreds of machines. In terms of digital scale, it is no less than the whole data center of some small and medium-sized enterprises. Such a complex system, in order to maintain the high availability standard for a long time, needs the professional support of SRE, QA and other auxiliary roles. If only a few safety engineers are relied on, it is difficult for them to take into account the quality of basic data, the availability and stability of services, the standardization of changes at the time of release, all kinds of operation indicators and timely response to operation and maintenance failures when they study security attack and defense. The final result is the intrusion that can be found within the scope of capability, and there are always all kinds of accidents that can't be "just" found.
Therefore, the author believes that the poor operation quality of most safety teams is not enough to match the strategy (Technology). Of course, once there are resources to follow up these auxiliary work, intrusion detection really needs to spell strategy.
At this time, there are so many attack techniques, why choose this scene construction first? Why do we think that to some extent, the construction is enough to meet the current needs? How can we choose to discover some samples and give up the confrontation of others?
These seemingly subjective things test professional judgment very much. Moreover, it's easy to carry the hat of "lack of responsibility" in front of leaders, such as finding excuses for difficulties rather than methods for targets. This technique has been attacked many times by hackers. Why can't it be solved? What's the reason for that technique to be in the field of vision, but it will be solved next year?
How to discover apt?
The so-called apt is a high-level continuous threat. Since it's advanced, it means that Trojans are likely to be immune from killing (they can't be found by anti-virus software or common features), exploit vulnerabilities (they can't be reinforced to the teeth to prevent the enemy's coming in), and attack techniques are also advanced (attack scenes may not be seen by us).
So, in fact, the meaning of apt is about the same as that of undetectable intrusion. However, there are always apt testing products in the industry, and the manufacturers of solutions are mixing. What do they do?
- If the Trojan horse is not killed, Sandbox + manual analysis is used, even if the efficiency is lower, it is still trying to make qualitative analysis, and quickly synchronize the IOC (Threat Intelligence) to other customers. It is found that one case, global customers have the same perception ability.
If the Trojan horse is not killed, Sandbox + manual analysis is used, even if the efficiency is lower, it is still trying to make qualitative analysis, and quickly synchronize the IOC (Threat Intelligence) to other customers. It is found that one case, global customers have the same perception ability.
- The traffic encryption is against the deformation. The model of anomaly detection is used to identify some unknown suspicious IP relations and payloads. Of course, after identification, operators should follow up carefully to determine the nature.
The traffic encryption is against the deformation. The model of anomaly detection is used to identify some unknown suspicious IP relations and payloads. Of course, after identification, operators should follow up carefully to determine the nature.
- If the attack technique is advanced, it is still assumed that the hacker will use known techniques such as harpoon and puddle to execute, and then collect logs in email attachments, PC terminals and other links to analyze the user's behavior. Ueba tries to find out the user's unusual actions.
If the attack technique is advanced, it is still assumed that the hacker will use known techniques such as harpoon and puddle to execute, and then collect logs in email attachments, PC terminals and other links to analyze the user's behavior. Ueba tries to find out the user's unusual actions.
So what about us? There is no good way to find the legendary "kill free" Trojan horse, but we can extract some characteristics of the samples and behaviors generated by known hacker attack frameworks (such as Metasploit and cobalt strike). We can assume that a hacker has controlled a certain machine, but when it tries to spread horizontally, we have some models that can recognize the horizontal movement of the host.
In my opinion, there is no 100% way to find apt in the world. But we can wait for the team implementing apt to make mistakes. As long as we have enough depth and asymmetric information, it is absolutely difficult to avoid touching all our bells at all.
Even if the attacker needs to carefully avoid all detection logic, it may also give the opponent a psychological deterrent, which may delay the opponent's speed of approaching the target for a long time. At this time, as long as he makes mistakes, it's our turn to play.
All the high standards in front, including high coverage and low false alarm, force every alarm to follow up to the end, and the attitude of "dig the ground three feet" is waiting for this moment. It's worth remembering that sense of achievement when you catch a worthy opponent.
Therefore, I hope that all security colleagues engaged in intrusion detection can stick to it. Even if they have heard countless times of "wolf coming", the next time they see an alarm, they can still meet the opponent with the highest awe (alarm has abused me thousands of times, I wait for an alarm like first love).
AI's correct posture in the field of intrusion detection
In recent two years, if we don't talk about AI, it seems that the story will not be complete. However, with the popularity of AI concept, many people have put traditional data mining, statistical analysis and other ideas, such as classification, prediction, clustering, association and other algorithms, all in the hat of AI.
In fact, AI is a modern method, which has a very practical output in many places. Taking the text analysis of webshell as an example, it may take a long time for us to separate dozens of sample technology types hidden in thousands of samples, and take a longer time to build models one by one (yes, in such a scenario, feature engineering really takes a longer time).
Using AI to do well in data marking, training and parameter adjustment will soon get a model with less fitting laboratory environment and put it into production quickly. A little more proficiency can be done in 1-2 months.
In this scenario, AI, as a modern method, can greatly improve efficiency. But the problem is, as mentioned before, hackers' attack black samples and webshell samples are often extremely scarce, which can not completely describe the complete characteristics of hackers' invasion. Therefore, the results of AI output, whether the false alarm rate or the false alarm rate, will be greatly affected by the training methods and input samples. We can use AI, but we can never give it to AI completely.
A common phenomenon in the field of security is that it's hard to solve the problem of marking by mathematical model. At this time, security experts are often required to go ahead and algorithm experts to keep up, instead of letting algorithm experts "fight alone".
For a specific attack scenario, how to collect the corresponding intrusion data, think about the difference between this intrusion action and normal behavior, the feature extraction process often determines the final effect of the model. The feature determines the upper limit of the effect, while the algorithm model only determines how close to the upper limit.
Previously, the author had seen a case where the AI team produced a webshell model with excellent laboratory environment effect and a false alarm rate of 1 / 1000000. However, when it was put into the production environment, the daily average alarm was 6000, which was completely unable to operate, and there were many false alarms. With the joint efforts of the security team and AI engineers, these situations were gradually solved. However, it did not succeed in replacing the original feature engineering model.
At present, there are many products and articles in the industry practicing AI, but unfortunately, most of these articles and products are "superficial" and have no practical operation effect in the real environment. Once we ask for it with the previous standards, we will find that AI is a good thing, but it is definitely just a "semi-finished product". Real operations often need traditional feature engineering and AI to work in parallel, but also need continuous iteration.
The future is bound to be the world of AI, but as much intelligence as there is, there may be as much artificial work ahead. I would like to continue to explore and share with colleagues on this road.
About the safety of meituan
Most of the core developers of meituan security department have many years of practical experience in the Internet and security field. Many students have participated in the security system construction of large Internet companies, including many global security operation talents, who have the experience of attack and defense on a scale of one million IDC. The security department also has CVE's "excavation experts", speakers invited to speak at top international conferences such as black hat, and of course, many beautiful operation girls.
At present, meituan security department involves technologies such as penetration testing, web protection, binary security, kernel security, distributed development, big data analysis, security algorithm, etc., as well as global compliance and privacy protection policy formulation. We are building a set of mobile office network adaptive security system with the scale of one million IDC and access of hundreds of thousands of terminals, which is built on the zero trust architecture and spans a variety of cloud infrastructure, including network layer, virtualization / container layer, server software layer (kernel / user state), language virtual machine layer (JVM / JS V8), web application layer, data access layer, etc., and can build a fully automatic security event awareness system based on big data + machine learning technology, striving to build the industry's most cutting-edge built-in security architecture and defense in depth system.
With the rapid development of meituan and the continuous improvement of business complexity, the security sector is facing more opportunities and challenges. We hope to implement more security projects that represent the best practices in the industry, and provide a broad development platform for more security practitioners, as well as more opportunities for continuous exploration in emerging security fields.
Welcome to join the safety technology exchange group of meituan and communicate with the author at zero distance. Way to join the group: please wechat (wechat: mtdptech01) of Camry students, and we will automatically pull you into the group.