Hacking Book | Free Online Hacking Learning


internal threat detection method based on dynamic change of user behavior

Posted by herskovits at 2020-03-15

*Original author: Mu Qianzhi

Written in front

The threat detection method based on user behavior modeling (profile) has not been a new thing for a long time. As early as 2008, in a review [1], the author listed a large number of existing research examples of active detection.

However, there are still few mature internal threat active detection systems in the real security industry today. One of the biggest reasons is that the false alarm rate of detection methods proposed by the academic community is too high. Although it seems to be only 1% - 2%, for tens of thousands of daily logs, the workload of security administrator / audit administrator is quite heavy, resulting in Now, although the audit system has been deployed in the organization, the current situation that the administrator doesn't basically look at the alarm of the day is not that they don't want to look at it, but that they really don't know what to look at. The human resources can't really see it. Therefore, the high false alarm rate is one of the biggest constraints to the practical application of active detection.

So today we will introduce a method to reduce the false positive rate of active detection. This method considers the change of user behavior, and thus establishes a dynamic classifier.


1. introduction

2. Dynamic classifier

3. Internal threat detection: non sequential data

4. Internal threat detection: sequence data

5. Hadoop / MapReduce application

6. conclusion

7. References

I. Introduction

The false alarm of internal threat response system based on active detection mainly comes from judging the normal and reasonable behaviors of users as abnormal behaviors. Here, the comparison standard of abnormal behaviors is the user behavior pattern learned from the training set. The traditional research focuses on extracting and selecting better behavior features and classification algorithms to improve the detection accuracy and reduce the false alarm rate; today we focus on the judgment criteria, that is, the user behavior pattern obtained by learning first.

Traditional research has ignored a problem: the behavior of users will change with time, and it is reasonable to change. When we regard the user's behavior data as a data flow under a huge time window, we will find this significant change. We will introduce more details in the next section.

Today, we mainly build classifiers to adapt to dynamic changes from the perspective of user behavior changes. According to the different use of data sets, we analyze non sequence data and sequence data respectively. Finally, we deploy the analysis task in the distributed processing frame based on Hadoop / MapReduce.

2、 Dynamic classifier

As we mentioned in the introduction, when we think of the user's data as a data flow that changes with time, we will find that the user's behavior changes naturally. This change may come from the change of user's own computer usage habits (such as reading news / checking email before going to work every day), or it may come from the change of organization workflow (such as the change of approval process). In any case, with the progress of time, the user's own behavior can't be the same. If the reasonable change is not included in the consideration of detection system, it will be inevitable It leads to false and missed reports.

In order to vividly illustrate the change of user behavior, we give figure 1 to illustrate:


In Figure 1, the impact of pattern migration on the system is shown in detail. The black dot represents the exception, the blank dot represents the normal, and the implementation represents the real exception / normal boundary, while the dotted line represents the boundary divided by the traditional classifier.

We can see from it that there are only two real exceptions in data block 2 compared with data block 1, while the detection system has identified four, so two are false positive. Compared with 1, there are 6 exceptions in data block 3, but the detection system thinks there are only 3 exceptions, that is, false negative.

In order to deal with the above problems, a feasible solution is to introduce a real-time and automatic updating classifier, which can automatically update according to the user's behavior changes. To achieve this goal, the following framework can be used, as shown in Figure 2:


The framework shown in Figure 2 is essentially a K voting framework, that is, the final decision result is a combination of K classifier results. Whenever a new data block (a certain time window data of the user) is analyzed, a global K model update algorithm is started, and a new classifier is calculated by using OCSVM or gbad After comparing the accuracy of the K + 1 classifier in the latest user behavior data block, the worst classifier is eliminated and a new global K model is obtained.

The specific pseudo code of global K model update algorithm is shown in Figure 3:


Briefly explain the meaning of Figure 3 below:

1. Du is the latest marked data block (i.e. normal / abnormal known);

2. For each model m in the existing K-model a, test the accuracy of m-model for Du determination, that is, test (m, Du);

3. A new model Mn is generated for Du by using single SVM algorithm;

4. Test whether the judgment of Mn on Du is correct, i.e. test (Mn, Du);

5. Discard the wrong model from {mnua} and get the updated K model a '.

In this way, we realize the dynamic real-time automatic update of the global K model. Next, we need to adopt different classification algorithms for different data.

3、 Internal threat detection: non sequential data

1. Non sequential data

Our non sequence data here is homologous with the famous intrusion detection data set of Lincoln Laboratory in 1998, namely the famous KDD99, and all come from the MIT'98 data set. This data set has been used for many times in the field of intrusion detection. Among them, the KDD99 data obtained by tcpdump shows a non sequential form and a feature vector. See Figure 4 for details:

Each dimension has its own meaning, for example, the first represents the connection duration, the second represents the protocol type, and the third represents the network service type of the target host.

For KDD99 data download, please refer to the link.

For more detailed analysis of KDD99, please refer to the link.

Classification of MIT'98 dataset:

Fig. 17:

The data of BSM (basic security model) in MIT'98 is in the form of header + token. Before token, header records the size, version number, system call name involved and execution time of token. The BSM part of the MIT '98 dataset was used in the experiment.

For a further processed header + token, refer to figure 5:


The second line records the complete path of command execution, the attribute line records the user group, file system node and device information, the subject line records the audit ID, valid user group ID, real user and group ID, process ID, session ID, port and address, and the return line records the system call execution result / return value. Finally, the features used are analyzed on this basis, as shown in Figure 6:


2. classifier

2.1 SVM

Here we use two kinds of classifiers, one is supervised learning SVM, the other is unsupervised gbad. We will give a brief introduction.

Support vector machine (SVM) is now widely used as a linear classifier. SVM maps nonlinear data to high-dimensional linear separable space, and then calculates the maximum separating hyperplane. This experiment uses SVM based on radial based function, which is implemented by libsvm.

For more information on SVM, see the link

2.2 GBAD

Graph based anomaly detection mainly analyzes the changes of vertices and edges in the graph to detect anomalies, specifically the changes of vertices and edges modification, addition and deletion.

First, we calculate the canonical subgraph of a graph, that is, to find the subgraph with the minimum description length

     L(S,G) = DL(G | S) + DL(S)   

Where g represents the whole graph, s is the subgraph to be analyzed, DL (g| s) is the description length after G is compressed to s, DL (s) is the description length of graph s, we select the final L (s, g) smallest subgraph. The minimum description length here is a mathematical concept. Please refer to the link for details.

Our drawing is similar to figure 7:


The box represents the normal subgraph, while the shaded part represents the exception, showing three cases of vertex modification (exception point E), vertex insertion (exception point C) and edge modification (exception B-C).

3. experiment

In the experiment, OCSVM is used to test the effect of global K model updating and traditional classification, as shown in Figure 8:


It can be seen that the application of global K model can greatly improve the classification effect and reduce the false alarm rate by nearly half.

The results show that supervised learning is better than unsupervised learning, which is easy to understand. Generally speaking, learning with teachers is better than learning without teachers. As shown in Figure 9:


4、 Internal threat detection: sequence data

The above introduces the analysis method of non sequential data, which is in fact the general data analysis in the form of eigenvectors. Let's talk about sequence data. The so-called sequence data refers to the orderly arrangement of elements, which can be repeated. We are using a student project at the University of galgary in 1988, which contains command log files for 168 UNIX users. A typical sequence example is like a student moving a place every day, as shown in Figure 10:

The location is simplified, such as ml = Media Lab.

1. Analysis method

Here we get some command behavior data of the user's computer, as shown in Figure 11:


We build a classifier by analyzing the sequence pattern of the user's commands, specifically:

1. First, simplify the user's commands, such as l for LS command, C for CP command, f for find command and CP for CPP command;

2. Use LZW algorithm to build LZW dictionary for each user. The full name of LZW algorithm is Lempel zivwelch algorithm, which is used to traverse all combination patterns;

3. Calculate the weight of each mode for the obtained LZW dictionary, and specifically use the frequency representation of each mode, as shown in Figure 12:


4. Filter LZW dictionary to get the most representative combination. For each mode, calculate the "representative value" of its own and its adjacent modes, which is equal to the product of mode length and weight, that is, VP = length (pattern) x wi, and repeat continuously to get a representative dictionary;

5. For all global time windows, calculate their respective representative dictionaries. When the number of windows is n, e = RD1, RD2 RDn, RD = Representive Dictionary;

6. If a user's command mode deviates 30% from RDI in E, it is judged as abnormal;

Figure 13 of the analyzed architecture:


2. Experimental results

In the experiment, compared with naive Bayes classifier (NB), the true positive rate and false positive rate are used as indicators. The experiment shows that LZW method is better than NB method under the two indexes. See chart 14:


5、 Hadoop / MapReduce application

LZW dictionary needs to be constructed in sequence data analysis, but for the user's command mode, the workload is very huge, and the general server efficiency is very low. Therefore, the distributed computing platform based on Hadoop / MapReduce is considered. Specifically, the calculation and analysis tasks are programmed with MapReduce model, and the tasks are distributed on the Hadoop platform. Figure 15 of specific architecture:


Six, conclusion

Traditional internal threat detection focuses on data feature extraction and classifier construction, trying to improve detection rate and reduce false alarm rate through these two aspects. However, in practice, the false alarm rate is still too high to be effectively promoted. Today, the method we introduced tries to optimize and complete from the overall idea, including the changes of user behavior that were not considered before but played a key role in the design of the overall solution, so as to obtain the improvement of the traditional detection framework.

The most important contribution of this paper is naturally the global K model, which is embodied in the realization of real-time automatic update and changes in line with the user's behavior; the essence of K model is a K voting framework, that is, the combination learning in machine learning, a combination of multiple classifiers. With the help of the global K-model, this method can automatically track the changes of users' behavior in real time, update the classifier constantly, and make the distance between the classifier and the reality minimum.

In addition to the above work, we also analyze the non sequence data and sequence data respectively. For non sequential data like KDD99, SVM is a good attempt with satisfactory complexity and accuracy. SVM is often used as a standard algorithm to measure other algorithms. It is not a new topic to detect anomalies with the help of graph theory. In this paper, the minimum description distance is used as the key feature to construct a canonical subgraph, and then the user behavior is described from the change of the vertex and edge of the graph.

For sequence data, today we start with the user's command mode, build LZW dictionary to traverse all command modes, and then calculate the most representative mode. If we consider the characteristics of time series, we can also try to use Markov model to build a classifier.

Finally, the overall framework of today's discussion is shown in Figure 16:


1. A Survey of Insider Attack Detection Research, Malek Ben Salem&Shlomo Hershkop, 2008.

2. Evolving insider threat detection stream mining perspective, Pallabi Parveen, etc, 2013.

3. Evolving insider threat detection using stream analytics and big data, P Parveen, 2013.

4. Research on intrusion detection data set KDD cup, Zhang Youxin et al., computer engineering and design, 2010.

*Author: Mu Qianzhi, this article belongs to the original award program of freebuf, which is prohibited to reprint without permission