Preface
Siem (security information and Event Management), as the name implies, is a management system for security information and events. Most enterprises are not cheap security systems. Based on the author's experience, this paper introduces how to use open-source software to analyze data offline and use attack modeling to identify attack behavior.
Review system architecture
Take the database as an example, collect the query logs of MySQL through logstash, back up to the HDFS cluster in near real time, and analyze the attack behavior offline through Hadoop script.
Database log collection
There are three common ways to collect data logs:
Mirror mode
Most database audit products support this mode. By analyzing database traffic, decoding database protocol, identifying SQL prediction, and extracting SQL logs
Agency mode
The typical mode is DB proxy. At present, Baidu, Sohu, meituan, Jingdong and other related open-source products are available. The front end accesses the real database server at the back end through DB proxy. SQL logs can be collected directly on DB proxy.
Client mode
Through the installation of the client in the database server to collect SQL logs, a typical way is to collect the logs through logstash. In this paper, the client mode is explained, and the other ways are essentially similar.
Logstash configuration
install
Download logstash https://www.elastic.co/downloads/logstash current version 5.2.1
Open MySQL query log
MySQL query log
Configure logstash
localhost:work maidou$ cat mysql.conf
Input {
File {
type => "mysql_sql_file"
path => "/var/log/mysql/mysql.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
Output {
kafka { broker_list => "localhost:9092" topic_id => "test" compression_codec => "snappy" # string (optional), one of ["none", "gzip", "snappy"], default: "none" }
}
Run logstash
bin/logstash -f logstash.conf
Examples of logs
2017-02-16T23:29:00.813Z localhost 170216 19:10:15 37 Connect
2017-02-16T23:29:00.813Z localhost 37 Quit
2017-02-16T23:29:00.813Z localhost 38 Connect [email protected] on
2017-02-16T23:29:00.813Z localhost 38 Query SHOW VARIABLES LIKE 'pid_file'
segment text by words
The simplest operation is not to cut words. If you like to automatically cut out database name, time and other fields, please refer to:
Gtok syntax
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
Gtok syntax debugging
http://grokdebug.herokuapp.com/
Common attack features
To build a shooting range environment with the common wavsep, please refer to another article of my guide to build a shooting range based on wavsep, http://www.freebuf.com/sectool/125940.html
Use SQL to scan links
Analyze the attack characteristics. Here are two. Please summarize more attack characteristics
Feature 1
2017-02-16T23:29:00.993Z localhost 170216 19:19:12 46 Query SELECT username, password FROM users WHERE username='textvalue' UNION ALL SELECT NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL, NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL#' AND password='textvalue2'
A large number of null fields will be generated when enumerating data with union query
Features 2 and 3
When enumerating the database structure, information ﹣ schema will be used. In addition, group by x) a will be used by individual scanners.)
2017-02-16T23:29:00.998Z localhost 46 Query SELECT username, password FROM users WHERE username='textvalue' AND (SELECT 7473 FROM(SELECT COUNT(*),CONCAT(0x7171716271,(SELECT (CASE WHEN (8199= 8199) THEN 1 ELSE 0 END)),0x717a627871,FLOOR(RAND(0)*2))x FROM INFORMATION_SCHEMA.PLUGINS GROUP BY x)a)-- LFpQ' AND password='textvalue2'
Hadoop offline processing
Hadoop is based on map and reduce model
A simplified understanding is:
cat data.txt | ./map | ./reduce
During the most simplified period, we can only develop the map program to process the log data line by line in the map to match the attack behavior.
Perl script development, python similar
#!/usr/bin/perl -w
my $rule="(null,){3,}|information_schema|GROUP BY x\\)a\\)";
My $line= "";
while($line=<>)
{
if( $line=~/$rule/i )
{
printf($line);
}
}
Just run it under Hadoop.
production environment
- The rules in the production environment will be much more complex than this, and you need to continue to supplement them. Here are just examples
The rules in the production environment will be much more complex than this, and you need to continue to supplement them. Here are just examples
- There will be a large number of duplicate alarms if only map is written. Reduce needs to be developed for aggregation
There will be a large number of duplicate alarms if only map is written. Reduce needs to be developed for aggregation
- In case of emergency response, you need to know which database SQL is injected into and which account is used. This needs to be supplemented when logstash cuts the field
In case of emergency response, you need to know which database SQL is injected into and which account is used. This needs to be supplemented when logstash cuts the field
- In case of emergency response, it is better to know the corresponding links of SQL injection. In this case, it is necessary to associate the web access log with SQL log. The more mature scheme is to learn the time-based association matrix based on machine learning
In case of emergency response, it is better to know the corresponding links of SQL injection. In this case, it is necessary to associate the web access log with SQL log. The more mature scheme is to learn the time-based association matrix based on machine learning
- The direct collection of SQL data by the client requires Mysql to open the query log, which has a great impact on the performance of the server. The large companies I know are mainly connected by DB prxoy. It is recommended to collect data on DB proxy.
The direct collection of SQL data by the client requires Mysql to open the query log, which has a great impact on the performance of the server. The large companies I know are mainly connected by DB prxoy. It is recommended to collect data on DB proxy.
- There is a bottleneck in SQL injection based on rule recognition. Although it has some progress compared with Web log level and traffic level, SQL semantics has become an inevitable way.
There is a bottleneck in SQL injection based on rule recognition. Although it has some progress compared with Web log level and traffic level, SQL semantics has become an inevitable way.
Successor
Based on sandbox and algorithm mining attacks, see below.