Hacking Book | Free Online Hacking Learning


using splunk to complete the intelligence analysis in the early stage of penetration test

Posted by chiappelli at 2020-04-17

Original author translation of prophet: original link

Absrtact: in the analysis of aggressive security data, Splunk has more advantages than traditional grep in data browsing and analysis.

Why Splunk instead of elk?

Elk itself is a very excellent open source project. If it is supported by the helk project of cyb3rward0g, it will go a step further in ease of use. In fact, I've been using elk for a while before I've turned to Splunk. When using elk, I realized that I had to create a. Config file to specify the location of the data before uploading it. This brings me a lot of inconvenience, because it means that for each small data set, you need to point out the corresponding header and data type. However, Splunk simply clicks on the Web UI and uploads the CSV file, which makes the data upload easy.

I know that many penetration testing teams are using elk for logging and analysis. I will introduce this in another article. Elk is generally not suitable for the "fast food" POC environment, which is generally used to assess what resources are needed to create an offensive analysis system and what benefits will be gained from doing so. In this article, we will focus on how to use Splunk as a log analysis system to achieve rapid visualization and search of data.

Install Splunk

The installation process is very simple. Readers can visit Splunk's website, and then download the MSI package for windows, or download the corresponding package according to their own operating system. My computer has a system with 20GB ram and 300gb SSD hard disk. The software works well on this computer. Of course, it doesn't matter if the memory is less, because Splunk doesn't seem to eat much memory.

We recommend that readers use a developer license with a six-month trial period during which they can index the data they need through Splunk. In fact, the process of indexing is just to import data into the database and optimize it to facilitate search.

Data intake

Yes, it's really easy: you can even get the. JSON. GZ file directly through upload. However, because the project sonar file is too large, I use the command Splunk add oneshot sonar.json.gz to load it into Splunk. Although it usually takes a little time to get the data, once it is finished, the search will be as fast as lightning.

If the data is less than 500MB, we can even use the Web UI to upload the data:

Project Sonar

To demonstrate how to apply forward DNS data from project sonar, I decided to examine Splunk's ability to aggregate and understand data.

Content distribution network

Find the domain name that looks like *. Cloudfront.net.

General grep command:

Search with Splunk:

Using the grep command to complete the above search takes 13.13 times longer than using Splunk.

Domain search

Get all subdomains of a specific domain:

The process is almost instantaneous. If you use the regular grep command, this process usually takes about 10 minutes. What's more interesting, though, is that we can also use the analysis capabilities of Splunk to perform operations like "how many domains share the same host?" Information such as:

This results in the following:

As you can see from the above figure, all the host names on the right point to a specific server.

In addition, we can map the physical location of the server, for example:

This results in the following:

Sometimes, of course, this may not be useful, but it does give you immediate knowledge of the geographic location of the target server.

If it is necessary to precisely attack the servers of the target organization in a specific country, you can use the Splunk to perform the corresponding filtering:

If necessary, it can even be narrowed down to a specific country.

Get all subdomains

In addition, we can perform the following search to get all subdomains. Of course, it may take too much time to parse 1.4 billion results, so here's just an example of a domain:

Or search for specific subdomains:

This method is helpful when you need to search the same subdomain from the domain name of the target organization to find more subdomains.

DomLink domains-> Give me all subdomains

If used in combination with my tool domlink, we can get the corresponding search results, and let Splunk provide us with a complete list of all subdomains belonging to the target organization. When the two are combined, it will be easy.

Run the tool first, and then use the command line flag to output the results to the specified text file:

Now, we have obtained a domain list, which is located in our output file, so that we can read these domain names, create a new file starting with name, and then use a simple regular expression to replace the first field of the domain name with *

After the above processing, the content of our file will be as follows:

Save this file to the C: \ program files \ Splunk \ etc \ system \ lookups directory and name it book1.csv. Then, perform the following search operations:

The results are as follows:

Then we can export these results and import them into other tools.

Password dump

The data used here is the data of leakbase breachcompilation. At the same time, in order to import the data into Splunk, we also refer to this article and use elk to process the format of password dump data of outflank. By running the script provided, you can easily convert the dump data to a space delimited file format, which is too convenient:

I changed the script a little to output the results to disk instead of pushing them directly to elk / Splunk.

This is very important for us. This is because Splunk cannot receive space delimited files, so we have to modify the script provided by outflank to convert the corresponding format to an importable CSV format.

To do this, just execute the Splunk add oneshot input.csv - index passwords - sourcetype CSV - hostname passwords - auth "admin: change me", and then it's OK!

OK, let's see what the most commonly used password is!

If you are interested in basic vocabulary, you can use fuzzy matching technology, such as:

@The most common password for facebook.com email:

Next, we present the data in a graphical form:

What's more interesting is that we can easily cross check the password through the email obtained by employee osint. Of course, we can do this through the grep command.

DomLink domains -> Give me all passwords for all domains

After running domlink (described above for domlink), we can continue to use it to complete the corresponding search. However, the regular expression used here is @ domain.com, not. Domain.com. In addition, you need to set the header field of the email as follows:

Then perform the query operation:

In this way, we will get a tabular output, which can be exported to CSV format if necessary:

It's amazing, at least I think so.

Employee OSINT

When we execute targeted phishing attacks or "drag libraries" on the external infrastructure, we usually need to get the corresponding user name list first, so it is very important to understand the profile of the organization's employees. However, linkedint seems to have been born specifically for this purpose. In addition, we need a reliable tool to help us get the employee list of the target organization. This is not particularly advantageous for one or two specific datasets of a few companies. However, if you are going to automatically collect employee data for Alexa's top 1 million websites and fortune 500 companies, the advantages of this method are immediately apparent. Of course, these tasks can be fully automated.

Here, we'll show you how to import data from an unpublished dataset that might be exposed on hitb gsec to analyze basic employee information. The same approach described here applies to linkedint. In fact, we only need to import the data as a CSV file, as described above.

We can use Splunk to search for employees who meet specific last name, first name, or geographic location requirements:

We can even look at information such as the number of employees in related jobs:

It can be seen from this that most of the employees are delivery personnel, because this company is equivalent to deliveroo in China.

In fact, we can further draw it into a role distribution map:

Although there are only about 5000 employee data here, it is enough to understand the general distribution of the personnel in the target organization. At least before the next step of social workers, it can help us answer the following questions:

For some companies, if a large part (nearly 10%) of the employees are delivery workers, they are likely to be unable to reach ordinary employees, and it is difficult to obtain the trust of ordinary employees. Therefore, it is very important to make corresponding choices according to the above problems.

Finally, I want to know how far the actual gap between my own osint and the actual distribution of employees is, in other words, how accurate is it?

Comprehensive use of employee osint and password dump

If you combine the e-mail obtained through employee osint with password dump, it will be easier to find the password:

That's good, at least that's what I think. Of course, you can use grep to do this, but the above method is not only easier to import a new CSV file, but also can find a matching password list almost immediately (you can export it to CSV or table).

Nmap scan

Upload the. Gnmap file to Splunk and set the timestamp to none.

Then, you can run the following sample query to format the data and complete the corresponding search in an efficient way through Splunk:


This is very effective if you have a large number of scans to parse. For example, you can extend the query to map the geographic location of all servers with port 443 open to TCP:

[1] Reference article: how to parse nmap in Splunk


Data processing and analysis play an important role in penetration testing. As the saying goes, if you want to be good at something, you must first make use of the tools. Just as password cracking requires the help of 32 GPUs, Splunk also helps to make the penetration testing process more simplified and efficient.

Many people think that the gzip analogy is a little superficial. Well, let's give agn0r a deep insight into the operation mechanism of Splunk based on the Lexicon Structure: