This article discusses some topics about the architecture and design of distributed penetration framework, and shares some of my humble opinions, hoping to inspire you. This paper is divided into three topics. It is suggested to read them in order. In the three topics, the author describes two pain points (architecture design and communication) in detail.
- Topic - what is penetration testing framework
- Topic - requirement analysis and architecture design
- Topic - Communication and message queuing
Topic - what is penetration testing framework
background
Now many large and small manufacturers or many private teams are doing or have mature scanners, open source or internal use. All to solve some practical problems:
- Penetration testing needs
- Instead of repeating manual operation
- Avoid omission
- Inherit / inherit excellent test methods
- As one of the products of the enterprise
- For professional users
- For ordinary users
- Grey industry (botnet)
In short, the penetration testing framework can make some of your penetration testing processes simpler, assist business completion, and simplify some business logic. Of course, for the framework, its basic function is to ensure the normal operation of plug-ins. When he has a lot of plug-ins, the value will really show.
essence
Penetration testing framework is not a particularly advanced thing. Compared with the micro service architecture in e-commerce system, the infrastructure may not be as complex as it is, or even in comparison, it can be said to be insignificant. However, penetration testing framework is not suitable for comparison with microservice architecture because it is more flexible. Some advanced development concepts (distributed / microservice) are not particularly suitable for it. A penetration testing framework / scanner can be very concise, even as a flexible script engine, but at the same time, there are many large-scale penetration frameworks / scanning frameworks (Nessus / OpenVAS). The complexity of these frameworks is much higher than that of ordinary script engines. The essence of penetration testing framework / scanner is not very good. According to its characteristics, what kind of program is it
- A single running / distributed program framework
- High quality scanning module or flexible script engine
- Stable infrastructure, logic programmable control
Problems solved
- Replace some repetitive manual labor
- Fixed logic output
- Stable output
Unsolvable problems
- Vulnerability with complex logic cannot be detected
- The design and implementation of distributed infiltration framework is very difficult
Rules of universality
For such a framework, simplify and simplify the model again, and it will eventually become a framework for RPC (remote procedure call) to call multiple programs (functional units). Of course, this is not a specific RPC Protocol, but a general concept of RPC; or a framework for directly calling multiple programs (functional units).
The biggest difference between RPC and normal RPC is that there are various types of tasks, tasks may be updated, and the execution environment is very complex. And the framework itself needs to support highly flexible functional units (plug-ins). Or in other words, the called program (function module) is very loose for the framework, and even failure / obsolescence will not affect the operation of any framework.
In this way, it can't be simply called an RPC. Based on the above situation, when he designs, he excludes later module registration and other mechanisms. The framework itself can't actively know how many kinds and how many programs he will have, and what the basic interfaces of these programs are at the same time. So RPC is not particularly appropriate.
So, what is the best way to describe a framework? Of course, everyone has their own opinions. I prefer to use loose master slave architecture as the architecture model of this kind of framework: natural slave represents functional unit or called program, master represents controller / caller or node of overall transaction logic processing, and loose represents slave and master The relationship and interdependence between them are not particularly strong. Other mechanisms can be used to maintain the relationship between them, such as registration or subscription mechanism.
Additional explanation
Of course, the framework is different from the penetration testing tools. The tools can directly use the layered architecture, or even do not pay much attention to the project model and structure related parts, directly implement some specific functions for the process, or use the microkernel architecture partially or as a whole to provide some flexibility.
Model classification of current mainstream scanners / frameworks
- Stand alone and distributed: the biggest difference between stand-alone and distributed in this paper is not whether the network is connected, but whether the specific task execution node is on a machine.
- Microkernel architecture: this architecture is very flexible and can be used as a part or an overall framework. For example, the common script engine: nmap's NSE system, sqlmap's tamper mechanism, etc., in fact, some ideas of microkernel model are used.
- Micro kernel and master-slave: the micro kernel adopts master-slave model to decouple functional modules to a certain extent.
In the next topic, we will discuss the architecture design of distributed penetration framework / scanner.
Topic - requirement analysis and architecture design
background
In short, the task of this framework is to receive tasks, distribute tasks, execute tasks, and process results. Therefore, the distributed penetration testing framework is much simpler than the micro service architecture of e-commerce platform, because the transaction of the penetration testing framework is relatively simpler, or even the concept of transaction is diluted. But it is also a task intensive application. To some extent, there is a certain demand for high concurrency and high availability.
We can sort out the whole process from the beginning. Master gets a task. The task is divided into several atomic tasks, which are sent to different workers according to the type. After the task is executed, the result is sent to master for transaction summary.
Although the whole process is not very complex, we still need to analyze what we need to make a reasonable architecture design according to the requirements
demand
After thinking about many methods of requirement description, we have no way to describe exactly what functions our system needs from one point of view, which is like we have OOP, but the emergence of AOP makes a very good supplement to OOP, and of course, SOP (state oriented programming) also makes a good supplement to OOP. Ah, the topic is far away. After some thinking, we discuss our needs from the perspectives of space and time (process / logic).
- Spatial perspective refers to the perspective of object, behavior, data and entity, which is similar to that of traditional OOP, so there is no need to elaborate too much.
- From the perspective of time / process / logic, we think about the whole process from the beginning to the end of a behavior, such as the whole process from master to slave communication, the life cycle of a transaction (task from generation to end or being discarded), the life cycle of a module, and the scheduling process with main logic.
Space demand
From the perspective of functional entities, we need something like this:
Master
- transaction management
- Task reception
- Task distribution
- Result collection
- Node management
- Node audit (inspect)
- Node start / stop
- user management
- slightly
Slave
- State independent
- transaction processing
- Task execution
- Return result
- Node management
- Node status report
Time / process / logic requirements
Of course, this refers to a series of mechanisms or possible processes or logics needed to solve problems in this framework
- Communication system (to be discussed in detail later)
- routine system
- Task management system
- Results management system
- Function module engine
- High modularity: it means that modules are highly independent, extremely low coupling or even no coupling
- Use script engine to improve flexibility: for small tasks that do not need to create new modules, use only one script engine to start and execute tasks quickly
- Low dependency on Framework: modules are written according to certain specifications and do not need to provide any interface to the framework
- The interface is unified, but the environment of the module itself is diverse: the module can be any language, any environment, any container, but the interface of the module must be acceptable to the framework
- dispatching system
There is a scheduling system that allows modules to work together.
- User interface (omitted)
- Framework extension system
- Task script SDK
- Key point / key message queue expansion
Design
According to our requirements, the simplest is that our overall architecture should be a loose master slave architecture. Master does not rely on any slave, but slave can work only when it depends on master.
In the following description, we first agree on several concepts to appear in our framework:
- Master - master entity
- MQ middleware for connecting master to all services of slave
- Slave - business specific entity
- Function unit: it has the ability to execute one task, but only one can be executed. It can directly report the survival status and results to the master, and it can also exist out of the node
- Node: it can manage multiple functional units, but it does not implement the task execution interface or the ability to execute tasks. It can directly report the survival status and results to the master (the results come from the functional units)
OK, let's use the following diagram to illustrate the structure
At the same time, the overall framework is not enough for us to describe the whole framework, and it seems very perfunctory, so it is necessary for us to elaborate in detail:
Master split services and subfunctions
Master is a huge collection of functions, but it's not a "cow". Master is composed of many services, so it's very necessary for us to separate master into specific services to elaborate their uses.
- communication services
- Node and module management service: provide management function for slave
- Node management sub service
- Module (function unit) management sub service
- Module scheduling service: coordination logic between scheduling modules / functional units
- Transaction services
- Task processing sub service
- Result processing sub service
- Persistence services: storing tasks and results
- User service
- User authentication and management (later stage)
- User interface
Slave split services and subfunctions
- Node service
- Master communication service
- Control channel: transmitting control information of the master node
- Task channel: task receiving channel
- Result channel: result reporting
- Reporting channel: additional information reporting
- ... ...
- Function unit management sub service
- Functional units using sub services
- Function unit service
- Receiving task
- Check parameters and rationality
- Perform tasks
- Return final or periodic results
Other equipment
- Message queuing (clustering): the next topic will focus on.
Process design
Design corresponding to requirements, we still need to design the important process, which is different from the above service / function split, but it can also describe the process of framework work well. So it is very necessary for me to make some explanations from this perspective.
Transaction flow
Transaction processing plays a very important role in the microservice architecture of various portals and e-commerce microservice architecture:
Transaction exists to ensure the integrity and accuracy of task execution. For example, it is necessary to illustrate the existence of transaction: when your task begins to execute, but due to node crash or network reasons, there is no way to successfully execute this operation. After receiving the signal of execution failure, the transaction will roll back to the previous safe state, so as to avoid pending This Schrodinger state is also an embodiment or implementation method of final consistency (of course, if we only set one transaction control center here, there is no need for strong synchronization of data in multiple transaction control centers)
When we design this framework, we intend to introduce the concept of transaction into our framework, and use the final consistency to process transactions
Functional unit life cycle
Of course, all functional entities should have a state machine, but due to space limitations, let's take a simple example. The life cycle state diagram of functional units is as follows
After the function unit is started, it first enters the initial state for initialization. If the initialization succeeds, it enters the prepared state. If the initialization fails and causes the function unit to crash, it enters the crash processing flow.
In the prepared state, the function unit automatically initiates registration to the master. If the registration succeeds, the status changes to registered. If the registration fails, the function unit enters unregistered.
The registered state directly enters the working state. In the working state, the heartbeat (or other mechanisms) will be sent regularly to ensure the connection with the master. In case of multiple disconnection, it will enter the unregistered state. At the same time, transactions are processed in the working state (omitted). If the program encounters an unsolvable crash during the transaction, it enters the crashed state.
In the crashed state, we need to reset the function unit to decide whether to stop or restart the function unit.
Unregistered status will automatically close the functional unit, because unregistered is a unit that marks the normal end of the functional unit.
In the previous topic, we didn't give much description about message queuing (Communication). Next, we will elaborate on the problem of node communication in the next topic.
Topic - Communication and message queuing
This part briefly discusses the relationship between the framework and message queue and the design of message queue.
Message queuing Foundation
Necessity
Is message queuing really necessary for our penetration testing system? My personal answer is necessary.
Main features: solve service (module) communication problems
Communication is more than just "I send, you receive". From the overall architecture of this framework, it is a master / slaves architecture mode, that is to say, a master and multiple slaves communicate at the same time, of course, the types of communication are also diverse:
- Task distribution communication (many to many): Master distributes tasks to slaves. This communication model is more like a producer / consumer mode. It's easy to understand that master sends tasks and slaves executes. We naturally guarantee that we don't want multiple slaves to do a task at the same time, which wastes resources.
- Notification and subscription message communication (one to many): you can simply imagine the requirements of broadcast and multicast. This communication model can be temporarily called fanout. In fact, this is also very understandable. When your master wants to send a notification message, this message may be for all slaves, so the most appropriate way is to use the broadcast communication mode; similarly, when your master If you want to send notifications to a group (such as closing all crawler module groups, upgrading a group's database, updating code, deploying new functions), you don't want to be received by other unrelated groups or nodes. For these situations, fanout can be a good solution.
- Point to point (one-to-one): This is not about the communication between slave and slave. Our architecture doesn't seem to like the connection between slave and slave, which will greatly increase the coupling and complexity. However, the single point communication between master and slave must be available, because we often need to tell a slave what to do (turn off slave, restart slave Even upgrade slave)
- Result report and survival status report (many to many): as a stateless slave, the first thing to complete a task should be to send the task back to the master; of course, as a master, it is necessary to know some survival status of slave. Besides the methods of active inquiry, slave should also actively report to the master; at the same time, slave The key part of is hung up. The error message / log should also be sent back to master In fact, this kind of demand is not excessive. We need a fan in model to solve this problem.
Secondary characteristics: reliability / safety
- Reliability: under normal working conditions, the data received by both ends of your communication is indistinguishable, and there will be no data difference, and there will be no loss of data for no reason, and unexpected data will appear. You can rely on your data sources.
- Security: do not want data intercepted by others to cause information disclosure, or rce due to deserialization vulnerability, or command injection, or unknown risk, communication needs to support SSL
As for a reliable message queue, it has a complete set of mechanisms to ensure that messages are reliable from one end to the other. You don't have to worry about losing your messages in the process of communication / in the message queue (because of machine restart or other unexpected factors). Let's take rabbitmq as an example to illustrate some of rabbitmq's work on data reliability and consistency:
At the ConsumerIn the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, The simple way to handle this is to ensure that your consumers handle messages in an independent way rather than explicitly deal with duplication. If you want to solve this problem, the easiest way is to use idempotence (rather than deal with it directly).
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redeliveredflag on it when it is delivered again (whether to the same consumer or a different one).... Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, It can do this only for messages with the redeliveredflag set Redeliveredflag, if so, your receiver will receive the message without ack again On the other hand, if the redelivered flag is not set, you can ensure that no duplicate message will be sent to your receiver. Therefore, we do not have to use idempotent method in the application business layer. We can simply use this feature of message queuing.
Model selection (programmable protocol - rabbitmq)
Here, we choose rabbitmq as our message queue support. Next, combining the characteristics of rabbitmq and our framework features, we can try to simply design the key message queue structure. In fact, for our rabbitmq, the design of message queue will also be quite pleasant. Our sender doesn't know the specific message queue of the receiver. In fact, message queue is only a concept existing in the receiver. At the sender, there is only the concept of switch and route. So we can use different kinds of switches to cooperate with the routing.
Switch
In rabbitmq, there are four types of switches:
- Direct exchange: the routing key provides one-to-one direct connection service, which can solve the problem of master to a single node or functional unit.
- Fanout exchange: provides one (one switch) to many message exchange services, which can solve the notification problem from the master to all nodes. However, this requirement is not particularly necessary (except for the time when the whole system is undergoing major upgrade / shutdown), which is not very common.
- Topic exchange: in fact, this is similar to fanout exchange. As long as you subscribe to a topic, you can receive relevant information about the topic. Of course, the method of subscribing to a topic is also very flexible. Through this switch, we can send messages to each group / type / functional unit or node with a preset feature.
- Header exchange: another form of direct exchange, which is not routed with routing key, but controlled by one of its own headers.
According to these four types of switches, we can easily design our communication system. (when used in practice, we will not choose all)
Here is a brief description of several switches in turn:
- Task distribution switch (direct exchange). Each type of task is a routing key. The node needs to set the routing key to receive the task. In the client, each type of functional unit needs to set the same routing key (type name) to receive the task
- The result collection switch (topic exchange), logging system, audit system or other systems can subscribe to the results they want to view through the result collection switch. Similarly, if a new service or module is developed, the result of task execution can also be obtained through the subscription result switch.
- Control switch (direct exchange) directly manages the behavior of the node (start / stop / update). Of course, the routing key of this control switch is the guid or UUID of this node, which can realize the single point link from master to slave.
- Notify the switch (topic exchange), group by functional unit type, and control the batch operation of the switch (update the database for a type of functional unit, hot patch / batch shutdown).
- The same as the result switch.
Among them, task distribution and result collection belong to transaction (task) management service, and control / notification / feedback switch belongs to node (function unit) management service. In this way, management and business can be separated perfectly through message queuing.
The above describes some parts of a distributed penetration framework, but limited to space, we can not describe the design ideas of each part clearly. The author's ability is limited. If there is any mistake in the article, I hope the readers will give me some advice.
PS: Zhihu's editor somehow the particle state eats the tab of the list, resulting in a format bug in the list. I can't fix it after trying?