Hacking Book | Free Online Hacking Learning


experience of architecture and function design of a cloud management platform

Posted by harmelink at 2020-03-14

The definition of cloud management platform is put forward by Gartner, which is summed up in two parts. The first is to manage public cloud and private cloud to form a hybrid cloud. The second is self-service, image division, measurement and billing, load optimization. The ultimate goal of the cloud management platform is to achieve the optimal effect when the application runs on the cloud platform. The public cloud that you are familiar with has cloud service, which can guarantee the reliability of the application to the maximum extent. When we design a cloud management platform ourselves, we should also consider this aspect, and also add some docking and management functions of external systems.

When it comes to cloud computing, you will think about openstack. What's the difference between openstack and cloud management platform? Can we make some cloud management platforms based on openstack? At present, from our own definition and market feedback, we believe that cloud management platform is a broader scope. We can think of openstack as a resource module under the cloud management platform, which can manage multiple openstack versions.

Some enterprises will deploy different openstack resource pool management modules in different data centers. On top of openstack, a cloud management platform is also needed to manage multiple openstack resource pools and different openstack versions. In addition, many virtualization is not implemented by openstack + KVM, such as VMware, Xen and other virtualization, so the cloud management platform also aims at this kind of virtualization Make docking. From the perspective of level, cloud management platform is to solve the problem of one kilometer of user's final use of resources, and the self-service platform of cloud management platform is ultimately provided to users.

The cloud management platform we have implemented has three main functions:

Heterogeneous virtualization management. Openstack is born to connect with KVM platform, which is a relative disadvantage in Xen, Hyper-V and VMware docking. We realize the management of the same ability.

Multiple openstack versions, multiple resource pool management, and multiple data center extensions.

Dispatching, metering and billing, external system integration.

Cloud management platform mainly has three functions: resource management, operation and self-service. Resource providers include storage, computing and network. The resource pool ultimately delivers a service to users, such as IAAs level of computing service, storage service, network service, and PAAS level of middleware service and data service. How to deliver it to users? It is the cloud management platform that manages the resources. Through operation, this kind of operation on the public cloud can be metering and billing, and then the application can be provided to users through the self-service interface.

This is the overall positioning of cloud management, which is a link between the preceding and the following. Manage resources from the bottom, provide interface and API for application from the top. If a start-up company designs a cloud management platform by itself, five modules are essential: resource management, operation management, service provision Management (using interface and API), operation and maintenance management and security management.

This is the first version of our cmp1.0 architecture. We need to split the resource management system and operation management system. The resource management system is responsible for physical machine management, storage management, network management and multi data center management. The module in the upper right corner of the architecture diagram is a specialized virtualization management system, which can also be regarded as a resource management module.

Why split the virtualization management system? Because the virtual management system is a core module of cloud management platform, it needs to be designed more precisely, so it is separated separately. The ultimate goal of the operation management system is to operate the resources managed by the resource management system and the virtualization management system into a service. For example, the service catalog is to publish the resources into cloud computing products. The work order management, process management and measurement billing are responsible for users' application of a virtual machine or a piece of storage space, how to do measurement billing and how to publish them.

In addition, the user's self-service, operation portal and administrator portal also need to open the unified API, because some applications use the API to call virtualization, storage and network instead of the self-service interface. To do a good job in a system, operation and maintenance management, such as monitoring and log analysis, is essential. In addition to the cloud management platform, we designed a separate operation and maintenance management system, which collected the running status of the system through ZABBIX and other acquisition modules, and presented it to the administrator of the cloud management platform in real time. This is version 1.0 of the overall architecture design of CMP.

First look at the resource management system. Virtual management system can be regarded as a sub module of resource management system. At present, the most common virtualization users in the market are VMware. We have counted telecom operators, banks and so on. VMware accounts for more than 80%. The second is small and medium-sized enterprises, which use more Microsoft Hyper-V, and Citrix, KVM, Su Yan virtualization.

When we are doing the virtualization management system, we are doing the virtualization adaptation layer. We are doing a driver for each virtualization platform. This driver can issue instructions and feedback the status, and open the API for the management layer. For example, basic management of virtual machine, such as power on and power off, can be done through driver. When we do it based on KVM, we refer to the Nova module of openstack virtualization management. The main purpose is that with Nova, we do not need to completely rewrite the functions of KVM virtualization management.

When we design the private cloud, users have very high requirements for application support, such as database services, and they will explicitly require that it be placed on the physical machine. Therefore, the resource management module needs to do physical machine management. At present, it is based on IPMI, which can do physical machine allocation, automatic installation and automatic service provision. The difference with virtualization is that the allocation unit of the physical machine is one physical machine, and the virtualization allocation unit can do smaller segmentation on the physical machine.

There are also enclosures. At present, when we talk about cloud computing, we often talk about distributed storage or cloud storage, similar to the implementation of object storage. If you do storage management and form storage services in traditional enterprises, there is also traditional storage, also known as SAN storage, including FC storage and iSCSI storage. The implementation of this storage management is through the standard SMIS protocol. If there is no such protocol, we need to dock the CMD line / shell script.

Network management of resource management. In cloud computing, SDN management is often introduced. In addition, there are routes, switches and firewalls. This management needs to automatically divide network services and a VPC or subnet on it. It needs to be connected and presented through the network management module API.

Finally, multi data center management. For example, to help a public cloud customer implement a cloud management platform, the public cloud may deploy multiple data centers across the country. It involves the management of multiple data centers by a cloud management platform, which is the implementation of a distributed architecture. For example, in the data center, openstack or vCenter is used as a resource pool module, and finally a cloud management platform is used to manage multiple data centers and do the operation and service of the overall platform.

Just now, I have simply summarized the part of resource management, including the unified management of computing resources, storage resources and network resources.

After management is the process of operation. Operation includes publishing the managed resources into services, which requires a service template planning and design. The service template includes defining a virtual machine as a service, including the definition, publishing, auditing and presentation on the interface of the virtual machine. After the service catalog, we have the concept of service, and we also need to do measurement and billing. When users use resources, we need to do statistics. Fees can be collected on the public cloud, and the private cloud can do cross department resource usage statistics. This is a combination of order management, user management, and resource usage measurement and billing. That is to say, we need to determine the resource usage time of a user or a department Long and final billing process. Finally, the operation portal, the operation administrator can operate these services. For example, public cloud manufacturers regularly release new cloud services, basically at the level of operation and management.

We have made a summary, the application can be divided into two types: real-time transaction type and online batch processing. These two applications have their own specific requirements when migrating to the cloud platform.

First look at the traditional trading system, such as e-commerce system or CRM human resource management system, which is divided into three layers: Web, app, DB. In the past, large and medium-sized enterprises, such as banks or operators, were all hosted on small computers. The Web uses x86 servers, the app may use Weblogic, WebSphere to run on small computers, and the DB uses Oracle or DB2 databases, all on small computers.

The de IOE that Ali has been talking about is actually a kind of distributed architecture transformation. After the transformation, the traditional Oracle database and DB2 database will gradually do the distributed database or the uncomplicated data will gradually migrate to the unstructured data and memory database, and the NoSQL database. In fact, the database is gradually introducing new technologies to realize the modular distributed Reform.

If we transform the original single database into a distributed database, we need the data routing and centralized access of the distributed database. To the application layer, we used to say that Weblogic and WebSphere are now micro services, for example, one application at a time, and each application is transformed into a single service. With microservices, the governance of microservices, such as service choreography and service access routing, is also needed in the upper layer of microservices. The upper layer is the user interaction layer.

Traditional applications bring many requirements when they are transformed into Cloud Architecture:

The first key point is web access, for example, from single machine to load balancing + back-end multi nodes. From the change of customer traffic, we all know that there is elastic scaling, and the number of nodes changes with the change of traffic.

The second is x86 cluster deployment. In the era of minicomputers, it is a single machine or ha deployment. When it comes to cloudization, from web to app to data layer, it is x86 cluster deployment, so it needs to have cluster management function.

The third is the distributed deployment of data. After storing hundreds of millions of pieces of data in the original single database, the query transaction latency and response are relatively large, so the database splitting will be considered, such as the vertical or horizontal splitting of database tables. After splitting, there will be new problems, such as unified access to external data. If a table was originally split into multiple tables, cross database join and cross table join of each table are new problems. Therefore, the data routing layer or unified data access engine should be built on the cloud platform. Open source includes hibernate, shards and ibatis sharding, but if the open source access engine is integrated into the cloud platform, it is necessary for the cloud platform to present their monitoring and calling logic.

The fourth is data platform. In the past, app and database logic were generally tied, such as writing some stored procedures in the database, which often have a strong logical relationship with the middleware layer. After microservicing, it needs a single database storage in the background and a centralized storage platform. Finally, it needs a data platform for storage analysis and order history trend presentation.

Another is online analytical processing. Data warehouse used to be a large Oracle data warehouse. After the emergence of Hadoop technology, you want to use Hadoop to do the data warehouse function of Oracle. The first architecture is to gradually introduce Hadoop technology in addition to the original Oracle DW. If you want to do associated and complex historical data analysis, such as the bank's complex user data portrait, you need MPP database. If it is not complex, you can also use Hadoop. The first online processing analysis introduced Hadoop parallel and split architecture. If you need to analyze in Hadoop, you can import the data from the data warehouse to Hadoop for analysis. There is a problem in this way. The independent information system lacks data interactivity and its evolution route is limited. Later, we mentioned the enterprise level data platform, from ETL layer to data collection, which is unified.

Summarize the new requirements of transaction system and batch processing system for cloud management platform in the process of cloud access. The first is elastic computing, and web nodes are elastic and scalable in real time. Second, APP middleware layer gradually introduces micro services. There are a lot of micro service architectures and designed micro services, which need unified management and control. The third is that more and more modules need to do more analysis on the generated data monitoring. For example, a user went to the cloud three years ago, and now the data center has accumulated a large amount of monitoring data, which is also very useful for him, and can do historical trend analysis. The fourth is to support the docking and management of open-source components of big data and open-source database classes.

We have done several things. The first is virtual machine layout and elastic scaling. As you know, docker has its own elastic scaling, which is also needed in the virtual machine layer. Secondly, micro service management is introduced into resource management system. The third is the support of open source components. Docker + kubernetes is introduced to migrate some content of Hadoop and spark to the elastic computing system, so that the new design of cloud management platform for applications has evolved a step. This is the design of version 1.3 in 2013, and the design of version 3.1 in 2014 and 2015 will gradually introduce elastic sub computing system.

At the beginning of the cloud management platform, the design of 1.0 is just the management of resources. In the design of 2.0 and 3.0, we gradually find that in the process of cloud access, applications will put forward new requirements for the cloud management platform, and how to adjust and add new technologies to support the application. Next, let's look at the changes in the design architecture of the cloud management platform itself after the introduction of containers and microservices.

A comparison of the two pictures. On the left is the 1.0 architecture. Its deployment architecture includes self-service portal, portal background, user service, operation service, operation and maintenance management. Operation service includes order management, billing management and AZ configuration. All businesses are coupled. If you want to upgrade a function, you need to upgrade the whole service. This is a disadvantage, and the development and test team is not clear, any function point to upgrade must be refactored. Therefore, we consider introducing such an architecture in microservice. Each function point in operation service, product support and operation and maintenance management is made into a microservice separately.

The system architecture of microservice is mainly divided into two parts: the first is the separation of customer interaction and business logic. JS in the foreground is separated from Java in the background. The second is micro service service, which is managed by service management, service registration and service discovery. The front desk uses the micro service management platform to discover the corresponding services, which are diverted to each service node. The cloud management platform eventually provides many products, such as virtual machine products, block storage products, etc. we will make these products into one service. Through micro service, we can realize the autonomy of each service in the data center. The upgrading of different services can be completely based on ourselves, just to ensure that the external API is unified.

Another is distributed. We have made stateless to make the cloud management platform more flexible. For example, a large operator is group level, with tens of millions of users. The pressure of the cloud management platform itself is also a big point in each promotion. Every promotion is to prepare a lot of virtual machines first, and now microservices can be expanded automatically through stateless.

What is the final state of each management platform after microservicing? The first is operation management portal. Each portal has order query, product catalog, work order and so on. When a service is selected in the portal, it will be transferred to the operation management API in the form of rest API. Each API is a separate service. After you click a service, the logical implementation of the service is to find the corresponding registered service through the microservice management platform in the background, and then the service will be the final logical implementation.

Resource management platform is also the same concept, for example, asset management, fault alarm and other functions will eventually fall on the API of one service, and then through service discovery, service management function diversion to do unified processing in the background service.

After micro service, it is very important to monitor the service capability. At present, we have two ways, one is to use ceilometer to collect and present real-time data, and the other is to store historical data after real-time rendering. Currently, we use spark cluster to do this. Each hypervisor needs to be presented and monitored in real time, which is an essential module for the platform.

Then there is the log, such as the call log. From the user applying for a virtual machine service to the call between each service, there are about 7 steps. If there is any problem in any of these 7 steps, the product fallback should be done. Because this will affect the final use, so it is necessary to monitor and collect the call log of each step to ensure that the problem can be checked. We have adopted the open source elastic search and made some encapsulation and transformation. Finally, it is presented on the log management interface.

Finally, it is the relationship between the system modules after micro service. In the past, there were three services: Web, app and DB. After microservicing, they became more than 30 services. Each service also has a call relationship, which is the most important module to maintain the operation of the system. In cmp1.0, we seldom emphasized the function of the chief architect. After the introduction of microservice, we set up the function of the chief architect in the background R & D to maintain the relationship definition between modules and the architecture design between relational services.

The effect of microservice is to speed up the product response speed from iteration and change. In the past, when the public cloud manufacturers came to our development page, they had to work overtime for several nights to go online. After the microservice service, they basically updated the version in the daytime, so that they could respond quickly and go online.

Finally, I will talk about the future of cloud management platform. I have summed up several points:

Refined management, stronger and more refined management ability. The management of SDN, SDS, etc., and all aspects of the arrangement.

Refinement of billing direction. We need to do more sophisticated infrastructure billing, which used to be on a daily basis, but now is on time.

Integrated management and control of cloud + application. From the access of user system to the call of cloud service components, the whole process monitoring and analysis system.

Hybrid cloud management. Public cloud is an irresistible trend, and the mixed management of private cloud and public cloud is also a direction.

Construction and support of industry cloud and community cloud. If you grasp a specific industry or a specific application scenario, such as rendering cloud, high-performance computing cloud, etc., it still has more profit points.

In large-scale node management, it is considered to analyze and operate the operation data of resource pool in combination with AI to form in-depth learning of operation and maintenance knowledge.

Recommendation today
Cloud computing

Discuss everything about cloud computing and focus on cloud platform architecture, network, storage and distribution. There are dry goods here, as well as small talk.

Wechat ID: cloudnote

Today's recommendation

Click the picture below to read

Comments on cloud computing in 2016