Hacking Book | Free Online Hacking Learning


using big data to pick up cai xukun's true and false traffic powder

Posted by patinella at 2020-04-08

Some time ago, CCTV news publicly disclosed that traffic star data was fake. As a data ape, we launched the micro blog of CAI Xukun, the new year's image ambassador of NBA, with curiosity. We found that except for the latest one, his micro blog forwarding volume is basically 1 million +.

In our impression, forwarding 1 million + is the traffic that can break the microblog server, such as some star officials announce their marriage, or some star drug abuse is exploded. Is Cai Xukun really so hot? Is there any fake traffic in his 1 million + forwarding? What is the proportion of false flow?

In order to answer this question, we randomly grabbed Cai Xukun's latest micro blog "goodbye," capricious "thousands of „Äč100000 pieces of forwarding data (time node 10:00, March 11, 2019). This short video micro blog about CAI Xukun's love for small animals was released at 01:23 on March 9, 2019, and was forwarded to 1 million + by 18:00 on March 10, 2019.

The dimensions of the data include the relevant information of the forwarder (nickname, gender, number of followers, number of followers, etc.) and the comments when forwarding.

Before answering this question, we were actually more interested in the gender ratio of CAI Xukun's fans. In principle, the proportion of female fans should be larger, but we have counted the 102313 forwarding data, 93618 of which are male fans, and only 8695 of which are female fans.

This is not right. Do more boys like Cai Xukun? And that's a big difference, isn't it? So we randomly selected the data forwarded by boys, and found that these forwarded male fans are basically users who pay attention to 0 and 1 fans.

We reasonably speculate that these flows are so-called fake flows.

What's the proportion of fake traffic? How many of the 100000 forwarding data captured randomly are fake traffic?

Through some exploration and analysis, we extract the data that the number of followers or fans of the forwarding data relay sender is less than or equal to 5, there is no profile, the number of comments that are praised after forwarding is 0, the number of tweet members is 0, and the number of followers or fans of the forwarder is more than 5, but the nickname is "user XXXXXXXX".

This part of data is what we call false traffic.

It can be seen that of 102313 forwarding data, 95397 were forwarded by fake fans, accounting for 93.24% of the total forwarding, and only 6916 were forwarded by real fans, accounting for 6.76%. The proportion of false traffic is so high!

So how many real fans are forwarding the 6919 data, excluding the number of repeated forwarding and list swiping? We will de duplicate this part of data according to the fan Weibo ID. It is found that only 3926 real fans are forwarding. That is to say, the number of real fans is 3.84% of the total.

According to this proportion, it can be calculated that the number of fans in the 1 million forwarding is 38400, which shows that CAI Xukun's fan base and influence are still very large, but far less than the one million + forwarding volume shown in Weibo.

You may say: our own microblog, usually there are some fake fans in forwarding it. For comparison, we also grabbed the latest 10006 tweet forwarding data of Uncle Wu Qingfeng (13.77 million fans) who was active on the stage of singer (time node: 10:00, March 11, 2019).

We extract these data according to the same steps as above, and find that only a small proportion of false fans, most of them are true fans.

In addition, among the 9658 real fans' forwarding data, the number of real fans is as high as 9318, which indicates that there is no fans' listing. Compared with CAI Xukun's data, there are obvious differences.

In 93.24% of the fake forwarding traffic, how are these fake fans produced? What are the common behavioral characteristics? We first gave the fake fans a user portrait.

It can be seen that there are 40838 fake fans in 95397 fake forwards. The proportion of men is as high as 95.42%!

We count the comments we carry in the forwarding, see what fake fans like to say when forwarding, but we find some more interesting things.

A lot of fake fans are going to forward the microblogs of "Cai Xukun's Nanan end Yin big miss" and "super super love Cai's thought" (please don't attack them). After searching these two people, we found that their number of fans is only one or two hundred, all of them are CAI Xukun's microblogs, and many of them are retransmitted to 0, but a few of them are retransmitted Thousands of hair!

This is the fans who spend money to buy traffic for idols.

In addition, we found that many fake traffic fans like to carry English comments when forwarding. After a search, we found that these comments are either English lyrics, American drama lines, or Tagore's or Neruda's poems.

Android topped the list of top 10 forwarding devices used by fake fans, which again proves that these fans are fake fans.

In addition, there are some interesting findings: the average attention of fake fans is 3.44, and the average number of fans is 1.04. There is no profile, and the nicknames are basically in the format of "Chinese + English and number". Many of the nicknames of fake fans have the characters of "Kun", "Cai", "Kui", "Kun", and the avatars are Cai Xukun (indicating that many of them are custom powders).

Let's first look at the sex ratio of real fans. It can be seen that of the 3926 real fans, most of them are girls, which is the logical proportion of fans.

It can be seen from the comments carried by these fans' forwarding that many of them support Cai Xukun to win the first place in the star power list or the Oriental wind and cloud list.

The forwarding devices used by real fans are all evenly distributed, and the most popular one is the iPhone client.

The average attention of real fans is 222, and the average number of fans is 179. Similarly, many fans like to include the words "Kun", "Cai", "Kui" and "Kun" in their names.

We made a cloud map of the real fans' profiles.

It can be seen that the profiles of real fans like to bring Cai Xukun's name, like Cai Xukun very much, and want to go with him all the time. Looking at the words "Youth", "hard work", "freedom" and "pursuing dreams", I found that this is our previous youth.

In addition, we also made a cloud map of the comments we forwarded.

It can be seen that fans are very concerned about the "billboard" and want to help Cai Xukun get the first place. There are good morning cards, super topics, and a lot of words such as "happy", "Bi Xin", "warm". It shows that most of the real powder is still very warm.

According to the data, most of the fake traffic does exist in CAI Xukun's frequent tweets. There are two sources of these fake traffic estimates: one is purchased by its own brokerage company, and the other is purchased by loyal fans at their own expense.

If their own brokerage companies buy it, it really disrupts the operation of the whole entertainment market, which is not good for the entertainment industry and even the whole social atmosphere. If it's purchased by loyal fans, Alfred thinks that the data is just a data, and the money in it can add influence to his idol through other better aspects. I think the topic of "poverty alleviation and starlight action" recently forwarded by Cai Xukun is very good. We can do more positive energy things through our own influence.

Alfred has something to say

1. It's not easy to be original. If you like our content, please pay attention to sharing and forwarding to support us, your recognition is our motivation;

2. Reply "Cai Xukun" in the background of wechat, and you can get the link of the crawler and data analysis code shared in GitHub;

3. Welcome any form of cooperation, please leave a message at the back of wechat for consultation;

4. If you have any interesting data topic, you can leave a message to tell us, maybe the next issue is what you want to see most!