Hacking Book | Free Online Hacking Learning


on the interpretability of recommendation system

Posted by patinella at 2020-02-28


Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Published at: The 41st International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR 2018)


Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks

Published at: The 41st International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR 2018)



The recommender system can recommend the content that users are interested in and give personalized suggestions. However, most of the current recommendations focus on the sequence modeling of recommended items, and ignore their fine-grained features. Through the interpretation of recommendation results and the analysis of the characteristics of recommended objects, users can make more intelligent and accurate decisions on which recommendation results to use, so as to improve their satisfaction. Based on two papers on interpretable recommendation system, this paper discusses the characteristics of introducing item into recommendation system and provides interpretability from two aspects.

MTER Model:Multi-Task Explainable Recommendation

The mter model extracts fine-grained personalization features from user comments, as shown in the following figure. In this paper, a multi task learning method for interpretable recommendation tasks is proposed, which maps user, item, feature and opinion phrase to the same vector space through joint tensor decomposition.

The main variables used in the model are shown in the table below.

User I's comments and product J's comments are expressed as R ^ u and R ^ J ^ I respectively.

Domain specific emotional words are expressed as (features, opinion phrases, emotional polarity), but because this is not the focus of this paper, the author does not elaborate on the specific treatment here.

The mter model is mainly divided into three parts: user preference modeling of item recommendation, user comment modeling for interpretation and multi task learning through joint tensor decomposition.

1. User preference modeling recommended by item

In this paper, three-dimensional tensor modeling is used to express the user I's appreciation of the feature K of item J. If the feature K of itemj is mentioned by user I for t ﹣ ijk times, and the emotional polarity label of each time is, then the feature value can be calculated. In addition, in order to add the overall rating matrix to each user's preference modeling tensor, the original three-dimensional tensor is modified to. In this way, the tensor can describe the relationship among users, items and features. The calculation method is:

Then Tucker decomposition and nonnegative constraints are applied to it. The nonnegative constraint here just corresponds to the nonnegative score when rating. Tucker decomposition is to decompose a three-dimensional tensor into three factor matrices and a kernel tensor, which can describe the degree of interaction between elements in different tensors.

In this way, the correlation among users, items and features can be predicted as

But Tucker decomposition is a bit by bit optimization. For recommendation, it is more necessary to solve the problem of rating. Therefore, the personalized sorting algorithm (BPR) based on Bayesian posterior optimization is introduced to optimize by bit. It reflects the results arranged in pairs. The principle of BPR optimization is

This can be optimized to.

2. User comment modeling for interpretation

User comment modeling involves four types of objects: user, item, feature and evaluation phrase. It should have been modeled as a 4-D tensor, but due to the sparse data, it is modeled as two 3-D tensors: user based evaluation tensor and item based evaluation tensor. The evaluations selected here are all positive for emotional polarity. The calculation method is:

In this way, the rating vector of comment phrases can be expressed as.

3. Multi task learning through joint tensor decomposition

Visualize the joint tensor decomposition of multi task learning:

The joint tensor here takes advantage of the properties of the kernel tensor in Tucker decomposition: it can capture the multivariable interaction among potential factors; and it can regard the principal component matrix as the basis of the potential space. So by (1) sharing the principal component matrix, we can learn the potential representation of users, items, features and opinion phrases in the two tasks, (2) maintaining independent core tensor for each task to capture the internal differences of tasks and the scale of sharing potential factors.

The optimization function is:

4.    Experiments

data set

The data in this paper are from Amazon and yelp. The author also analyzes the convergence of data sets.

experimental result

The evaluation index used in the experiment is ndcg.

The comparative experimental results are shown in the table below.

The influence of adjusting the weight coefficient of joint tensor decomposition on the experimental results:

Experiment of interpretability (Interpretation Based on user evaluation)

The following figure shows that the mter model can accurately capture the relationships among users, items, features, and evaluation phrases even in a randomly disordered data set. The results show that mter model can effectively model the relationships among users, items, features and evaluation phrases, and provide reasonable recommendation results.

The impact of the size of the user evaluation data set on the related test experiments between users and items, features, and evaluation phrases is shown in the following figure:

The following table reflects the performance differences between feature-based prediction and evaluation opinion phrase prediction based recommendation. It can be seen that recommendation based on evaluation opinion phrase prediction has great advantages.

This paper also makes a user test, raises five questions, and scores the performance of several recommendation systems. Question 1: in general, are you satisfied with this recommendation? Q2: do you think you know something about the recommended products? Q3: does explanation help you to learn more about the recommended project? Question 4: according to the recommended project, do you think you know the reason why we recommend this project? Q5: do you think the explanation will help you better understand our system, for example, according to our suggestions?

KSR Model:Knowledge-enhancedSequential Recommender

KSR model proposes to use the memory network combined with knowledge base to enhance the ability of feature capture and interpretation of the recommendation system, to solve the problem that the serialization recommendation system is not interpretive and can not get the fine-grained features of users.

The above figure depicts the infrastructure of KSR model. Gru part is used to capture user sequence preference, and kv-mn (key value memory network) is used to capture attribute based preference features.

The KSR model will be introduced in three steps: Gru based sequence recommendation model, memory network enhanced sequence recommendation model and complete knowledge enhanced sequence recommendation model.

1. Gru based sequence recommendation model

Given a set of interaction sequences of user u, the hidden layer of Gru model is expressed as, and the sequence preference vector of user u can be expressed as:.

Then the representation of each item is pre trained, and the personalized sorting algorithm (BPR) based on Bayesian posterior optimization is also used here.

Rank candidate I by calculating the recommended score

2. Knowledge enhanced memory network

The key K in the key value memory network (kv-mn) corresponds to the relationship in kg, and the value V corresponds to the entity in kg. In the user specific key value memory network,,,. The key memory matrix is shared, and the value memory matrix is unique to users.

The reading operation of memory network can be expressed as

That is, weighted output of the queried value. The weight calculation formula is:

The final query result is:

The write (update) operation on the value is similar:

Here, kg is pre trained by Transe to get the representation of entity and relationship:

These representations are then used to calculate the coefficient for updating the door:

Finally, the updated V value is obtained:

3. Sequence recommendation model for knowledge enhancement

The overall working mechanism of KSR model is shown in the figure below.

And have the same dimensions.

The scoring function is:

The loss function based on BPR is:

4.    Experiments

The data set of the experiment is shown in the following table:

The performance evaluation indexes are map, MRR, HR and ndcg.

The information of baseline is as follows:

The overall experimental results show that each data set has achieved good improvement:

In addition, a series of comparative analysis experiments are carried out

(1) The effect of data set size on experimental results

(2) The influence of sharing the value matrix of key value memory network on experimental results

(3) Effects of different kg embedding methods and selected vector dimensions on experimental results

Experiment of interpretability

The top line in the figure represents the timeline; the second line represents the attributes of each product, that is, the key in the memory network. Here, take singers and albums as an example; the third line represents the recommended list generated by each attribute. From the second line in the figure, it can be found that at the beginning of initialization, the recommendation system felt that the user preferred the album of songs (at first, the singer's weight was smaller, the box color was lighter, the album weight was larger, and the color was darker); later, with the passage of time, the recommendation system gradually found that the user preferred the singer of songs rather than the album (the singer's box color became darker, and the album became lighter). From the third line, it can be found that at the beginning, the judgment of the recommendation system is wrong, and the generated recommendation list is not so accurate, but the longer the time is, the judgment tends to be accurate, and the reason for recommendation is also given. The user wants to listen to the singer's songs rather than the songs of this album.

The author of this paper is Deng Shumin, a 2017 level direct doctoral student in the school of computer science, Zhejiang University. His research direction is knowledge map and text joint representation learning, interpretability and timing prediction.


The purpose of Chinese open knowledge map (openkg. CN) is to promote the opening and interconnection of Chinese knowledge map data and the popularization and wide application of knowledge map and semantic technology.

Reprint notice: reprint should indicate the source "openkg. CN", author and original link. If you need to change the title, please indicate the original title.

Click to read the original and enter the openkg blog.