MEGUEBLI Youssef 27/03/15 Centrale Supélec à 10h.

SUJET : Analyse des opinions dans les portails d’informations pour personnaliser les systèmes de recommandation.
Leveraging user-generated-content to enhance and to personalize news recommendation.
Thèse préparée à : CentraleSupélec – Labo : E3S

Directeur de Thèse Bich-Liên Doan

-Anne Boyer, Rapporteur, Professeur
-Benjamin Piwowarski, Examinateur, Maitre de Conférences
-Bich-Liên Doan, Directrice
-Fabrice Popineau, Encadrant
-Josiane Mothe, Examinateur, Professeur
-Mohand Boughanem, Rapporteur, Professeur
-Mouna Kacimi, Encadrante
-Nicolas Sabouret, Examinateur, Professeur
-Pierre Zweigenbaum, Invité, Professeur


Online news websites are becoming one of the most popular and influential social media platforms allowing people to easily access information about daily life topics, share their opinions on different issues, and give feedback on published content. The tremendous increase of published news requires effective recommendation techniques that help users to find interesting news articles that match with their interests. Thus, users are continuously encouraged to participate to online news websites and keep sharing their opinions, which represent a valuable source of social information. In this thesis, we have investigated how to exploit user-generated-content for personalized news recommendation purpose. The intuition behind this line of research is that the opinions provided by users, on news websites, represent a strong indicator about their profiles. By mining such content, we can extract valuable information about the domains of interests of users, their inclination towards a certain version of news articles, their political orientation, their favorite sport teams, their preferences, and many other interesting features. Furthermore, such content can also be used to enrich the content of news articles, particularly for those describing controversial news articles that can reveal various aspects that are not well described or even not found in their content. Thus, user-generated-content is the core component of our work. This thesis is divided into three main parts, as described in the bellow, which represent the different steps of developing a news recommendation system based on user-generated-content.
In the first part, we have developed a fine-grained model that captures both users and article profiles. The profile of each user is extracted from all the opinions and the reactions that are provided on the news websites, while the profile of an article is extracted from its content. A profile is mainly composed of the entities, the aspects, and the sentiments expressed in the corresponding content. While the extraction of entities is a well-established problem, aspect extraction often relies on supervised techniques, which are domain dependent. For a more general solution, we have proposed an unsupervised technique for aspect extraction from opinions and articles. We have investigated two types of models in three different applications.
The first model, called a sentiment-dependent profile, exploits the sentiments related to each entity and aspect to define the orientations of users towards a specific trend. For this purpose, we have built a knowledge base of trends, more specifically of political orientations, that guides the extraction of profiles in an unsupervised manner. We have assessed the accuracy of the extracted profiles on two datasets crawled from CNN and Al-Jazeera and the results show that our approach gives high quality results. The second model, called a sentiment-independent profile, focuses only on entities and aspects and is used on the purpose of news recommendation. This model was used to define both users’ interests and the content of news articles. We have test it on a large test collection based on real users’ activities in four news websites, namely The Independent3, The Telegraph4, CNN and Al-Jazeera. The results show that our model outperforms baseline models achieving high accuracy. In the third application, we have used a combination of the two former models for news recommendation purpose: the sentiment-independent profile model to define users’ interests is combined with the sentiment-dependent profile model to describe the content of news articles. The main goal of this application was to give a method that deal with the problem of redundancy on the list of recommended news articles. For this purpose, we have used a diversification model on news articles profiles to reduce the redundancy of the list of recommended news articles. We have tested our approach on real users’ activities on four news websites CNN, Al-Jazeera, The Telegraph, and The Independent. The results show that diversification improve the quality
of recommended news articles.
In the second part, we have focused on how to enrich the article profiles with user generated-content. The idea behind is to exploit the rich structure of opinions to tailor the articles to the specific needs and interests of users. The main challenge of this task is how to select the opinions used for profile enrichment. The large number and the noisy nature of opinions calls for an effective ranking strategy. To achieve this goal, we have proposed a novel-scoring model that ranks opinions based on their relevance and prominence, where the prominence of an opinion is based on its relationships with other opinions. To find prominent opinions, we have (1) suggested a directed graph model of opinions where each link represents the sentiment an opinion expresses about another opinion (2) built a new variation of the PageRank algorithm that increases the scores of opinions along links with positive sentiments and decreases them as well as links with negative sentiments. We have tested the effectiveness of our model through extensive experiments using three datasets crawled from CNN, The Independent, and The Telegraph news websites. The experiments showed that our scoring model selects meaningful and insightful opinions.
In the third part, we have focused on the development of a recommendation technique that exploits the results of the previous part and use them to enrich the content of news articles. We have tested various methods of leveraging opinions on the content of news articles. Concretely, we have worked on two main aspects. Firstly, we have only focused on sentiment-independent profiles, which consist on entities and aspects, and investigated of thoroughly the profile construction process. Secondly, we have enhanced the opinion ranking strategy described earlier by proposing an opinion diversification model based on authorities, semantic and sentiment diversification. The goal is to deal with redundant information and have a wide coverage of topic aspects. We have tested our approach by running large experiments on four datasets crawled from CNN, The Independent, The Telegraph, and Al-Jazeera. The results show that our model provide effective recommendation, particularly when enriching the content of news articles with a diversified set of opinions.