Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection - IEEE Journals & Magazine

Flowchart and performances of the proposed approach for hate speech detection.

Online social networks (OSN) and microblogging websites are attracting internet users more than any other kind of website. Services such those offered by Twitter, Facebook and Instagram are more and more popular among people from different backgrounds, cultures and interests. Their contents are rapidly growing, constituting a very interesting example of the so-called big data. Big data have been attracting the attention of researcher, who have been interested in the automatic analysis of people’s opinions and the structure/distribution of users in the networks, etc.

While these websites offer an open space for people to discuss and share thoughts and opinions, their nature and the huge number of posts, comments and messages exchanged makes it almost impossible to control their content. Furthermore, given the different backgrounds, cultures and believes, many people tend to use and aggressive and hateful language when discussing with people who do not share the same backgrounds. King and Sutton [1] reported that 481 hate crimes with an anti-Islamic motive occurred in the year that following 9/11, 58% of them were perpetrated within two weeks after the event. However, nowadays, with the rapid growth of OSN, more conflicts are taking place, following each big event or other.

Nevertheless, while the censorship of content remains a controversial topic with people divided into two groups, one supporting it and one opposing it [2], in OSN, such language still exists. It is even easier to spread among young people as well as older ones than other “cleaner” speeches.

For these reasons, Burnap and Williams [3] claimed that collecting and analyzing temporal data allows decision makers to study the escalation of hate crimes following “trigger” events. However, “official” information regarding such events are scarce given that hate crimes are often unreported to the police. Social networks in this context present a better and more rich, yet less reliable and full of noise, source of information.

To overcome this noise and the non-reliability of data, we propose in this work an efficient way to detect both offensive posts and hate speeches in Twitter. Our approach relies on writing patterns, and unigrams along with sentimental features to perform the detection.

The remainder of this paper is structured as follows: in Section II we present our motivations and describe some of the related work. In Section III we formally define the aim of our work and describe in detail our proposed method for hate speech detection and how features are extracted. In Section IV we detail and discuss our experimental results. Section V concludes this paper and proposes possible directions for future work.

A. Motivations

Hate speech is a particular form of offensive language where the person using it is basing his opinion either on segregative, racist or extremist background or on stereotypes. Merriam-Webster1 defines hate speech as a “speech expressing hatred of a particular group of people.” From a legal perspective, it defines it as a “speech that is intended to insult, offend, or intimidate a person because of some trait (as race, religion, sexual orientation, national origin, or disability).” This being the case, hate speech is considered a world-wide problem that many countries and organizations have been standing up against. With the spread of internet, and the growth of online social networks, this problem becomes even more serious, since the interactions between people became indirect, and people’s speech tends to be more aggressive when they feel physically safer, not to mention that internet presents for many hate groups sees it as an “unprecedented means of communication of recruiting” [2].

In the context of internet and social networks, not only does hate speech create tension between groups of people, its impact can also influence businesses, or start serious real-life conflicts. For such reasons, websites such as Facebook, Youtube and Twitter prohibit the use of hate speech. However, it is always difficult to control and filter all the contents. Therefore, in the research field, hate speech has been subject to some studies, trying to automatically detect it. Most of these works on hate speech detection have goals such as the construction of dictionaries of hate words and expressions [4] or the binary classification into “hate” and “non-hate” [5]. However, it is always difficult to clearly decide on a sentence whether it contains hate or not, in particular if the hate speech is hiding behind sarcasm or if no clear words showing hate, racism or stereotyping exist.

Furthermore, OSN are full of ironic and joking content that might sound racist, segregative or offensive, which in reality is not. An example is given in the following two tweets:

The first tweet sounds offensive and demeaning the person target of the tweet. However, given the mutual follow of both users, the tweet is actually a joke between two friends. The second also presents the same problem, even though the user seems to be offending women, given the context of the message (i.e., a small discussion between a group of friends), the tweet in itself was not posted to offend women, or even the person targeted by the tweet.

Such expression, and others that include reference to a particular gender, race, ethnic group or religion are widely used in a joking context, and have to be clearly distinguished from hate speeches. Therefore, the use of dictionaries, and -grams in general, might not be the optimal option to perform the distinction between expressions showing hate, and those that do not.

It is arguable that sentiment analysis techniques can be used to perform hate speech detection. However, this is a different task, which requires more sophisticated techniques: In sentiment analysis, the main task is the detection of sentiment polarity of the tweet, which goes back to the idea of the detection of any existing positive/negative word or expression. This makes it easy to rely on the direct meaning of words: words have usually the same sentiment polarity regardless of the context or the actual meaning with very few exceptions (e.g. the word “bad” cannot be interpreted, under any circumstance, in a positive way). However, in the case of hate speech, some words might be negative, might even have the meaning of hate, but the context makes them not hate speech-related. A typical example can be seen in the following two examples:

:

“I hate seeing them losing every time! It’s just unfair!”

Even though the word “hate” has been employed here, the given sentence does not fall under the category of hate speech, simply because the context is not a context of offending a person, let alone to be offending him for his gender, race, etc.

:

“I hate these neggers, they keep making life much painful”

This is obviously a hate speech towards a specific ethnic group.

This makes the task of hate speech detection quite different and more challenging than sentiment analysis: not only is it context-dependent, but also, we should not rely on simple words or even n-grams to detect it.