Identifying Toxic Content


Protecto is designed to identify and score content based on the presence of toxic language, such as inflammatory remarks or discriminatory statements. This guide provides an understanding of how Protecto scores content and the types of toxic content it identifies.

Scoring System

Protecto assigns a score to content indicating the severity of toxic content:

  • Score Range: 0 (not present) → 1 (severe toxic content present)

Customers can utilize these scores to establish thresholds for blocking or issuing warnings about potentially harmful content.

Types of Toxic Content

Protecto classifies toxic content into several categories:

  1. Obscene
    • Definition: Use of profanity or sharing sexually suggestive content.
    • Example: Explicit language or graphic sexual references.
  2. Threat
    • Definition: Directly or indirectly threatening violence or harm.
    • Example: Statements implying intent to physically harm someone.
  3. Insult
    • Definition: Use of derogatory language or making personal attacks.
    • Example: Name-calling, slurs, or any language intended to demean an individual.

Example Analysis

Below is an example of how Protecto analyzes and scores a piece of content:

"value": "You're really dumb, aren't you?",
"token_value": "You're really dumb, aren't you?",
"toxicity_analysis": {
    "toxicity": 0.9648129,
    "severe_toxicity": 0.011925396,
    "obscene": 0.39630863,
    "threat": 0.0010725626,
    "insult": 0.9019991,

Toxicity Score Examples

The table below provides examples of content with their corresponding toxicity scores and categorizations:

Examples Toxicity Score Insult Obscene Threat
You disgusting piece of s***. 100% 96% 97% 0%
You are ugly, nobody likes you. 97% 84% 13% 0%
You’re really dumb, aren’t you? 96% 90% 39% 0%
If you make this mistake again, I will kill you 83% 7% 5% 81%
I can’t believe how incompetent you are. 81% 26% 2% 0%
I hate people from India 80% 3% 1% 1%
If you come near me again, I will make sure you regret it for the rest of your life 58% 2% 1% 53%
Watch your back, or you’ll be sorry. 49% 1% 1% 26%
You’re not very good at this, are you? 7% 0% 0% 0%
I hope you understand the consequences of your actions. 0% 0% 0% 0%
