How Do Pedophiles Tweet? Investigating the Writing Styles and Online Personas of Child Cybersex Traffickers in the Philippines
Abstract
One of the most important humanitarian responsibility of every individual is to protect the future of our children. This entails not only protection of physical welfare but also from ill events that can potentially affect the mental well-being of a child such as sexual coercion and abuse which, in worst case scenarios, can result to lifelong trauma. In this study, we perform a preliminary investigation of how child sex peddlers spread illegal pornographic content and target minors for sexual activities on Twitter in the Philippines using Natural Language Processing techniques. Results of our studies show frequently used and co-occurring words that traffickers use to spread content as well as four main roles played by these entities that contribute to the proliferation of the child pornography in the country.
Index Terms:
pornography, tweets, cybersex, persona, natural language processingI Introduction
”Child abuse casts a shadow the length of a lifetime.”
- Herbert Wood
Cybersex or computer sex is an activity where two or more people, anonymous in some cases, connect over the Internet to engage sexually gratifying performances [1]. Activities such as sharing, watching, downloading and trading explicit online content across websites and social media platforms such as Facebook, Twitter, and Instagram are all under the umbrella term of cybersex [2]. Behaviors exhibited in cybersex activities include solitary acts of self pleasure, consensual interactions, and to coercive and forceful activities which often considered as rape [3].
In its essence, cybersex allows exploration of sexual urges and private fantasies while maintaining anonymity [4] as well as providing a safe space for physically-separated partners to connect over the Web and continue to be sexually intimate [1]. However, in a moral and ethical point-of-view, the conduct of cybersex activities should only be between consenting and legal-aged partners. Non-consensual cybersex often target extremely underprivileged women and minors where the produced media are peddled, trafficked, and sold worldwide. Although most justice and intelligence agencies in countries around the world enforce strict laws on minors involved in cybersex activities, the problem still pose as one the major challenges for poor, developing areas in Southeast Asia, Africa, and South America where they are often labelled as hotspots of child sex tourism from 2014 to 2016 [5].
I-A Proliferation of Child Pornography in Twitter
There are multiple environments where cybersex, both consensual and non-consensual, are often mediated and spread. Internet chat rooms and instant messaging applications are common grounds for these activities. However, in the recent years, social media platforms such as Twitter have been used more and more by illegal cybersex peddlers and traffickers since it offers anonymity under the guise of fake accounts [6, 7]. In addition, Twitter allows these accounts to share images and videos seamlessly as well as having the option to privatize accounts. Pedophiles, or a group of people who are sexually attracted to children, use these features to maintain a close circle of similar-minded individuals and to stay hidden from public eye. Although Twitter follows strict policies111Twitter Policies: www.help.twitter.com/en/rules-and-policies/twitter-rules in maintaining a safe environment by banning users for any type of abuse, child sexual exploitation, and sexual assault, accounts of pedophiles and illegal cybersex peddlers still surge in number [8].
In the Philippines, the Cybercrime Prevention Law was signed in 2012 which aims to reduce computer-related crimes including child pornography and other illegal cybersex activities. However, in the recent years, the Cybercrime Law did little to nothing to alleviate proliferation of child pornography as country topped the latest survey by United Nations Children’s Fund on global sources of child sex abuse materials in 2018 [9]. According to the report, the proportion of internet addresses hosting child pornographic materials in the Philippines tripled in scale starting from 2017. Twitter has become one of the most used platform in the Philippines that serves as a breeding ground and medium of pedophiles to spread child pornographic content. These individuals hide their identity using multiple fake accounts colloquially known as alter or alternate accounts. In the same manner, the term Alter Twitter has become popularly known in the country as a Twitter community of Filipino individuals using anonymous accounts to conduct, share, and exploit sexual content and activities [10].
In recognizing the need for further research efforts in mitigating the spread of child pornographic content, this paper investigates the general writing styles of pedophiles and cybersex traffickers in Twitter, and the roles that they often conform using the platform. We perform natural language processing techniques over a dataset composed of a year’s worth of child pornographic tweets collected from the Twitter accounts of pimps, peddlers, and traffickers in the Philippines.

II Related Works
II-A Writing Styles in Twitter
The challenge of analyzing writing styles such as authorship attribution in social media platforms is one of the most interesting tasks in natural language processing. [11] defines writing style as a grammatical choice that writers make which adheres to norms and social identity. An individual’s writing styles is composed of choice of select words, sentence and paragraph structure, and symbols that are used to convey the a message effectively [12].
Existing writing styles in the web vary by a large scale since users are free to express themselves and there are no formal rules to follow. In addition, other elements of writing in social media platforms such as the use of emoticons to adds complexity to the task [13]. The use of social media platforms like Twitter allows researchers in various fields perform deeper analysis on factors that can affect writing such as gender [14], user personality [15], and mental illness [16]. Inclusion of these factors paved way for more research efforts in understanding negative social media interactions such as forms such abuse like racism and sexism [17] and bullying [18].
II-B Themes in Twitter
Works on identifying salient and underlying themes conveyed in large volumes of social media data have also intrigued researchers on the field. In contrast to writing styles which focus on how each tweet is constructed using elements such as hashtags, emoticons, and use of symbols, thematic analysis captures the representations of the texts by uncovering topics commonly extracted using unsupervised machine learning algorithms to generate topic models [19, 20]. These topic models allow us to have an overview of important topics (or themes) and supporting topic words present in the document [21]. In the Philippine local setting, the works of [22], [23], and [24] all focused on the use of topic models to extract themes present from collected typhoon and earthquake-related Twitter data which can be used to improve the disaster risk reduction landscape and response of the country.
III Child Pornography-Related Tweets
For this study, we collected over 69,675 raw tweets related to child sex trafficking and peddling in Twitter from October 2019 to July 2020, over a year’s worth of data. We used a bounding-box feature from the Twitter API to capture tweets only published within the area of the Philippines. In addition, we used hashtags such as #bagets (colloquial term for the word ’children’) and #sarapngbagets (conveys sexual desire for children) which were reported to be commonly used by child sex traffickers as bookmarks or subject tags for their tweets [25]. After cleaning and removal of retweets and duplication, only 32,899 unique tweets were left for the analysis proper.
IV Writing Style Analysis
For the writing style analysis, we conduct two methods: the word cloud visualization for getting a bird’s eye view of the most frequent words present within the collected data and mapping of trigram co-occurrence network for understanding series of word connections used to spread child pornographic content.

The word cloud visualization in Figure 2 showcases top used terminologies in tweets with respect to the size of each word. The word jakol or masturbation, tamod or semen, and boso or voyeur are seemingly three of the most used words in the context of child pornography. In addition, the hashtag #alterph is also often appended in tweets to signal that the account used for uploading content is an alter account with the suffix ph indicating the user is in the Philippines and prefers interaction with users also coming from the same country. Action words are frequently used in context such as chupa or fellatio and salsal or motion of stimulating a man’s penis as well as words used for targeting children such as bagets for hire or children for hire and altergc which means alter groupchat, indicating that there are also other platforms where videos and contents are shared and not just in Twitter.

Figure 3 describes the chain reaction-like structure of words that are co-occurring or are seen together in semantically similar tweets. From a corpus containing 2,498 unique words, only three subgraphs are formed which signify that the overall lexicon used by pedophiles and child traffickers are somewhat limited in a way that terms are often reused repetitively. From the figure, the first subgraph on the upper left contains only two connected words, ctto or credits to the owner and trade. These two words describe user accounts that share tweets by giving unofficial crediting of the source of contents as well as the notion of exchanging resources by trading links of online repositories where videos are stored as seen in Figure 1. The largest subgraph in the middle, on the other hand, contains terminologies forming sequences denoting instruction for proliferation and attention such as follow and rt (retweet), dm me (message me), and follow me. And lastly, the third subgraph on the lower left with word sequence open thread to denotes the spread of content to be in the form of threads or series of posts. Overall, these subgraphs model how posts containing child pornographic content such as lewd photos and videos are structured. It also describes how users behind alter accounts sway other users to spread their malicious content by convincing them to use Twitter’s interaction features such as (a) retweets for sharing and (b) likes for increasing the exposure of the content to a wider audience.
Persona | Vocabulary |
---|---|
The Propagator | rt, follow, vid, retweet, videos, like, link, jakol, |
post, comment | |
The Peddler | bagets, dm, price, pic, area, php, avail, looking, |
pls, willing | |
The Social | tara, face, pogi, pm, jakol, jan, pwede, want, sino |
tayo | |
The Voyeur | boso, face, tara, pic, dm, bagets, baby, sarap, cr, boys |
V Persona Analysis
Aside from just analyzing the overall stylistic writing patterns of potential pedophiles and child sex traffickers on Twitter, we want to understand deeper roles played by these entities in the platform. To do this, we trained a short-text clustering model using the Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM) [26] trained from the pre-processed tweet corpus. The GSDMM model aggregates words into clusters or groups that are similar to each other in terms of usage and meaning. As seen in Table I, we obtained four main homogeneous clusters symbolizing four different online personas or roles played by users behind alter accounts that are tied with child pornography.
From the table, each persona has its own unique set of thematic words forming an underlying vocabulary used for specific purposes. First, we have the Propagator which is mainly responsible for spreading child pornographic content in the platform. Keywords often used by this type of user are similar to the ones highlighted in Figure 3 such as rt or retweet, follow, like, post, and comment. Next, the Peddler which is responsible for the hidden market or trading, buying, and selling of child pornographic content. Keywords often used by peddlers are dm or direct message, price, php, avail for their business transactions and looking and willing for enticing possible victims who are willing to trade sexual content such as photos and videos for money. Third, the Social acts as someone who encourages users to meet physically or digitally for activities such as jakol or masturbation. This persona often uses descriptive words such as pogi or handsome as well as semi-coercive words such as tara or let’s go and sino pwede jan? or who is available? to convince potential users having the same interests. Lastly, we have the Voyeur which often targets minors for their voyeuristic content. The main keyword used by this persona is boso or the act of spying undressed or naked people for sexual pleasure and often used in tweets with targets such as bagets or children, baby, and boys. This persona also frequently makes use of the word cr or comfort room where hidden camera are often installed.
VI Ethical Considerations
This study makes use of extremely sensitive data involving sexual words that are often used to target minors. However, the proponents felt compelled to do this type of study as something has to be done in order to understand and be able to alleviate the problem of child pornography landscape in the Philippines. In addition, for the safety of minors, no personal information is revealed in any part of this document.
VII Conclusion
In the Philippines, child pornography and other illegal cyber-sex activities are widespread especially on social platforms like Twitter where users can hide behind anonymous accounts. In order to further gain understanding and deeper insights for the reason behind the rapid proliferation of child pornographic content online, we used three types of analysis, namely word cloud visualization, trigram co-occurrence analysis, and persona analysis. Results show basic terminologies often used by child traffickers and peddlers that often co-occur with each other. In addition, these entities can be classified into four possible roles or online personas based on their vocabulary use. Continuation of this study involve partnership with local government units concerned with cybercrime prevention and child protection to track down active child pornography peddlers and traffickers.
References
- [1] P. M. Miller, Principles of Addiction: Comprehensive Addictive Behaviors and Disorders, Volume 1, vol. 1. Academic Press, 2013.
- [2] A. Cooper, “Sexuality and the internet: Surfing into the new millennium,” CyberPsychology & Behavior, vol. 1, no. 2, pp. 187–193, 1998.
- [3] S. Southern, “Treatment of compulsive cybersex behavior,” Psychiatric Clinics of North America, vol. 31, no. 4, pp. 697–712, 2008.
- [4] K. S. Young, “Internet sex addiction: Risk factors, stages of development, and treatment,” American Behavioral Scientist, vol. 52, no. 1, pp. 21–37, 2008.
- [5] A. Times, “Up to 80% of cybersex victims are filipino minors,” Feb 2020.
- [6] K. Guilbert, “Webcam slavery: tech turns filipino families into cybersex child traffickers,” Jun 2018.
- [7] A. Bevan, “Children at risk of increased online sexual exploitation – andrew bevan,” May 2020.
- [8] O. Solon, “Child sexual abuse images and online exploitation surge during pandemic,” Apr 2020.
- [9] UNICEF, “Situation analysis of children in the philippines,” Manila, Philippines: National Economic and Development Authority (NEDA) and UNICEF Philippines, 2018.
- [10] S. B. H. Piamonte, M. A. M. Quintos, and M. O. Iwayama, “Virtual masquerade: Understanding the role of twitter’s alter community in the social and sexual engagements of men who have sex with men,” 2020.
- [11] B. Ray, Style: An introduction to history, theory, research, and pedagogy. Parlor Press, 2014.
- [12] P. Sebranick, D. Kemper, and V. Meyer, “Writers inc: A student handbook for writing and learning,” Wilmington, MA: Houghton Mifflin, 2006.
- [13] H. Maeda, K. Shimada, and T. Endo, “Twitter sentiment analysis based on writing style,” in International Conference on NLP, pp. 278–288, Springer, 2012.
- [14] J. D. Burger, J. Henderson, G. Kim, and G. Zarrella, “Discriminating gender on twitter,” in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309, 2011.
- [15] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p. e73791, 2013.
- [16] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting depression via social media,” in Seventh international AAAI conference on weblogs and social media, 2013.
- [17] I. Clarke and J. Grieve, “Dimensions of abusive language on twitter,” in Proceedings of the first workshop on abusive language online, pp. 1–10, 2017.
- [18] Y. Lee, S. Yoon, and K. Jung, “Comparative studies of detecting abusive language on Twitter,” in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), (Brussels, Belgium), pp. 101–106, Association for Computational Linguistics, Oct. 2018.
- [19] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
- [20] J. H. Lau, N. Collier, and T. Baldwin, “On-line trend analysis with topic models:# twitter trends detection topic model online,” in Proceedings of COLING 2012, pp. 1519–1534, 2012.
- [21] D. M. Blei, “Probabilistic topic models,” Communications of the ACM, vol. 55, no. 4, pp. 77–84, 2012.
- [22] C. Ligutom, J. V. Orio, D. A. M. Ramacho, C. Montenegro, R. E. Roxas, and N. Oco, “Using topic modelling to make sense of typhoon-related tweets,” in 2016 International Conference on Asian Language Processing (IALP), pp. 362–365, IEEE, 2016.
- [23] K. Gorro, J. R. Ancheta, K. Capao, N. Oco, R. E. Roxas, M. J. Sabellano, B. Nonnecke, S. Mohanty, C. Crittenden, and K. Goldberg, “Qualitative data analysis of disaster risk reduction suggestions assisted by topic modeling and word2vec,” in 2017 International Conference on Asian Language Processing (IALP), pp. 293–297, IEEE, 2017.
- [24] L. L. Maceda, J. L. Llovido, and T. D. Palaoag, “Corpus analysis of earthquake related tweets through topic modelling,” International Journal of Machine Learning and Computing, vol. 7, no. 6, pp. 194–197, 2017.
- [25] J. Gavilan, “Child sex abuse material now peddled for as low as p100 on twitter,” May 2020.
- [26] J. Yin and J. Wang, “A dirichlet multinomial mixture model-based approach for short text clustering,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 233–242, 2014.