top of page

SetembroBR: Social media analysis for early detection of mental health disorders


The observation that individuals with mental health disorders such as depression and anxiety are often regular users of social media has led to the development of a wide range of studies in the Natural Language Processing (NLP) field for risk assessment based on the kind of language employed by these individuals. Existing work in the field is however largely dedicated to the English language, and tends to consider publications (e.g., tweets) produced at any time, including even those produced after the individual is already clinically diagnosed. Thus, models of this kind tend to focus more on the issue of distinguishing individuals with and without a certain disorder, but are perhaps less able to anticipate these as a means to prevent their possible aggravation. Based on these observations, this project proposes to explore the temporal information provided by the Twitter platform for the study and development of computational models for early recognition of depression and anxiety disorder in Portuguese using a database - called the SeptemberBR corpus - designed so as to include only texts that are chronologically prior to the date of diagnosis reported by social media users. A study of this kind, in addition to introducing a novel (and possibly more useful) formulation of the present computational problem, opens up the opportunity for a number of scientific contributions in the NLP field, including the modeling of textual and non-textual features and the use of recent neural learning methods, and enables novel solutions for a pressing issue of great social interest.

Current status

The project ran from May 2022 to April 2024 under FAPESP grant nr. 2021/08213-0. Complementary funding has been provided by  CAPES grant nr.88887.475847/2020-00, and by the Center for Artificial Intelligence (C4AI-USP) with support by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and by the IBM Corporation.

The corpus has been publicy released for reuse - see download link.


dos Santos, Wesley Ramos; Amanda Maria Martins Funabashi ; Ivandré Paraboni (2020)  Searching Brazilian Twitter for signs of mental health issues. 12th Language Resources and Evaluation Conference (LREC-2020). pp. 6113-6119, Marseille, France.

dos Santos, Wesley Ramos; Rafael Lage de Oliveira ; Ivandré Paraboni (2023) SetembroBR: a social media corpus for depression and anxiety disorder prediction. Language Resources and Evaluation 10.1007/s10579-022-09633-0.


dos Santos, Wesley Ramos; Sungwon Yoon; Ivandré Paraboni (2023) Mental health prediction from social media text using mixture of experts. IEEE Latin America Transactions 21(6), pp.723-729. 10.1109/TLA.2023.10172137.

da Costa, Pablo Botton; Matheus Camasmie Pavan; Wesley Ramos dos Santos; Samuel Caetano da SIlva; Ivandré Paraboni (2023) BERTabaporu: assessing a genre-specific language model for Portuguese NLP. Recent Advances in Natural Language Processing (RANLP-2023).pp. 217-223, Varna, Bulgaria. BERTabaporu download.


dos Santos, Wesley Ramos; Ivandré Paraboni (2023)  Predição de transtorno depressivo em redes sociais: BERT supervisionado ou ChatGPT zero-shot? XIV Simposio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL-2023), pp. 11-21. 10.5753/stil.2023.233275.

de Oliveira, Rafael Lage; Ivandré Paraboni (2024) A Bag-of-Users approach to mental health prediction from social media data. 16th International Conference on Computational Processing of Portuguese (PROPOR 2024). Santiago de Compostela, Spain.

de Oliveira, Rafael Lage; João Trevisan Martins ; Ivandré Paraboni (2024) Mental health prediction from social media connections. New Review of Hypermedia and Multimedia. 10.1080/13614568.2024.2346227.

bottom of page