Here are some popular downloads made available in recent work (more soon). Should you need any resource that is not currently listed below, please get in touch.
SetembroBR mental health Twitter corpus
Corpus of tweets (ids only) and accompanying network connections (friends, followers and mentions, all of which anonymised) of users diagnosed with depression, anxiety disorder, and random (control) group, comprising 18,819 unique individuals. The project download link contains the corpus files and instructions on how to (approximately) rebuild the labelled dataset. For further details, see SetembroBR: a social media corpus for depression and anxiety disorder prediction.
UstanceBR stance Twitter corpus
Corpus of tweets (ids only), accompanied by timelines and network connections (friends, followers and mentions, all of which anonymised) of users who produced a stance for/against a particular target topic (Brazilian presidents, Covid-related treatment, and Brazilian institutions). The corpus contains over 86K manually labelled tweets and it is available for reuse from the download link. For further details, see UstanceBR: a multimodal language resource for stance prediction.
BRmoral corpus
Crowdsourced opinions on moral issues (abortion, death penalty etc.) labelled with moral foundation scores and author demographics - download Further details are available from the project page and bib
b5 and b5-post corpora
Texts labelled with Big Five personality traits and author demographics are available in two formats:
- b5-post single CSV file (semicolon for separators, comma for decimals, and "ISO-8859-1" encoding) conveying only Facebook text data and demographics.
- full b5 project (includes additional text data).
Further details are available from the project page and bib