Moderator Discourse Data

Data downloads to accompany the paper "Perceptions of Moderators as a Large-Scale Measure of Online Community Governance".

View the Project on GitHub behavioral-data/moderator_discourse_public

Perceptions of Moderators as a Large-Scale Measure of Online Community Governance

This website is a companion site for our paper titled Perceptions of Moderators as a Large-Scale Measure of Online Community Governance, which will appear at CSCW 2025. A preprint of this paper is available on arXiv, please read it for more details on our methods and results.

A summary of our results and some additional discussion is available over at /r/TheoryOfReddit.

If you make use of our data, please cite our paper:

@misc{weld2024perceptions,
      title={Perceptions of Moderators as a Large-Scale Measure of Online Community Governance}, 
      author={Galen Weld and Leon Leibmann and Amy X. Zhang and Tim Althoff},
      year={2025},
      journal={CSCW},
}

Moderator Discourse Data

We are in the process of computing moderator discourse data for a larger time period. For now, data is available for all subreddits from April 2017-May 2022, which is several more years than the time period covered in our paper. Due to reddit licensing issues, we only make ‘dehydrated’ data available here, which is to say we do not include the content of the posts and comments, only their sentiment with regards to moderators, along with some metadata for convenience. If you have any questions, or would like help hydrating the data, please contact Galen Weld, the corresponding author for this work.

Data Schema

The data consists of the following fields, in .csv format:

There are two files per month. RS_YYYY-MM.csv includes submissions (posts) for that month, and RC_YYYY-MM.csv includes comments.

Data Download

You may download all the above files, or individual files, from our github repo.

Daily Counts Data

If you prefer, we also provide a single large .csv containing daily counts, for each subreddit, of the number of posts and comments with positive, neutral, and negative sentiment with regards to the moderators. Importantly, these data also include counts of total posts and comments (of any type, not just those addressing moderators), as well as the number of removed and deleted comments (detected using [removed] tags). You may download this file here. It spans from July 2018 to June 2021.

The data consists of the following columns: