View on GitHub

Reddit Community Rules and Sentiments

Rule Taxonomy and Timelines

Overview

Rule timelines were reconstructed from the Wayback Machine, scraping at-most-weekly snapshots. These rules are then classified into a hierarchical taxonomy of 3 levels and 17 classes.

Taxonomy

Rule Taxonomy

Rules were classified by a GPT-4o-based classification pipeline. For more details, see our paper on arXiv.

Rule Data Download

Our rules data, spanning from 2018-04-23 to 2024-06-20, is available for download as a .csv file here.

Note that due to the infrequency of Wayback Machine snapshots, there is some uncertainty in both when a rule was created and when a rule was removed. To quantify this, we include the timestamps of the Wayback Machine snapshots on either side of a rule’s start and end date. For example, if a snapshot taken on Monday does not include Rule X, but the snapshot taken on Wednesday does include Rule X, we know that Rule X was created sometime between Monday and Wednesay. The same holds for a rule being removed. We include the timestamp of the Monday snapshot (earliest_start) and the Wednesday snapshot (latest_start) as these represent the lower and upper bounds, respectively, of a rule’s actual (and unknown) creation date).

The file includes each rule in our dataset as a row, with each row having the following columns:

For ease of computing, we recommend loading it using Pandas and creating a MultiIndex. This can be done with pd.read_csv('path_to_downloaded.csv').set_index(['subreddit', 'earliest_start', 'latest_start', 'earliest_end', 'latest_end']).