[Air-L] Launch of Platform Governance Archive (PGA) v2 with new data set, access options, website and data paper**

Christian Katzenbach christian.katzenbach at gmail.com
Mon Jul 24 01:58:00 PDT 2023

Dear colleagues, 

I am happy and proud to announce that our team at my PGMT lab in Bremen, in collaboration with Open Terms Archive and the Alexander von Humboldt Institut for Internet and Society (HIIG), has relaunched the Platform Governance Archive as PGA v2, now constituting a complete resource for researchers, journalists and interested people for exploring and comparing platform policies of major platforms, partly going back to founding years in the mid 2000s. We are now offering new datasets, complete access and bulk download options, a new website and a data paper to document the data. 


Below you find the full announcement. Please don’t hesitate to reach out if you have questions or ideas. Please note, that we will organise a workshop later this year to bring together people working with platform policies and content moderation guidelines. 

Looking forward, 

Prof. Dr. Christian Katzenbach

Professor of Media and Communication
Head of Lab „Platform Governance Media, and Technology“
Director of MA program „Digital Media and Society"
Centre for Media, Communication & Information Research (ZeMKI), University of Bremen
https://platform-governance.org <https://platform-governance.org/>

Associated Researcher
Alexander von Humboldt Institut for Internet and Society (HIIG)

Launch of the Platform Governance Archive (PGA) v2 with new data set, access options, website and data paper

The Lab Platform Governance, Media and Technology (PGMT) at the ZeMKI, Centre for Media, Communication and Information Research, and the Alexander von Humboldt Institut for Internet and Society (HIIG) launch this week an updated version of its pioneering open-access repository of platform policies, the Platform Governance Archive (PGA). The extensive update includes the launch of a new website, which enables easier data access, the publication of a data paper, which gives a holistic overview of building the PGA, and the release of an updated dataset, which widens the scope of the PGA to cover more platforms and policies. 

The power of social media platforms has been a focal point of critical discussion and research – long before Musk took over Twitter. Platforms corporate policies are a key measure of the way platforms govern and order public discourse as they articulate which kind of content and conduct is allowed and prohibited on their services. These rulebooks are the subject of the Platform Governance Archive (PGA), an open-access repository of platform policies which aims to enable collaborative research on/critical engagement with how and when and why platforms are changing their rules founded by the Alexander von Humboldt Institut for Internet and Society (HIIG) and now hosted at the University of Bremen. 

The need for the systematic study of platform policies 

When first launched in April 2021, the PGA emerged out of the need to systematically study the historical evolution of platform policies and due to the lack of coherently collected data in this area, which did not rely on the platforms’ own corporate archives. The resulting PGA v1 dataset  <https://www.platformgovernancearchive.org/data/dataset-pga-v1-historical-dataset/>contains all historical versions of the Terms of Service, Community Guidelines and Privacy Policies by Facebook, YouTube, Twitter and Instagram (with the exception of YouTube’s Community Guidelines) from the time when they were first introduced through late 2021. 

New download option and data paper

The dataset was built through a combination of automated and manual approaches of data collection and data cleaning which are explained in detail in our newly published data paper <https://doi.org/10.26092/elib/2331>. The paper also lays out the conceptual set up of the PGA and gives a detailed overview of the specificities of the included policies as well as some of the general trends and patterns which run through the historical evolution of the PGA v1 corpus.

As part of the new PGA website <https://www.platformgovernancearchive.org/>, the dataset is now available as a direct download <https://github.com/PlatformGovernanceArchive/pga-corpus/releases/>. Overall, the corpus of the PGA v1 contains 153 policy documents with a total of 6,036 pages, which are provided in PDF, HTML and Markdown formats. The downloadable archive furthermore contains additional material and tools that were used in the data collection process. 

Collaboration with Open Terms Archive: New dataset includes more platforms and policies 

With the relaunch of PGA, we are also publishing a new dataset <https://www.platformgovernancearchive.org/data/dataset-pga-v2-ongoing-collection/> which widens the scope of the PGA to cover 18 platforms and currently 79 policies. The dataset is generated in collaboration with Open Terms Archive <https://opentermsarchive.org/> (OTA), an open source initiative which is dedicated to increase the transparency and democratic oversight of digital services. 

The timeline of the PGA v2 dataset goes back to April 2022 and is automatically updated on a daily basis to enable the continuous tracking of changes in the selected policies. Whenever a change is made to one of the tracked policies, the system stores a snapshot to a Github repository where the change can be examined by ways of a change visualisation. The dataset can also be downloaded as a bulk download as an archive of Markdown files. 

Funding for the PGA has been provided by the hosting institutions as well as by different partners and funding schemes such as the EU Horizon 2020 project reCreating Europe <https://www.recreating.eu/>, Wikimedia Deutschland and the Data Science Center (DSC) at the University of Bremen.

Future directions 

In the future, the PGMT Lab will continue developing the PGA by merging the historical dataset with the ongoing data collection into an integrated dataset. The roadmap also includes the addition of more platforms and more language versions. The PGA has been used for a growing body of research on platform policies <https://www.platformgovernancearchive.org/research/> and enables researchers, journalists and the public to answer questions on the historical evolution of platform policies.

More information about the Air-L mailing list