Skip to content

Dataset for 'Characterizing “Permanently Dead" Links on Wikipedia' IMC submission.

Notifications You must be signed in to change notification settings

anishnya/Wikipedia-Permanently-Dead-Link-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Wikipedia-Permanently-Dead-Link-Dataset

This repository contains the dataset for 'Characterizing “Permanently Dead" Links on Wikipedia' IMC 2022 paper. The dataset contains 10,000 links in a JSON format. For any questions, or more information, please feel free to contact [email protected].

This is the following structure of the data.

{
    "url": <string>,
    "article_url": <string>,
    "current_status": <string>,
    "date_link_posted": <string>,
    "date_link_marked_dead": <string>,
    "copy_after_posted": [<string>, <string>],
    "copy_before_marked_dead": [<string>, <string>],
    "copy_after_marked_dead": [<string>, <string>]
}

For archived copy data, this is the following format.

[Archived Copy Date, Archived Copy Status Code]

For all dates, they are in the following format: YYYY-MM-DDTHH:MM:SS+00:00. Packages such as arrow-py, can automatically parse dates in this format.

About

Dataset for 'Characterizing “Permanently Dead" Links on Wikipedia' IMC submission.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published