Friday, August 28, 2020

GH Archive

GitHub provides 20+ event types, which range from new commits and fork events, to opening new tickets, commenting, and adding members to a project. These events are aggregated into hourly archives, which you can access with any HTTP client:

Query Command
Activity for 1/1/2015 @ 3PM UTC wget https://data.gharchive.org/2015-01-01-15.json.gz
Activity for 1/1/2015 wget https://data.gharchive.org/2015-01-01-{0..23}.json.gz
Activity for all of January 2015 wget https://data.gharchive.org/2015-01-{01..31}-{0..23}.json.gz

Each archive contains JSON encoded events as reported by the GitHub API. You can download the raw data and apply own processing to it - e.g. write a custom aggregation script, import it into a database, and so on! An example Ruby script to download and iterate over a single archive:

  • Activity archives are available starting 2/12/2011.
  • Activity archives for dates between 2/12/2011-12/31/2014 was recorded from the (now deprecated) Timeline API.
  • Activity archives for dates starting 1/1/2015 is recorded from the Events API.

For the curious, check out The Changelog episode #144 for an in-depth interview about the history of GH Archive, integration with BigQuery, where the project is heading, and more.



from Hacker News https://ift.tt/2NMOAgw

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.