GitHub provides 20+ event types, which range from new commits and fork events, to opening new tickets, commenting, and adding members to a project. These events are aggregated into hourly archives, which you can access with any HTTP client:
Query | Command |
---|---|
Activity for 1/1/2015 @ 3PM UTC | wget https://data.gharchive.org/2015-01-01-15.json.gz |
Activity for 1/1/2015 | wget https://data.gharchive.org/2015-01-01-{0..23}.json.gz |
Activity for all of January 2015 | wget https://data.gharchive.org/2015-01-{01..31}-{0..23}.json.gz |
Each archive contains JSON encoded events as reported by the GitHub API. You can download the raw data and apply own processing to it - e.g. write a custom aggregation script, import it into a database, and so on! An example Ruby script to download and iterate over a single archive:
- Activity archives are available starting 2/12/2011.
- Activity archives for dates between 2/12/2011-12/31/2014 was recorded from the (now deprecated) Timeline API.
- Activity archives for dates starting 1/1/2015 is recorded from the Events API.
For the curious, check out The Changelog episode #144 for an in-depth interview about the history of GH Archive, integration with BigQuery, where the project is heading, and more.
from Hacker News https://ift.tt/2NMOAgw
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.