Saturday 16 March 2013

What do the Twitter backups downloadable files look like

After about a minute from sending the request shown at the end of Feature request: Tweet backups to Git/GitHub ,  I received an email with:


image

Accessing the link from a non-logged-in-to-twitter account shows the sign-in page

image

opening that link from a valid twitter sessions shows:

image

which changes to (when clicked):

image

The download is 1.6Mb (which I placed on my DropBox folder for an extra backup)

image

The unzipped data is 7Mb:

image

The Readme.txt provides some good info:

image

Here is the data in CSV format

image

Here the local version (via the index.html)

image

We can open tweets from a particular month:

image

And interestingly, the search works (I wonder how that happens)

image 

Here are the files loaded when one of the month’s data is loaded

image

which is consuming the data stored in

image

and

image

Here is an example of the data contained in these js files (which are basically json objects)

image

This is quite nicely done

What I’m thinking is that there should be a public repository of these backup files, which could then be used for data analysis and historic searches.

For example, it would be great to have the OWASP leaders and temp accounts created for conferences available this way :)

Does this already exist? (public archive of twitter data)