Tuesday, 21 May 2013

Downloading the entire NuGet package database

When I was having the serialization problems described in Saving the entire list of NuGet Packages I realized that adding a NuGet IPackage (retrieved from the GetPackages() method) to a SharePackageRepository will also download actual packages :)


This means that this script:

image_thumb[62]

will will get the data for the first 10 NuGet packages:

image_thumb[63]

i.e, this will also download of the actual nupkg (the zip with all the code+data files)

image_thumb[76]

(also stored in the local computer’s NuGet cache)

image_thumb[66]

This means that with this code:

image

we can download the entire NuGet library :)

On execution, all metadata will be downloaded first:

image_thumb[68]

which took 1m:32s (to get the information about 12197 packages):

image_thumb[69]

and then the download of all packages will start:

image_thumb[70]

and take a while:)

Here it is after after 10m (180 Mb downloaded)

image_thumb[71]

…. after 24m (505 Mb downloaded)

image_thumb[72]

…after 45m (697 Mbs downloaded)

image

…after 1h (1Gb downloaded)

image

After 1h 2m and 4100 fetched (1.50 GB downloaded) we hit an exception:

image
image
image

So we need to improve our code to handle errors (and cache the packages list)

The code below will open a new C# script editor (with the packages passed as the _package variable)
which can then be used to trigger another fetch:

image

and here we go again (with the git-got.0.0.1.nupkg error being catched):

image

…after 15m (1.78 Gb downloaded)

image

…after 45m (2.13Gb downloaded)

image

… after 50m (and 2.47Gb downloaded) I had to stop the current download thread (while changing locations)

image

Starting again on a new network (and physical location)

image

Finally (after a couple interruptions caused by the VM going to sleep) 3.99 Gb

image

the 27 errors were caused by network errors, re-running the scan we got 4 errors:

image

on these packages:

image

So now we have a 4Gb of cache folder

image

and 4 GB NuGet archive

image

each of these 12,193 folders containing an *.nupkg and *.nuspec file

image


See also Offline copy of the entire NuGet.org gallery. What should I do with these 4.05 Gbs of amazing .Net Apps/APIs?