There's an answer to that: let's monitor the DefCon Internet connection. It's 100-mbps sustained traffic throughout the day with the craziest stuff that goes across the Internet. DefCon protects this traffic, protecting peoples privacy (except the public WiFi of course), but maybe they should open this up to researchers.
For example, consider these questions:
- What percentage of the websites attendees visit (like Facebook) are protected with 1024-bit SSL keys that can be easily broken by the NSA?
- How many use "forward security"?
- How many use VPNs in a manner that can easily be cracked, such as those using MSChapV2?
- How many BitCoin transactions, and what's the totally value transferred at DefCon?
- What is the percentage of HTTPS vs. HTTP?
- What is the percentage of mobile vs. desktop browsers? Mobile apps?
- How much is Tor? BitTorrent? YouTube streaming?
- How much is "hacker" traffic, such as nmap scans or tunneling non-DNS traffic on DNS port 53?
- What are some cool Maltego transforms that can link people together in 3 degrees of separation?
- How many unique SMTP email addresses were seen? unique login names? passwords? password hashes?
In short, assuming that the NSA is monitoring the DefCon traffic, what do they see?
All of this info is interesting without having to tie it back to individual identities. We can report the number of BitCoins transferred at DefCon without revealing who did them, for example.
Unfortunately, the privacy difficulties may be insurmountable. The number of researchers data mining this would have to be small, and they'd have to sign NDAs, but there is a good chance that even this isn't practical.
The sorts of researchers you want is those making deep-packet-inspection tools, such as my Ferret tool. So consider me a typical researcher.
I never need to see the network traffic itself. I can update my tool to answer the above statistics using traffic from other sources. Thus, I can build a tool that generates the above information so that DefCon can show the updating statistics live on their website next year. But the results will be incomplete. To extract the best information, I need a copy of the real traffic to work from. I need to run it through my code, produce results, then go back and modify the code to produce better results.
I'm going to update my Ferret tool over the next year to produce such results live for next year. Hopefully, the conference organizers will find a way to do a live display of the results. But, the reason I'm posting this is to encourage The Dark Tangent to collect a large capture (e.g. 1-terabyte) of real data and let me play with it under an NDA.