Public information is public, and it's not a crime (nor ethics violation) to access it. As a community, we strongly supported Aaron Swartz and Andrew Auernheimer defending this principle. If you'll recall, they were accused (and convicted in Weev's case) of scraping public websites. And this is all that some engineer at McAfee did. We can't apply this principle when it's convenient, when it's our friends, then turn around and deny the principle to others we don't like.
If McAfee then republishes that information without permission, that certainly could be an ethics/legal violation, because that's publishing copyrighted material. But that's not what McAfee is accused of doing. They are accused simply of accessing the information.
What about the clause from their license that says the following? Doesn't that forbid scraping?
4. Obtaining data from this website in a programmatic fashion (e.g. scraping via enumeration, web robot, crawler, etc) is prohibited. Such activity is likely to trigger security software that will permanently block your IP from accessing the site.No, it doesn't. I can put a license file on my website that forbids anybody from accessing the site who isn't standing on their head, but such a thing has no ethical or legal meaning. The only thing that has meaning is that you published it on your website -- you can't retroactively take that back and tell people they can't access it despite it being public. Such a license can restrict how people republish the information, of course, and of course it can dictate terms for private access, but if you are making information public, it's public.
Indeed, OSVDB doesn't even have a robots.txt file that backs up this statement in the license:
# from http://www.last.fm/robots.txt User-Agent: * Disallow: /harming/humans Disallow: /ignoring/human/orders Disallow: /harm/to/self # http://www.shopwiki.com/wiki/Help:Bot User-Agent: ShopWiki Disallow: / User-Agent: www.changedetection.com Disallow: / User-Agent: Mozilla/5.0 (compatible; mon.itor.us - free monitoring service; http://mon.itor.us) Disallow: / User-Agent: cognitiveseo.com Disallow: /What we are looking for is a clause that says "User-agent: *" followed by "Disallow: /", but it's not there. Of course, even if it were there, it still doesn't make the license valid or the information private. Public information is public.
When your organization starts with the word "Open" and your robots.txt doesn't forbid scraping, it's pure lunacy to complain when people do exactly what these things imply.
All corporations care a lot about intellectual property. Individual employees often make mistakes, of course, but corporate policy and practices keep track of all third party contributions to what they sell. It's unlikely that OSVDB information would ever make it into McAfee products/services without it being clearly tracked by the company -- and paid for.
Conversely, what does happen is that engineers download open stuff and try it out. It's quite likely that an engineer would scrape some information, writing some Node.js scripts to parse it, and see how difficult it would be to integrate into an offering. When what they are working with is public, they certainly wouldn't ask permission. The assumption that scraping information means that the company intends to republish it is wrong.
Again, to reiterate, accessing public websites is not illegal, wrong, nor unethical. We've fought for this principle in the Weev/Swartz cases, and it applies equally here.