Wednesday, May 07, 2014

No, McAfee didn't violate ethics scraping OSVDB

My twitter feed is full of people retweeting this claim that McAfee (the company) violated ethics by scraping http://osvdb.org. This is completely wrong: McAfee violated no ethics (nor law).

Public information is public, and it's not a crime (nor ethics violation) to access it. As a community, we strongly supported Aaron Swartz and Andrew Auernheimer defending this principle. If you'll recall, they were accused (and convicted in Weev's case) of scraping public websites. And this is all that some engineer at McAfee did. We can't apply this principle when it's convenient, when it's our friends, then turn around and deny the principle to others we don't like.

If McAfee then republishes that information without permission, that certainly could be an ethics/legal violation, because that's publishing copyrighted material. But that's not what McAfee is accused of doing. They are accused simply of accessing the information.

What about the clause from their license that says the following? Doesn't that forbid scraping?
4. Obtaining data from this website in a programmatic fashion (e.g. scraping via enumeration, web robot, crawler, etc) is prohibited. Such activity is likely to trigger security software that will permanently block your IP from accessing the site.
No, it doesn't. I can put a license file on my website that forbids anybody from accessing the site who isn't standing on their head, but such a thing has no ethical or legal meaning. The only thing that has meaning is that you published it on your website -- you can't retroactively take that back and tell people they can't access it despite it being public. Such a license can restrict how people republish the information, of course, and of course it can dictate terms for private access, but if you are making information public, it's public.

Indeed, OSVDB doesn't even have a robots.txt file that backs up this statement in the license:

# from http://www.last.fm/robots.txt
User-Agent: *
Disallow: /harming/humans
Disallow: /ignoring/human/orders
Disallow: /harm/to/self

# http://www.shopwiki.com/wiki/Help:Bot
User-Agent: ShopWiki
Disallow: /

User-Agent: www.changedetection.com
Disallow: /

User-Agent: Mozilla/5.0 (compatible; mon.itor.us - free monitoring service; http://mon.itor.us)
Disallow: /

User-Agent: cognitiveseo.com
Disallow: /


What we are looking for is a clause that says "User-agent: *" followed by "Disallow: /", but it's not there. Of course, even if it were there, it still doesn't make the license valid or the information private. Public information is public.

When your organization starts with the word "Open" and your robots.txt doesn't forbid scraping, it's pure lunacy to complain when people do exactly what these things imply.

All corporations care a lot about intellectual property. Individual employees often make mistakes, of course, but corporate policy and practices keep track of all third party contributions to what they sell. It's unlikely that OSVDB information would ever make it into McAfee products/services without it being clearly tracked by the company -- and paid for.

Conversely, what does happen is that engineers download open stuff and try it out. It's quite likely that an engineer would scrape some information, writing some Node.js scripts to parse it, and see how difficult it would be to integrate into an offering. When what they are working with is public, they certainly wouldn't ask permission. The assumption that scraping information means that the company intends to republish it is wrong.

Again, to reiterate, accessing public websites is not illegal, wrong, nor unethical. We've fought for this principle in the Weev/Swartz cases, and it applies equally here.


License for blog.erratasec.com: you may only read this website if you are standing on your head.

5 comments:

Bwanshoom said...
This comment has been removed by the author.
Craig Williams said...

˙pǝʇsǝnbǝɹ sɐ ƃuᴉop ʇou ʎq ǝɔuǝɔᴉl ɹnoʎ ǝʇɐloᴉʌ oʇ ʇuɐʍ ʇupᴉp ᴉ ʇnq 'ʎʞɔᴉɹʇ ʎllɐǝɹ sɐʍ ʇuǝɯɯoɔ sᴉɥʇ ƃuᴉdʎʇ ʇnq'pɐǝɥ ʎɯ uo ƃuᴉpuɐʇs ǝlᴉɥʍ pɐǝɹ oʇ ʎsɐǝ ʎlǝʌᴉʇɐlǝɹ sɐʍ ʇᴉ 'ǝlɔᴉʇɹɐ ǝɔᴉN

bob mcbobberson said...

I have issue with the implied use being commercial.

That engineer may well be scraping the data but who knows why or what for... Are big companies not allowed to nurture pet projects or research any more?

Until there is factual proof McAfee abused the data by reselling it within their own products, I'm sorry but there is no way osvdb can infer anything from the accesses.

Backrow said...

I just looked and OSVDB has a post up about abuse of the individual non-commercial nature of the site.

McAfee "made 2,219 requests between 06:25:24 on May 4 and 21:18:26 on May 6. Excuse us, you clearly didn’t want to try our service back then. If you would like to give a shot then we kindly ask you to contact RBS so that you can do it using our API, customer portal, and/or exports as intended."
-- http://blog.osvdb.org/2014/05/07/the-scraping-problem-and-ethics/

The more I read you lately Robert, the more irritating you get. You start out ill-informed many times -- like with bitcoin "change", heartbleed, and other things. Then you make some apology post where you admit you don't know your ass from a backspace key.

Shame.

Shritam Bhowmick said...

I don't see any legal ramifications turing out to be in favor of OSVDB either. Because either way, the information was public. Consider Git, if you can fork and pull code, only if the code itself were made public.

Don't want to share? make the code or teh content out of reach from public domain, grab a license and commercialize with violations purely indicated towards copyrighted content.

Also, how come the vulnerabilities which were collective effort of individual penetration testers, researchers, and security ethuaists become public property of OSVDB alone? that must be weird, but THINK!