Wednesday, May 09, 2007

Notes from Google Invalid Clicks Seminar

I spent 5 hours last week attending an ‘Invalid Clicks’ seminar at Google yesterday, learning about their views on click fraud as well as what they do to prevent it. We've certainly come a long way from the days of the H.M.S. Click Monkey. Here’s a summary:

Bucket o’ data
-Google said <10% (my read = 8-10%) of total clicks are determined by Google’s proactive and reactive measures to be invalid.

-‘Invalid’ means both fraudulent clicks as well as invalid clicks in the case of
double clicks, back button impressions, etc.

-Google really wants to get people thinking about ‘invalid clicks’ rather than click fraud because clicks that advertisers shouldn’t pay for encompass both fraudulent and non-fraudulent clicks. I can see their point - most clicks not worth paying for are non-fraudulent.

-Google claims that only the infinitesimally small <0.02% of invalid clicks are not detected by Google’s proactive filters or Google-initiated offline analysis. Interestingly, GEICO was in the audience and said 0.02% is roughly the amount of refunds they alone got for clicks they found to be click fraud, so that tells you the truth is probably two decimal points over (~2% IMO).

Statistical Anomaly Detection (that’s S.A.D.)
That’s the name for Google’s 100 or so filters that are applied in real-time to detect and filter out invalid clicks. The system captures “the overwhelming majority” of invalid clicks, and only a minority are captured offline by human-led analysis. [NOTE: given the amount of reactive refunds I've seen, I think reactive captures are closer to 0.2-2% than 0.02%.]

Invalid Clicks Team
36 people, which includes engineers, operations and support reps. A good part of that team was at the seminar, and they were very nice people.

False Positives
Google feels strongly that the absolute $ amount of invalid click refunds given has stayed the same or declined since 2004, all while advertiser requests have gone up by at least 2-3X. This means there’s a huge amount of false positives, a point I tend to agree with. False positives include:

a) AOL Proxy. To most advertisers, multiple clicks from one AOL IP address looks like click fraud, when in fact it’s AOL’s proxy servers representing multiple AOL users.

b) Back button. Unless advertisers/agencies use auto-tagging and URL redirects, when the searcher clicks on an ad, visits and site and then clicks the back button (think how often we all do that), to many advertisers this appears as multiple clicks from one user. And given how few advertisers are sophisticated in their approach to tracking, it’s not surprising that back button accounts for 40%+ of what advertisers initially deem suspicious activity.

More Transparency Going Forward
1)Google avowed that sometime in Q2 ’07 they will offer IP Exclusion, whereby an advertiser can specify IP addresses or address ranges for which they don’t want their ads to show.

2)Also during Q2, they will provide enhanced reporting, which most in the room took to mean ad group or keyword-level ‘invalid click’ reports.

3)As certain of Google’s search and contextual distribution relationships come up for renewal, Google is inserting in the renewal contracts language that [finally] gives them more control over who the partner can sub-syndicate to. This should help control click arbitrageurs (‘garbitrageurs’) from whom click fraud oftentimes emanates.

EF’s role in all this
Many of the attendees were large advertisers who universally complained that they don’t have the bandwidth to constantly monitor referrer logs to make sure that [(PPC traffic – invalid clicks) = PPC $ Spent], and they also said that no SEM or agency is capable of scaling to do this for them and all the SEM's other clients. This is clearly one of the reasons Google's reactive rate is so low - advertisers just don't have the time to do the digging. Good SEMs, then, must develop technical solutions that take advantage of the add’l transparency Google is set to provide in Q2. And the industry must continue to push the search engines to clear up their distribution networks - and reward them when they do so.

The Middle Ground
Lest I be called 'brainwashed by Google', I'll point out the following areas where I think Google's being 'Plexpomorphic':

1) 0.02%. No way Google's filters catch 99.98%. If the sum total of advertiser-lead refunds in my immediate vicinity is 5X that and we manage 3-4% of total search spend, then the number's probably closer to 2%. It appeared that Google spends 99.98% of their time looking at their own data; I would urge them to look at the advertisers' and SEM's data more often, even if that means paying for the privilege.

2) The real problem that needs to be talked about more is invalid distribution partners. Google *does* seem to be preparing to address this in Q2, but putting all the burden on advertisers and agencies to proactively identify low-quality distibution partners is nearly as bad as doing nothing at all about the problem. Why should it be the advertisers' sole responsibility to police such a huge network? The answer is - it shouldn't.

3) Google's PPA offering should be rolled out to its entire network, as that is the most efficient and effective way to combat invalid clicks. Google is doing itself and the industry a disservice by continuing to operate its business with clicks and CPC as a translation layer between search engine Revenue Per Query (RPQ) and advertiser ROI.

Lastly, it's also important to state Google's relative position on invalid clicks. Google is doing a much better job than any other search engine in dealing with invalid clicks, as measured by Google's relatively higher ROI, an impressive feat given how much bigger they are than any other SE. Kudos to Google for that.


Blogger Richard said...

Appreciate the summary. Two questions:

1) Do you think Google will ever create an AdWords "domain network" option for the AdSense for Domains sites rather than spreading that traffic over the existing search and content networks?

2) In light of your distribution fraud posts, I'm a little perplexed by the Efficient Frontier "case study" that claims domain traffic converts better than both search and content traffic. Are you aware of this? Is Efficient Frontier selling out? ;-)

I came across it on the AdWords help page about parked domain traffic. Here's the "study" URL on Google's site:

BTW, you might want to have your colleagues correct this incorrect sentence in that document: "AdSense for domains is a part of the Google content network that enables effective distribution of advertisers' ads on parked domain pages."

9:13 PM

Blogger searchquant said...

Hi Richard,

1) Yes, I think they will; at least that level of granular choice is what they indicated would be rolled out in Q2. That said, however, we'll have to wait and see how granular granular is.

2) I hear you on the case study, but keep in mind that it was done before I started writing about distribution fraud, and without my involvement. Keep in mind, though, that it's a case study. Domain traffic *can* convert well in a unique instance, but that doesn't necessarily mean that it does in general. Your point's valid, though, and we're working on creating a EF official position on the topic.

9:21 AM

Blogger Richard said...

Thanks. Let me know if you ever hear they are going to split the domain traffic onto its own network.

8:13 PM


Post a Comment

<< Home

Google Analytics Alternative