<< January 2007 >>
Sun Mon Tue Wed Thu Fri Sat
 01 02 03 04 05 06
07 08 09 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31


If you want to be updated on this weblog Enter your email here:



rss feed



Thursday, January 18, 2007
The Google power

REAL DATA
Google has been keeping historical records for quite a while now -- records of how fast content appears, changes and vanishes over different types of web properties. They also keep historical records of backlinks -- their rate of appearance, change, and disappearance in various types of scenarios.

Google has been studying all this quite directly, en masse and "in the wild" rather than just generating ivory-tower abstractions. By now Google has an immense data set.

FOOTPRINTS and PROFILING
Google has established statistically significant footprints for what is natural and un-natural in the areas of appearance, change and disappearance for content, links, and who knows what else. And they can generate such footprints for various "types" of sites and market areas.

With that pile of data, Google can generate very sophisticated web-maps and visual representations of their data -- and confine those maps to various slices of the whole. They can build extremely accurate link profiles and then see visually what the mean distribution really looks like -- with regards to rate of link acquisition, the ratio of deep linking to domain root linking, the differences comparing branded corporate sites to free hosted pages -- on and on.

THE BIG VIEW and CLOSE-UP
Google can zoom out for an overview of the entire web, or zoom in to look at just e-commerce sites, or just sites without affiliate links. They can profile one single domain and overlay its footprint with the mean profile or footprint they've collected for similar sites or the web as a whole.

They can designate certain hot spots such as "manipulative linking nodes" and display them in red on a link map. Once the available data set grows to a certain level, amazing and apparently magical learnings become simple.

TECHNICALLY PRECISE MEANINGS
In a situation like this, words like "natural" or "manipulative" can take on very precise and rigorous technical meaning. And deviation from the normal footprint can be measured algorithmically and have automated consequences.

Statistically significant deviation can also raise a flag for a human to visually inspect the webmap and associated footprints. When major deviations are spotted, they will not commonly be "false positives" -- although with statistics, anything is still possible in a single isolated case.

GROWING SOPHISTICATION
A lot of the oddities that we see recently in search results are improvments to the sophistication of this kind of data modeling. Big Daddy gave Google a lot more elbow room to crunch many more numbers, keep more records, and so on.

Google has evolved a long way from the crude text matching that characterised early attempts at web search. When we work to understand what is happening with the SERPs today, we should appreciate the near-magic that has been created at Google and not be too primitive in our assumptions about what is, or soon will be, possible on their back end.

And the beauty of such an approach - it scales, and it IMPROVES as scale increases.

CAVEAT
And as I said, what I wrote here contains a lot of educated guesswork, fueled by studying Google patents and by close listening to what Googlers say and how they choose their words. I don't "know" that this is all true, but I'd be very surprised if it isn't.


Posted at 07:31 am by PioneerGold
 

Previous Entry Home Next Entry