Authority versus Page Rank

September 17, 2006 · · Posted by Greg Lloyd

On 15 Sep 2006 Tim Bray wrote in Wikipedia: Resistance is Absent:

What happened was, I went to check out the new Microsoft search engine at live.com (it’s not bad), and I started by looking for myself. I was kind of surprised when my Wikipedia entry came in ahead of ongoing. (Wikipedia’s #2 at Google and Yahoo.) I’m seeing this pattern of Wikipedia inching up the search-result charts for a whole lot of things. Search-result rank, on the Internet, more or less equals Authority. So this trend has to worry the anti-Wikipedians. It worries me too. Maybe it could be reversed, but I don’t think so.

As an example, Tim uses a search: "for each of the ten provinces of Canada, what is its population?" and notes that population figures from government sites are available but hard to find, and scattered throughout cyberspace with horribly meaningless URI's. Tim ask: "Would you bookmark them? ... how confident are you that they would be there after the next site re-org?" [versus the stable and increasingly popular but less authoritative pages of WikiPedia]. Tim posits that the Wikipedia is going to win the page rank battle, and asks for other plausible outcomes.

I read global Search-result rank = authority as the underlying problem, and the ability to select a (large) collection of sources to weight page rank as an interesting alternative. Use the content (or just the links) of a large collection of reference libraries when searching for authoritative facts. Use the content (or just the links) of pop culture sources - or the universal web - when searching for the latest on Britney. This is a very tall order, but I think it's an interesting concept to explore.

The Wikipedia has become a popularly cited source and uses an encyclopedic organization that is extremely "page rank friendly" since there is exactly one article titled Douglas Engelbart (title text is weighted heavily), and when internal WikiPedia or external references to Douglas Engelbart are created, they will tend to link directly to that one URL, rather than being distributed among many articles that talk about Douglas Engelbart in the context of hypertext, history of the graphical user interface, the Hypertext Editing System, etc.

When doing serious research, I want to be able to select the authority and viewpoint of sources used to rank my search results. A search for facts would use different sources than a search for buzz about a person, product or political topic. Is it surprising that someone interested in the latest scoop on Britney Spears versus the population of Canadian provinces might want different page rank based a priori as well as calculated authority of linking sources? (e.g. links from a trusted World Almanac vs the National Enquirer).

In Tim's example, when searching for specific population facts, I'd want to use a weighted search based on links within and extending from trusted reference libraries whose content is selected, reviewed and updated by professional reference librarians. There would of course be many choices, and the union of the content of many reference libraries would be more valuable than a single library for page rank weighted search.

If I was a student at Brown University, I might use anonymous links abstracted from Brown faculty, student, and elibrary content as weighing factors for research (you'd also see content hits in local sources you're permitted to read). You might choose a national reference library, a corporate reference library, or any other aggregate source you have permission to use as a relevance ranking resource. Except for special purposes, you'd likely choose the largest aggregate whose authority (and viewpoint) you trust for your research objective.

For specialized research on hypertext or the history of computing, I might choose a collection of references from specific researchers or organizations whose opinions I particularly value for the purpose of weighting my general search requests without asking those researchers or organizations to explicitly disclose the content of their collections. I just want to use their content to weight the page-rank relevance of the content I search.

For example, the IEEE might make link references of everything they publish available for weighted-page rank calculation without necessarily disclosing the content. Members might be encouraged pool their own relevance ranked references using social software like Furl or del.icio.us.

More organically, individuals and groups within an organization might give permission for their blog / wiki content to be used for link weighting (with or without disclosing the content). Many large professional or scientific organizations or enterprises might pool their references to form a link base of authority weighted references large enough to encompass a large portion of relevant content on the public web as well as their private sources.

It becomes particularly important to be able to deal with a variety of sources, since for many work products there will be more than one "authoritative" source of facts, opinions, and analysis. Having one popular WikiPedia is good, and useful in the same sense that having one World Almanac for 2006, Oxford English Dictionary, or good source of movie reviews in imdb.org.

But real world research also needs to deal with difference in opinion, analysis, historical or national perspective. I'd be willing to bet that dailykos.com and rushlimbaugh.com could get into a fight over their opinions on the population of Canadian provinces, regardless of the facts. And you might want to research how many people hold different opinions, why, what influences their opinions, and how their opinions change over time (it's called political science).

Sources can of course also disagree on "objective facts" - such as population of Canadian provinces - based on how the information is collected and by whom. It's good to try to reconcile alternatives to arrive at a consensus on "facts" for ready reference, but important to remember that consensus always reflects a point of view at a particular time.

Page Top