Some Clarifying Notes on My LDA Post Yesterday

Updated September 7, 2010

Doc Sheldon

A friend of mine and I were discussing the topic of LDA and SEOmoz’ new tool, which I mentioned in yesterday’s post. The conversation turned to all the chatter on the web about it, and that he felt my post just fanned the flames. I was a little surprised by that, because I was careful not to convey the notion that I was “calling anyone out”.  So I went back and re-read the portion pointed out to me, and in all fairness, I can see how someone might get the impression that I was being critical of Rand, SEOmoz or their tool… or maybe all three.

That was definitely not my intention. So I want to clarify my position.

First, though, I’ll say that I went back and re-read Rand’s piece again (that’s THREE times now). Don’t get me wrong… I don’t mean to imply that I totally understood it. Not by a far cry! I DO, however, understand more about term vector models and topic modeling than I did this time last week.

And I re-read my post, trying to be objective as possible. Here’s the part my friend thought might have been misinterpreted, and I’m inclined to agree:

“My opinion, at this point, is that the moz LDA tool might give us some help in discovering a relevancy problem, but it falls short of the more subtle implications in Rand’s article.”

The first part of that statement isn’t meant to discredit the tool, only to say that in my opinion, it might offer some enlightenment, if we were experiencing relevancy issues. Frankly, I see its benefit as still somewhat limited. It is, when all is said and done, just a piece of software, one of a half dozen or so that can be found on-line, and to me, that implies that it will have to prove itself to me. I’ve played with it for a couple of hours, and although I thought the standard deviation was a little on the high side, I think it has potential.

The second portion of that statement, regarding the “subtle implications in Rand’s article” is more subjective. Early in the piece, the tone was set, thus:

“…there have been a number of negative remarks and criticisms from several folks in the search community suggesting that LDA (or topic modeling in general) is definitively not used by the search engines. We think there’s a lot of evidence to suggest engines do use these…”

I suppose I found this suggestive of the notion that SEOmoz had developed a tool that used the same technology as the search engines.

Then, there were a number of references, including the title of the article, to the correlation between Ben’s tests and Google’s results. Again, that may have been suggestive, although Rand was careful to point out, several times, that there appeared to be a correlation. In fact, he even made this statement:

“While the correlations are high, and the excitement around the tool both inside SEOmoz and from a lot of our members and community is equally high, this is not us “reversing the algorithm.” We may have built a great tool for improving the relevancy of your pages and helping to judge whether topic modeling is another component in the rankings, but it remains to be seen if we can simply improve scores on pages and see them rise in the results.”

So, it would seem that my closing statement yesterday was, in fact, suggestive of more skepticism on my part than might be reasonable for any other new tool. All I can say is that I may have been unconsciously skeptical of the possibility of someone discovering the secret entrance to the Google Cave. I certainly never meant to imply that SEOmoz, the company, or Rand Fishkin, the CEO, were being less than totally honest and straightforward about the testing and the tool. If anyone got that impression, it is totally wrong, and entirely my fault.

I still have my doubts about the efficacy of this tool. But had I been born a century or so earlier, I would have undoubtedly had my doubts about electricity, as well. That’s just the way I am. Do I see some use for it? Certainly! I just don’t think it’s yet earned my confidence.

I would also offer this: Even if THIS tool turns out to be a total waste of effort (and I certainly don’t believe that’s the case), it is STILL a step toward understanding how to develop a truly semantic-capable search engine. It really doesn’t matter whether the technology used here is identical in nature to that used by Google… hell, it may end up being BETTER! Eventually, SEOmoz,  or someone like them, will build another iteration, then another… until finally, they get the one they want. That’s the way things get designed. Not all at once, but in baby steps.