![]() |
| Home RSS Directory F.A.Q Try Custom Feed Sonneries Portable |
Latest Flows from this sub-category: random selection from this sub-category: |
Articles, interviews, and case studies on a wide range of search engine optimization (SEO) topics. Copyright: StoneTemple Consulting (STC) Mon, 15 Mar 2010 20:44:30 +0100 Published: March 14, 2010
Matt Cutts joined Google as a Software Engineer in January 2000. Before Google, he was working on his Ph.D. in computer graphics at the University of North Carolina at Chapel Hill. He has an M.S. from UNC-Chapel Hill, and B.S. degrees in both mathematics and computer science from the University of Kentucky. Matt wrote SafeSearch, which is Google's family filter. In addition to his experience at Google, Matt held a top-secret clearance while working for the Department of Defense, and he's also worked at a game engine company. He claims that Google is the most fun by far. Matt currently heads up the Webspam team for Google. Matt talks about webmaster-related issues on his blog. Interview Transcript Eric Enge: Let's talk a little bit about the concept of crawl budget. My understanding has been that Googlebot would come to do a website knowing how many pages is was going to take that day, and then it would leave once it was done with those pages. Matt Cutts: I'll try to talk through some of the different things to bear in mind. The first thing is that there isn't really such thing as an indexation cap. A lot of people were thinking that a domain would only get a certain number of pages indexed, and that's not really the way that it works. There is also not a hard limit on our crawl. The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline. Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often. One thing that's interesting in terms of the notion of a crawl budget is that although there are no hard limits in the crawl itself, there is the concept of host load. The host load is essentially the maximum number of simultaneous connections that a particular web server can handle. Imagine you have a web server that can only have one bot at a time. This would only allow you to fetch one page at a time, and there would be a very, very low host load, whereas some sites like Facebook, or Twitter, might have a very high host load because they can take a lot of simultaneous connections. Your site could be on a virtual host with a lot of other web sites on the same IP address. In theory, you can run into limits on how hard we will crawl your site. If we can only take two pages from a site at any given time, and we are only crawling over a certain period of time, that can then set some sort of upper bound on how many pages we are able to fetch from that host. Eric Enge: So you have basically two factors. One is raw PageRank, that tentatively sets how much crawling is going to be done on your site. But host load can impact it as well. Matt Cutts: That's correct. By far, the vast majority of sites are in the first realm, where PageRank plus other factors determines how deep we'll go within a site. It is possible that host load can impact a site as well, however. That leads into the topic of duplicate content. Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We'll drop two out of the three pages and keep only one, and that's why it looks like it has less good content. So we might tend to not crawl quite as much from that site. If you happen to be host load limited, and you are in the range where we have a finite number of pages that we can fetch because of your web server, then the fact that you had duplicate content and we discarded those pages meant you missed an opportunity to have other pages with good, unique quality content show up in the index. Eric Enge: That's always been classic advice that we've given people, that the one of the costs of duplicate content is wasted crawl budget. Matt Cutts: Yes. One idea is that if you have a certain amount of PageRank, we are only willing to crawl so much from that site. But some of those pages might get discarded, which would sort of be a waste. It can also be in the host load realm, where we are unable to fetch so many pages. Eric Enge: Another background concept we talk about is the notion of wasted link juice. I am going to use the term PageRank, but more generically I really mean link juice, which might relate to concepts like trust and authority beyond the original PageRank concept. When you link from one page to a duplicate page, you are squandering some of your PageRank, correct? Matt Cutts: It can work out that way. Typically, duplicate content is not the largest factor on how many pages will be crawled, but it can be a factor. My overall advice is that it helps enormously if you can fix the site architecture upfront, because then you don't have to worry as much about duplicate content issues and all the corresponding things that come along with it. You can often use 301 Redirects for duplicate URLs to merge those together into one single URL. If you are not able to do a 301 Redirect, then you can fall back on rel=canonical. Some people can't get access to their web server to generate a 301, maybe they are on a school account, a free host, or something like that. But if they are able to fix it in the site architecture, that's preferable to patching it up afterwards with a 301 or a rel=canonical. Eric Enge: Right, that's definitely the gold standard. Let's say you have a page that has ten other pages it links to. If three of those pages are actually duplicates which get discarded, have you then wasted three of your votes? Matt Cutts: Well, not necessarily. That's the sort of thing where people can run experiments. What we try to do is merge pages, rather than dropping them completely. If you link to three pages that are duplicates, a search engine might be able to realize that those three pages are duplicates and transfer the incoming link juice to those merged pages. It doesn't necessarily have to be the case that PageRank is completely wasted. It depends on the search engine and the implementation. Given that every search engine might implement things differently, if you can do it on your site where all your links go to a single page, then that's definitely preferable. Eric Enge: Can you talk a little bit about Session IDs? Matt Cutts: Don't use them. In this day and age, most people should have a pretty good idea of ways to make a site that don't require Session IDs. At this point, most software makers should be thinking about that, not just from a search engine point of view, but from a usability point of view as well. Users are more likely to click on links that look prettier, and they are more likely to remember URLs that look prettier. If you can't avoid them, however, Google does now offer a tool to deal with Session IDs. People do still have the ability, which they've had in Yahoo! for a while, to say if a URL parameter should be ignored, is useless, or doesn't add value, they can rewrite it to the prettier URL. Google does offer that option, and it's nice that we do. Some other search engines do as well, but if you can get by without Session IDs, that's typically the best. Eric Enge: Ultimately, there is a risk that it ends up being seen as duplicate content. Matt Cutts: Yes, exactly, and search engines handle that well most of the time. The common cases are typically not a problem, but I have seen cases where multiple versions of pages would get indexed with different Session IDs. It's always best to take care of it on your own site so you don't have to worry about how the search engines take care of it. Eric Enge: Let's touch on affiliate programs. You are getting other people to send you traffic, and they put a parameter on the URL. You maintain the parameter throughout the whole visit to your site, which is fairly common. Is that something that search engines are pretty good at processing, or is there a real risk of perceived duplicate content there? Matt Cutts: Duplicate content can happen. If you are operating something like a co-brand, where the only difference in the pages is a logo, then that's the sort of thing that users look at as essentially the same page. Search engines are typically pretty good about trying to merge those sorts of things together, but other scenarios certainly can cause duplicate content issues. Eric Enge: There is some classic SEO advice out there, which says that what you really should do is let them put their parameter on their URL, but when users click on that link to get to your site, you 301 redirect them to the page without that parameter, and drop the parameter in a cookie. Matt Cutts: People can do that. The same sort of thing that can also work for landing pages for ads, for example. You might consider making your affiliate landing pages ads in a separate URL directory, which you could then block in robots.txt, for example. Much like ads, affiliate links are typically intended for actual users, not for search engines. That way it's very traceable, and you don't have to worry about affiliate codes getting leaked or causing duplicate content issues if those pages never get crawled in the first place. Eric Enge: If Googlebot sees an affiliate link out there, does it treat that link as an endorsement or an ad? Matt Cutts: Typically, we want to handle those sorts of links appropriately. A lot of the time, that means that the link is essentially driving people for money, so we usually would not count those as an endorsement. Eric Enge: Let's talk a bit about faceted navigation. For example, on Zappos, people can buy shoes by size, by color, by brand, and the same product is listed 20 different ways, so that can be very challenging. What are your thoughts on those kinds of scenarios? Matt Cutts: Faceted navigation in general can be tricky. Some regular users don't always handle it well, and they get a little lost on where they are. There might be many ways to navigate through a piece of content, but you want each page of content to have a single URL if you can help it. There are many different ways to slice and dice data. If you can decide on your own what you think are the most important ways to get to a particular piece of content, then you can actually look at trying to have some sort of hierarchy in the URL parameters. Category would be the first parameter, and price the second, for example. Even if someone is navigating via price, and then clicks on a category, you can make it so that that hierarchy of how you think things should be categorized is enforced in the URL parameters in terms of the position. This way, the most important category is first, and the next most important is second. That sort of thing can help some search engines discover the content a little better, because they might be able to realize they will still get useful or similar content if they remove this last parameter. In general, faceted navigation is a tough problem, because you are creating a lot of different ways that someone can find a page. You might have a lot of intermediate paths before they get to the pay load. If it's possible to keep things relatively shallow in terms of intermediate pages, that can be a good practice. If someone has to click through seven layers of faceted navigation to find a single product, they might lose their patience. It is also weird on the search engine side if we have to click through seven or eight layers of intermediate faceted navigation before we get to a product. In some sense, that's a lot of clicks, and a lot of PageRank that is used up on these intermediate pages with no specific products that people can buy. Each of those clicks is an opportunity for a small percentage of the PageRank to dissipate. While faceted navigation can be good for some users, if you can decide on your own hierarchy how you would categorize the pages, and try to make sure that the faceted navigation is relatively shallow, those can both be good practices to help search engines discover the actual products a little better. Eric Enge: If you have pages that have basically the same products, or substantially similar products with different sort orders, is that a good application for the canonical tag? Matt Cutts: It can be, or you could imagine reordering the position of the parameters on your own. In general, the canonical tags idea is designed to allow you to tell search engines that two pages of content are essentially the same. You might not want to necessarily make a distinction between a black version of a product and a red version of a product if you have 11 different colors for that product. You might want to just have the one default product page, which would then be smart enough to have a dropdown or something like that. Showing minor variations within a product and having a rel=canonical to go on all of those is a fine way to use the rel=canonical tag. Eric Enge: Let's talk a little bit about the impact on PageRank, crawling and indexing of some of the basic tools out there. Let's start with our favorite 301 Redirects. Matt Cutts: Typically, the 301 Redirect would pass PageRank. It can be a very useful tool to migrate between pages on a site, or even migrate between sites. Lots of people use it, and it seems to work relatively well, as its effects go into place pretty quickly. I used it myself when I tried going from mattcutts.com to dullest.com, and that transition went perfectly well. My own testing has shown that it's been pretty successful. In fact, if you do site:dullest.com right now, I don't get any pages. All the pages have migrated from dullest.com over to mattcutts.com. At least for me, the 301 does work the way that I would expect it to. All the pages of interest make it over to the new site if you are doing a page by page migration, so it can be a powerful tool in your arsenal. Eric Enge: Let's say you move from one domain to another and you write yourself a nice little statement that basically instructs the search engine and, any user agent on how to remap from one domain to the other. In a scenario like this, is there some loss in PageRank that can take place simply because the user who originally implemented a link to the site didn't link to it on the new domain? Matt Cutts: That's a good question, and I am not 100 percent sure about the answer. I can certainly see how there could be some loss of PageRank. I am not 100 percent sure whether the crawling and indexing team has implemented that sort of natural PageRank decay, so I will have to go and check on that specific case. (Note: in a follow on email, Matt confirmed that this is in fact the case. There is some loss of PR through a 301). Eric Enge: Let's briefly talk about 302 Redirects. Matt Cutts: 302s are intended to be temporary. If you are only going to put something in place for a little amount of time, then 302s are perfectly appropriate to use. Typically, they wouldn't flow PageRank, but they can also be very useful. If a site is doing something for just a small amount of time, 302s can be a perfect case for that sort of situation. Eric Enge: How about server side redirects that return no HTTP Status Code or a 200 Status Code? Matt Cutts: If we just see the 200, we would assume that the content that was returned was at the URL address that we asked for. If your web server is doing some strange rewriting on the server side, we wouldn't know about it. All we would know is we try to request the old URL, we would get some content, and we would index that content. We would index it under the original URL's location. Eric Enge: So it's essentially like a 302? Matt Cutts: No, not really. You are essentially fiddling with things on the web server to return a different page's content for a page that we asked for. As far as we are concerned, we saw a link, we follow that link to this page and we asked for this page. You returned us content, and we indexed that content at that URL. People can always do dynamic stuff on the server side. You could imagine a CMS that was implemented within the web server would not do 301s and 302s, but it would get pretty complex and it would be pretty error prone. Eric Enge: Can you give a brief overview of the canonical tag? Matt Cutts: There are a couple of things to remember here. If you can reduce your duplicate content using site architecture, that's preferable. The pages you combine don't have to be complete duplicates, but they really should be conceptual duplicates of the same product, or things that are closely related. People can now do cross-domain rel=canonical, which we announced last December. For example, I could put up a rel=canonical for my old school account to point to my mattcutts.com. That can be a fine way to use rel=canonical if you can't get access to the web server to add redirects in any way. Most people, however, use it for duplicate content to make sure that the canonical version of a page gets indexed, rather than some other version of a page that you didn't want to get indexed. Eric Enge: So if somebody links to a page that has a canonical tag on it, does it treat that essentially like a 301 to the canonical version of the page? Matt Cutts: Yes, to call it a poor man's 301 is not a bad way to think about it. If your web server can do a 301 directly, you can just implement that, but if you don't have the ability to access the web server or it's too much trouble to setup a 301, then you can use a rel=canonical. It's totally fine for a page to link to itself with rel=canonical, and it's also totally fine, at least with Google, to have rel=canonical on every page on your site. People think it has to be used very sparingly, but that's not the case. We specifically asked ourselves about a situation where every page on the site has rel=canonical. As long as you take care in making those where they point to the correct pages, then that should be no problem at all. Eric Enge: I think I've heard you say in the past that it's a little strong to call a canonical tag a directive. You call it "a hint" essentially. Matt Cutts: Yes. Typically, the crawl team wants to consider these things hints, and the vast majority of the time we'll take it on advisement and act on that. If you call it a directive, then you sort of feel an obligation to abide by that, but the crawling and indexing team wants to reserve the ultimate right to determine if the site owner is accidentally shooting themselves in the foot and not listen to the rel=canonical tag. The vast majority of the time, people should see the effects of the rel=canonical tag. If we can tell they probably didn't mean to do that, we may ignore it. Eric Enge: The Webmaster Tools "ignore parameters" is effectively another way of doing the same thing as a canonical tag. Matt Cutts: Yes, it's essentially like that. It's nice because robots.txt can be a little bit blunt, because if you block a page from being crawled and we don't fetch it, we can't see it as a duplicative version of another page. But if you tell us in the webmaster console which parameters on URLs are not needed, then we can benefit from that. Eric Enge: Let's talk a bit about KML files. Is it appropriate to put these pages in robots.txt to save crawl budget? Matt Cutts: Typically, I wouldn't recommend that. The best advice coming from the crawler and indexing team right now is to let Google crawl the pages on a site that you care about, and we will try to de-duplicate them. You can try to fix that in advance with good site architecture or 301s, but if you are trying to block something out from robots.txt, often times we'll still see that URL and keep a reference to it in our index. So it doesn't necessarily save your crawl budget. It is kind of interesting because Google will try to crawl lots of different pages, even with non-HTML extensions, and in fact, Google will crawl KML files as well. What we would typically recommend is to just go ahead and let the Googlebot crawl those pages and then de-duplicate them on our end. Or, if you have the ability, you can use site architecture to fix any duplication issues in advance. If your site is 50 percent KML files or you have a disproportionately large number of fonts and you really don't want any of them crawled, you can certainly use robots.txt. Robots.txt does allow a wildcard within individual directives, so you can block them. For most sites that are typically almost all HTML with just a few additional pages or different additional file types, I would recommend letting Googlebot crawl those. Eric Enge: You would avoid the machinations involved if it's a small percentage of the actual pages. Matt Cutts: Right. Eric Enge: Does Google do HEAD requests to determine the content type? Matt Cutts: For people who don't know, there are different ways to try to fetch and check on content. If you do a GET, then you are requesting for the web server to return that content. If you do a HEAD request, then you are asking the web server whether that content has changed or not. The web server can just respond more or less with a yes or no, and it doesn't actually have to send the content. At first glance, you might think that the HEAD request is a great way for web search engines to crawl the web and only fetch the pages that have changed since the last time they crawled. It turns out, however, most web servers end up doing almost as much work to figure out whether a page has changed or not when you do a HEAD request. In our tests, we found it's actually more efficient to go ahead and do a GET almost all the time, rather than running a HEAD against a particular page. There are some things that we will run a HEAD for. For example, our image crawl may use HEAD requests because images might be much, much larger in content than web pages. In terms of crawling the web and text content and HTML, we'll typically just use a GET and not run a HEAD query first. We still use things like If-Modified-Since, where the web server can tell us if the page has changed or not. There are still smart ways that you can crawl the web, but HEAD requests have not actually saved that much bandwidth in terms of crawling HTML content, although we do use it for image content. Eric Enge: And presumably you could use that with video content as well, correct? Matt Cutts: Right, but I'd have to check on that. Eric Enge: To expand on the faceted navigation discussion, we have worked on a site that has a very complex faceted navigation scheme. It's actually a good user experience. They have seen excellent increases in conversion after implementing this on their site. It has resulted in much better revenue per visitor which is a good signal. Matt Cutts: Absolutely. Eric Enge: On the other hand what they've seen is that the number of indexed pages has dropped significantly on the site. And presumably, it's because these various flavors of the pages which are for the most part just listing products in different orders essentially. The pages are not text rich; there isn't a lot for their crawler to chew on, so that looks like poor quality pages or duplicates. What's the best way for someone like that to deal with this. Should they prevent crawling of those pages? Matt Cutts: In some sense, faceted navigation can almost look like a bit of a maze to search engines, because you can have so many different ways of slicing and dicing the data. If search engines can't get through the maze to the actual products on the other side, then sometimes that can be tricky in terms of the algorithm determining the value add of individual pages. Going back to some of the earlier advice I gave, one thing to think about is if you can limit the number of lenses or facets by which you can view the data that can be a little bit helpful and sometimes reduce confusion. That's something you can certainly look at. If there is a default category, hierarchy, or way that you think is the most efficient or most user-friendly to navigate through, it may be worth trying. You could imagine trying rel=canonical on those faceted navigation pages to pull you back to the standard way of going down through faceted navigation. That's the sort of thing where you probably want to try it as an experiment to see how well it worked. I could imagine that it could help unify a lot of those faceted pages down into one path to a lot of different products, but you would need to see how users would respond to that. Eric Enge: So if Googlebot comes to a site and it sees 70 percent of the pages are being redirected or have rel=canonical to other pages, what happens? When you have a scenario like that, do you reduce the amount of time you spend crawling those pages because you've seen that tag there before? Matt Cutts: It's not so much that rel=canonical would affect that, but our algorithms are trying to crawl a site to ascertain the usefulness and value of those pages. If there are a large number of pages that we consider low value, then we might not crawl quite as many pages from that site, but that is independent of rel=canonical. That would happen with just the regular faceted navigation if all we see are links and more links. It really is the sort of thing where individual sites might want to experiment with different approaches. I don't think that there is necessarily anything wrong with using rel=canonical to try to push the search engine towards a default path of navigating through the different facets or different categories. You are just trying to take this faceted experience and reduce the amount of multiplied paths and pull it back towards a more logical path structure. Eric Enge: It does sound like there is a remaining downside here, that the crawler is going to spend a lot of it's time on these pages that aren't intended for indexing. Matt Cutts: Yes, that's true. If you think about it, every level or every different way that you can slice and dice that data is another dimension in which the crawler can crawl an entire product catalogue times that dimension number of pages, and those pages might not even have the actual product. You might still be navigating down through city, state, profession, color, price, or whatever. You really want to have most of your pages have actual products with lots of text on them. If your navigation is overly complex, there is less material for search engines to find and index and return in response to user's queries. A lot of the time, faceted navigation can be like these layers in between the users or search engines, and the actual products. It's just layers and layers of lots of different multiplicative pages that don't really get you straight to the content. That can be difficult from a search engine or user perspective sometimes. Eric Enge: What about PageRank Sculpting? Should publishers consider using encoded Javascript redirects of links, or implementing links inside iframes? Matt Cutts: My advice on that remains roughly the same as the advice on the original ideas of PageRank Sculpting. Even before we talked about how PageRank Sculpting was not the most efficient way to try to guide Googlebot around within a site, we said that PageRank Sculpting was not the best use of your time because that time could be better spent on getting more links to and creating better content on your site. PageRank Sculpting is taking the PageRank that you already have and trying to guide it to different pages that you think will be more effective, and there are much better ways to do that. If you have a product that gives you great conversions and a fantastic profit margin, you can put that right at the root of your site front and center. A lot of PageRank will flow through that link to that particular product page. Site architecture, how you make links and structure appear on a page in a way to get the most people to the products that you want them to see, is really a better way to approach it then trying to do individual sculpting of PageRank on links. If you can get your site architecture to focus PageRank on the most important pages or the pages that generate the best profit margins, that is a much better way of directly sculpting the PageRank then trying to use an iFrame or encoded JavaScript. I feel like if you can get site architecture straight first, then you'll have less to do, or no need to even think about PageRank Sculpting. Just to go beyond that and be totally clear, people are welcome to do whatever they want on their own sites, but in my experience, PageRank Sculpting has not been the best use of peoples' time. Eric Enge: I was just giving an example of a site with a faceted navigation problem, and as I mentioned, they were seeing a decline in the number of index pages. They just want to find a way to get Googlebot to not spend time on pages that they don't want getting in the index. What are your thoughts on this? Matt Cutts: A good example might be to start with your ten best selling products, put those on the front page, and then on those product pages you could have links to your next ten or hundred best selling products. Each product could have ten links, and each of those ten links could point to ten other products that are selling relatively well. Think about sites like YouTube or Amazon; they do an amazing job of driving users to related pages and related products that they might want to buy anyway. If you show up on one of those pages and you see something that looks really good, you click on that and then from there you see five more useful and related products. You are immediately driving both users and search engines straight to your important products rather than starting to dive into a deep faceted navigation. It is the sort of thing where sites should experiment and find out what works best for them. There are ways to do your site architecture, rather than sculpting the PageRank, where you are getting products that you think will sell the best or are most important front and center. If those are above the fold things, people are very likely to click on them. You can distribute that PageRank very carefully between related products, and use related links straight to your product pages rather than into your navigation. I think there are ways to do that without necessarily going towards trying to sculpt PageRank. Eric Enge: If someone did choose to do that (JavaScript encoded links or use an iFrame), would that be viewed as a spammy activity or just potentially a waste of their time? Matt Cutts: I am not sure that it would be viewed as a spammy activity, but the original changes to NoFollow to make PageRank Sculpting less effective are at least partly motivated because the search quality people involved wanted to see the same or similar linkage for users as for search engines. In general, I think you want your users to be going where the search engines go, and that you want the search engines to be going where the users go. In some sense, I think PageRank Sculpting is trying to diverge from that. If you are thinking about taking that step, you should ask yourself why you are trying to diverge and send bots in a different location than users. In my experience, we typically want our bots to be seen on the same pages and basically traveling in the same direction as search engine users. I could imagine down the road if iFrames or weird JavaScript got to be so pervasive that it would affect the search quality experience, we might make changes on how PageRank would flow through those types of links. It's not that we think of them as spammy necessarily, so much as we want the links and the pages that search engines find to be in the same neighborhood and of the same quality as the links and pages that users will find when they visit the site. Eric Enge: What about PDF files? Matt Cutts: We absolutely do process PDF files. I am not going to talk about whether links in PDF files pass PageRank. But, a good way to think about PDFs is that they are kind of like Flash in that they aren't a file format that's inherent and native to the web, but they can be very useful. In the same way that we try to find useful content within a Flash file, we try to find the useful content within a PDF file. At the same time, users don't always like being sent to a PDF. If you can make your content in a Web-Native format, such as pure HTML, that's often a little more useful to users than just a pure PDF file. Eric Enge: There is the classic case of somebody making a document that they don't want to have edited, but they do want to allow people to distribute and use, such as an eBook. Matt Cutts: I don't believe we can index password protected PDF files. Also, some PDF files are image based. There are, however, some situations in which we can actually run OCR on a PDF. Eric Enge: What if you have a text-based PDF file rather than an image-based one? Matt Cutts: People can certainly use that if they want to, but typically I think of PDF files as the last thing that people encounter, and users find it to be a little more work to open them. People need to be mindful of how that can affect the user experience. Eric Enge: With the new JavaScript processing, what actually are you doing there? Are you actually executing JavaScript? Matt Cutts: For a while, we were scanning within JavaScript, and we were looking for links. Google has gotten smarter about JavaScript and can execute some JavaScript. I wouldn't say that we execute all JavaScript, so there are some conditions in which we don't execute JavaScript. Certainly there are some common, well-known JavaScript things like Google Analytics, which you wouldn't even want to execute because you wouldn't want to try to generate phantom visits from Googlebot into your Google Analytics. We do have the ability to execute a large fraction of JavaScript when we need or want to. One thing to bear in mind if you are advertising via JavaScript is that you can use NoFollow on JavaScript links. Eric Enge: If people do have ads on your site, it's still Google's wish that people NoFollow those links, correct? Matt Cutts: Yes, absolutely. Our philosophy has not changed, and I don't expect it to change. If you are buying an ad, that's great for users, but we don't want advertisements to affect search engine rankings. For example, if your link goes to a redirect, that redirect could be blocked from robots.txt, which would make sure that we wouldn't follow that link. If you are using JavaScript, you can do a NoFollow within the JavaScript. Many, many ads use 302s, specifically because they are temporary. These ads are not meant to be permanent, so we try to process those appropriately. Our stance has not changed on that, and in fact we might put out a call for people to report more about link spam in the coming months. We have some new tools and technology coming online with ways to tackle that. We might put out a call for some feedback on different types of link spam sometime down the road. Eric Enge: So what if you have someone who uses a 302 on a link in an ad? Matt Cutts: They should be fine. We typically would be able to process that and realize that it's an ad. We do a lot of stuff to try to detect ads and make sure that they don't unduly affect search engines as we are processing them. The nice thing is that the vast majority of ad networks seem to have many different types of protection. The most common is to have a 302 redirect go through something that's blocked by robots.txt, because people typically don't want a bot trying to follow an ad, as it will certainly not convert since the bots have poor credit ratings or aren't even approved to have a credit card. You don't want it to mess with your analytics anyway. Eric Enge: In that scenario, does the link consume link juice? Matt Cutts: I have to go and check on that; I haven't talked to the crawling and indexing team specifically about that. That's the sort of thing where typically the vast majority of your content is HTML, and you might have a very small amount of ad content, so it typically wouldn't be a large factor at all anyway. Eric Enge: Thanks Matt! Matt Cutts: Thank you Eric! Have comments or want to discuss? You can comment on the Matt Cutts interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Tue, 09 Feb 2010 23:42:26 +0100 Published: February 7, 2010
Chris Baggott began his career doing database marketing in the catalog industry. his frustration was that in spite of a terrific insight into customer behavior, he was stuck sending a batch of similar books to everyone regardless of the specifics related to individual data. This led him to co-found ExactTarget the worlds leading Data-driven Email Service Provider. As the company evolved he realized that although Email is perfect for building dialog and relationships with people you already know, it does nothing for acquisition. The number one online activity other than email is search, and this led him to found Compendium Blogware. Interview Transcript Eric Enge: Tell us a little bit about your background. Chris Baggott: I am the founder of Compendium Blogware, and also of a company called ExactTarget. My background is in database marketing, and I started my career with RR Donnelley in the catalogue business. During those years, the idea of data-driven communications consisted of putting pages of bathing suits in catalogues going to Wisconsin in February, that was big time stuff back in late eighties and early nineties. I gradually got into some retail work, building databases and tripping over email as a way to leverage data. One of the things that makes ExactTarget unique is the fact that it was one of the first companies on the web to have an API. The idea that email and communication should be relevant to individual customers was a novel concept at the time. If you think about the early days of email, it was about getting the biggest list possible, and what I call old fashion reach frequency marketing. All of a sudden, I had this media that is basically cheap paper, and if I send ten million emails, I get x new customers, and if I send fifty million, I get y. Eric Enge: Like TV commercials. Chris Baggott: Yes. It's American Airlines sending me e-mails about weekend escapes, where there is a laundry list of 50 trips, none of which ever originate in Indianapolis. I am sure it had a good ROI, but with the ExactTarget model, we've always been about the data. We want to know how we can talk to the right person at the right time with the right message. I may send some people 50 emails a year, and I may send others three emails a year. I may also make more money on the people I send three to, however, so it's all about data-driven communication. Sometime around 2006, we started focusing on the problem of acquisition, where email is the greatest tool ever invented for maintaining and building relationships with large volumes of people, and search is the greatest way to acquire people. I am not going to ask people if they want to buy a pizza four times, but if they search and say they want to buy a pizza, all I have to do is show up with a compelling offer and I get the chance of winning their business. Eric Enge: The way I always describe search engine optimization to people is that it's not about finding new customers for your business, it's about enabling people who are interested in the products and services your business carries to find you. Chris Baggott: Right. It's about listening, and that's kind of the way we approach it. If you think about keywords, they are essentially people telling you what they want. If you want to win a business and build a relationship, you need to be listening very, very carefully, and then showing up. That's how we've approached the idea of Compendium, and SEO, by listening in a lot of places. If we listen very carefully across hundreds or thousands of different spots and we are able to deliver our message to that spot, we have a very good chance of initiating a relationship. Eric Enge: Let's step back to email marketing for just another minute or two. You gave a little bit of an overview, but what's happening today in terms of response rate declines and snow blindness that people get in emails? Are there problems with emails getting filtered and all those sorts of things, and what are the new things that are going on in the email marketing side of things? Chris Baggott: All of these are trends that are forcing marketers to pay closer attention to their customers as individuals. If you think about a broad term like social media, it just means human beings. Peppers and Rogers wrote this great book back in 1992 called The One to One Future. It was about the dream that we can talk to people in this medium like normal human beings. We have been corrupted all these years in thinking consumers are not humans, but are just a brand. The world hates that. Zig Ziglar said people buy from people back in the mid 20th century. This is one thing for the owner of a mom and pop shop who can talk to his customers over the counter, but when those managing thousands or millions of relationships, it doesn't scale very well. What's wonderful about the era we live in is that there are various tools which actually allow me to listen carefully and collect data. I can tailor my conversation because now we have access to the inexpensive data customers readily give us. We've got to respect that and use these new tools to communicate and build relationships. Email is still the most important and best way to do that, and that's the trend you see. We are getting away from people talking about their list size. I can remember calling the second largest retailer in the world, which is in the home improvement space, and they had a database of about six million addresses. Since they are the second largest retailer in the world, they wanted to build a database with 30 million addresses in the next year. The reality was that they never got 30,000,000 email addresses, but they make huge sums of money because they respect their customers. They only talk to them when they have something to say, so I think those are some of the big trends you are seeing. On the downside, all this other social media is really starting to cloud my inbox. The biggest piece of spam I get now comes from Facebook and Twitter asking me to join groups and things like that. I mean, I get 20 or 30 of those a day, and it's not relevant to me. Eric Enge: The good thing is that corporate marketers like the one you mentioned are a lot more educated about the downside of just blasting crap out there. As regards social media, the problem is that the numbers are still small, and it is not yet fully adapted by the broad public. Chris Baggott: I was in a meeting the other day with all of the agency partners of one particular retailer. We had a three-hour scheduled meeting, 45 minutes of which was sucked up by some woman talking about the 2,900 fans they had on Facebook, and everyone was enamored with it. I thought it was ridiculous, because they spend $16 million a week on newspaper inserts, and they are talking about 2,900 people on Facebook like it's some kind of win. It's just bizarre, but it happens because marketers want new and shiny. We are seeing trends now where people are starting to get more ROI focused, and social media is a big place for that. Last year, they just wanted to chase the shiny thing, now they are asking how it is helping anybody and starting to talk more about ROI? Eric Enge: If you look at Twitter. you'll see all these articles about the tons of traffic people are getting from Twitter every day. But then if you dig into it, the wild success stories only involve getting hundreds of visitors a day. I don't give a crap about a hundreds of visitors a day; I build sites that get thousands, tens of thousands, and hundreds of thousands of visitors a day. Chris Baggott: Well, that's right. Last year it was kind of a tough fight, and now we are seeing that turn where we are starting to question what the goals and biggest benefits of social media are. You get the things like blogging and you start talking about traffic and who the audience is. It's still growing very, very quickly, and it's getting smarter and smarter. All these companies are coming into the space are focusing on data. Unica, which is database company, not an email company, does email for its clients, and it was just recently announced that they bought Pivotal Veracity for $17,500,000. Pivotal Veracity is about deliverability; it's an inbox company. For a publicly-traded company like Unica to step up and say that legitimately getting good email into inboxes is a very important business space says a lot about the direction of the industry. Eric Enge: So the transition is slow in coming, but it is coming. Chris Baggott: Everyone has a lot of choice, but at the end of the day, email is still ridiculously effective, especially when used properly. The retail company I was just describing is running this simple email program right now. They trigger an email because they are trying to drive store traffic by offering a free screen cleaner and $100 off a Blu-ray player. The only people who are getting this email are the people who recently bought flat screen TVs, and it is an amazing program. The numbers are just phenomenal. Customers come in for the free screen cleaner, and they walk out with a brand new Blu-ray player. I am sure once they get this beautiful TV home and then hook it up to a crappy DVD player, they want something better. The old model used to involve just sending ten million offers of $100 off a Blu-ray player to everybody, and they would be wasting their time. Now since they are only sending the right message to the right people, everybody is happy. Eric Enge: The last element of that is that companies have to realize that they only have a limited number of offers they can send people via email before they just shut down. Whereas the scenario you just described probably said something like, "we want to say thanks for buying a new hi-def TV." Something that ties it to the event to get customers to recognize it and want to open the email, and then you give them a gift like the one you just talked about. Chris Baggott: Right. It is database marketing, direct marketing one-on-one. What salesmen were doing back in the fifties was tough was to give customers a free gift, have a great catch line, engage people, get them excited and emotionally involved, and make them happy to do business with you. People buy from people, and people buy from people they like. The only thing really different now is that the tools are better. A lot of the stuff Peppers and Rogers were talking about in 1992 was before anyone could even imagine the Internet. Eric Enge: Right. They understood the people. Chris Baggott: Exactly. Eric Enge: Say the publisher of a website gets millions of visitors a month. Any tips on how they should go about building their mailing list from their audience? Chris Baggott: I wrote a book on this, called Email Marketing by the Numbers. These publishers have to make it work the customers' way, because specific offers are finite. This home improvement retailer I was talking about ran a very successful program from an acquisition standpoint, where they offered their customers the opportunity to sign up for twelve days of offers, right before Christmas. This was about four years ago, but it worked phenomenally well. They sent out twelve emails, one every day, and every single email they sent was another opportunity for the customer to engage at another level. They are opting in for a minor commitment before they make a big commitment, so the company had to compel its customers during those twelve days in order to take another offer. This is typical in the B2B world, and you do the same thing with whitepapers and webinars, things that are short-term commitments. I don't want to get your newsletter or be on your list. The whole model where I have to signup for its newsletter and then get its email every week is outdated. It's the same thing as trying to get me to join your Facebook club. I don't need that. I want the company to solve my problem right now. If I come to a site looking for a digital camera, I want a digital camera. Offering me ten tips for taking great seasonal photos or some kind of curriculum marketing like that, which is compelling and is a small commitment leading to a bigger relationship, is a much better option. Eric Enge: Talk a little bit about Compendium and what you are doing there. Chris Baggott: Compendium is a platform designed to take an organization's blog content and direct it towards Search Engine Optimization. I previously had this opportunity in my life in ExactTarget, where I had this blog called Chris Baggott's Email Marketing Best Practices. It was a very good blog, as it got me a book deal with Wiley, and one with Forbes Magazine. It was successful from a thought leadership standpoint, which was important back in 2002 and 2003. Then I found myself doing some SEO and trying to rank for terms like "list building strategies", for example. I talk about list building strategies in my blog about 200 times, but you'll never find my blogs using search. I thought if I had a blog called "list building strategies", and if I put all of my content about list building strategies on that blog, I would probably rank highly on that term as well. A compendium by definition is a collection of similar writings. In our system, a blogger who is typically an employee of a company creates content, and we incorporate a keyword suggestion tool and a keyword strength meter. They usually know the keywords that the company wants them to be using as a part of their normal language. We are definitely not trying to get into keyword stuffing, just good basic keyword rich content. This keyword meter basically is a little bar that goes from red to green, so if someone writes a good blog post, it's going to turn green. When that post is submitted, our system has an algorithm that reads that content, understands the keywords in the taxonomy, and applies it to what we call a compended blog. If I am writing about "list building strategies" and I've got a blog called "list building strategies", I am going to apply that content to the blog "list building strategies". We are organizing the content of the organization around topics and keywords, as opposed to organizing it around authors. Nobody really cares about the author, and that's kind of a big shock to a lot of bloggers. If I've got customer service people blogging, ultimately as a searcher I care about my problem and how credible the author is as a source to solve my problem. If a blogger is telling stories about how they solve problems for people like me all day long, I am going to land on that page, and I don't really care who the author is. I care about the company, the solution, the products, the services and things like that. We are just about to launch a study we've been doing, called "who is your audience? " This really comes down to traffic source. So many blogs talk about audience as if you have readers who will be coming back to the site tomorrow. The reality is that the vast majority of business blog traffic is coming from first time visitors. They are coming either through search or through referrals, and they are most likely not coming back. They are first time visitors, and they just want to solve their problem. They are not going to subscribe and they are not going to take your feed, because you've already helped them with what they needed. That's the idea behind Compendium. Our clients are almost anybody, and we consider ourselves a database marketing company, data-driven where we are trying to deliver a message. We look at a keyword as a mailbox, and I am going to put a message into that mailbox and then wait for somebody to come by and open it. I know everything about that person except their name. I usually can tell the geography, the volume, the conversion rates, I just don't know the name of the person. If I put my message in that mailbox, I know that 40,000 people a week or whatever the number is are going to come by and open it. Eric Enge: Basically you are enabling employees of a company to participate in generating content. And your tool then automatically categorizes and routes it to the right place? Chris Baggott: Exactly. Eric Enge: That gives you a lot of leverage, first of all by enabling a broader group of people to participate in content generation. From a search perspective, you are automatically grouping it in relevant ways and not relying on a user to manually pick categories. Chris Baggott: Exactly. I saw this thing by Vanessa Fox a couple of years ago, where she said search engines looking for that is about one thing. The problem with most blogs is that usually they are about lots of things, so people have to tag and categorize in many different ways. We need to take all that away and just build the page, so it becomes only about one thing. Say I am a Ford dealer and I have a page called Mustang, one called pickup truck, one called sports car, and another called Minivan. In a normal Ford blog, they would just write all that content on one page and categorize and organize it. In our system, this is all done automatically so there is a page that's only about Mustang. Eric Enge: You must deal with conflicts between the categorizations so you have a post that could potentially go in more than one blog? Chris Baggott: Oh, that's guaranteed. That's actually a desirable feature, so we make sure that everything has a canonical tag and we are clearly pointing back to what the original version was. Let's go back to my Ford example. Mustang is a sports car, so if I write a post about a Mustang, I want that content to go to Mustang. I don't want it to go to pickup truck, but I do want it to go to sports car. Now, I might write about the Ford F-150 sport version, which is a pickup truck but it might also be considered a sport car. Eric Enge: So what you do then is put it in both places and then implement a canonical tag so that the relevance occurs the way you want. So let's talk about corporate blogging. From a company's point of view, let's just talk about the key concepts and benefits to them for engaging in this. Chris Baggott: It starts with the whole idea of wanting to be in social media. People come to us because they feel like they are missing something and they should be doing this, but they are not sure why or how. As a software-as-a-service solution, there is no technology that our clients need to worry about. One of the main reasons employees and customers don't blog is because they don't trust bloggers, so this is why we have a compliance layer. This way, when content gets created, it runs through a workflow, and we can direct it to various places. Marketing and legal can have approval rights, and no content gets approved until it goes through this workflow. That makes the corporation and the businesses very comfortable by freeing up a lot of people to write content. On the other side, everyone wants to be participating in social networking, and that really comes down to freeing up people and being transparent. How can I be transparent in a controlled environment, freeing up the people who actually are on the frontlines to tell the stories? The great element of this is the whole compending process, which organizes the content to help it rank better in search engine, but also drives a lot of user engagement. We see really, really low bounce rates and really high click through rates because the stories are compelling. Paula Berg of Southwest always talks about the blog being the hub of the social media strategy. If the blog is the hub, I need to feed this out lots of ways like Twitter and Facebook. The vast majority of traffic is going to come through search, but that doesn't mean Twitter traffic is unimportant. It's still worth doing, and by having this hub, you get all the benefits of a central place to have your message, and then a launching pad to get it out through lots of different mediums, including search, which is the most important. Eric Enge: Corporate blogging is one component of social media. There are a lot of other components, ranging from forums to Twitter to Facebook and YouTube, and all these sorts of things. Do you have some thoughts that you can share on how a company can engage with all this stuff? Chris Baggott: Well, there are lots of different ways. We are working with a large travel company right now, and we have not launched this project yet so I really can't talk about it. But the concept, which I think is great, is that it's one thing to get people to follow the company, but if I can get the customer to share his or her thoughts, it's even better. If I book a trip to Hawaii, for example, this company is going to send me an email saying they would love to hear about my family's trip to Hawaii. They are going to give me this place to post my pictures and basically write a blog post. Once I hit submit, it goes through approval, goes live, and triggers an email back to me thanking me for my post and telling me they just featured my trip to Hawaii on their blog. This content is going to help them win searches on Hawaii, because they are going to get thousands of stories about Hawaii, just as an example. Secondarily, I am going to share that myself. I am going to email all 50 of my relatives and 500 of my friends. I am going to push it out through my Twitter feed, and I am going to feature it on my Facebook bragging that I just got featured on XYZ travel company's website. This word of mouth that I am providing for the company through my own social channels is going to provide a nice, healthy glow for the company and add measurable ROI. They are going to focus on conversion, and that's one of the big things that we haven't talked about. Companies are really starting to focus on conversion of their blog traffic. Corporate blogs are usually dead ends. There is no place to go, and there is nothing to do. The last time I was at Best Buy's blog, I could find it by typing in something about 1080 DPI television sets, but there was no way for me to actually buy a 1080 DPI television set through that blog. So companies are getting a lot smarter about that. Eric Enge: The nice thing about the email scenario you talked about was when somebody buys a product, or a trip in this case, and the company emails the customer asking them to talk about their trip, the company is creating that point of engagement, and then the social thing happens. Chris Baggott: Right. I don't care how many followers I have on Twitter anymore, it's a whole different dynamic. My wife has 500 email addresses, she is on a hospital board and we have four kids in school activities. She is not on Facebook or Twitter, but trust me she has an extensive social network through her AOL email address book. If you feature her new $25,000 kitchen on your blog, she doesn't care if it is a blog or a website. She is going to send a hundred emails out to all her friends telling them to check out her new kitchen, which was featured on this retailer's site. Again, it's not about how many followers this retailer has, and it's not about publishing it on their Twitter feed, it's about using this as an acquisition tool by getting her to help the company sell its product. If a company makes its customers really happy by making them feel famous and special, that's a relationship, and the customers are going to do something nice for the company too. I wish it wasn't called a social network. I wish it was called the human network, because it's about being human, and human beings listening to each other. I love going back to Zig Ziglar, and another quote I use all the time is one from Tom Hopkins, who said that the #1 sales tactic in the history of mankind is the similar situation sale. If you tell me how you've solved a problem like mine for someone just like me, I am going to trust you to solve my problem. Think about corporate blogging and freeing up employees to tell similar situation stories all day long. The odds of you telling a story about someone like me becomes pretty good. We try and humanize whatever we do. If I am a customer and I talk to someone from a company on the phone, how come my emails don't come from that person? Well, they should. Those are easy, data-driven touches that just make it human. They are going to send me a follow-up email, and they are going to send me a confirmation email anyway. Why just send me a blank piece of text telling me that what I ordered will be delivered on this day? Why not send me a nice, engaging, human-based email from the person I dealt with within the store. If Bob sold me the TV, how come Bob isn't sending me the email confirming my delivery? So I really prefer the term human over social, but I guess they are getting to be interchangeable. Eric Enge: People tend to think of Facebook as a network, and that's constraining because the people you reach probably use dozens of networks. Obviously, the web has enabled an awful lot of new ways for people to connect from social media perspectives. I have a 14-year-old daughter who basically doesn't use email anymore, because email is for old people. She texts, and that's her primary mode of communication. If she actually uses the phone as a device to actually call someone, it's pretty unusual. I also have a 16-year-old son who communicates with his friends through Facebook. So when we look at all this, what are your thoughts on how social media has remade the Internet? Chris Baggott: I have kids the exact same ages, so I see the exact same phenomena. There are going to be so many different kinds of media, and you've got to have a way to distribute the right message to the right people at the right time. I also think that the world that you and I play in right now, which is search, is becoming more and more important as well. Kids can communicate and ask questions amongst each other, but when they seek something, they are immune to marketing messages. They are immune to advertising, but they are very well-trained to go to Google and ask for what they want. My 9-year-old uses Google searches for his homework. I was really surprised when comScore came out with its report on the number of searches people do a day. I was thinking that that number might start going down, but it was up 47% between 2007 and 2009. People are searching more than ever, and every business has to consider that as a real important media. Eric Enge: Marketers talked about the potential for a paperless office for years, where computing and electronic communication was supposed to eliminate the need for paper. Guess what, we continue year after year to consume more paper than ever before. What we did was actually enable the consumption of more paper. Perhaps social media will enable more search. Chris Baggott: The other point is that my daughter, who claims she doesn't use email, gets an email every time someone posts on her Facebook wall. I see that she has twenty people writing something to her on her wall everyday, so I know she is using email because each one of those triggers an email. It's kind of a myth that email is going anywhere. The other interesting aspect to consider is where privacy really plays in with all of this. Everyone was up in arms about Mark Zuckerberg's statement a couple of days ago that Facebook doesn't care about privacy anymore, or something like that. Everyone complaining was like 40, however, and the reality is that my daughter will probably never care about privacy. She has pretty much everything out there in a healthy way, and she is not worried about it. Maybe people my age care a little bit about personal privacy, but these kids, they don't care about that stuff. Are they going to grow to care about it? I don't think so, but it will be interesting to watch. Eric Enge: Most people will feel that the convenience outweighs the fear. Chris Baggott: As marketers, if we can take that information and make better relationships, why not do it? I've always said that I'll give merchants any piece of data they need to make my life better. I don't care if American Airlines wants my toenails and my fingerprints, as long as they can get me through security faster. All those things are coming true too. Our lives are getting much easier, and that's because we give up this privacy, and the trade off is that things get better for us. We'll see where that goes, but I think it really is an interesting phenomena. I did this talk last year at Indiana University to the graduating seniors, and I I talked about the unintended consequences of the Internet. hen it comes to Facebook, many employers want to look at an applicant's Facebook page before they hire them, and a lot of students today feel like that's not fair. What I say is going to give an employer a better idea of what the person is like, a piece of paper that says they had a 3.7 GPA and were in Chess Club, or their Facebook page? Eric Enge: Of course there is the example of the woman who was about to receive a job offer until the employer discovered she was running an S&M site, and then suddenly thought better of it. Chris Baggott: Those kinds of things are going to really raise some interesting questions about all this too. We talk about email and idea of delivering, and it's funny in the catalog business because I know what products people bought. Why do I have to put only one product in the catalogue? If I know someone buys plaid shirts and I buy sweater vests, shouldn't my model have a sweater vest model in it, and yours have a plaid shirt? That kind of personalization where I see someone just like me in my body size, shape, color, age, all that stuff helps conversion because it helps build affinity with the customers. I am obviously very excited about all this stuff. Eric Enge: What about SEO? How is that going to evolve in this environment? Chris Baggott: I hope SEO is getting more content-driven, more personalized and more relevant. If you write good content and people like it, you are going to get links in our model. We don't do anything proactively as a link building service for our clients, we don't do anything on linking. We are simply helping them generate a good, healthy volume of the right kind of content, and then organize that content to get in front of lots and lots of keywords. A lot of customers have hundreds to thousands of blogs, and feeding that content now is the ability to personalize search. That's going to become more keyword centric, and the idea of a long-tail is getting longer and longer. To me, that's a good thing, and it's certainly a healthy thing for our business. We think that's a direction the market is going in, as it gets more recency focused and more content-focused. When Chris Baggott makes a search and you make a search at a different page, it still might be the same company. Recency is coming to be a big thing, but I worry about it becoming too spamy. I think you've got to be careful about things. People talk about Twitter search and that whoever talks the most wins. How is that helping anybody? What's nice about Google at least is there is some semblance of intelligence. Eric Enge: Well Google actually made some statements recently about how they were ranking tweets, and two things emerged. Part of it is how many followers the person has, but it's also about how reputable their followers are. Those are two big ranking factors, so they look past just like a simple count of followers. Chris Baggott: Sure, but the important thing is how that scales. If I am doing a search on digital cameras, I could see the big searches, but again you start getting further and further out on the trail and there is no way to know which is the most reputed one. It's going to be interesting. What I do know is if I create a page called digital cameras in Columbus, and I talk about digital cameras in Columbus, I have a really good shot of winning that search. One retailer we are working with has 3,200 blogs targeting 3,200 keywords. They are in the top ten on 2,400 of those pages, and they are #1 on almost 800 of them. They've only been at this for about four weeks, so that strategy obviously works, and I think will continue to work as long as they can continue to feed good, relevant content and get people engaged. Eric Enge: One of the challenges is what to write about for digital cameras in Columbus that differs from digital cameras in Cincinnati? Chris Baggott: Right, exactly, and if it is a factor of simply organizing it on a page. One of things we are going to start doing is making each of their six-page newspaper inserts 'a blog post.' If I am in Columbus, and make a search for a digital camera, I probably want a digital camera in Columbus and if I have a recent ad or a coupon for a special on digital cameras in Columbus, that's good content that the searcher probably wants. I may or may not want to read a review, and reviews don't scale, so I can probably have a review as part of the sidebar comment. Ultimately, I am probably looking for a coupon or a deal and a source that's credible for me to find this Canon EOS or whatever it is I am looking for. The question then becomes how different does that content need to be between Cincinnati and Columbus, and can it be the same content, just with a different title to the page? Eric Enge: Well from a search perspective, that's basically seen as a duplicate page. Chris Baggott: Right. But again, you have a canonical version of the original post. So, the main blog has the content, and then we'll also place it out here in these category blogs if you will. The question is, do I need to have some copy to go with that ad saying if you are shopping for a camera in Columbus, Ohio, here is our special today at XYZ Retailer. Title matters a lot, and so does competition. We are seeing that a lot in these examples, and I am not saying that's right or wrong, but that's what's happening. If someone puts a small business website online or on Online Yellow Pages, and somebody else is competing and putting up new pieces of fresh content every single day on that page, the more frequent guy is usually going to win. Eric Enge: It's going to make for a world while we see all this stuff sorted out. When I was growing up, it seemed like disruptive events happened maybe every decade or so. Today, the pace of disruptive events has greatly accelerated, with Google and other social media sites. My own conjecture is that the pace is actually going to continue to accelerate. Chris Baggott: I agree. Eric Enge: This makes for an environment where no one knows what the world is going to look like in our industry two years from now. I don't think there is anyone who honestly has a clue. Chris Baggott: No, that's right. It's all about these technologies. A successful business is one that makes its customers really, really happy. Every technology needs to be looked at under the lens of making people really happy. You get what you want in this world by helping other people get what they want in this world. Just think if you keep a focus on the customer and on customer satisfaction, then you will be successful. That hasn't changed for thousands of years. Eric Enge:Thanks Chris! Chris Baggott: Thank you! Have comments or want to discuss? You can comment on the Chris Baggott interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Mon, 01 Feb 2010 18:52:23 +0100 Published: January 30, 2010
Adam is a second generation internet marketer and president of AudetteMedia, a search marketing boutique located in Bend, Oregon. Adam is also a lead SEO strategist at Zappos.com, who he's worked with since 2001. A veteran of the internet marketing field since 1996, Adam has worked with companies such as Charming Shoppes, JELD-WEN, Intermec, Motosport, Michelin, University of Phoenix, and HSBC. Adam is a frequent speaker at search conferences nationwide. You can read his blog at and follow him on Twitter. Interview Transcript Eric Enge: Can you start off with an introduction of yourself? Adam Audette: I've been doing SEO for a number of years. I started back in the mid to late 90s, and at that time my father, John Audette, had a company called MMG. I was a link builder, and I would do something called the top 100, where we would submit sites to different directories like Yahoo, Rex and others for all kinds of clients. My father sold MMG to a large international agency sometime around 1999 or 2000, and after that I started consulting and working on different projects, and I started working for Link Exchange and then for Zappos. MMG still has some pretty good roots in the SEO world. Some of the people who came out of the company are Derrick Wheeler (Microsoft in-house SEO), Marshall Simmonds (NY Times SEO), Adam Sherk, Bill Hunt (well known SEO) and Disa Johnson (well known SEO), among others. All these people who were active in the industry started at MMG, so it's pretty cool to have those old roots. While I was consulting in 2001 and 2002, I started to put together the idea of starting my own company. That idea evolved over time, and now my company, AudetteMedia, partners with my father, who oversees all the operations and more or less guides the direction of the company. Meg Thompson, who is one of our partners, oversees all the clients and accounts in a vice president capacity. We work with a lot of eCommerce clients, and one of the biggest clients we have is Zappos, which we are going to talk about today. I am working pretty deeply in the company in terms of SEO and all that stuff, and I have to be there every two weeks or so. Eric Enge: Do you have other clients that are large in size as well? Adam Audette: Yes. Some of the other large clients we have are Michelin and the University of Phoenix. We work with Charming Shoppes, which has a number of retailers across the country, and with Fashion Bug, Lane Bryant and Catherine's. We do some projects for other large companies like AOL, and we also work with a lot of startups like DriverSide, Motorsport.com and Rockler.com. We also do a lot of one-off projects for other agencies, things like audits. We are working a lot with large interactive agencies that don't necessarily have SEO capabilities. We'll do audits for an agency like AKQA and for one of their clients, and we'll also do SEO for clients such as HSBC. We work on a lot of different capacities, and I think one of the things that we are going to be doing going forward is broadening our reach from strictly e-tailers into other verticals as well. A few months ago, we started working with Primedia, which operates a lot of real estate websites. It's fun to start working in new markets and industries that are so different from eCommerce. Eric Enge: Can you talk a little bit about your relationship with Zappos? Adam Audette: In 2001 I started working with Zappos on some different things, including producing an email newsletter for community building. That changed to social media stuff, and then about two and half years ago we started doing SEO. I work there with Aaron Shear, who in my opinion is one of best SEOs around, and he also has a lot of experience in eCommerce as well. Before Zappos, he was with eBay, Shopping.com, and some of the other big CSEs or comparison shopping engines. Aaron and I work directly with the CEO, Tony Hsieh, who I had known from my Link Exchange days. Tony actually owned Link Exchange before he sold it to Microsoft back in 1999. He is involved on the SEO side from a high-level perspective, while Aaron and I sit between marketing and development, focusing on interface marketing initiatives and getting them in front of the developers. Over the last year to year and a half, we've had a ton of momentum, and now it's got to a point where the development, content, and marketing teams are all on the same page with regards to SEO. The model at Zappos is very top-down. Tony is pretty active in all the initiatives for Zappos of course, and he is very active in SEO as well. He is able to drive things through that we wouldn't be able to do as effectively as SEO managers. There are different things that we've learned process-wise in order to get that to occur, and now looking at SEO for 2010 we have a lot of things planned. Organic free search traffic has turned into a big driver, and it's going really well. A lot of that probably has to do with early adoption, as Zappos has of course been around for a long time, but it also has a lot to do with the good fluid environment there. There is not a lot of red tape and bureaucracy to impede things getting done, which is a big advantage. Eric Enge: When you talk about Tony being active, one of the things that I took from that was that when individual teams aren't able to come to a consensus and someone needs to play the role of tiebreaker, he is probably pretty active in those kinds of scenarios. Adam Audette: That's exactly right. Eric Enge: This is actually a great thing, because the big challenge for a lot larger organizations is that there are so many people to sell, and if you can't get them on board, a paralysis emerges. Adam Audette: Exactly, that's very true. Tony does do that, and he can prioritize and de-prioritize things so that it makes sense for everyone, and he makes what he wants done clear to everybody so they can get done pretty efficiently. Eric Enge: I imagine that the environment for each of your clients is quite different. Adam Audette: Yes, it sure is. Each one is different, and some of the companies that we work with require a bit of education, and sometimes we are working with a marketing manager that doesn't really get SEO, and yet the people above him or her are the ones that get it and hired us. As a result, we end up being managed by people who don't exactly get it. Not only do we have to try to get our recommendations implemented, but we also have to educate either the main marketing contact person or the whole marketing department on why we are trying to do what we are trying to do. That's one of the more frustrating things, but I guess the integral part of our job is that it's not as easy as just handing over our recommendations to development and then managing that process to get it implemented. We also have to rationalize all of our recommendations and talk about why we came to these conclusions. That's probably one of the harder and more social parts of SEO that some people in the field maybe don't always think about. Eric Enge: For a small business whose fate is dependent on search engine traffic, you can count on the owner of that business probably being at least basically knowledgeable about SEO. But when you have a large, established brand company like Michelin that hasn't been dependent on traffic, it can be much more challenging. Adam Audette: Yes, that is exactly right. Michelin is interesting because in that case we are the specialty SEO shop and we are a small shop. We work with companies that already have other digital agencies that they work with, and in Michelin's case that's exactly what occurred. In those cases, we have to share deliverables with larger agencies, so that's kind of an interesting situation as well. Eric Enge: Going back to Zappos, I guess the other thing that is unique about it is that it developed as a web-only business. Adam Audette: Exactly. Zappos tried to roll out some outlet stores about a year and a half or two years ago, and I believe there are still two located in Las Vegas. It's been an interesting situation to see a web-only startup roll out some brick and mortar stores. They weren't really successful for the first year or two, but the one outlet in Las Vegas had bigger sales than it ever had this holiday season, which is really interesting to see. Zappos is still a pure website all the way, and the main thing about it is that there was a never a lot of money put into marketing, as the entire focus of the business was on word of mouth through great customer service. For Zappos, it's all about having a great experience and then getting that customer to tell his friends and family about it, and of course getting previous customers to come back again themselves as well. Spreading marketing by word of mouth really helped the company grow for a number of years, and SEO fit nicely into that because it's not as hard as an upfront cost, especially when you can build SEO into the processes of the company and get content teams to think about it. Therefore, marketing dollars were saved for things like PPC, some display, and very little TV and print advertising. Then about a year ago, Zappos started to push more into traditional advertising channels. Eric Enge: What were some of the unique challenges in the Zappos environment then? Adam Audette: One of the things that makes this challenging is that it's such a large corporation with so many sort of moving parts. Getting resources is the biggest challenge we face, so if we have a project or an idea and we want to get that slated, it may not happen because there are 10 or 12 other things that have already been prioritized above that. One thing we do to counteract that is having monthly meetings where we get the key stakeholders from each department in a room together, including Tony, and we all talk about the outstanding projects out there and what we need to do to prioritize. Those have really, really helped, just that one hour together a month, because there are just so many other needs out there between brand and other marketing channels, so resources continue to be the biggest challenge. Eric Enge: The other thing that occurs to me is the size of the site, as it is a large, extremely complex, multi-navigational path type site. Another challenge is just the complexity of getting that many pages to not be seen either as low-quality or duplicate of something else. Adam Audette: Exactly. Talking about the site, one of the things that will jump out is that there are basically two sites there. There is what we call Classic, which is the old Zappos site, and what we call Zeta for Beta, which is the new Zappos site. Everything has been slowly migrating over to Zeta, and we basically are running two sites in parallel that have been there for a long time. We've had several million pages, so trying to transfer this all over to a new site just wasn't going to happen in one fell sloop. First we started with certain sections of categories, and we rolled it out from there. If you look at the site we have, there are people all the time, SEO consultants and agencies, who will email Zappos marketing telling us to look into some SEO consultant who wants to help and listed out a number of problems with the site. From a classic SEO perspective, Zappos is doing so many different things wrong. There might be 2,000 links on a single page for sections, such as the brand section, and other pages could have several hundred links. There is no clear information architecture in that way. The environment is now so large that we really have to prioritize big chunks and take things off one at a time. There may be a reason why there is a neglected part of the site where some old NoFollow tag is still out there, and obviously the page titles need to be ideal. There are the kind of the lumps we have to take, because even though we want to, we can't do it all. Eric Enge: In some way, it gets back to the resources thing and dealing with the massive number of potential projects. You have to find a way of thoroughly cleaning house on things that you know you need to do, but it's just isn't making it up the stack. Adam Audette: You got it, that's exactly right, and we have to rationalize the need for every project. When we put that ticket in, it has to have a clear benefit and a clear bottom-line. If it doesn't, it's going to get shoved right off. That quickly prioritizes things on our end, because we can see right away if something is or isn't going to have a massive effect on the bottom-line. We only want to look at the stuff that really will hit the bottom-line. Eric Enge: Do you ever run into a situation where you have things accumulating bit by bit that wouldn't hit the bottom-line substantially themselves, but dozens or hundreds of them together would cumulatively make a big impact? Adam Audette: Yes, I could think of a number of different examples that I won't share, but in SEO it's so true that all those things add up to affect their overall score. When all those signals are pointing in the same direction on the page, it's going to have a very strong affect on the SEO. When some of those signals are pointing in different directions and it all adds up, that's going to have a big affect. This could mean doing minor things such as updating the XML sitemap, internal link structures, updating robots.txt, and dealing with duplicate content, and all those things cumulatively make the difference. We had a lot of issues with things like redirects, which we have accounted for in the last year or so, and now we are feeling really good about them. Once that's done, we can look at some of these other things that have been lower priority up until now. Eric Enge: By redirects, do you mean redirects implemented using the wrong HTTP status code? Adam Audette: Yes, there would be some cases where 302s were occurring on areas where we wanted 301s, and with the Zeta site rolling out we had just thousands upon thousands of URLs that needed to redirect the old classic URLs to the new versions. Each time we roll out a redirect, we introduced site latency just to ensure those were all happening, because it's one more lookup that has to occur. That's something that's very important to us, just the speed of the site. Over the last two years, ever since Aaron Shear and I started really diving in on SEO, site latency has been one of our higher prioritizes. We were gratified to see Google acknowledging publicly that site latency is going to be a factor in its algorithm, and that's a big focus for us. Eric Enge: If you look at the webmaster tools and check out the performance stats for your site, it's pretty interesting to see. Google is pretty aggressive in terms of the messaging they are giving you through webmaster tools. Adam Audette: Very true, very much so. They are really trying to speed up the web, and leave it to Google to speed up the web. If anybody can do it, they can. Eric Enge: Yes, they are pretty determined. You look at all the various initiatives that they have, such as launching their own DNS infrastructure, and not a lot of companies would think about doing that. Adam Audette: That's true. Eric Enge: Are there any specific example scenarios of issues you've had with large company SEO that you could provide? Adam Audette: We had a large client with several very large eCommerce sites roll out some new designs under their brand. They had worked with a design company that didn't do SEO natively, so they hired AudetteMedia to vet their work, make sure their SEO was built in, and make sure their legacy URLs were redirected. There were a number of challenges that arose when working between a large company and their design agency. Even though we repeatedly hammered into them the fact that they needed to do 301s for all legacy URLs and implement redirects on them, they failed to do so. All these sites were returning error pages for the legacy URL, which of course is a big problem in itself. What was happening was those error pages were actually returning a 200, so these thousands and thousands of links, were permanently redirected to an error page that returned a 200. We are still trying to recover from that with that company, and it's been several months since that occurred. Eric Enge: That's one of those education type scenarios here we are talking about, where we need people to understand the basics or at least have a process in place so that these things don't get pushed out the door like that. Adam Audette: Yes. I think it will continue to improve on the web in general. A lot of design agencies are getting better with SEO, but, there are still some real horror stories out there. Some of these things which we consider to be basic SEO aren't being taken care of. Another good example I can share is when we worked with a company that had their link canonical tag implemented. Unfortunately, every single product page on their site had a link canonical target to the homepage. We didn't see this problem until three months or four months later, because we hadn't been working on that particular site. Eric Enge: Yes, that's a tough situation too. That leads to the reason why Google has stated publicly that it takes the canonical tag as a hint, as opposed to a directive. They are anticipating, and rightly so, that there will be people who implement it improperly. Adam Audette: You got it. In our experience so far, they are treating it as a very strong hint, and it's working well when implemented properly, but it can also really work well against you. I'll share one more example. We work with a really large company who has a number of websites, and the company wanted us to do some audits for them. One of the first ones we looked at was a site that conditionally redirected the user with a 302 to a landing page if the user agent didn't accept cookies. Of course search engines generally speaking don't accept cookies, so they all get 302 redirected. Out of the hundreds and thousands of pages on their site, they only had one page indexed, and that page was the page prior to the 302. They were wondering what was going on with that. This is another case of just missing the boat on fundamental things need to be taken care of. Eric Enge: Yes, it's amazing. One of the things that I have written about somewhere along the way is that one of my first recommendations for large companies is to set up training sessions. This way, we can walk them through some of the basics so that we can spend more time being proactive and building traffic, rather than being reactively driven by basic mistakes. Adam Audette: That's exactly true. Training is something that we've focused on a lot, and it takes a little bit of time, but the value there is amazing. Getting everybody in line with SEO just so they know it at a superficial level will save so much time, resources and pain by being more proactive and not having to just react to everything that occurs. Eric Enge: Any final thoughts? Adam Audette: One thing I would like to say is that despite all the difficulties we face, working with these larger sites is a lot of fun, and I really enjoy it. Eight or nine times out of ten, the brand strength itself will trump a lot of their problems. We are working with a site like Zappos that has millions of back links. The amount of muscle to push around is it's just tremendous. It has been like this in the past and it may change, but there is an unfair advantage for these larger brands in SEO. Eric Enge: I like to say that links basically cure all evils. Adam Audette: Well said. Eric Enge: Thanks Adam! Adam Audette: Thank you Eric! Have comments or want to discuss? You can comment on the Adam Audette interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Tue, 15 Dec 2009 15:37:47 +0100 Published: December 13, 2009
An in-house SEO with more than seven years of experience, Brent has doubled Tribune's visits from search engines since he joined the company in February 2008. With more than a million visits per day to Tribune's network of websites, Brent drives traffic not only to Tribune's newspaper sites (including the L.A. Times, Chicago Tribune, and Baltimore Sun) but also to Tribune's dozens of broadcast sites (including KTLA, WGN, and WPIX). He is a newspaper SEO authority with experience and expertise in CMS challenges, duplicate content mitigation, page-rank funneling, Google News, and Google index rates. He has also trained large editorial teams regarding SEO. Interview Transcript Eric Enge: Can you outline the basic background of where you work, the number of properties you have and what your day-to-day work is like? Brent Payne: My official title is the Director of Search Engine Optimization for the Tribune Company. I handle the Search Engine Optimization work for all Tribune properties, including the websites for the Los Angeles Times, Chicago Tribune, Baltimore Sun, Orlando Sentinel, Sun Sentinel, Hartford Courant, Morning Call and Daily Press. KTLA in Los Angeles is one of our larger broadcast sites, along with WGN in Chicago and WPIX in New York, and twenty-one other smaller stations across the country. I also handle the Search Engine Optimization for the websites of about 17 LocalTV, LLC stations, most of which are in the Midwest. We are working on launching some projects that I can't talk about, but some of them which we have already launched include HealthKey.com, a health-related site, and ZooZag.com, a classifieds site. Eric Enge: You spend four hours a day on each one, correct? Brent Payne: Ha ha ha! Yes, of course! Seriously though, I try to break things down by traffic. For example, LA Times receives around 30 percent of our traffic from search, so I spend quite a bit of my time on LA Times, followed up by Chicago Tribune, which, admittedly, gets more of my attention than it normally would simply because I live in Chicago and our corporate offices are located here. Florida papers also get a bit more attention than their percentage of traffic would typically warrant mainly because they tend to be a little bit more edgy in terms of the kind of content they post, especially in their photo galleries. That gives me some additional opportunities when it comes to celebrity related search queries and other entertainment related SEO, which is obviously popular on Google. For the Baltimore Sun site, I mainly deal with news that is related to Washington DC. Broadcast sites are typically for Video SEO, but unfortunately our Video SEO is really poor because our current implementation of the video player sticks it in an iFrame. However, I am working on getting that fixed, as well as getting the close captioning for those videos automatically fed, which I feel would be huge in order to help our Video SEO. Eric Enge: What are some of the unique problems that you have faced? Brent Payne: The largest problem that I deal with quite a bit is with the CMS. The CMS was built ten years ago, but it has been changed a lot since then. At the time it was built, the current mindset was duplication, duplication, and duplication. It was definitely set up from the print mindset that it was okay to syndicate to ourselves, so literally when I got here two years ago, about 250 copies of the same article would get created on every story. It would literally be in five or six different sections on the current domain, and then duplicates across about 50 different domains which created quite a problem. It was quite a challenge because I had to convince the General Managers of these companies why this was important, and then get tech resources to make a massive change to our CMS. Once we got that part done, we saw an increase in search engine traffic. The current problem I am dealing with is the cross domain issue, i.e., if the LA Times writes an article that all the other Tribune properties like and thus they put that same story on their own domain, how do we get Google to realize that the LA Times is the originator of the article? Google engineers claim they are going to have the cross domain canonical tag by the end of the year, and I am excited about that, but I know they were pretty slow to initially utilize the canonical tag. Thus I'm a bit leery as to how effective it will be initially. However, I am hoping that the cross domain canonical tag actually has some type of impact by at least March of next year. Sure, I could do 301s and force it to a single domain name, which is where I was headed, but we literally needed 3 to 4 months to complete the process with 3 or 4 developers. The cost to do that is high, so I am going to try the cross domain canonical first, and hopefully that does the trick. If I don't see anything by March or April of next year, then I'll go down the road of literally forcing it to a particular domain. Another concern I deal with are political issues. Say there is a huge breaking news story, and the LA Times and Chicago Tribune both want to write a story about the LA fires, for example. And, let's say someone from Chicago was killed in one of the LA fires. Well, the Chicago Tribune would want a story that is more about the death of that person so the story is more focused to the Chicago area. However, the story coming out of L.A. Times would be more tailored to a Los Angeles audience. I have to deal with the online editors from both those newspapers and get them to understand that the story in the Chicago Tribune is not really about the LA fires, it's an angle of the fire, and that they should still link to the LA Times for stories about the LA fires. But if the editors in LA could utilize the Tribune's story in a related story section, that would be appreciated as well. It is always a challenge to get newspapers to link to each other. Even though we all work for the same company, there is still a lot of journalistic competition between the different properties. Eric Enge: You get involved in the negotiations as well, correct? While there is certainly an SEO reason for you to be involved, this is really a bit more fundamental issue than just SEO. Brent Payne: You are absolutely right, but it's a consolidation of resources as well. I think that we'll get to a point where this is less an SEO conversation and more a resources conversation, where we will discuss if we really need eight or 10 different movie critics throughout the Tribune network, instead of just one or two. I think that's going to be a difficult pill for some of us to swallow as we continue moving forward and realize that monetizing on the web is a different model than monetizing in print. Eric Enge: Getting people to make the mental switch from print to online media is a real issue? Brent Payne: Correct. Luckily I still have significant buy-in with the company, and our upper management, all the way to the COO of the Tribune Company, understands why it's important for our papers to get traffic from the search engines. Which makes it easier for me. I have the tools within our content management system to force a change if I need to, but I don't like being the bad cop. I'd prefer to build and utilize relationships rather than forcing a change or being the bad cop with editorial. When Michael Jackson died, I didn't care how many versions of the story were out there. The only property I had ranking well for it was the LA Times, so I literally forced every story about Michael Jackson's death to LA Times.com. It's hard to make a lot of friends doing that, but I rebuild those relationships over a period of time and I make sure that I find out what's important to each of the different newspapers and broadcast sites. I eventually help them rank well on what was really important to them, because at the end of the day, the Tribune properties care more about local visits then they do about national visits. So if I can get a local win for them via some of these SEO processes, it is much more valuable to them than winning on a national story. Eric Enge: There is obviously a lot of classic corporate negotiating going on. You have well-meaning people with their own individual objectives just doing their thing that are walking around with blinders on, and sometimes you have to help pull the blinders off of them. Brent Payne: Right. On my first day working for the Tribune Company, there was a huge stack, probably two-feet high, of site reviews from some of the best SEO consulting companies in the industry waiting for me on my desk. There were different reviews that they collected over the 18 months before I started working for them, and they seemed pretty accurate to me as I looked over them. Maybe a few things were incorrect, but for the most part they were totally on point. I picked up that stack of SEO site reviews, walked over to the guy who hired me, whom I had just signed papers with earlier that morning, and I told him that I quit because he really didn't need me to do this. He literally responded with, “Okay, let's go to lunch”. So he took me out to lunch and explained to me that it wasn't a matter of them not knowing what to do, but that they needed someone who could actually implement their plan. He told me that the reason they hired me was because they believed I had the technical ability from an SEO standpoint to know what to do to make sure the company didn't make any missteps. But, more importantly, they felt that I could actually get the plans implemented, because I have the personality to affect change, and they had not been able to do that for over a year. Overall, probably 80 percent of what I do is selling; selling ideas, selling concepts, selling why this particular change needs to be done, and then making huge, SEO changes. I think there are people in the industry who are considerably more versed in SEO than I am, but I have unique skills where I can sell or convince people to get that done. One of the frustrations I have with of lot of consultants is that they tend to tell you exactly what needs to be done in a perfect scenario, but they don't give you any fallback options. Eric Enge: Right, that's a really important part of SEO. The perfect SEO solution might not play well with the way a site is built. It's important to have other potentially implementable options. Brent Payne: There are a lot of people who question why we don't do some things that other companies do. I have had some conversations with Marshall Simmonds (who handles SEO for the NY Times) about some of the things they are and are not doing, and it's not a matter of us not knowing some of these simple things we should do. For example, I understand that it would be much better to do a 301 Redirect directly to LA Times for an LA Times story rather than putting a canonical tag in there. I get that, but I also implement what I can do immediately, and then I work towards the perfect solution over a period of time. Eric Enge: At a recent conference I heard you say something about how rapidly content is indexed on these sites. Brent Payne: Yes, it's quite different than some of the other sites I worked on in the past. When I worked for OneCall.com it was different. Our content was re-indexed by Google about every two weeks. Today, it literally takes Google five to seven minutes to update the index for the LA Times homepage. I think that's widely different than what most people deal with, and it sometimes causes problems for us. One of the things I have had to drill into the minds of all the writers, editors and publishers here is that as soon as they hit save or publish, or whatever the button happens to be, Google will see it. If they don't have their headline right, it's going to cause a problem because Google is going to see exactly what they have written. If they don't have a lead photo in their story before they save it, Google is not going to see the lead photo. After that, it may be 30 days or longer before Google takes a second look at that URL. You have to get it right the first time. All the information for the story has to be in the story before they save it. If you look at Google Trends for any major breaking news story, huge spikes occur within two to three hours of the break, and then the spike is done, so it is essential to have it right the first moment we go live. A scenario that underscores the point of quick indexing occurred when I was doing SEO training at the Orlando Sentinel about a year ago. At the time there was a big story on Caylee Anthony and her mother's involvement in her death. In the other room they were doing CMS training, and they decided they would put in Caylee Anthony as the headline just as an example. Within 15 minutes, they were getting complaints from people who found a story on Google for Caylee Anthony that ended up ranking well in Google but just had a crap story written for it. I had to interrupt the training, add a NoIndex to the page, and then go into Google Webmaster Tools and request the removal of the URL. Then I emailed the guys at Google saying that this needed to be taken care of, and luckily it got removed pretty quickly. We were ranking for some pretty odd stuff, and that's the power of the trust and authority that some of these major domains have. Eric Enge: It puts the trust itself at risk. Brent Payne: Right, absolutely. Eric Enge: The good news is that your content is indexed incredibly quickly, but if it's not the way you want it when you put it out there, it might not get fixed for thirty days. Brent Payne: Correct, although Google News now claims that they will refresh the Google News content within 12 hours. As a side note, I have had conversations with Google News where they have told me they literally have 300,000 to 500,000 stories about Barack Obama live at any particular time. Do you really think they are going to refresh all 300,000 to 500,000 of those stories on Barack Obama within twelve hours? Hell no. I have a bit of an issue with that timeframe that Google is publicly stating, because I am just not seeing it. I am seeing it could be several days or weeks before they look at it again. At least they are looking at it again, however, so I have to give them that. This is huge in comparison to what they were doing even six months to nine months ago. I, however, still tell our journalists to make a new URL for an update to a huge breaking news story because it's easier to rank in Google News than it is in Google Web. Once that new URL is put in, you can go into the older version of the story that you had and 301 redirect it to the new version so you don't lose your link juice and can still rank well in Google Web. It also helps to change your title tag, and the H1 tag in that first paragraph in order to get past the Google News Duplicate Content Filter. That's what we have been doing, and it's been working great. However, Google web search has made a change which is driving me crazy right now, which is that they have slowed down how quickly they reattribute page rank on a 301 Redirect. It used to be that they would immediately transfer page rank as soon as Googlebot would see the old URL that you have now pointed to that new location. I could do some pretty cool stuff with that, but now it seems to be delayed several weeks. This is causing me some issues based on what I have trained the journalists to do. We just aren't gaining the web success that we could have. It's even worse now that Google News and Google Web are getting even more different, as we, as content creators and SEOs, kind of have to choose whether we are going after a Google web search or Google News results SEO win, and that's frustrating. Eric Enge: I recently wrote a post about the cost of site moves. Basically, the main thrust of the story is that you will lose traffic, and I am trying to help people get a sense of how much traffic is going to be lost. We run into this all the time when working with large companies, where an executive makes a decision that may be brilliant from the traditional marketing point of view, but it is just total disaster from an SEO perspective. The article is focused on trying to get people to understand that by changing their domain name, changing their URL structure and all of their content, they are going to lose more than half of their traffic, even with properly implemented redirects. Brent Payne: I feel so sorry for our broadcast sites and our TV stations because they have changed their URLs so much even in the past couple of years. We also have not done a great job with 301 redirecting from the old domain to the new domain. We have done some of it, but most of what we did was quick fixes. They are definitely being hindered by it, and I’m frustrated that, as a result, broadcast sites collectively account for only five percent of our SEO work in comparison to newspaper sites. Some of these broadcast sites like KTLA, WGN and others are serious destinations for news, and yet their website is not ranking well because of the massive changes they have made and the poor job we have done moving from domain-to-domain. It happened exactly as you describe, where upper management decided to make a change and we had to roll with it. Eric Enge: At SMX East we talked about some of the unusual page rank sculpting things that you have done. It would be great to talk about that a little bit. Brent Payne: We have done some rather significant work with that. Last year we did a lot of work with what I call dynamic page rank sculpting. We had five different levels setup for NoFollows on all of our homepages and section fronts that could be controlled individually by domain name. A select group of people, including myself the online editors, and the producers of the site could rate what type of news day it was on a scale of one to five. A scenario that was a huge, highly focused national event that people cared about is what we refer to as SEO Level 1. Of the 400 or 500 links on their homepage, we would reduce it to literally one followed link on the page (the rest were NoFollowed), which would also be the H1 tag on the page. It worked extremely well last year, literally every time we moved it we were on the first page of Google, usually in the top five for whatever that particular story was. Eric Enge: The spike in traffic for breaking news was the critical area for you to focus on. Brent Payne: It's all about breaking news, and we have PageRank 8 sites PageRank 7 sites, and PageRank 6 sites, and we want to focus their authority on the most important news. So we can choose SEO Level one through five, where one represents only one followed link on the page to a particular destination. A five would be everything on the page is followed except for terms and conditions and privacy policy, stuff like that. A normal news day would be an SEO Level 3, which would NoFollow about half the links on a page. If something bigger happens, we might set it to SEO Level 1 or 2, and then at night open things back up, like go back to SEO Level 5 so that Google can at least get some type of wider distribution of the page rank coming from our strongest pages being--the homepages and the section fronts. Eric Enge: Right. The fascinating thing here is that this mechanism was getting response that quickly. You are talking about reprocessing of page rank in a dynamic manner. Brent Payne: Yes. The fact that there are news cycles involved is only one of my problems in News SEO. There are number of queries that people are looking for that are involved. It's not as normalized as ecommerce, where the number of people searching for Sony DVD player per week or per month doesn't change too much, unless it's the holiday season. That way you can track what is and what is not working considerably better. With news it's much tougher, because the news cycle itself has a lot to do with how much traffic you are getting from search engines. That being the case, I can look at where we are ranking on the first page and where we are ranking for stuff that we wanted to rank for, and last year it was working really, really well for that type of stuff. But, then I noticed it was more difficult on inauguration night than it was on election night. I don't know if that's when the change in Google's NoFollow treatment occurred, but I know on election night our dynamic PageRank sculpting worked great. When it came to inauguration night, something seemed to have changed significantly, so it did not work so well any more (Editor: for more on how Google changed its treatment of NoFollow, see this post). Eric Enge: Welcome to the world of SEO. Brent Payne: Exactly. Another thing we have been working on is the way in which we flow page rank from different domains, specifically how we are utilizing our topic galleries to do this. The New York Times has a single domain, and they have commented that they wish they could link out more from their site. In December of 2008, we had literally eight different topic galleries for the eight different newspapers, and we had the same topic gallery on each domain. In other words, we'd have 8 copies of the same topic gallery like Barack Obama, instead of just a single topic gallery living on one site for Barack Obama. That wasn't too effective, and we weren't seeing anything rank too well in the search engines based on that, so we consolidated it to just one topic gallery that was owned by a particular domain. Considering we have 50,000 topic galleries, I had to go on somewhat of a knee-jerk reaction on assigning the majority of them. We moved most to Chicago Tribune, which wasn't necessarily fair, but that's what we did. The Tribune newspapers could request any topic gallery they wanted, and a bunch of the different Tribune newspapers went through the list of 50,000 topics and grabbed what they wanted. For example, The Orlando Sentinel and Sun Sentinel grabbed a lot of hurricane and Disney searches, and we broke it up other topics geographically or based off of news events that were occurring locally. When that happened, our topic galleries all of a sudden popped, and we now rank on the first page for terms like: Barack Obama, Michelle Obama, even weird things like Delta Airlines, because we are able to crosslink from several domains to one domain. Plus, the topic galleries are pretty compelling content to link to and they only have one destination to link to instead of eight. But, obviously a lot of strength comes from within the Tribune network of sites. Currently, in stories across all eight newspapers, (and we will soon expand that to all fifty domains) mentions of keyphrases link back to the exact same topic gallery location. On the commerce side, we have an example of the visibility that Google has into what we are doing. We had a scenario where we launched e-stores inside of our newspaper domains. Inside those e-stores, we were selling products that probably do not match our demographic. We were selling belly rings, as an example of one of the craziest things we were doing, and we were linking off to other sites that probably didn't have the best link network in order for people to make that purchase. For the sites we were linking to, it was a thrill to have a lot of PageRank linking to them for things like belly rings, and that was a good move on their SEO part. However, within 24 four hours of launching the sub-domain, I received an email from Google telling me we had either been hacked or we were linking to a massive spam network. They asked me if I was sure that we really wanted to do this, because it was going to cause problems not only for the sub-domain, but for the main domains as well. We essentially got a warning from Google that we wouldn't have if we were a smaller set of sites, or if I didn't have the relationships established that allow for this type of communication. I know the BMW situation (Editor: for more info on the BMW situation, read this post) is talked about a lot, how larger sites are treated differently than smaller sites. I will absolutely agree that we are treated differently. I know that Google can't state that because it's just not a good PR play for them, but I think it's absolutely true. In this business, you have certain relationships, and if you develop all those relationships, you are going to get advance warning like this, either from a good friend or from a close business contact. I think that even CNN would agree that it's the same thing for them. How many people have literally a dozen Googlers in their Instant Message system? Very few. How many people have two dozen or three dozen Googlers that they can contact about different problems that they have with their site? Very few. We are lucky that we have that, but we are also a major content destination for queries that people are searching for on a daily basis. I am not going to be pompous enough to think that Google must have us. I do think that a lot of news companies are getting a little caught up in the fact that they think their content is so unique that Google needs them. Internally, we are exploring certain options that may not be great for SEO, such as pay for content subscriptions ("paywall"), but we really do have a lot of commoditized news that's just not going to fly for. When you have CNN writing a story about it as well and they don't have a paywall up, or you have New York Times writing a story on the same type of topic, the users out there that are coming from Google don't really care where they are reading that news from. They just care that they can find out what's going on. I am a little concerned about the direction that news sites are trying to take and the attitude that they have that we are uber special and Google needs us. Google doesn't need any one company. They would like to have some companies more than others, but they don't need any specific ones in my opinion. Eric Enge: In my opinion, even if somebody does need you, that shouldn't stop the selling, because good relationships are things that grow. Relationships where one party is arrogant or, in this case both parties, I think they are pretty problematic. Brent Payne: Right, I agree with that. Unfortunately I have been in a scenario where I had a public war with Google about a year ago. I am not going to get in to too many details on it, but you can look it up in a lot of our press releases either from Google News or from Tribune. That scenario where you are lobbying back and forth press releases against one another is not a good situation to be in. To get a phone call from Google News saying that in five minutes they are going to remove you from the index because they believe you did a public opt out of Google News is not a good call to take. Having to try and stop that and literally get executive level people of both companies to slow down and have a civilized conversation is tough. We have just got to be careful and really figure out how we want to work together, as there is a lot of money on the table. Eric Enge: Thanks Brent! Brent Payne: Thank you Eric! Have comments or want to discuss? You can comment on the Brent Payne interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Wed, 18 Nov 2009 14:31:36 +0100 Published: November 15, 2009
Josh Cohen is the Senior Business Product Manager for Google News. He is responsible for global product strategy, marketing and publisher outreach for Google News, which is currently available in 26 languages and more than 50 countries. Prior to joining Google, Josh was Vice President of Business Development for Reuters Media, the world's largest news agency. While there, he led business development for Reuters' Consumer Media team, including all activities with major strategic partners. He was responsible for agreements with AOL, Google, MSN, Yahoo! and numerous media companies around the world for content distribution, revenue generation and strategic investments. Before joining Reuters, Josh was Director of Business Development for SmartMoney.com where he led business development and licensing activities for the site, a joint venture between Dow Jones and Hearst. Cohen holds degrees from the University of Michigan and Columbia Business School, where he graduated Beta Gamma Sigma. Interview Transcript Eric Enge: Can you tell me what your responsibility is within Google? Josh Cohen: I am the business product manager for Google News. I work with other folks on the news team, on figuring out what is our roadmap, what are the features that we are working on, what we want to do with the product in the next 6 months, 12 months, 18 months, and so on. A big focus of my job is really working with people outside of Google; so talking to publishers, talking to people in the media and at conferences; just putting a face on Google News and trying to demystify it as much as possible. I also work with a lot of the different cross-functional teams who interact with publishers on a day-to-day basis and try to tie those efforts together a little bit better. Eric Enge: Tell us what Google News is and what it does, and who uses it. Josh Cohen: Google News was launched in beta back in 2002. The idea behind Google News is really similar to what we are trying to do in search. Not to throw the company mantra at you, but, it really is about organizing all the news information out there and making it even more accessible and useful for users. We are trying to do this in every single country, and in every different language. We want as many different sources as possible, so that when people are looking for that information, they can find it. The interest in news overall is probably higher than it has ever been. More and more people are getting this online, and so the challenge is trying to find that information and to provide some context and organization. So, we really are operating as a search engine specifically for news. Eric Enge: How do you define news versus other types of content? Josh Cohen: We really try and keep as much as possible as black or white, and we don't get into qualitative discussions about the nature of the news site. We don't include any hate speech and pornography. What we look for is whether or not the site is covering current events, is it specifically covering the topics of the day, is there some evidence of an editorial organization, is there at least some editorial review process before something actually gets published. But, our bias is really toward inclusion. Eric Enge: Right. So, you try to be as broad as possible and include as many different sources as you can. Are you looking for the content that would be unique, rather than somebody just republishing stuff off of a news wire? Josh Cohen: Absolutely. We don't have people who are just pure aggregators; there needs to be some original content on that site. Eric Enge: That makes a lot of sense. What is the process that people go through when they want to have their site or some portion of their site considered for Google News? Josh Cohen: It is actually pretty straightforward. There is a whole help center on Google News that is specifically for users and explains to them how it works. A whole portion of that is dedicated specifically to publishers, which explains to them how it works, and how to submit their content. Ultimately, they simply submit their sites or the portion of their sites that they'd like to be reviewed for inclusion, and then we take a look at it. Eric Enge: There is a form people can use? Josh Cohen: Yes. It is located here. Eric Enge: What type of questions are covered in the form? Josh Cohen: There are a few basic questions about the organization itself. We do not make editorial judgments about the nature of the site. It is really up to the user at the end of the day to make those decisions about whether or not they think it is a site that adds value to them. So in the form, we are looking for objective information about their site, and we are not looking for them to make a pitch about their site. Eric Enge: Evaluating whether it is unique news content is something that your reviewers just do. Josh Cohen: Yes, there is a support team that will review those sites as they come in. There is not a single editor or journalists who are working on Google News. Once the site is included in Google News, and included in our index, there is no manual intervention around the rankings. It is all done algorithmically. Eric Enge: Right. Yes, but the people who review the site check to make sure that it is unique content as opposed to duplicated. Josh Cohen: Yes, they ensure that it meets that criteria. A lot of that can be done algorithmically. We understand duplicate content, and we can do a full-text analysis. But yes, there needs to be original content. Eric Enge: Right. And you know, for some reason, something goes wrong in the process, and the site does get turned down, but the publisher thinks that there is a fit, and they really believe that they should be reconsidered. Is there a process you would suggest for that? Josh Cohen: Our bias is towards inclusion, so if there are things that we miss, we certainly want to be able to understand the site better. Eric Enge: I know one example of a site that got turned down, and it turned out what happened is that, the person who had reviewed it had not looked at the news portion of the site. Josh Cohen: That is really why we try and ask for as much information about their site as possible, because obviously the webmaster, the owner of the site, the publisher is going to know a lot more about it, understands the details of it. We are looking at thousands of different sites, and so that is the one real manual part of Google News; so the more information we can get about this site, during that submission process, the better. Eric Enge: We have heard things about other kinds of requirements, like there needs to be a certain volume of news for example. Josh Cohen: No. There is not any a volume requirement in terms of number of articles published a day or something like that. It can certainly have an impact in the rankings, but not in terms of inclusion or not. We have sites that are publishing hundreds of articles on a daily basis, and we have others that are longer analytical pieces or investigative pieces that are publishing just a handful a week. So, there is really a pretty wide range. Eric Enge: There is also the notion that the URL needs to have a 3-digit code on it. Josh Cohen: That is correct, there are certain technical requirements, which have nothing to do with the nature of the site, but the ways in which we can pickup that content. The 3-digit identifier is one of the ways we pick up the news content on a site. As you mentioned, there are sites that have a section that is devoted to news, but maybe the rest of their content is inappropriate for Google News. Oftentimes, in those sites we will see that that 3-digit identifier is a way for us to pick up the specific news content, so that is a requirement for crawling that content. However, when sites are included in Google News, they are able to submit a News Sitemap, and if you are submitting the News Sitemap to us, then we don't need the 3-digit URL requirement anymore, and you can ignore that if you are submitting the content via sitemaps, as we can pick it up that way. Editors Note: Since this interview took place, Google News Sitemaps went through an update into a new format. Eric Enge: Do the sitemaps bring any other kind of specific advantages? Josh Cohen: Yes. It doesn't change the ranking; there is no bias towards a site that submits a site map versus one that doesn't. The real benefits of submitting a sitemap are, it provides a greater level of control over which of the articles appear on Google News, and it allows for specific metadata to be communicated about each of those individual articles. Right now it is fairly limited, but we are certainly looking to expand what we do within sitemaps, because the more information we have about a publisher's site, the better. For individual articles there can be basic stuff like attribution, and bylines, and location, and so forth. Ultimately, sitemaps are a really good way to clearly identify the information that you want to get crawled. Most questions that a publisher will have around ranking of their content on Google News boils down to some a technical issue; where we didn't take up an article or when we try to crawl it, it failed the extraction process. So, sitemaps is a real good way to insure that we are crawling that content, and it also allows you to proactively address any of those issues, because you can go right in, you can see when we are having problems crawling your site, whether it is a technical issue on our side or your site. I won't say sitemaps eliminate all the technical issues, but it can certainly it can limit the impact of some of those, and allows you to have a better way of monitoring them. Eric Enge: It will reduce errors, and will not affect ranking of included stories. It can affect whether or not the story is included at all. Josh Cohen: Yes, exactly. And, that is a pretty big difference. Eric Enge: Yes, it is. Are there other technical issues that people need to be concerned with to make sure that their news articles are friendly to the Google News crawler? Josh Cohen: There are definitely challenges with images; so there are certain best practices that we try to encourage publishers to do. Larger-sized images with good aspect ratios are always easier for us to pick up; having more description within the captions is always helpful, having them near the title, having them inline and non-clickable. And, for the most part we prefer JPEGs. Another thing is to have relevant and useful titles that are going to help the readers and to help our crawler know what your page is about. Try not to break up the body of the article, such as having dates between the title and the body. These are tips that are not just specific to Google News, but certainly help for Google News. Eric Enge: These things can also influence click-through. Josh Cohen: Absolutely. Eric Enge: Who are the people who consume Google News? Josh Cohen: The focus of Google News, and I think one of the real appeals of it, is trying to offer as many different perspectives as possible on a given story. So, it can be a different political perspective, different geographical perspective, and you have different people who want to understand a story and all the different angles around it, and they really want to delve into a story. And that is why we cluster these stories not by sources, but any request of the articles by story. People click on a bunch of these different links and those are the people who by and large get a lot of value from Google News, because they get that diversity from Google News. Eric Enge: From our experience, that certainly includes reporters and editors from a variety of sites. Josh Cohen: They are certainly heavy users of Google News. There are those who will come to the front page and like the fact that we will aggregate the top stories out there on the web, and allow them to browse the top stories, see what is there, click on them, and go read them on the publisher's site. Looking at those top stories is not dramatically different from somebody who may go to the publisher themselves directly to look for those top stories. They may be just looking to see what is out there from across the web, from both their favorite sources and sources they don't know. Then, there are the other half of the users who are using us pretty specifically as a search engine, who are using us just to type in the keywords or news stories that they have heard; whether they have heard it in the office, or on the web, or somebody emailed to them want to learn more about it, and they will just type in a name or few keywords, and use it much more as a search. Eric Enge: People also set up news alerts, right? Josh Cohen: Absolutely. They can set up alerts, use our RSS feeds, so there are a number of different ways where they can try and keep on top of stories. We see our role not as a destination site, just as a starting point. Our goal, very similar to what we are trying to do with web search, is to help people find what they are looking for and then send them on their way. Eric Enge: One of the subtleties of this is that it is obvious to have a title that entices a click-through. But then, you also want that title to whatever it is that the editors you want to reach use as search terms. Josh Cohen: To be clear, having a clean title matters, and the placement of that title in your page matters; but there are a few different elements that we are going to look for in trying to pickup the correct story. Certainly, the title matters, but URL and most importantly the text in the article itself matter too. If you have got a URL that is somewhat unclear, or the information is not that clear in the body of the article itself, then the title takes on more weight. These are all different components that we are looking for; so if you have got a URL that has information, the text is very clear for us; then the title I would say is no more important than the other ones. Eric Enge: Are there other things that go into ranking news stories? Josh Cohen: Yes. There are two separate ranking processes that take place. One is just the story ranking, such as what is the top sports story of the day, what is the top entertainment story of the day, science and technology, and so on. There are a number of different factors that go into that, but the easiest way to think about it is we are really relying on what editors think the most important stories are. What is the aggregate editorial interest in a given story: that is to say, how many people are covering it, and where are they putting it on their page? These factors do not impact an individual source's results, but do influence what story lines we think are most important. So, that is the story ranking. For article ranking there are a number of signals that we are trying to use: is it original content, is it timely, is it relevant, is this a local story, and there is a local source reporting original content on it? That is again, not always relevant to every single story, but it is something else we will look for. Other questions we ask are, is it novel, or is it just a rehash of an article that was out there before, a story that somebody else broke, you just happen to publish it later. These are things that we look for, hard to do, but increasingly something that we are trying to include in our rankings. Then, there are also source-specific signals that we try to use. This is where volume comes in: what is the volume of publication of original content in a given category? The example that I would like to use is, looking at the business category, you have got the Wall Street Journal, or Bloomberg, or Reuters, all of whom, any given day, are publishing probably hundreds of original stories in business. By itself, that is a decent signal that this is a quality source in that category. You can compare that then with their volume of publication of original content in the sports category, you are probably not going to see a whole lot, if any, of original publication there. I would say another really important signal for us in recent quarters has been the user behavior. Their behavior has become a really helpful signal for us in trying to determine that same trusted quality of a given source. So in a given cluster, the first link will get the most clicks, the second gets less clicks, and the third, the fourth, and so on, keep getting fewer and fewer clicks. But, if you look at a user who comes in, and instead of clicking on that first link which is what they were "supposed to do," and instead let's say they click on the fourth link; that is a very strong signal about both the source that they clicked on and also the three sources above it that they didn't click on, even though they were "supposed to" click on that. Over time, as you aggregate that information, normalize it for different click positions, you can look at this section-by-section to get a sense of what users feel are the best sources in given categories. Again, sticking with the business example, if I have got some random source as the #1 link in Google News, and Reuters in the #3 link, somebody may come to that and say "Wait a second, this is a business story, I want to see what Reuters has to say, I am clicking on that link in the third spot." That type of behavior takes place again and again, and it has become another important signal. Now, that doesn't trump everything else; all these other scores and factors still matter, but all things being equal, we certainly want to take a look at some of the qualitative aspects of a source. We try to algorithmically determine the qualitative nature of a source in addition to the story-variable signals. Eric Enge: Are inbound links a factor? Josh Cohen: Not really. It is obviously a signal on the search side of things. With PageRank links certainly, as you know, are an important factor. On the news side of it, just because the nature of news and how quickly that information comes out, to be able to build up links over time is just something that isn't really all that applicable on the news side of things. Eric Enge: What about social media signals, such as Twitter? Josh Cohen: There is nothing specific I can say on those, but I think it is safe to say that we are always looking at new signals. We will always keep working on this, because it continues to remain imperfect. We will test certain ones, and we will do evaluations against them as we did with the user click behavior. Eric Enge: Anything you can say about plans for Google News? Josh Cohen: We are trying to experiment in a number of different ways. For example we launched Fast Flip two months ago. With Fast Flip we tried to introduce that element of serendipity that you get in the offline world. When you pick up a paper and you see the top stories, you may spot the article at the bottom of the page. It is something you would never think to read, you would never really look for, but you do because you spot it. How do you introduce some of that quality into the online experience? Fast Flip is an attempt to do that. Another key component to that is the speed with which you can browse those pages. If a page takes five to ten seconds to load, you are not going to want to explore different types of content. Fast Flip is an attempt, both in terms of how it is presented visually, and also the speed with which it loads, to allow you to introduce some of the best of the offline experience online. That is a good example of one of the things that we are experimenting with; and I think we like to keep trying to innovate and figure out ways in which we can help our users and work with our partners. Eric Enge: From my perspective, for a publisher looking to get exposure for what they are doing, implementing a quality-relevant news feed and working with Google News is an outstanding opportunity. I mean, you get visibility that a lot of people would die for. Of course there is an expense in implementing such a news feed. You have to do a quality job, because you don't want to get in front of people and then have them say this is crap. Josh Cohen: I think that is well-said. The way that we look at it is that it is a real partnership with the publishers that we have. We are a search index, we are focused on news; but we don't have any content, we don't have editors, we don't have any journalists, and we don't create any information. We get that from the publishers. For publishers, we think that we bring value in helping them get found and driving the traffic to them. In a given month, Google News sends almost a billion clicks to publishers worldwide. Eric Enge: Better still, a significant percentage of that is from news editors and bloggers. So, not only you are getting the traffic from Google News, but you are getting the possibility of being written about in other news environments. Josh Cohen: Sure, getting written about by others within the market is interesting, but we also help publishers obtain loyal users, who may like the aggregation qualities of Google News, but will discover their content and like it. Eric Enge: Thanks so much for taking the time Josh, to speak with me today. Josh Cohen: Thank you! Have comments or want to discuss? You can comment on the Josh Cohen interview here. Other Google Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Thu, 17 Sep 2009 20:59:22 +0200 Published: September 17, 2009
Chris Silver Smith is the Director of Optimization Strategies at KeyRelevance. Chris has extensive background experience in search engine optimization and Internet application development and he is a regular speaker at Search Marketing Expo, Search Engine Strategies, American Marketing Association seminars, and other technology and marketing conferences. Chris previously worked as Lead Strategist at Netconcepts where he provided search marketing consulting and product development for their GravityStream automated optimization software. Prior to that, Chris served as Head of the Technology Department for Verizon's Superpages.com sites, leading teams which focused upon advertising applications, taxonomic development, usability, user interface design, and more. Chris worked at Superpages for over a decade and his projects included work in: R&D, map-based search, Campus Area Yellow Pages, weather forecasting systems, ecards, XML APIs, RSS feeds, mobile applications, city guides, and more. While at Superpages.com, Chris founded their extensive SEO program, initiating research on increasing search engine referral traffic naturalistically as far back as 1997, and he was later honored with the corporation's highest award for this work in 2004, the Verizon Individual Excellence Award, for increasing site visits and associated click-through revenues by many millions of dollars. In 2006, Chris went on to found and chair the Idearc SEO Council, pulling together individuals from across the organization who worked on elements of natural search optimization. Interview Transcript Eric Enge: What are the most important things for people to be concerned about when they want to rank well in local search? Chris Silver Smith: In my opinion, local search optimization is based upon a foundation of regular search engine optimization, but there are a number of additional factors beyond what we see in regular SEO, so there is a little bit more complexity here. Although some of the same classic SEO ranking factors are some of the things that feed into local search rankings as well, including well-formed titles and good H1 tags on pages. There is also another way to rank well within local search for non-businesses websites. Even if a business doesn't have a website, it can rank within local search because it may have listings within business directories whose listings get indexed and ranked by various local search engines. (For more on this, see Chris's article on how Google Maps may've switched from ranking sites to ranking businesses, independent of website.) Eric Enge: That's one of the big problems in local search. There is a very little control over the way that data is maintained and propagated in both on and offline directory listings. It seems almost like a mass of random Brownian activity at times. Chris Silver Smith: That's a really good description of it. All these different little pieces of information are in a way feeding off of and influencing one another, and then showing up in different places within search results. All of them may have something to do with the business that you are particularly interested in. Eric Enge: Right. There are actually a lot of things that can go wrong, including someone incorrectly typing a phone number or website address, the business may change locations or possibly going out of business. These are just a few examples of things that make the data go bad. Chris Silver Smith: Correct. There are many factors, and you've already touched on a few of them. Another example is that there may be multiple ways to write a street address. I've seen this happen before in big cities and small towns. In big cities like New York City, there could be an east version and a west version of a street or an avenue, for example. Local businessmen could be writing their street address in the way locals refer to it, and may not include the East or West. Google, and other mapping systems, may pinpoint the address in the wrong spot along the street. That has impacted me personally, and consumers are impacted by it very heavily in general, as it impacts their trust of online directories, online mapping systems and local search systems. It could also affect someone's overall assessment of a business, if they go to an address and the business is not there, they get irritated. They are not going to have a very high tolerance for this, and even if they eventually find the business their opinion of it has probably suffered. The difficult thing about this is that it may be completely unrelated to anything that business did, and it can affect a customer's appreciation and review of a business. It can impact customer satisfaction before they've ever even arrived at the business. The mapping issue has been one of the biggest problems with local search since the inception of the Internet. There are a whole lot of reasons why businesses may be incorrectly located on maps, and it's one of the areas for which Superpages received more complaints about than anything else over the years. Of course, we didn't even provide maps that we made in-house, we used a few different external mapping systems, and each of the different mapping systems had its own problems and issues. All of those factors feed into the process of getting listed and located correctly, and they affect the local search engine and directory's ability to canonicalize listings and all of the different specs of a business's information within one particular listing. I refer to the Brownian specs of information floating around as a sort of "constellation of local search information sources". In a recent article, I wrote that if the search engine successfully associates all those diverse specs of info with each other then they can help to build a particular business's rankings within the search results. Eric Enge: One issue I think most people don't understand is that if a search engine doesn't have confidence that a business is at a particular location, it's enthusiasm for ranking that business highly for related searches is going to drop. Chris Silver Smith: That's true. Eric Enge: This could be potentially harmful for business if they go in and correct their listing at Superpages, for example. It's great that it's correct there, but it could still be wrong at yellowpages.com, business.com or at thousands of other websites. The business could then have one place where their listing is correct and a bunch of other places where it's wrong, and what's worst is that they are different. So if the search engine sees a bunch of data and it's not really confident that the data is accurate then, that can be a ranking issue. Chris Silver Smith: I can describe how that works to some degree. Search engines, local directories and online yellow pages use a variety of methods to try to associate businesses' information that they get from multiple sources all with the same listing. There are a few different things that they do, including comparing the business name, street address and phone number. The phone number has in the past typically been considered something that doesn't vary as much as some of the other information from all the different business sources. As a result, it may be used for associating those various pieces of information all within one listing, but there are cases where people start adding on phone numbers, and then these directories don't know which is the primary number for a particular listing. In addition, there are businesses using various tracking phone numbers to determine how many calls they get from the various types of promotional work and advertising they do. They might use a different phone number in their newspaper ad, in each of the different yellow pages print directories and on their different ads online, so they might have a whole series of different phone numbers showing up across these different mediums. If the phone number is different, it may result in the search engines having difficulty associating the same business's information if some of the other pieces of information are not identical or very similar, such as the business name and the street address. There are additional problems with that, however, because there are many variations in the way people cite street addresses, as I mentioned before. Google may think that a street address is "Highway 1" for example, whereas the more common name locally may be "Main Street" or some other alternate name. I've seen cases where streets have three or four different possible names or spellings, and an address could have multiple different businesses all at the same street number, like in a shopping center, a large office building or a shopping mall. This is how it starts getting more and more difficult for the search engine to associate these pieces of information all with one listing. There are multiple different ways that directories cite the business name as well. Doctors, for example, often have their last name listed first in directories. There are also other cases where people use variations in business names, and they all may be valid, just different and used for different reasons. So if a business has different listings in InfoUSA and in Superpages, then associating those pieces of information gets very difficult. And if a business has one listing associated with one website and another associated with an alternate website, this is not going to give them a good chance to rank well on the search results. The same is true for reviews and ratings. If three or four different online business directories have reviews and ratings for a business, they can't get collected together under a single unified rating when Google pulls all those pieces of information together. In this situation that particular business is not going to have the best chance to rank highly. Eric Enge: I interviewed Pankaj Mathur of InfoUSA a few months back, and he told me that InfoUSA has 14,000,000 different businesses listed in its data, and Google has four or five times this many, almost one business for every five people in the United States. The reason Google's number is so much higher is because it gets its data by crawling all the web, whereas. InfoUSA does work to verify and confirm every listing it has. Chris Silver Smith: There are a couple of reasons why InfoUSA might show fewer listings. One of them could be that Google has a variety of different information sources and is having problems collapsing those listings together. The problem is that Google is taking information from sources that are not as high quality as InfoUSA, which is one of the very few business listing aggregators that actually calls every single business in its directory once a year to verify its information. One problem is that businesses fail all the time, and they are not real good about notifying directories that they are no longer in business. Those dead, old listings get left in directories everywhere because there are not any good processes to get them deleted comprehensively. Unfortunately for Google, this is one case where "having more" search results is actually an indication of inferior quality. Eric Enge: We have established that if a listing aggregator isn't managing and monitoring its data, it is eventually going to go wrong, so what's the best way to then clean it up? Chris Silver Smith: There are a few different ways, and unfortunately there is no universal way that is going to work for everyone. One of the best ways is to try to clean it up with a main data aggregator such as InfoUSA or Axciom. They provide their data to many of the most important places, including Google, Superpages, and many other directories. If the business goes to those main data aggregators and tries to get its listing information updated, then that's great, but it can be challenging because those companies are not set up to deal with a lot of small businesses. They rely on getting information back from some of their data partners, including superpages.com, yellowpages.com and Google. The best thing to do is have a shotgun approach, where they make a list of all the top directories and then go and check their information in each directory periodically to see how they are showing up. I've had clients who were not very careful about this, who would check only one listing that looked right and was ranking well within that directory. If they had looked more closely, however, they would have found a handful of other listings associated with it that were also showing up. One of the ways to solve that problem would be to go in and search by phone number if reverse search is offered. They have to ask themselves if there is possibly some other listing showing up under their phone number. I even found one egregious case where one of my client's listings had been hijacked by a competitor who added their own URL into the listing in an attempt to steal their referrals! Eric Enge: Right, they are using the phone number to hijack the listing. Chris Silver Smith: Yes, exactly. Eric Enge: That gets back to what you said before about how most people assume a company's phone number is the item least likely to change. Chris Silver Smith: Right. Eric Enge: Right. So, working with InfoUSA, Acxiom, and also working directly with the major yellow pages sites as well, would be the basic recipe. Chris Silver Smith: That's correct. All of those data sources are really good places to search for local listings, and I have seen variations on how they've operated over time. Localeze also could be a very good partner if a business doesn't have a lot of time and it wants to pay someone to go out and try to get their information updated in all those different locations. They are a little odd in my view because they straddle the line between being an information provider and an advertising publisher for businesses. They are selling on both ends of the equation, the information to business directories and search engines, and advertising to end-users. Eric Enge: Of course companies like InfoUSA and Acxiom are originally direct marketing companies, so they sell lists for people to mail, email, or call. Chris Silver Smith: Yes, that is correct, but they are not really selling advertising to small businesses, at least I don't think that they are. Eric Enge: Can you talk a little bit about links and web references in the context of improving ranking? Chris Silver Smith: Google is using as many ranking factors as possible in an effort to broaden its sources and stop people from exploiting the local search results. Google wants to give fair ranking status to all businesses and to provide high quality information to its end-users. They don't want to be exploited by more simplistic ranking routines, as we've seen in the past, so there is no overemphasis on some ranking factors like inbound links. Inbound links used to be the web references or citations method of choice for Google, and it was the foundation of its PageRank linking algorithm, but it recently broadened to a larger spectrum of different ranking factors that could play into and help influence what should rank best. In terms of ranking businesses, the old ranking methods used by business directories were proximity and alphabetical order, and of course there were people who ranked businesses based on how much money they spent on advertising as well. Now, I am talking a little bit about online yellow pages with the evolution of local search, because online yellow pages were the only sources of local search before local search engines and map-based search engines were developed. Those two ranking criteria, proximity and alphabetical order, were the original dominant ranking methods. Google changed that paradigm and tried to make keyword relevancy the higher ranking factor, which was a real interesting development. They have since broadened beyond mere keyword relevance to also taking other factors like ratings and popularity of a business into account. Popularity is a very vague notion and is very difficult to define or quantify, but Google uses a number of different ranking methods to try and do this as effectively as possible. One of those is to measure how many times a business is referred to by people, so a business that is talked about or referred-to regularly would rank higher than a business that is not. They may be looking at a whole lot of different sources of information for these types of citations, which could include how many times a particular business is mentioned within blog posts, within microblogging platforms like Twitter or Facebook, or within news stories. There have even been some people who claim that Gmail could factor into this, based on how many times businesses are mentioned within emails. I think that could be a really compelling signal for Google to use. When we talk about citations or web references online, there are a number of different types of references. Links used to be the basis of Google's PageRank algorithm, and they continue to be a factor in the ranking of businesses in local search results and of regular web pages in web search. There are additional types of references that Google is now allowing to influence rankings within Google Maps, including how many times a phone number of a business is referred to in all those various sources on web pages, blog postings and news stories. It also possibly considers how many times an identifiable business name or URL is mentioned in relation to a local area. There are many news feeds and newspaper sites that have a policy of not linking, but they will occasionally mention a URL in plain text within a news story. Another factor that influences rankings within Google Maps is how many times the address itself is mentioned in that web space. I mentioned this in an article I wrote on the topic of using reverse search for local search optimization. If there are two different businesses across town from each other, but one is in a more highly popular location, then perhaps that business should be ranked more highly. This is especially applicable in tourist hotspots, like next to Grauman's Chinese Theater in Los Angeles or in a shopping center that has various other businesses within. Those businesses could be cited as more popular because the street address of that shopping center is referred to over and over again in many different media sources. So even though someone might not mention the restaurants by name, for example, it might have a better chance of getting ranked higher because it's in a popular location. If there is a particularly popular shopping district, even mentioning it and associating it with the business could be very worthwhile. There are a handful of different types of web references, and they seem to be getting weight within Google Maps. Eric Enge: There is a fairly strong consensus that these types of web references are a factor in local search. Any reason why they can't use that kind of signal in global, regular search? Chris Silver Smith: They certainly could be, particularly if those local search ranking factors could easily be feeding back into regular web search. If a business is ranking well within local search, why wouldn't the same types of ranking factors feed back out into the regular web keyword search results? Some of those references could be feeding back into the overall keyword rankings for pages. A business's phone number could be associated with its website, and perhaps that could feed into the overall rankings, which are spread out through the pages of that particular business's website. Eric Enge: In simplistic terms, a newspaper article could mention a URL and not implement it as a link. That's a very simple case of a web reference that could easily be associated as a vote by a search engine. Chris Silver Smith: That's right. Eric Enge: Also, one might also have high authority sites that implement links but use Nofollow, which is supposed to restrict link juice from being passed. If by context the search engine knows that that's really a policy and still wants to treat it as an endorsement, it can. Chris Silver Smith: That's exactly right. Many of us in the SEO profession have the sense that even Nofollowed Links might have some level of value. I think that NoFollowed links could be considered by Google, as well as mentions of URLs within text that are not hyperlinked could be considered to some degree. I think that Google probably considers those two types of references at a lower rate, and I think they probably rate links from different sources in varying value levels. Those Nofollowed Links could still feed in and give a fractional amount of PageRank transfer compared to a pure link from the same site. Wikipedia is probably the biggest example of this at this point. If Google didn't pay attention to links within Wikipedia pages, then I don't believe we would see those links and their anchor keywords listed within Google Webmaster Tools, but that's actually what we do see in practice. If someone has a valuable link that they've added to Wikipedia for a valid reason, they could then see that link appearing within the metrics that are shown in Google Webmaster Tools. I don't think Google would be listing these if that wasn't something they are going to use in some way, so I believe they have some influence. Google probably just weights them a little bit less heavily. Eric Enge: Even if the links aren't Nofollowed, Google certainly knows when a link is coming from a blog or from a comment from a user. These are things that they think about all the time. To bring it back around to local search, can you talk a little bit about how social media plays into this situation. How can people use social media to help with their web reference campaign? Chris Silver Smith: The social media services that are out there are the closest thing Google has to word of mouth endorsements, which is the golden standard that Google would ideally like to use as a relative ranking signal. Since they obviously can't hear what we are all saying to each other all the time, the next best thing currently is all the different types of social media. We believe that Google will see these as signals for relative popularity. Google was interested in this with blogs, in terms of "burstiness". They might look very suspiciously at a site that suddenly got thousands of inbound links. If all those inbound links came from credible sources, however, that is considered to be normal bursty behavior for something that just abruptly appeared on the scene and became popular overnight. Social media is a good source for this, and we can see that Google is interested in it because many people believe their recent search engine development, the "Caffeine" test platform, was geared in large part to try to absorb highly bursty messaging from sites like Twitter. They are trying to absorb that content and get the information to feed into the system, not just so those little pieces of information can be available, but also so they can influence the rankings of other pages. Eric Enge: It's great stuff and a great opportunity for people. I think we are still at a stage where there is a lot of bias introduced by the people who use various social media platforms, which are not yet used by a large percentage of the people who use the web. It's a signal, and you can count on them using those types of signals. As those platforms grow and get broader, they are likely to give it more and more weight. Chris Silver Smith: I think that's right. They see a future within all sorts of social media, even if it's a little bit undefined right now. Eric Enge: Thanks Chris! Chris Silver Smith: Thank you Eric! Have comments or want to discuss? You can comment on the Chris Silver Smith interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Thu, 06 Aug 2009 20:02:55 +0200 Published: August 6, 2009
As vice president of product at Oyster Hotel Reviews, Eytan leads product management, planning, design, and usability of the Oyster.com site. Prior to Oyster Hotel Reviews, Eytan was a principal program manager lead on Microsoft's Live Search team (now Bing.com), a position he held for four and a half years. This team was responsible for the vision, planning, and specifications that power the web search software for Live Search. This includes all aspects of Live's core ranking and relevance measurement. Earlier in his career at Microsoft, Eytan worked as a program manager on the Microsoft CRM team, driving the program management of functionality related to the core user experience and integration with back-office applications. Eytan received degrees in computer science and history from the University of Pennsylvania. Interview Transcript Eric Enge: Why did you pursue the idea of a high end hotel review site? Eytan Seidman: The idea was pretty straightforward really. It came out of a personal experience, like many ideas do. Co-founder Elie and I were staying in a hotel in Alaska that we found online, and it turned out to be nothing like we had imagined based on what we read and saw online. It was 2007, and we could not figure out why it was so different. As we dug deeper, we found a gaping hole in the lack of any high quality content in the hotel space. We then decided to hire journalists trained to look critically and objectively at hotels, sleep in their beds, swim in their pools, experience the spa, experience the entire hotel, take 500 to 1,000 photos of the hotel and write a very detailed, very structured, 2,000-word review. We basically just send people out to hotels undercover, and we pay their full cost. We typically narrow the pictures down to 200 to 500, but sometimes we keep more. All of our pages are very well structured. The second piece to our success is having great search on our site, and then also getting indexed by the engines. The search doesn't even require people to specify the location of a hotel, they can literally just search "spa," and see hotels with spas all across the world on an interactive map. We are trying to introduce something very new, and we think it will continue to scale upwards. We've introduced what we think is a new way of searching for and finding hotels, and we think people might be interested in it. Eric Enge: One of the problems with traditional travel review sites is that visitors don't know what's motivating the reviewer who provided the content. If they love the place, they are going to take a lot of pictures of beautiful scenic views, and none around the garbage cans or behind the kitchen. Eytan Seidman: That's correct, yes. The other approach out there for sites likes ours is to use user reviews. We definitely think of user reviews as a compliment to what we do, but the problem lies in exactly what you mentioned earlier. Looking at a user review, the real challenge is trying to figure out if the "user" is in fact a real, unbiased customer, and those biases can come in many different forms. For example, if someone who only stays at $1,000-a-night hotels found a four-star hotel to be a total dump, it can simply be because that person is used to staying in very, very high-end hotels. Eric Enge: One possible solution I've heard suggested for this problem is to find other reviews by the same person to try and uncover and biases or tendencies. Eytan Seidman: That's right, yes. Another thing to note about hotels is that most people don't go to them super frequently. Therefore, in most user-review sites, there are probably about fifty different reviewers who have each been to one or two hotels. We have a handful of reviewers who have each been to 50 or 100 hotels, and they benefit from having been to a lot of hotels and having that perspective. So it isn't impossible to maintain a successful site using only user reviews, but there is the challenge of trying to figure out the viewer's perspective. Eric Enge: How many roaming reporters do you have at this point? Eytan Seidman: We have roughly 11 or 12 reporters. Eric Enge: How long do they typically stay at a hotel? Eytan Seidman: They typically stay at least one night, and very often two nights. Eric Enge: What sort of things do they typically check for? Eytan Seidman: It's a very, very structured approach. They literally go through it with a checklist. In terms of the room, they check for cleanliness, size and service, and they try to do all the things that regular guests would do. For example, we have them call down to the office and ask for some additional towels, and then note how long it takes to get the towels. It's basically a whole set of tests around every single feature of the hotel. Whether it's a pool, a golf course or tennis courts, our reporters will experience and evaluate every feature that the hotel offers. They are also looking to get a sense of whether or not it is good for families by talking to guests and looking at all the features that are offered for families. They will even check if the room has a crib, or if the room is big enough to put a crib in. Going beyond family demographics, they will evaluate the hotel for honeymooners or for those looking for a party scene. The overall feel of the hotel, its ambience and its feel, are evaluated as well. Location too is very important, of course, if the hotel is in a remote area or an urban area. Finally, our reporters will evaluate the hotel's food. If the hotel has a restaurant or restaurants, they will visit all those places. If the hotel is all-inclusive, they sample the food so future guests will know all its best options. Our reporters will also take note of what restaurants and food options are available in the area surrounding the hotel. Eric Enge: One of the important things I think you mentioned earlier is looking at it from multiple perspectives. Obviously your professional reviewer isn't a honeymoon, business and family traveler all at the same time, but at least they can try to take the perspective of those audiences. This is something that a user-generated review on another website would never do. Eytan Seidman: Correct. Eric Enge: You mentioned earlier how the data on all your pages is structured so people know what kind of information they are going to find and how to find it. Can you expand a little bit on that? Eytan Seidman: Looking at a page review, a user will see links to descriptions and evaluations of the scene, service, location, beach, rooms, features, family options, pets, cleanliness and food, among others. So a pet owner could click the pet section and see if pets are allowed, if there is a charge, if pets are allowed to be in the room alone and other things like that. All those sections are well defined and appear for every hotel, so as people navigate though our site, they are always going to see the same sections and options. We hyperlinked our site pretty aggressively to other content as well, including photos and other hotel websites, because it is the best way to provide the most information to our users. In that way, our site models the Wikipedia type model to some extent. Eric Enge: How many hotels have you reviewed so far? Eytan Seidman: We have reviewed about 400. We have 340 reviews on our site right now, and we are going to be adding about 70 or 80 in next week or two. Eric Enge: Do you have information for hotels in New York City now? Eytan Seidman: New York is live, yes, which we did about two weeks ago. Eric Enge: And Las Vegas is next? Eytan Seidman: Las Vegas is next, yes. We'll add about 80 hotels from Las Vegas. Eric Enge: I imagine your plan is based primarily on where the most demand is? Eytan Seidman: Yes, that's right. We look at where there is going to be a lot of demand and high-involvement decisions. Obviously, Las Vegas, New York, Miami and the Caribbean are all very high-demand leisure destinations. Eric Enge: What is your goal in terms of numbers going forward? Eytan Seidman: Our main goal is to just keep growing our base. We are already seeing good traction, and there are many, many hotels in the world. We've only started to scratch the surface. In Las Vegas they have everything from midrange hotels to very high-end hotels, but our goal is to keep increasing our coverage and increasing the number of destinations we cover. Eric Enge: So I imagine the number of professional reviewers you have will scale up too? Eytan Seidman: Correct. Eric Enge: Do you have a plan for revisiting hotels you have already covered? Eytan Seidman: Yes, it is built into our model. We will revisit hotels periodically, but obviously the big thing that will require a revisit is any major renovations. There are already hotels that we've stayed at right after a major renovation, and we'll go revisit those hotels in the future. Eric Enge: Of course new ownership could warrant a revisit as well? Eytan Seidman: New ownership would also, yes. There are a number of things that we look to flag that could result in our revisiting them. Eric Enge: What is you revenue model? Eytan Seidman: We don't make any money today. There are no ads on our site today, but the hotel industry is a very, very large one, and a key revenue model over time will definitely be through advertising. There are a lot of people who would want to advertise against our content. Going on a trip or a vacation, the hotel is probably one of the largest, most important and most complex purchases anyone can make. Hotels are not really a commodity, if you will, so there are a lot of advertisers that want to advertise against that. For example, Jet Blue may want to advertise to people who are looking for hotels in the Caribbean because they fly to those destinations, and tourism bureaus may want to advertise to attract people to their respective destinations. Eric Enge: How was all of this funded, and how are you going to keep funding it going forward? Eytan Seidman: We are funded by Bain Capital Ventures. We raised about $6.4 million from Bain, and we also raised some money from another smaller fund called Accelerator Ventures. In time, we will produce revenue, and that revenue will fund the company. Eric Enge: I saw in an interview you did that it will take $40 million to break even. Is that still your projection? Eytan Seidman: Yes. As you pointed out, on the human side we are going to need more and more reviewers, so that probably will continue to require more and more capital. I believe that the reward at the end of a tunnel in terms of what we can produce is quite large, however. Eric Enge: This is going to be very interesting to watch from a search perspective, as it goes back to directory model of doing things while targeting a very specific niche. Eytan Seidman: That's right, yes. At the end of the day, people want to find great content and they want their search engines to surface the very best content. I think that's been true from day-one. Any motivated person who wants to stay at a hotel in Aruba will want to find out about Aruba and will search for it on Google, Bing and Yahoo looking for the best information they can find. Likewise, Google, Bing and Yahoo are going to want to find that the best stuff and deliver it to their users as well. That has always been a truism, and I think it will be true for a long time to come, so our goal is to be right in the thick of that. For people who are looking for very, very deep content on hotels, we hope to provide what they are looking for. Eric Enge: Who do view as your competition? Eytan Seidman: Interestingly, there is no site that takes our exact approach. One of the reasons we find it so interesting is that we hire professional reporters to go and review hotels, similar to the approach CNET or Consumer Reports. These companies take products and run them through a series of rigorous tests. The hotel Albion is a good example. On the Albion page we have 98 photos, but on many hotel pages we have even more in that. How often do you see 98 or 100 high-quality photos on hotel sites today? Most of the sites out there are using the same content that users see elsewhere on the Internet. Eric Enge: They lack any uniqueness. Eytan Seidman: Yes, exactly. Eric Enge: So uniqueness will certainly play a role in the world of search too. Eytan Seidman: Yes, it definitely should. The searching capability of our site is something that we believe will draw people in. It is quite new and it should be fast from everywhere, as the goal is to bring super high-quality search technology to the travel space. Combine that with really extensive, in-depth content and very good software, and we hope we can become a leader in the field. Eric Enge: I see there are several categories where I can narrow down my search based on price range, hotel rating, location and even if pets are allowed. Eytan Seidman: Exactly, that's right. Every single amenity is hand-collected by our reporters and verified by our staff here. When we say that the hotel allows pets, we've actually gone to great lengths to verify that and ensure it's accurate and up-to-date. We are not just getting a data dump from some unreliable source and pushing it on to consumer. We are actually verifying all the information in every step of our process. Eric Enge: Thanks Eytan! Eytan Seidman: Thank you Eric! Have comments or want to discuss? You can comment on the Eytan Seidman interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Wed, 15 Jul 2009 00:27:35 +0200 Published: July 6, 2009
Peter is a Google's Product Manager for Image Search. Prior to Google Peter was at Ask.com working on Search Quality. Prior to Ask Peter worked as an engineer at Oracle where he enabled regular expression support in SQL and developed a migration solution to convert databases to Unicode, both equally challenging and rewarding projects. Peter also worked with Jonathan Gennick and came up with a nifty book on Oracle Regular Expressions. Interview Transcript Eric Enge: What are some of the basic on page things that you can do to optimize your image search results? Peter Linsley: There are a lot of best practices that we can touch upon, but I think I'd like to start off by describing the problem that image search engines in particular have. If you think about most web documents, they are to a certain extent structured data that we can crawl easily. The title and body of the page communicate a lot to the search engine. Think of a Wikipedia article titled "History of United States," which describes very accurately what you are about to read in the body of the page. The other thing that is key with web search is that they have things like backlinks and anchor text where you can get a read on what other people say about something in particular, in this case a Wikipedia page. When it comes to images, those signals are not always available. The crawler will have to look through the HTML, and it will find an image source tag. That's pretty much all it knows about the image except for the alt text, which is something you can put into the image tag. The best thing to do with that is to describe what is in the image as efficiently as possible. This has value in a lot of ways. I was on a local connection the other day, and as I was looking at a page for about 3 or 4 seconds as an image loaded up, and all I could see was the alt text. If you have images turned off or if you are visually impaired, or are using a text browser like the Lynx browser, the alt text is extremely valuable. In addition, if you hover your mouse over the image on most browsers, you can see the alt text. We like it because users can see it and it brings them value. I think a good practice is to use the alt text to describe what you can see in the image, and you can use that same text elsewhere on the page. It could be the title of the image or part of a caption. Getting back to how a crawler looks through a page, we see this image tag and we can see the alt text, but of course it doesn't say everything about the image that it possibly can. In Flickr's case, people will highlight parts of the image, and say something about a particular part of the image, but none of this is structured, and it's not really machine readable. We have to guess and also look at what text we believe can be attributed with that particular image on the HTML page. Another good, simple practice is making the title, description or caption of the image obvious to the user. Hopefully we'll be able to figure out which text is associated with that image from an algorithmic point of view, and rank it accordingly. One other good practice we can talk about is using the file name to label the image as well. If you include an image in multiple places on your site, or if other people happen to be including it on their sites, it's an attribute that's attached to the image wherever it goes. We certainly look at that too. There are a couple of things to note about the filename. One thing that comes to mind is that a lot of operating systems or web servers do not allow you to use other languages that cannot be represented in ASCII. There are operating systems and web servers where you cannot use certain types of text in the filename. So for a search engine to look at the filename and treat it as the best description of the image would be a mistake. There are also some people who can't accurately describe what's in the image with the filename in their own language, and that's one reason why we may or may not consider it a very strong signal as part of the ranking process. Eric Enge: So for the alt text would something like a "picture of Charlie Chaplin dancing in the moonlight" too long? Peter Linsley: No, I wouldn't say that was too long at all, and it's very descriptive. If you can't see the image, you can imagine what the image would be, and that's really the whole point with the alt text, you could consider it a replacement for the cases where you can't see the image. That sounds like a perfect example, it's very descriptive of what you would see in the image if you could see it. That's the sort of thing that I would definitely encourage. Eric Enge: When does it become too long? Peter Linsley: Obviously there is no hard and fast rule. I would just think of how the user would feel about it. I think if it's a very long description of all the details of the image, it's probably something that would be more useful on the webpage itself. A good rule of thumb is just to say, here is what's in the image, and then you can put a title, caption, and description elsewhere on the page. I would treat it like an image title, and if you think about the title of an HTML page, I would give the same sort of treatment. Eric Enge: Along those lines, you mentioned the value of text like captions where your crawl will attempt to correctly associate with the image. Of course, that is general nearby text, but I think as you look a layer up you have just a general context of the page. In my example with Charlie Chaplin dancing in the moonlight, if the page is all about Charlie Chaplin and that's in the title and discussed in the article, then I would imagine all of this helps with the classification of what the image is about. Peter Linsley: Absolutely, there is no doubt about that. It does to the extent to which the image itself is very important to the page, and if the image was no longer on that page, it would lose the a significant amount of its utility. It's a very strong signal obviously, and it means we can start to take the context of the entire page into account. If you have a large image above the fold where you can see enough detail of the site, and you give it a very clear title and the page and image are clearly related to one another, this is all a very strong signal. The key here is to think of the user. When they land on your page, they know at a glance what this page is about and the image fits in as an important component of the page itself. Eric Enge: Let's say you had a webpage with an article with 10 paragraphs and it covers 6 different subtopics all related to one topic. If you have an image related to each subtopic, there is definitely context matched with content on the page, but it's not like the focus of the page. Peter Linsley: Yes. There are a couple of good things you can do here. Let's take something like a blog category page, where you go to My-blog.com/category/san-francisco or something like that, where you are seeing every blog post that happens to be tied with San Francisco on one page, but each post talks about different things. We usually are pretty good at figuring out what one section of the page is about and what image is associated with it. Then we find another section of the page and the content that can be associated with the particular image that it's using. Another good practice is having a permalink for each and every entry. If you think about the blog category page example, in a lot of content management or blog publishing systems, the title of the blogpost itself would typically be a link to a permalink page for that particular article. A good way of looking at it is if you do have links to the particular section that can really help us figure out what the canonical URL is for that image. There is no hard and fast rule. It's something I wouldn't worry about too much if it's intuitive to your end user. The goal for us is to try and figure out how to interpret that, and figure out which content is associated with which image. Eric Enge: One of the interesting challenges you've outlined here is that you can't fully parse what the image is by looking at the image file itself. A lot of what we have talked about relates to developing a level of confidence as to what the image is about, so as your confidence that an image is about a certain topic increases, is that a positive ranking signal? Peter Linsley: Absolutely, yes. There are number of ways we try to figure out what an image is about, what the content is and whether it would match the intent of a search on our site. You've touched on one of the most fundamental problems, which is that machines find it difficult to read an image and know what it is trying to represent. Let's say I was out in San Francisco for the weekend, and I snapped a photograph of a shark jumping over the Golden Gate Bridge or something ridiculous like that. There is not much I have to do to tell the readers of my site what's going on here. I could just have a simple title that says wow, check this out, and then have the image there. The image will speak volumes, but there is nothing available for a search engine from a crawler or a machine point of view to be able to figure out what's going on there unless you actually start to look at the pixels of the image itself. So it is certainly all about our confidence as to how strongly we believe that we've figured out what's going on with the image. Eric Enge: In the HTML world I always refer to this as having two dimensions. One is the relevance, and the other is importance, which relates to signals like links. In the image world you can envision that relevance and importance would certainly still be factors, but now you have the additional factor of confidence.It's a slightly different model, and because there are so many signals in the HTML world, that confidence is usually pretty high. Peter Linsley: That's right. People linking to your page, that's a vote for that page from an external source, but people don't typically link directly to your image, so it's really up to us to figure out when those signals are talking about the image and when they are not, and factor that into the algorithms. Eric Enge: How does the importance of the webpage influence the ranking of images and image search on the webpage? Peter Linsley: It certainly is a signal that we use. PageRank is one of many signals that we consider, where people are just generally interested in that page, what it is talking about and how much of an authority it is. The value of a webpage can speak volumes about the images that it includes, so when we talk about image search from an SEO perspective, one of the best things we can say is all of the rules for web search apply. The goal here is to create a site that has a lot of unique and compelling content. The extent to which you are the authority on a particular subject and can talk about it in a compelling manner, will certainly start to provide a lot of signals that benefit the image as well. Eric Enge: So you have a page with really great content that has been rewarded with links from all over the web and has other positive external signals associated with it, and the benefit of those signals accrues to the images on the page. Peter Linsley: It certainly is part of the equation, yes. Eric Enge: If you have a page with one, dominant image on it above the fold, I imagine that that gets a bit more attention then when you have 25 images scattered around the page. Peter Linsley: Our general belief is that we want to return pages that are going to be useful to the user. That's not to say that we are analyzing to see if images are above the fold and so on and so forth. It's more to focus on things that provide a compelling and interesting experience for our users, and hopefully a lot of the signals that we look at will start to point in the direction of this particular page by virtue of doing this.If you have an image-centric site, let's say you're a photo blogger or you run your own little stock photography site, it's a much better practice to bring the users' attention to that image immediately as they land on the page. Because we believe it will be a good user experience, we are more than happy to send our users to those kinds of pages. It's more about focusing on the user experience. We try to make sure that we return images for each query that are the best possible images out there. You can imagine there are plenty of pages that have many images on the page and are perfectly relevant to show for certain queries. Eric Enge: If you have a page where the external links and the page title all match up and appear to be centered around on one picture, that's a lot of on-page focus. Peter Linsley: Definitely, and it's a very good user experience to boot. Eric Enge: Speaking of ways to get more data on what images are about, one of the more interesting things is the image labeler game. Can you talk about that a little bit? Peter Linsley: The inventor of the concept is Luis von Ahn, and there is a video of him discussing it that is really interesting to watch. The basic idea here is that there isn't a whole lot of structured data around images on the web. For a lot of people, it's a pain to label images. If you think about your own photo collection, I know I have tens of thousands of pictures in my photo collection, and I simply don't have the time to go about describing them. This obviously makes it very hard to search for images that I took 3 or 4 years ago. The basic concept here is to label images in a more effective manner. The idea is to present participants with an image and to also pair you up with somebody else online. You are not aware of each other, you are both looking at the same image and you have to start typing what you see essentially. Let's say you and I were playing this game and we saw a picture of a shark jumping over the Golden Gate Bridge, we'd start to type in things like Golden Gate Bridge and shark and so on. The better your tags match up the more points you get. You can imagine the net result here is you end up getting a lot of relevant tags, because the chances of us matching on something like Golden Gate Park if it is not relevant to the image are very, very low. If you could do this in a very scheduled manner and get a whole lot of images tagged this way, it's something you can imagine would be very useful for the search engine. The game is out there, and it's a whole lot of fun. If you haven't tried it, give it a shot. Without getting too much into the specifics, I guess I can say that the data we've got from this has been very interesting, and it's taught us a lot about how we can improve our search quality and results for our end users. Eric Enge: Just to elaborate on the game a little bit more, basically the closer the words that you associate with a picture are to someone else's description, the more points you get, correct? Peter Linsley: That's right. If you look at the leader board, you'd be amazed by the number of points that some people have accumulated over the years. Eric Enge: You also make a special note of the people who earn the most points in a day as well, right? Peter Linsley: I believe that's presented on the site, yes. Eric Enge: The other interesting thing is that you get paired with a different person for each image, so you don't develop this symbiotic pattern. Peter Linsley: That's right, yes. Part of the goal is to make sure you are getting a fair sample, so to speak. The other part of the game is that you can't communicate to this other player by making up tags. You'll only see the tags when you both match up, so it's a really interesting concept. It's really interesting to watch the original video of Luis von Ahn as he demonstrates it. Eric Enge: You said it has also taught you a lot about the world of images and how to evaluate that data, so in your mind can you say that it's had a direct positive impact on image search quality? Peter Linsley: Without getting into specifics, it has taught us a lot and given us a lot of useful data. Eric Enge: A couple of years back we witnessed the advent of universal search, which was a big thing. One of the things that's been very prominent with that is images being served up in the regular search results. Over the years I have also seen that the amount of that integration has increased. You used to be able to type in something photography related, and you would occasionally get images, but now you get them much more frequently. You can also get images where you just infer what people really mean if they want a picture, a photo or an image. How has this impacted image search from a couple of different perspectives. - One is has there been a huge increase in people getting image results that they acted on? The other is, has it done a lot for traffic at original http://images.google.com? Peter Linsley: Just to give a little bit of background on the motivation there, while we do have an image search property, we found that there are a number of queries being received on web that either had direct image intent, much like the ones you described, or they would be best answered by an image. You can think of various examples for this, like we might infer somehow that somebody typing in Empire State Building is purely interested in seeing what it looks like, and maybe they wanted to visit a site that had a lot of pictures related to their search topic as opposed to a site that wasn't very picture or image rich. The idea is that if we believe an image would be a good answer for a particular query on the web, then we will just show them images. Image search provides the results for a universal search when images are shown, and I think it's fair enough to say that it really does provide a lot of exposure, given that it is shown for a significant number of queries. A lot of people are maybe not aware that we have image search, even though it's shown in the tabs across the top. It's second to web search in size, which makes it a really huge property, so it's a really good way of exposing that Google is searching content from all sorts of different verticals across the web. Certainly by virtue of showing this content we have provided the user with more options, and increased the chance they will find what they want quickly. Users are given two options when they see those units popping up, one of which is to click directly on a thumbnail and then get taken directly to the site that contains that image. Or they can click on a little link and choose to do the same query on image search. That then gives the user an option to focus on a particular image if they chose to do so. So there are certainly cases where it's bringing traffic into images.google.com, and hopefully we are satisfying our user's intent. If they end up figuring out their question could be answered directly through images. google.com, then that is fine too. The other case, of course, is queries where there are images that answer the question directly. This could be a query like "picture of mount rushmore". Eric Enge: I understand that the results that you show in universal search aren't necessarily the ones that you show in an image search? Peter Linsley: That's true, that can happen. We believe the expectations are ever so slightly different between somebody doing a query on a web search and somebody doing an image search. When they are performing an image search at images.google.com it is clearly their intent to get images in response, which is not the case with web search. But there are other aspects of how their intent may differ based on the web property they use to perform the search. Eric Enge: At this point you probably have significantly more images served up in web search than in image search. b>Peter Linsley: Well, I think they offer two very different experiences. One thing we found is that for a lot of queries on image search people like to see a lot of images. Another thing is that a lot of queries are very subjective in nature. You might do a query for something like waterfalls, and you have in mind the kind of waterfalls that you want to see or the kind of site that you want to navigate to, but it's very difficult for a search engine to know exactly what you are after ahead of time, which is essentially the goal of web search of course. So there are cases where 3 or 4 images just don't cut it, but universal search offers you that ability to dive into that image-centric experience, where you can jump right into the property and you can page through hundreds and hundreds of images. People can consume image snippets at a much faster rate than web results, where you have to click through and evaluate each site and its content, so I think they complement each other very well. There are queries on web search where we believe users might be interested in seeing images as the answer to their question, but it also offers the ability to dive in and have this very image-centric experience in the property. Eric Enge: Now what about the bane of all search, which is spam? What sort of issues do you face with spam in image search? Peter Linsley: For the most part, image search can inherit quite a lot of the work from the web spam team at Google, who do an incredible job of identifying pages that are not really in the best interest of our users, and taking appropriate actions. That's something that we inherit directly at image search. We know we are associated with a particular webpage, because that's where we take you after you click on the result. For the most part, I think we inherit that, meaning the best practices that you can read on the website and talk about on the web all pretty much apply to image search. The other thing to add to this is that it's very easy for the average consumer to go and get a nice camera and take unique content, so it's quite easy for you to go out and create some unique content. We believe that if your motivation is to get traffic then just put out some unique content, then it shouldn't be that difficult. Eric Enge: Let's talk a little bit about things that are coming in the future. One of the obvious interesting things is scanning and extracting information directly from the image; and you can talk about facial recognition software or optical character recognition software, and various kinds of tactics. But I want to get your view on what's exciting in that area and what kind of things you think could happen? Peter Linsley: Image search is certainly a really interesting property, and it's growing very rapidly. But, more importantly, so is the world of online images, especially as it becomes easier for your average user to take photographs. A digital SLR camera would have cost me thousands of dollars several years ago, but now it is much, much cheaper, so it's becoming much easier for your average user to get hold of cameras that take really high quality images. The cameras that are in cell phones are becoming better and better, which translates into a whole lot of really nice, unique online content, and it is our goal to organize and present that content to users when we believe it's the best possible image for that query.This is just absolutely exploding. Not too far in the future, people will be able to take a photograph and their wireless SD card will simply replace upload their photos directly to the web and publish them instantaneously. Pretty much every image you take is unique. You can imagine the world of images is just absolutely exploding right now, and right now we look at the hundreds of billions of images out there that we are trying to organize and index. You can imagine it becoming trillions of images in the not too distant future. Then the question of course will be how can we organize all of this content. So a lot of our focus is on where we think this industry is going, and where we think the area is going. Certainly you can imagine the amount of effort it takes to write up a nice webpage and put tags, a title and all your alt text on it, but most users are lazy and they just don't really have time to do this. We are very, very interested and excited about the future of computer vision, visual search as we call it, which involves looking at the pixels of the image and trying to figure out what's going on, and trying to associate that with the user's intent. It is definitely an area we are very interested in. On the flip side, you can also see a slightly broken paradigm. If somebody has an intent to see an image, why does the intent have to start with a text query in a query box? Why couldn't they just describe the image they are looking for in other, more visual ways? This is another area that we are very excited about, and you can see the initial fruits of this labor in the Similar Images launch, which was launched last month. This allows you to look at search results and explore particular genres of images in more depth. You can do a query like Paris, and you could imagine a good search engine might show you an image of Paris Hilton, Paris, Texas, and a picture of the Eiffel Tower. With Similar Images you can dive into a particular image, and take along an image as a supplement query with your text query. This will allow you to be able to dive into that space and see images that are very similar to the Eiffel Tower image that you just clicked on. Eric Enge: So you are using the image at that point as the next search query? Peter Linsley: That's right, that's exactly how we look at it. We think it's a very exciting space and we think it will help answer a lot of our user' questions when they just can't quite figure out how to describe the image they are looking for. Plus, certainly the area of computer vision is something we are very interested in, and we think it's going to be able to provide answers to the numerous questions that our users have as they arrive on our site. Eric Enge: I've heard that facial recognition software is already in use. Peter Linsley: If you are a Picasa user, it is possible that you have already used this feature. It's a very interesting way of tagging your images. Say I have photographs of my family members, Picasa will come back and tell me who this person is and we'll go and tag all of my images of this particular person. It's really cool technology, and I am sure you can imagine we'd be very interested in how this technology could be used in image search itself. Eric Enge: Somebody might type in "thomas jefferson headshot" as a query. You would want to be able to distinguish between full body pictures and headshots, right? So I think that is an example of a pretty specific thing that is of interest. Peter Linsley: One of the other things we launched relatively recently was the ability to filter your results down to images that just contain faces. That's made available in the dropdown page, so it's something that we are already doing today. I think you can imagine a whole lot of extra similar filters being useful to end users as well. Eric Enge: Is there anything else you would like to add? Peter Linsley: One topic that I am personally interested in is the area of outreach, and we are really interested in hearing more from webmasters about some of the issues they've perceived with image search and how we can collaborate with other search engines to try and help resolve some of these issues. I could think of a lot of examples in the web world where representatives from Google and other search engines have been up at search conferences, and they listened to the audience and helped them resolve problems that they may have had. Most recently you can think of the canonical link tag , which is allowing webmasters to tell Google that this is the one URL we want you to index and treat as the canonical. The Sitemap is another good example to help the webmaster tell search engines more about their content and how to best crawl and index it and so on and so forth, so I am really interested in hearing more from webmasters who have image-centric sites or images on their sites. There could be various ways that they think they can help us improve their ability to get images indexed and ranked and improve our end user experience at the same time. I am really excited to get more involved with the webmaster community as we go forward. We will be doing a lot more outreach from the image search team and just listening and trying to make it a win-win situation for everybody. Eric Enge: What is the best way for someone who has an image search question to get in touch with the right person on the image search team? Peter Linsley: We have the Google Web Search Forums, which are monitored. Various members of the image search team drop by, and we pretty much follow and respond to every image search related question, so I would suggest the so I would suggest posting your questions there. Eric Enge: Thanks Peter! Peter Linsley: Thank you, Eric! Have comments or want to discuss? You can comment on the Peter Linsley interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Tue, 16 Jun 2009 15:19:53 +0200 Published: June 15, 2009
Dennis R. Mortensen is a pioneer and expert in the Analytics industry. He is an accredited Associate Web Analytics Instructor at the University of British Columbia, the Author of data driven insights with Yahoo! Web Analytics, and a frequent speaker on the subject of analytics and online marketing. Mortensen is an Entrepreneur and was the COO of IndexTools until it was acquired by Yahoo! Inc., in May 2008. Today he is the Director of Data Insights at Yahoo! and sits on the Board of Directors at the Web Analytics Association, and he maintains the highly popular analytics blog, VisualRevenue.com/blog. Interview Transcript Eric Enge: Can you start with an overview of what has happened to IndexTools / Yahoo! Web Analytics is since the acquisition of IndexTools last year? Dennis Mortensen: We were acquired on May 9th, 2008, and we are now one year into the integration project. We did a first simple update five months after the acquisition, which was renaming and re-branding the tool to Yahoo! Web Analytics plus a number of minor adjustments. This including a couple of new features that was supposed to come out in the upcoming IndexTools version that was on the verge of being launched anyway. We then spent the last 7 months retooling this from the bottom up, meaning that we are now extremely close to being able to disconnect and shutdown the IndexTools data centers and move all legacy clients over to Yahoo. We are moving forward on having a completely different level of scalability, both from a vertical and horizontal point of view. From a vertical point of view, we are now able to take on very large clients. From a horizontal point of view, we can now take on a lot of clients. Not just having 5,000 clients, but knowing how to handle 50,000 or 200,000 clients is what aspire to. At the same time, with Yahoo! Web Analytics version 9.5, which just came out on April 28, we managed to include a number of very interesting points on the front-end, including demographic dimensions. This is not just demographic reports, but actually true dimensions within the analytics engine. This is also true for the new psychographic dimensions we included. We also introduced new visualization capabilities from a tool that we previously called Rubix, which IndexTools had been developing on. We've moved that into Yahoo! Web Analytics now. We also came out with new negative segmentation opportunities. For example, you can now look at people who did not buy, or people who bought but paid less than $100. This is the reverse attitude of the segmentation process that you usually do. Finally, we are rolling out the new version of the tracking code, which is version 5. This version is now served off the Yahoo infrastructure, and of course it has a completely different level of stability. All in all, I think we have around 30 smaller updates. We also increased the number of actions (or goals as it is called in GA), to 50. We also included 38 custom fields for everybody, not just the selected enterprise segment, so now this is not a professional services request; it is something that's included into the tool. We also actually managed to re-brand it into purple. :-) Eric Enge: What's the use of the tool if it's not in purple? Dennis Mortensen: You and I know that it does not mean anything, but funnily enough this is something that people notice, and it's a very good indication of us truly moving from IndexTools to Yahoo! Web Analytics. Most people seem to believe (and fairly so) that it is accurately handed over to Yahoo now. Eric Enge: I am sure that's a goal you personally have been pursuing with some passion. Dennis Mortensen: Very much so. I think the finest task of entrepreneurs who have been working on a startup is to make sure that they actually end up handing over the keys to the acquirer so they can run the machine. I have personally been working very hard in getting us to the point where we could say we did the job and we did it very well. Eric Enge: What are your thoughts on the impact this product will have on your competition in the analytics space? Dennis Mortensen: Before I answer that directly, let me try to answer it a tad indirectly first. I believe it's almost naive to think that any media company can be in business without providing some sort of data back to its customers. You get reporting back if you buy a newspaper, TV or radio ad, so companies like Yahoo, Google and Microsoft need to do that as media companies as well. What happens at the same time of course is that you have a set of companies providing independent data collection and independent reporting; such as Coremetrics and Omniture. We need to look at it from those two viewpoints. So, coming back to your question; in regards to the media companies, I don't really think that it means too much. I don't believe that people will choose Yahoo! Media over Google Media based upon the type of analytics that we provide. They will just expect us to provide an average or an above average level of reporting and/or analytics. It's important that people understand that even though we collect and report honestly, you can present data in any number of ways, still being honest, but still favoring towards your own business. Look at a simple question such as attribution. If I spend $100,000 on search advertising and use Google Analytics, search will probably look more successful than it really is by the pure fact that they tend to report on last click attribution, and this is especially true if I've compared that to display campaigns. If you look at Yahoo! Web Analytics, we provide multiple attribution models, including the original referrer. The original model is typically the one that comes in from display, which is media that Yahoo is very good at. I am painting a different picture of the same dataset, and I think you'll see the different media companies do that to some extent. So to conclude, that's one part, where you'll see us compete, namely on the type of insights that we provide to our customers. On the other side, I think you will see independent vendors really have to expand on their products, because Yahoo! Web Analytics and Google Analytics are getting more feature rich. The technology is simply becoming better and better, but that said, I would like to confirm that we are not out to compete against or replace The Coremetrics' or the Omnitures. We are simply there to provide the best possible insight to our customers. That might happen to be somebody who is using Omniture and might choose to use our product instead, but that's not really the goal. We just want him to be enlightened. What happens then, and the reason that they have to expand on their product portfolios, is that all of a sudden we will do things with our analytics products that they simply can't do with, such as Yahoo! demographic information. It is something that only we can do, and Omniture or Coremetrics can't do this, because they don't have access to the rich dataset of user behavior on Yahoo! web properties. They need to figure out other ways of being competitive, and I think that's their task. Eric Enge: Let's dive into the demographics features in a little more detail. Dennis Mortensen: Let's just provide a quick example so we know what we are talking about here. Let's say that I am a customer of Web Analytics provider X. The way that works is that I will call up my vendor and he will provide me with a JavaScript data collection tag. I'll put that on my website, and by doing that I will start collecting data. Then it essentially takes data that I have and sends it to my vendor, who, to put it simply, will then reformat it and send it back to me. If I send him 10 data points, he will then send me 10 data points back. What media companies can do is collect the 10 data points, but then add 5 more data points and send 15 data points back. Thus, you also get insight on some information that you didn't have access to before. We've tested this out in the release that came out April 28 with a number of dimensions such as age, gender and interest groups. This way we get an opportunity for everybody who bought this specific product to tell us what they thought the best converting segment was, and you would be able to know that it is females age 25 to 34, interested in politics and so forth. (Editor's note: here is a sample screen shot:)
Eric Enge: How many different kinds of data points are you supplying that type of data on? Dennis Mortensen: We came out with 4 new dimensions, the first of these are gender. Age is second, and as a FYI we provide 5 age groups. The reason for doing this is that we are not just doing this on Yahoo! specific traffic. You can buy an advertising campaign off the New York Times, and we can actually measure against the dimensions of the 10,000 visits that you'll get from that. That's why we have fewer groups, so we are actually able to come back with some decent information. Then we have 16 interest groups, including politics, sports and entertainment, among others. Just as a positive note; we are actually trying to expand that all the way up to 377 interest groups, but it's something we are still working on. Finally, but perhaps not as interestingly, is that we still provide information on what type of Yahoo! properties people visit. You might look at a section on your site and you'll see that you have a large percentage that uses Yahoo! Finance, Yahoo! Sports, Hot Jobs or something like that. That's probably less sexy, but still an interesting marketing data point. Eric Enge: How many of those do you have? Dennis Mortensen: In the hundreds. Eric Enge: What's the basis of how you are assembling the data? Are you doing that by seeing who has visited a website, and then matching them up with data you have because they have visited other Yahoo properties? What are the mechanics of all that? Dennis Mortensen: Without getting too nerdy, it is based upon the fact that we have a huge sample of visitors that we have insight into and who come to one or more of our properties every month. I think the last public number was 550,000,000 unique visitors every month. That is a huge sample. If you just think about this, the guys over at Compete are using a sample of perhaps 2,000,000, and they provide some pretty good information actually. Some of the visitors we will have more information on than others (all anonymously of course), and some of them we won't have any information. That means that I won't have information on all visits, but I will have it on some of them, and some of them are enough for me to provide statistical significant reporting back to you. Just to elaborate a bit on that, we actually provide a very honest way reporting on this, something that we call confidence level, which you can set yourself. First of all, we are very eager to make sure that we keep our high level of privacy, so you can't report on anything with less than five visitors. With our tool you can actually filter down to a single visitor, but if that happens, I won't report on these new dimensions such as age, gender and interest because of privacy. Anything less than five, we don't report on. You might look at 40 visits from within the last hour and I might not have information enough to provide you with the default confidence level. You can then choose to decrease that confidence level within the tool, and that will essentially provide new reporting back to you on the fly. The reason for that is that sometimes you would want to have a very high confidence level. That could be when you choose how to spend the amounts in a budget; because you want to be sure that you are spending it wisely. Other times it might just be a simple non significant redesign where you need to get an indicator of whether you want to go left or right, and you might be happy with the confidence level at 70%. So this is something you can setup in the tool, and you can use it for different purposes. Eric Enge: How is this integrated into the tool itself? Dennis Mortensen: The really cool thing about this is that there is nothing to install once you've been upgraded to the new tracking code. That means that all the clients that we signed up since the acquisition didn't have to do anything, because they were in a new tracking code to begin with. The legacy clients, as I said earlier, have to be moved over now, so within the next couple of weeks they will have the opportunity to move over to the new tracking code as well. If you use that, these data points are attached, not collected, because this is information we have already. They are appended to your dataset without you having to do anything. We will give you a couple of reports right out of the box, including an age report where you simply get the distribution and a split on a gender report. You can also see this on the two dimensions at the same time in a matrix. We also give you an interest report because that's a good way of getting started so you can see what your segments are. That said, the most powerful fact, in my opinion, however, is the opportunity to use these in filters. You might look at sales from yesterday for a specific product, a specific campaign or a specific section of your site. You can choose to filter by these dimensions. If you want to look at the sales for these specific products, you can filter it by gender and then you will get specific information for those products, which might be very different than for other products. So it's not just about giving an overall average for the whole site, it's an opportunity to use this wherever you see fit throughout the tool. People might compare this to what they saw in the now discontinued Microsoft adCenter Analytics, but they provided a couple of reports out of the box. We were very eager to make sure that this was not just about a set of fixed reports, but actually a true opportunity to use those dimensions within the reporting system. Eric Enge: As you've always said, even from the early IndexTools days, you were never interested in helping people do report surfing. The goal was to give them a tool where they could build their own queries and do their own thinking. You've been consistent with that here it sounds like, because the ability to track these metrics in the filters is very interesting. You can see how many sales you've made to people of a certain age over a specific period of time, and you can see if your sales become very centric on a certain age group over that time period, for example. Dennis Mortensen: Exactly. Let's use something simple here. Let's say you have a 2-year old website, with 20,000 visits whose visitors are spilt 50%-50% between men and women. Then you look at your sales, however, and you might figure out that it is fine that you have an even gender split for traffic, but 90% of your sales are coming from females, and that is an insight you can actively use for changes. Not just that you have an even split, but that the sales are coming from one specific demographic. Then you can start to figure out how to deal with that problem specifically. You might even increase spending on some campaigns, and decrease it on some on the others. Eric Enge: Or you can just change the focus of the landing pages. Dennis Mortensen: Indeed! Eric Enge: There are a number of things that you can do with that kind of data, and there are a couple of other things that you mentioned that peaked my interest as well. You talked about negative segmentation. Can you expand on that a little bit? Dennis Mortensen: Usually when people think of segmentation, it's about including things such as people from New York or people who looked at a specific page on the site. That's how we tend to think of doing segmentation. What we've included now is what I call negative segmentation. Making sure that you do everything but this defined thing; I might for example like to see people who did not read about shipping details. I might want to see people who didn't come from New York, where we have an offline store. Negative segmentation is a powerful new way of trying to get insight on your visitors. Eric Enge: I can imagine there are an awful lot of things you can do with that. You can certainly combine negative segmentation with positive segmentation, for example. Dennis Mortensen: Exactly. You can choose to look at people who bought something, but didn't look at the shipping details. Eric Enge: If you see that 70% of your abandon rate happens after people look at your shipping charges, you can decrease the price of shipping, but you can only do that by doing some of these comparisons. Dennis Mortensen: Agree. Eric Enge: Can you talk a little bit about the new path analysis details as well? Dennis Mortensen: As you know, there is a huge debate in our little analytics community on the value of path analysis in general. Some people say that it's completely bogus, some people say it provide decent insight, and then we see some people simply use it to figure out how people navigate, where people drop off and essentially to get insight into how people go around their site. I am not advocating path analysis, but I'll tell you what the new edition includes. What we had before was an opportunity to see two levels deep, and only ten steps on each level. What I am talking about here, when we say path analysis is for example, that when you can look at the last five steps in an e-commerce funnel, and before people move from checkout to payment, you have a drop off of 60%. Since 60% obviously didn't go down the funnel, they clearly went somewhere else. You want to investigate where they went, and for that you can drill into the path. First of all, the new path analysis is a Flash Application. So you are thrown into a Flash app within our UI, which you can actually expand into full screen and really work the path analysis. You can now choose not just to see one or two steps, you can drill into endless steps. Let me expand on this a little bit. Say I want to look at the 30 pages that my visitors go to on my website, I can actually now work the path analysis much more visually, because you can move items around. If you imagine an elastic spider web, that's kind of how it looks and that's how you can navigate around it. Eric Enge: There is a lot of flexibility in how you route it. Dennis Mortensen: Yes. You are not limited to two levels anymore. This is very flexible and unlimited levels now, plus you can expand it and work at full screen, so all your work canvas is the path analysis. Eric Enge: Can you speak a bit about the changes in visualization techniques? Dennis Mortensen: Most people will agree with me that most analytics applications are set up in essentially the same way. In the upper left corner you have a calendar, below that you have a menu with a number of reports that you could choose. On the right-hand side you have a chart, below that you have a set of row-based data. That's how you build up an analytics application, and that's all good and fine, but what we are trying to do with this new version is improve the chart itself. Let's say you look at visits for the month of May, you'll have that as a bar chart, and you'll have visits on the y-axis and the dates on the x-axis. You'll also have a number of bars illustrating how many visits you have for every day. That's fine, and it used to be static and illustrated very nicely, but we've turned that into an active component now. That means you can expand on it and chose how you want to visualize it. You can turn it into a trend chart, a bubble chart or into a bar chart and work it that way. That is not the powerful thing here though, the powerful thing is that you can choose to work with multiple metrics at the same time in the chart. I might customize the reports, so I can have visits, time spent on the site, bounce rate, average value per order, and a number of other metrics as well. What I can now do is choose how I want those to be visualized in the chart. I might want to visualize two or three of them at the same time, trying to spot a potential correlation. I might want to use bar charts, but I want to color them based on the bounce rate. So if there is anything out of the ordinary, I will be able to tell from the chart itself. We even expanded it to the point where people are actually able to apply notes to elements on the chart. You can start writing notes on the chart, that's how flexible we've made it, and when you are done it works just like it did before. I think we all agree that we can have data that might provide insight, but I can't visually see it because I have too many numbers to process in my head. I can keep drilling down until I get it, but if you have the right visualization opportunities, sometimes you can very quickly gauge where you have issues and then you can start working on them. Visualization is not all about coolness, I think it's actually a pretty essential element to have. Most people have charting of course, but I think we tried to take it just one step further. Not magic yet, but just one step further. Eric Enge: Is the tool going to be available on an open basis to non-Yahoo advertisers? Dennis Mortensen: I believe there is an understanding in the market that we are supposed to become a free for all tool like Google Analytics, and thus creating a head-on competition. There is nothing more fun than a competition, but that doesn't mean that it's the right thing to do however. At the current time we have no intentions of turning this into a free for all solution. Don't expect us to put up a webpage where you signup, you get a JavaScript and then you can go track your website. That's not really what we are shooting for. We are looking for opportunities to connect with advertisers and publishers in the sophisticated end of the market. That means we'll probably not ever get to the million accounts, or however many accounts Google Analytics has, but that's not really our target anyway. If we can just capture the head- and torso end of the market, we've pretty much solved what we set out to solve. That said, it doesn't mean that you necessarily have to be a Yahoo! customer in the long run. Right now you need to be affiliated with Yahoo! in some way, shape or form in order to get access to an account, but this is something that we are expanding upon all the time. Eric Enge: What will be coming down the pike next? Dennis Mortensen: That is a very good question, but it's something I would have had an easier time responding to when we weren't Yahoo!. Now, we have legions of communication and legal folks that make it more difficult, but I think one thing that I can safely say is that you will see us keep appending the data with more and more information simply because of the power of combining a data company with a media company. I think that's a safe bet. Another safe bet is the idea of combining post-click and pre-click data information, so you can see what people were doing before they arrived at your website. Just to give you an example if it doesn't make sense. Some people are actually searching for you or for your products and you will be in the SERP results, perhaps not on page-1, but you will be in the results. I'll be able to tell you information on their intent before they even visited your site, such as search phrases that they did not click on. Eric Enge: Right. Like a search funnel. Dennis Mortensen: That could be one thing, yes. Sometimes people search with seven different phrases. They end up with the last one, and that's the one they click on, but you have no insight into what happened before that. I think what is happening before that story is something we might want to move into. Eric Enge: Thanks Dennis! Dennis Mortensen: You are most welcome Eric! Have comments or want to discuss? You can comment on the Dennis Mortensen interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Mon, 08 Jun 2009 15:45:44 +0200 Published: June 8, 2009
Richard Zwicky has been involved in search marketing for 10 years, starting in the late 1990s. He started in the industry by managing the online campaigns for his own successful e-tail operation, which quickly led to developing Metamend, a leading search engine optimization firm which he co-founded in 2000. As CEO for Metamend, he managed and led the optimization campaigns for web properties ranging from SOHOs to Fortune 500 sites. He split Metamend and Enquisite into separate companies in 2006, as Enquisite's services are designed for use by any SEO and SEM. Today, he is leading Enquisite, which recently released its first products. Richard's work is focused around helping search marketers manage campaigns more easily and with greater success. Richard believes in long-term successful campaigns that are built from the ground up, and are never caught flat-footed by shifts in search engine algorithms or by regional variances in search user behavior. Interview Transcript Eric Enge: Can you tell us a little bit about Enquisite Optimizer? Richard Zwicky: Enquisite Optimizer is built from the ground up for search marketers. I used to run a lot of campaigns and it was always frustrating and time-consuming to get the right data out of existing web analytics. It was even impossible a lot of the times because, quite honestly, the focus in most analytics products isn't on the organic search marketing campaigns. They are focused a lot more on paid search. The legacy of analytics products goes back to the days when IT needed data about page load times and information like that, and they have continued to be built on that foundation. We came at it from a completely different angle, and developed a new way of collecting, processing and reporting the data to help the search marketer understand that they needed to do the job more efficiently and deliver higher value to their customers from a variety of perspectives. Our long tail analysis was one great example of that. People like Rand Fishkin were always looking for better data about the long tail of the campaign they are running. They want to know how to visualize it, understand it and understand its shape because a good site has a very standardized traffic shape and pattern for its tail's traffic, whereas tails from sites that are less well-optimized don't follow standard form. But you can't see it unless you visualize it, you can't understand it or know how to deal with it. So we built that element of the report incorporating the long-tailed graph and you've got the choice of long tail or amounts of overlays, so now you can visualize where your traffic is coming from as well. But we built it using the logic behind how you would run a campaign. Of course, different businesses have different needs. For example, a retailer that sells only within the US will not care about search traffic coming from anywhere else. They want to understand what is coming from the US. You should be able to analyze that and turn that data into action. The application is built so that you can segment and break out your traffic by the logic with which you actually went into business. You can segment geographically down to the zip code level. As another example, a standard analytics package will give you which phrases bring users to your site and which search engine sent those users. We make it possible for you to segment your visitors by which webpage they landed on, from what geography and from a variety of other parameters. So you really get what you want the way you want it. If you are am SEO firm, you might want to target the word organic and find out all the different the different ways that people are using organic to arrive at your website (long tail segmentation). Show me all the strings that include the word organic, show me what that tail looks like or show me just everything that includes the term SEO. Eric Enge: So you can specify a stem like you can in Wordtracker or KeywordDiscovery fashion? Richard Zwicky: Exactly, but it also provides analysis as well. Simply because you are getting traffic doesn't mean that it is good traffic. One of the other challenges that marketers have is the ability to see what the traffic really looks like and to understand what part of that traffic is actually meaningful and relevant. If you are a retailer, you would care about conversion; if you are a publisher, you would care about page views and time on site; Enquisite Optimizer discovers and reports on what is optimal on a site-by-site basis. It compares all of your referral traffic to identify optimal patterns in terms of the user behavior and which traffic has the highest potential and which one has the lowest. Just because you are getting a lot of traffic for a term, doesn't mean it is actually ever going to result in conversion. There is a mathematical process where you can show what the normal traffic for conversions looks like and what other traffic matches up. You can actually target the right terms for the right pages and do better job of shaping your traffic. This saves you from trying do it yourself with trial and error, which might take you months. But with our system, it will show you that information and within a couple of days you will start seeing patterns of what is normal and what is optimal. Then all of the sudden it's helping you make those decision so you can get on with optimizing and building your campaign out as opposed to sitting there and trying to figure out what to do next. Eric Enge: You mentioned a little earlier that the shape of the long tail curve behaves differently for sites that are less well-optimized. How is it different? Richard Zwicky: It's actually quite interesting. Normally, there are a very few search terms that bring large amounts of traffic, and there are a much larger number of terms that bring in relatively smaller amounts of traffic (each). As it turns out, the cumulative value of all the low volume terms is about 70 to 75% of your total traffic. In other words, the number of smaller traffic terms is so large that they cumulatively deliver more traffic than you high volume terms. But the tail in a poorly optimized site is constructed slightly differently, with much more of the search volume going to the high volume terms, and with a much smaller tail. If you target a term like blue shoes, you need to understand all the variations of the ways it may come up. You want to be able to capture things like blue tennis shoe, blue running shoes, blue canvas shoes, blue leather shoes, blue suede shoes. That is part of your site's referral tail that a lot of times people don't optimize properly against, but as soon as you start recognizing these terms, you are not just getting those variations but you are getting blue canvas, blue canvas deck shoes, which starts building it out more and more. And that's what you see in a well-optimized campaign, you see that rich variation of terms also focused on certain themes, but all pointing back to the same core term that you want to capture. As people start building, constructing longer and longer search queries, they are getting more and more definite about what they are looking for. And the reason they are doing that is because they are highly motivated, they are looking for what they want, and they want to get on with it, they want to purchase it, they want the information about it and they want to act on it. And when your tail is properly constructed, you are capturing all those variations through the optimization of your site and you are actually able to see it reflected in the tail. Eric Enge: So in a well-optimized site, you might have 70% of your traffic coming from a long tail, but on a poorly-optimized one, it might be the opposite. Richard Zwicky: Yes, that's a good way to put it. Eric Enge: Being able to visualize your own long tail is huge because a lot depends on if the site is optimized properly or not. Richard Zwicky: There are still opportunities to grow and improve on any site's long tail, even though if it is already well optimized. Eric Enge: The way you collect data is through JavaScript on the publisher site? Richard Zwicky: Yes, we provide every website operator unique JavaScript for their site. They put it everywhere in their site, not just on particular pages. And the reason for that is we provide the user behavior analyses to help give them more information. What is nice about it is two-fold. The JavaScript is actually served off with the Akamai network, so instead of having to log all the data to one central point we can use the nearest server that Akamai has, which makes it very responsive and very robust. This generally provides a load time of 12 milliseconds or less for anybody on a broadband connection anywhere on earth. And this means that we don't miss data even if people start loading a webpage and click the first link to move on, prior to the page fully loading. Because we have already captured the log information on their behaviors, we can report on it, and add value to our clients. This also means that we don't have to go to sampling to fill in any gaps, which is a critical issue with some analytics. So we know that when we are reporting we know it's accurate and comprehensive data we saw and that the customer would actually receive as opposed to any hypothesizing or extrapolating to complete a picture. This also means that you don't have to worry about whether or not 10% of the data is missing. That 10% can be crucial. In this case, there is nothing missing. The only thing that would cause data to be missing is if the visiting user has disable Javascript in their browser. The other advantage we have is that we provide a single JavaScript tag for everywhere in your site. So when you want to analyze outcomes, conversions, actions or anything like that, you don't have to modify the script on a page-by-page basis. You just can figure it out once, and you can specify what events you want to track and then go backward and look through all the data. You actually collect everything with that one JavaScript tag. If you have had the tag in place for a year, and then you realize you want to do a new analysis of the data over the past year, you can do that. This is not easily doable in many web analytics packages. Additionally, the JavaScript tracks sessions across multiple visits so you are able to understand attribution over time, not just attribute all your sales to the last click. If somebody came back twenty times and finally made a purchase, you can see how they first got there and when they came back the second, third and nineteenth times. This way you can actually understand how all of your online marketing efforts start fitting together, and that's incredibly valuable. Eric Enge: Let's talk a little bit about Enquisite Campaign. Richard Zwicky: Thanks! We are getting phenomenal feedback and response from the people who have been using Enquisite Campaign as beta testers in the lead up to launch, which occurred on May 19. In the search marketing industry, by which I mean the paid side of search, the market has a very straightforward business model that everybody can understand. That is that you invest a certain amount in paid search and your agency gets compensated a percentage of that amount for managing your spend. The more successful they are, the more money you will spend in the future, and the agency succeeds as well. On the organic side of the business, everybody is always negotiating for fixed-rate contracts. And this is fine, except for the fact that it's very hard for anybody to understand the true nature of the opportunity or forecast what the contract should look like, or how much effort is really required to succeed. Also, a fixed-rate contract is disincentive to perform at a certain point, because a lot of SEOs can deliver a ton of tremendous positive value, but they don't get paid more for finding that other opportunity and driving all this new business into the customer. Having a fixed rate contract can be almost counter productive for them at the certain point, because they are not able to leverage that opportunity as intended or are limited by contract scope to how much value they can drive into a customer's business, thus also limiting how much of a profit they too can make for themselves. Eric Enge: Right, the danger in consulting contracts that are fixed rate is if you have a consultant that's savvy enough to know when they've done enough to earn their fees, which is actually better than the consultants that aren't savvy enough to know that because they probably don't care about earning their fees, but then they have some other client that's barking at them and they stop paying attention to you. Richard Zwicky: That's correct. And in the other model, you can still have a base fee for all the base work you are doing. But an incentive model may allow you to stretch your goals, go for that opportunity and discover where those other opportunities are. Then you have a model where all of a sudden you were rewarded for going that extra mile, because what it basically means is you are delivering added, unexpected and unforeseen value to the customer. If the customer earns more sales, the consultant should win with the customer. Today, you might run under a fixed rate model, and you are able to get running on an existing campaign, but every time you come up for renewal there is a frustrating discussion about what value you delivered. That discussion has become obsolete because of Enquisite Campaign. You can prove the value delivered, so it shouldn't be a question of if you your money's worth or not. It is more like "Wow, I can see that not only did I get my money's worth, I have got more than I ever expected. Definitely, I am renewing with you." You need to be rewarded for the value of your work. There is no upside or incentive to go that extra mile in the present systems, where there are fixed-rates contracts and they don't really reward most SEOs. A lot of large agencies are now having to focus on the SEO business models. They struggling with the question of how to compensate or build the right pricing models to sell to their clients. Now the ecosystem can run much more efficiently so that some of the larger agencies are going to go out and contract more and more SEOs in a much more efficient manner and help everybody win together. It's a win all around. The client wins because they are getting value, and the agency wins because they get compensated for delivering that value. I mean, what more could you ask? Eric Enge: Can you describe a little bit about how you collect the performance data and do the value calculation? Richard Zwicky: To collect the data we use the same JavaScript that we use in all of our products, but where the application actually begins is in helping people determine what an opportunity really is, so that you are able to determine if you are focusing on something that's worthwhile or not. Or, if a customer comes in and says I want to show up and get customers for a specific term, we are able to sit down with them and determine whether or not it is possible and worth the time, effort and investment. Our system figures it out and runs a really intensive series of calculations to determine how many people will search for the term over to next 30 days. Let's say the term is Blackberry. How many people are going to search for Blackberry over the next 30 days? And if you are placed in the top four, how many referrals can you expect to receive for that term? Eric Enge: Are you using that classic AOL data for people who click on number-one or number-two or you are using your own data? Richard Zwicky: We use our own data. We've done a lot of analysis work, and one of the beautiful things about having such a large data sample internally, is that we are able to qualify, verify, validate and iterate the reference data as the marketplace changes. You are not always going to be number-one, but if you are placed in the top 4 and you are bouncing around in there, what's a reasonable expectation of the referral traffic you could acquire? Now, if you are number one all the time, you are going to exceed the numbers we're laying out as potentially available because we are doing the weighted average of how much traffic you will get if you are in the top four. We're also adding a slider so you can project "what if I only reach page 3?" type questions and answers. That's the first part of the platform. The second part is that we make it possible for you to build a campaign based around conversions if you want. Then, customer can pay you as they make a sale. We can also do it based on a cost-per-click basis. Essentially, if you are only targeting Massachusetts for Blackberry phones, you would be able to build a very targeted SEO campaign for that, and our system will do the calculation to determine what the fair market price for organic clicks will be. To establish a fair market organic price, we actually take into account the differences between informational and transactional queries, the difference in conversion rates between paid and organic in each area, the difference in user behavior within search results and the website for when they actually arrive there or how much of the traffic that you get from organic that is actually good. This helps establish where the market really should be because you might want to be paid on a cost-per-click model or a customer may want to pay you on that basis. In an affiliate model they have to pay for every referral that comes through. But how do you define what the payment is? And to date, there hasn't been a good model for defining that in organic search. In Enquisite Campaign we have built it. Eric Enge: Thanks Richard! Richard Zwicky: Yes, thank you Eric! Have comments or want to discuss? You can comment on the Richard Zwicky interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Wed, 27 May 2009 15:46:47 +0200 Published: May 24, 2009
For over a decade, Dr. Scott Prevost has worked to bring natural language processing technology to the marketplace. As a graduate student at University of Pennsylvania, he developed theoretical models of prosody for synthetic speech, as well as technology to generate dialogue for autonomous agents. In post-doctoral research at the MIT Media Lab and FX Palo Alto Lab, he integrated gestures, facial expressions, and other interactional cues into his research, creating lifelike 3D characters with speech recognition, dialogue processing, and vision capabilities. Dr. Prevost co-founded and served as CEO of Headpedal, Inc., a software company that specialized in creating virtual character interfaces for customer-facing applications on the web. Dr. Prevost also previously served as CEO of Animated Speech Corporation, which produces interactive, animated tutors for speech and language development. Dr. Prevost was General Manager and Director of Product at Powerset, where he was focused on developing the user experience for natural language search. Powerset was acquired by the Microsoft Live Search division in August 2008, where Dr. Prevost currently holds the position of Principal Development Manager. Interview Transcript Eric Enge: Can you provide a quick overview about yourself and Powerset? Scott Prevost: I have been working on natural language systems with the goal of helping information retrieval in particular for quite a while now. Powerset was founded with the notion that we can improve search results by having a much better understanding of the meaning of the documents and of what people intend with their search queries. The way that we do this is to apply very deep natural language processing technologies to the documents as we are creating an index. And we also apply that to the queries at runtime so we can do a better job of actually matching meaning to meaning as opposed to just finding the keywords. Powerset was founded in 2006 and we launched our product in May of 2008, which was initially a Wikipedia search engine. Then we were acquired by Microsoft in the summer and closed the deal on August 1, 2008. Eric Enge: Can you talk a little bit more about the goal of better understanding a searcher's intent and the mechanics that you use after doing that? Scott Prevost: One of the key points that I want to make is that Powerset is not just about understanding intent in queries. That's part of the equation for getting better search results, but once you have that, you also have to have a much better understanding of what's in the documents as well. So, it's not enough to know that a user is looking for a certain kind of search result, you also need to be able to match that to what's actually in the document. So, what we propose to do is very different from what most other search engine startups do. Most search engine startups are trying to take the existing keyword search model and add some bells and whistles to it or put a new front-end on it. What we did is completely reinvent how the index is built by applying technology that we licensed from PARC, which allows us to do very deep linguistic processing. We essentially look at a document, break it into sentences and then we analyze each sentence using a very robust linguistic parser. We extract semantic representations out of that, and it actually has semantic representations that we store in our index. We do a similar processing on queries at runtime, and then we look to match these semantic properties, the keyword properties and other document properties. What this means is that we can find sentences that may have the right meaning, but use slightly different words. If you type in "When did earthquakes hit Tokyo" in powerset.com, you will see answers that use words like strike instead of hit. Then you will see that we are actually able to highlight dates in the captions for those answers because we've done the linguistic analysis on the sentences, not merely matching keywords. Eric Enge: So how is this different from Latent Semantic Analysis or Latent Semantic Indexing? Scott Prevost: We are actually doing the semantic processing upfront, and we are doing all the hard work on the backend, so that's one big difference from all the other approaches that we've seen out there. Eric Enge: So you are doing some preprocessing? Scott Prevost: Yes. We are processing the documents as we index them. We are also trying to do some analysis at query time, because natural language technology is still quite expensive in terms of the compute power that's needed. So, the degree to which we can compile all that out in the index means we can produce a runtime that's on power with a keyword search runtime in terms of latency properties. Eric Enge: So, when we talk about the problems with traditional search engines, one of the things that I saw you focus on was the fact that they required users to speak their language? Scott Prevost: That's right, yes. Generally, we've all gone through the process of trying to find that document where we try to figure out what the right collection of words that will pull this document up is. That means that you have to start thinking like the author of the document, imagining how the thing that you are looking for might have been expressed. We generally try our query a few times before we find what we are looking for. By adding the semantic analysis, we are allowing people to be a little more natural in the way they express themselves. You don't necessarily have to worry about the specific keyword, because we are likely to find a synonym. You also don't have to worry about excluding stop words or which words are going to be matched with which words in the matching algorithm. We just want people to be able to write a natural phrase or even a question, and then let the search engine do the hard part; figuring out what the appropriate matches are. Eric Enge: Right. In existing search engines it can be a disadvantage to have extra words that aren't actually necessary to the query. This is a result of using a more basic method for matching up the words in query with words on page. Scott Prevost: That's right, yes. And of course it creates some interesting issues for us, because now we are trying to change user's behavior a little bit. They have grown very accustomed to thinking of a search engine as words and documents that include these words. So now that we are messing with that interaction model, our hope is that people's behavior will gradually change as they start to realize the power of the system that we are introducing. One thing that we have been very careful with at Powerset is trying to maintain the old model as much as possible. So, if you just type keywords into Powerset, you will still get results that are just as good as those from Google, Live Search or Yahoo. Eric Enge: So you have talked a little bit about stop words, can you expand upon that a little bit? Define what they are, how they are treated by regular search engines and why making use of them in Powerset is important? Scott Prevost: Stop words are words that the search engine just disregards; prepositions or words like "what" and "where." It's a very salient limitation to implementation. Basically the idea is that if you try to match documents on those words, they tend to be less important in the query because they would match so many documents. But in reality they are the linguistic glue in the query and in language. They start to tell you how the other important words in the query link together, and that allows us to look for those links in the document when we are matching a query by processing them linguistically. Let's go back to the earthquake example. I am not specifically searching for the word "did," but that word is still part of the verb complex in that query. So the parser knows that "did" and "hit" go together. Basically, we are not matching for that specific word, but we are matching verbs together that semantically match. So instead of "did hit," we can use the word "strike." Eric Enge: Right. So for example, you could accidentally get something like "did not hit?" Scott Prevost: Yes. We are not currently processing negation in parser on a real detailed level because it is such a tricky problem. It would actually match queries that get the negation incorrect, but that is generally useful information for the user anyway because it is relevant to their query even if it isn't an exact answer. Eric Enge: Right. So, that's an example of something that you would be working on in the future? Scott Prevost: Oh, absolutely. That and things like sentiment analysis are all things that we will be working on in the future. For sentiment analysis, say you want to know what positive things a particular politician said about a particular topic. You would get a different set of results then if you just asked what they said about the particular topic. Right now we are basically working on sentence level linguistic matching along with other broader document properties like keywords, anchor text and using all of these things to rank our results. But as the technology improves, we'll start to look at many more of these kinds of discourse level properties so we can really understand what the most important sentences in the document are and how they relate to each other. And as we can learn from these kinds of approaches, I think we'll see the relevance of search results improving with time. Eric Enge: Right. For example, if someone types in "The Office," they probably don't just want to search the phrase "Office." They probably mean the TV show. Scott Prevost: Yes. And in fact if you type that into Powerset, you will get a result that's tabbed at the very top, for The Office television show. There is also a tab for the UK television series by that name, one for the band and one for Microsoft Office. So that's a pretty ambiguous query, but chances are you probably meant the television show by phrasing it that way. That's the one that comes up first. Eric Enge: Right. So let's get back to Latent Semantic Analysis. One of the things that you do is look at the entire set of documents, and determine relationships between words by proximity and frequency. This way you might discover that doctor and physician probably mean the same thing, or at least almost the same thing. What I am getting at here is the analysis of the corpus of documents to extract relationships. Scott Prevost: We are not using what you are thinking about as Latent Semantic Analysis. We are actually using more of a symbolic approach to the linguistic processing. That's the first phase of what we are doing. We look at a document and break it into sentences, and then we actually parse the sentences using technology that we've licensed from PARC. What this does is it allows us to create fairly complex semantic representations of the meaning of those sentences. And it also allows us to represent ambiguity in those interpretations as well. This way we can index he most likely reading of that sentence, and the other possible readings as well. What happens then is that these things become semantic features that get thrown into the mix with keyword and other document property features that are used by our retrieval system and ranking system. We are not retrieving results just based on meaning matches and partial meaning matches. It throws that into the mix, and that retrieval and ranking system is a machine-learning based algorithm. In that sense we are starting to use statistical approaches, but we start with a very symbolic representation of the meaning in the document. Then that is used by a machine learning algorithm to retrieve and rank the documents. We are not pulling the relationships based on things like frequency, we are actually uncovering the linguistic and semantic relationships through symbolic approaches. We actually do have other projects going on within the company that are looking at more statistical approaches to these problems. But I would currently characterize that system as a hybrid. Eric Enge: What exactly does it mean to say that it's a symbolic approach? Scott Prevost: It means that it's rule-based semantic processing as opposed to just uncovering things from machine developed approaches. For example, if we have a rule in our system that says if you kill something it dies. Eric Enge: What are some examples of search queries that highlight the power of this approach? Scott Prevost: Let's start with something like Siddhartha. The first thing you will see is the summary of Wikipedia pages that are relevant and that you can tab through. You probably were looking for Siddhartha, the founder of Buddhism, when you typed it in, but there is also a film, a novel and an American rock band by that name as well. You can just click on the tabs to see those different snippets. In the section below that, you will see something called facts from Wikipedia, and these are some of the semantic relations that we have automatically extracted using these linguistic techniques. In the second line you will see "Siddhartha renounced the world," and if you click on world, you will see sentences from which we extracted that fact. We extracted that from three different sentences on three different Wikipedia pages, and you will see that it's not the case that we are using proximity in the second one. Siddhartha is actually pretty far away from the word renounced, but linguistically they are tightly tied together. It's just that there is another phrase intervening. So this starts to show you how we are taking data that's in Wikipedia and starting to structure it. If you click the More link at the bottom of that section, you'll see that there are a bunch of other relationships that we've pulled from. Eric Enge: They are just a little less tightly matched. Scott Prevost: Exactly. Now you can also get to this structured information pretty directly. So, if you type in "What did Siddhartha attain," you will see Enlightenment and Nirvana. So, in a sense, these subject-relation-object semantic triples are great for answering questions. So, try something like "What was banned by the FDA." Now, if you are at the right part of the screen, you will see More. If you click that you will see up the longer list. And if you say click on something like "cyclamate" you will see the sentences from which we extracted that fact. We are basically allowing a whole new type of interaction. I type a simple subject-relation-object question, and now I get a list of answers that are supported by the text that we've uncovered through this linguistic analysis. And you'll also note that we can start to make distinctions between a query, like "who defeated Hulk Hogan," and "who did Hulk Hogan defeat?" If you search "who defeated Hulk Hogan," and you click on More you will see the whole list. And if you do the other query, "who did Hulk Hogan defeat," you will see that the lists are different because we are actually looking for these things in the correct relationship to each other in the text. We are not just looking for the keywords "Hulk," "Hogan," and "defeat." That's an example of a pair of queries that would be very hard for a typical search engine to distinguish between, because the key phrases are the same and the word order is what defines the difference. So let's pick a query for the regular search results. Let's type in "how many nuclear reactors does Japan have?" Now, here is a query with a lot of stop words, right? But it's a query where I think it is pretty easy to tell what the user is looking for. In the very first caption we can see that Japan has 55 reactors. We are basically interpreting the fact that you typed in "how many" as the fact that you are looking for the particular number of nuclear reactors. This is just something that you don't get when you use Google, Yahoo or Live Search, or any of the keyword search engines. Let's try "Who mocked Sarah Palin?" Now obviously, the other search engines do a pretty good job of finding relevant results for this. But what I want to show you are some of the captions in the blue link results. So we get things about impersonating Palin and parodies of Palin. It's not that we are necessarily just looking for the specific words Mock Sarah Palin, but we find synonyms that are semantically related to and can highlight those right in the answers. The hope here is that we can help users better understand when one of these blue link results is actually truly relevant to them, and we can save the clickthroughs when they are not. Another thing that we can talk about is pulling data, or pulling search results from structured data. So, if you type "GM board of directors," we actually connect with Freebase in order to produce this result at the top. Eric Enge: Along with the pictures of each of the members. Scott Prevost: Right. If you type in "what movies did Heath Ledger star in," you will get the same results as if you typed in "films with Heath Ledger," because we are actually doing semantic analysis and you are essentially looking for the same thing whether you type in the first phrase or the second. Eric Enge: The list of movies shown didn't change at all. There were just some subtle changes to the results below that. Those are interesting examples. Currently you are operating this on Wikipedia? Scott Prevost: That's right. Eric Enge: What was the reason why you chose Wikipedia in particular? Scott Prevost: Well, there are few reasons. First of all, as we were developing the technology, Wikipedia was a great test bed because it covers just about every topic that there is to cover. We wanted to make it very clear that our technology was about linguistic processing, and that we didn't have to be within a specific, very narrow semantic domain for the technology to work. Some other natural language approaches have taken that very narrow approach, and that's not what we've done. So the fact that Wikipedia is so broad was very appealing to us. The second reason is that Wikipedia is well written, so it parses pretty nicely. Although, our technology is designed so that when we can't parse something, we still index it as keywords. It has to be graceful degradation into the keyword world. The final reason is that Wikipedia is prevalent in so many search results these days. It's almost hard to find a search query that doesn't have a Wikipedia result in the top ten. So we know it has a very valuable set of documents to index. When it came time to define a product to launch, we had some resource constraints. It takes a lot of hardware to spin an index that has as much information as the Powerset index. So we had to find a smaller set of documents, and then it becomes a challenge to find a small set of documents that hangs together for the user in a meaningful way. So we decided initially to restrict ourselves to Wikipedia alone, rather than having Wikipedia and a few other smaller document sets that might not fit in. But now we are currently expanding the index. We've been continually playing around with other kinds of documents. The technology is not particularly wedded to anything that's specific to Wikipedia, but it's such a valuable set of documents on the web that so many people use. Eric Enge: So, if we think about this as runtime, if someone enters a query is there reason to believe that Powerset is more or less compute-intensive than regular search? Scott Prevost: It's marginally more compute-intensive at runtime, but the reason that it is only marginally more compute-intensive at runtime is because we do the real compute-intensive things at index time. Eric Enge: I assume that at that time it's probably significantly more compute-intensive. Scott Prevost: Actually the only thing that's more intensive at runtime is the fact that we are parsing the query. Once we've parsed the query, then the actual retrieval it is very similar to keyword retrieval, except we are retrieving on semantic features as well as keyword features. But it's very similar apparatus. Eric Enge: Right. But you probably have a higher level of investment to build the index, because, you are doing all that preprocessing? Scott Prevost: That's right. We are doing very deep processing on the documents as opposed to just pulling out the words. Eric Enge: Is there any insight you can give us at to how much more difficult it is. Scott Prevost: It depends on the degree to which we do it. It's a very granular system and we can adjust a lot of knobs. It can be a anywhere from ten to one hundred times more expensive. I am sure we could make it a thousand times more expensive if we thought we would get the benefit from it. Our goal initially has been to improve relevance while disregarding cost in some sense. But obviously, we are pragmatic when push comes to shove. The goal was to find out which of these features are most important for improving relevance. Then as we learn more, we can simplify and skip some of the computation that's not giving us as much bang for the buck. Eric Enge: Are there any components of Powerset that are integrated in the Live Search at this point? Scott Prevost: We've integrated a few things. We've integrated some of our direct answers using Freebase, some improved captions and snippets under the blue links for Wikipedia. And we've also done some things with related searches. And of course we are working on a much more robust integration plan, although I don't have any plans to announce anything today. But some exciting stuff will be coming down the pipe for sure. Eric Enge: Any closing comments? Scott Prevost: We are excited at Powerset to be having the opportunity to take this technology to scale and to integrate it in a product like Live Search. We are really thrilled because it allows us to see our dream actually come to fruition. And I think that we have just a lot of exciting stuff coming down the road. Eric Enge: Thanks Scott! Scott Prevost: Yes, thank you Eric! Have comments or want to discuss? You can comment on the Scott Prevost interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Mon, 18 May 2009 15:19:17 +0200 Published: May 17, 2009
Our first interviewee, Tracy Chan is a Product Manager at YouTube. Prior to working at YouTube he was a Financial Analyst at Google. He has also worked as an Associate at Stockamp & Associates and a Corporate Strategy Intern at eBay. He got his degree at the University of California, San Diego. Our second interviewee, Matthew Liu is the lead product manager on YouTube Sponsored Videos. In this role, he focuses on building an advertising platform that allows video creators -- from the everyday user to a Fortune 500 advertiser -- to reach people who are interested in their content, products, or services, with relevant videos. Previously, Matthew led numerous other projects at YouTube for advertising, content partnerships and rights management, and community. Matthew has a MS in Management Science & Engineering and a BS in Electrical Engineering from Stanford University. Interview Transcript Eric Enge: Can you provide an overview of what Insight is, and why you created it? Matthew Liu: There are millions of people watching hundreds and millions of videos every single day on YouTube. We started to hear from advertisers, content providers and everyday users that they wanted to understand YouTube's audience,. They were asking questions such as: "How do we really standout, how do we understand our ecosystem, and how do we know how the videos are performing"? They basically wanted to learn more about their audience in order to make better content. As YouTube was growing, it started turning into the world's largest focus group. So basically what we did was build up a pretty powerful analytics tool that helps content providers, advertisers and users better understand their performance on YouTube. This is a tool that's free to anyone who has ever uploaded a video. When I uploaded my first video, I got a hundred views in the first two days. And that was actually surprising to me, because it was a little animation video that was actually not that interesting. I was wondering if my mom was just watching it over and over or if other people around the world were watching my video. With this in mind, we built up a product almost a year ago; which we launched on March 26 of last year. We started with basic functionality that could give you information on my views over a certain period of time, maybe a month. On a personal note, it helped me figure out that my video was watched 50 times by my mom in California, but it also got a lot of views in Spain and the UK. So it was really interesting, because you could finally see where your audience was coming from and what the lifecycle of your video looked like. On top of that, we built a feature called Popularity, which analyzes how your video's performance compares to other videos. You can see how well your video performed on any given day relative to all the videos within YouTube or within specific geographic regions. And what is really interesting is that what we found our businesses were starting to use this basic data in very interesting ways. The obvious value was in understanding the lifecycle of your video, and on what days of the week it was most popular. This can help content owners really start to own their program strategies on YouTube. If they get most of their views within the first three weeks, for example, serial content providers could start uploading their videos every three weeks. Then they could maximize the number of views that they get on YouTube. Another interesting phenomenon we observed was that bands would put up concert footage or their new video clip, and they would have interesting pockets of audiences in different areas across the US. They'd actually start planning the touring schedules around them, because nothing is worse than scheduling a concert and having no one show up. But by having their content on YouTube, they could understand where the views were coming from and they could better plan their concert strategies. Another really interesting use of the tool involves measuring ad effectiveness. This could save you money on promotional dollars within the YouTube ecosystem. You could really start to see the effect of specific advertising campaigns that you ran them and if you got the views and spikes you expected. Here is an interesting example: if you ran a homepage ad on YouTube, you would expect that the video that you ran the ad on would get a spike in views. But, what we also saw was that all the other videos within that uploader's channel got spikes and views even if they put just one video on the homepage. So, you could really start to see the halo effects of advertising. Interestingly enough, you could also see the effectiveness of the different offline promotions that you were doing. If you had a movie screening in Michigan, for example, you could see if that made people in Michigan start looking for your YouTube content, and then the halo effects of the surrounding states that potentially heard of it as well. So a lot of really interesting stuff is coming off of the first features we created for YouTube Insight, showing basic views trended over time and space. A couple of weeks after we launched Insight, we added a discovery feature that allows publishers to understand how people get to their video. They can see if they found it through a search on YouTube or Google, or if it was an external link that they found somewhere on the web. It may be an embedded video across the web or a part of the YouTube site that drove traffic back to your video. Now this is actually pretty obvious, and again there is an opportunity to devise optimization strategies around how people find your content. For example, if there were blogs that embedded your video, you could reach out to them and form business relationships. One of the interesting stories that we heard involved the band Weezer. Weezer debuted one of their videos off of their latest album on YouTube, and what they found is they got almost 2,000,000 views within the first couple of days, which is a fantastic performance. When they looked in Insight, they found that a lot of those views were actually driven by tech blogs such as Valleywag and TechCrunch, which was a big surprise to them. So what they did with this information was actually more interesting than the information itself. The single preceded the album release, so when they were promoting the album release and their tour, they actually spent a lot of their media money on tech blogs since they knew they were already established there. Eric Enge: So they reached out directly to the tech blog, because clearly the tech blog had an interest in them at that point as well. Matthew Liu: You can imagine all the types of relationships that you could form from that. Not only do we show you the sources of traffic, we allow you to drill down more specifically. So, for example, you can actually see the search terms that led people to your videos. We have a great promotional product called Promoted Videos which is basically Adwords for YouTube, that allows you to advertise against specific keywords. So you can have your search and your video results show up with organic search results on the site. Again, Insight has proven to be a very powerful product, because now you can know which search terms are really effective and which terms were less effective. And the combination of the two really helps people start to find the audience that was looking for their content, whether they be advertisers or content providers. Eric Enge: Right. So if you are a commercial entity that produced a neat video that you put on YouTube, you may want to buy advertising just to create visibility for your video. Then you could use the analytics functionality to see how that campaign performed. Matthew Liu: Absolutely. Another really interesting thing about YouTube is that a lot of people just come to the site to be entertained. So for example, we get a lot of crazy, funny videos. You may find that the term "funny video" actually drives a lot of video views to a video such as Tea Partay. Because you now have access to this information, you can understand those general search terms that you may not have thought about before and really start to optimize. Insight is very real time. You can optimize in the middle of your campaigns, and it will really start to tell you what your strategies should be. The next feature that we launched was the Demographics function, which basically shows you the makeup of your audience in terms of a sex breakdown and an age breakdown. This is pretty important to both advertisers and content providers, because they need to see if they are reaching their target demographic. One of the things that we've realized about YouTube is that since it has such a massive audience, you can find any niche audience you want. An example we had was of a PBS producer who produced a show. He wanted to put the pilot up on YouTube, but the management at PBS wasn't really sure that YouTube was the right place, because they thought YouTube was geared towards a younger audience. What they did was put the pilot up on YouTube and let it run for two weeks, and they found that actually 75% of their audience was over 35, which was their target demographic. So it really proves that there is an audience on YouTube for any type of content. We also found that people are starting to use the demographic information provided by YouTube Insight to close deals. One of the most popular comedians on YouTube is a guy named Paul Telner. And he used the demographic information in Insight to show that he appealed to the right target audience and sign a deal with MuchMusic, which is Canada's #1 cable music network. Another example is Chris Bosh, who is NBA All-Star for the Toronto Raptors and also a member of the US Olympic team. Sharing information on his YouTube demographic helped him get a sponsorship deal with AOL Sports. Eric Enge: You could view it from the opposite point of view, which is if you are a content provider who needs to decide who you want to target as a potential advertiser. Matthew Liu: Absolutely. And we think people experiment with their content too. They put up multiple creatives to see what demographics these different creatives resonate with. It's using that focus group in a very, very controlled way, but it's very quick and free as well. And you have access to such a wide audience, so you can really see how things resonate within different groups. The most recent feature that we've launched is Hot Spots. All the previous features focused around using aggregate geographic data, but Hot Spots starts to dig deep into specific views. It shows how your audience related to the video during playback. What you basically see is a graph alongside your video, so you can actually play the video and see how your audience is responding second-by-second. If people dropped off, your graph would go downwards. If people rewound, you'd see spikes in attention. We also give you an overall attention score so you can understand how your video is performing relative to others. We show this attention score and your Hot Spots graph relative to videos of a similar length. This is important because in a vacuum people drop off more and more as videos continue past certain lengths. It's an aggregate. Eric Enge: Can you talk about exportable reports? Matthew Liu: Basically we heard from our content providers, our power users, and our advertisers, loud and clear, that they want broad access to the data. So we have launched exportable reports. The premise behind it is that we want to give these power users the data how they want it, when they want and where they want it. Exportable reports provides a lot more flexibility on top of the tools we already give you today. There are groupings of these videos that publishers want to look at that they are never going to tell YouTube. So, for example, if you had one marketing department focusing on a set of videos and another one focused on a different set of videos, there was no way to arbitrarily group those up, because YouTube had no way of knowing which individual works on each set of videos. Now they can download analytics for the specific videos and then make those comparisons. Until today, we gave you discovery sources by geography and by time, but in order to see things over time you had to select different date ranges. So if you wanted to see the number of times a specific keyword was searched on a daily basis, you could do it in Insight prior to the release of this new feature, but it previously required some manual work, because you had to switch filters and things like that. With exportable reports you can target the specific types of data that you want. If it is a keyword term, you can select that, filter it in the list and then try and put it on a timeline. Or if you wanted to look at views from certain keywords versus having your video embedded on a certain blog, you can compare those sources side-by-side. There are a lot of interesting things that we have heard content providers and advertisers want to do with this data, such as plugging it into their own systems and comparing their advertising campaigns on YouTube versus those on the radio. Now they are able to have that flexibility, and if they want to plug into a wider ecosystem, exports can take them a long way in getting there. Eric Enge: Is the export a manual process? Matthew Liu: Yes. It is a link and we provide it on a per-video and a per-channel basis. We are going to make improvements in terms of including more types of data and making it easier to access it, but we actually launched this feature very quickly from its conception. It was a 3-week cycle, so our goal was to launch it very fast, get users access to the features that we were promoting and then make improvements as we get feedback. Eric Enge: Can you export any of the data in Insight or just specific things? Matthew Liu: Right now, we basically have two reports. The first report gives you views, uniques, popularity information and engagement information. You can see comments, ratings and favorites on a daily basis by country and by video. And then the second report is referral data, so views by referral source are broken down by all the granularity that we have on a daily and country basis. Eric Enge: That is some good stuff for people to pull out. They can combine it with their other analytics data as well. Matthew Liu: Absolutely. We think that would be a great use of the exported data. We have heard some advertising agencies have their own internal reporting tools, and anytime that there is a reporting system that can plug-in, it makes them more efficient in terms of optimizing campaigns. Eric Enge: Right, You can just export the CSV file out and then run their other tools. Matthew Liu: Yes, absolutely. We are excited about this new feature, and we have received pretty good press from the blogosphere and from comments back on the YouTube blog where we can see people are finding it useful. Eric Enge: Any comments you can make on plans to enhance the analytics further? Matthew Liu: Insight had just two features when we launched a year ago, and now we have about six full-featured modules. So we are evolving very, very quickly. I can't speak specifically to features that we are going to be building up, but you can imagine there is a lot we can do with all the data that YouTube has. We display a lot of data such as engagement within the sites and how people are commenting on and rating the videos. You can imagine that expanding over a number of dimensions. Eric Enge: And now YouTube has become the # 2 search engine on the web, so that really adds to the value of this data. Matthew Liu: We are looking forward to helping people use the tool, because quite frankly we've been surprised about all the different use cases. Optimizing for search is a great way that people can enhance their experience on YouTube. Eric Enge: Thanks for joining us today Tracy! Matthew Liu: Thanks for having me! Eric Enge: Hi Matt! Can you give us an overview of your role with YouTube? Matthew Liu: Hi Eric, my name is Matthew Liu. I am a Product Manager, working alongside Tracy and others for YouTube advertising platforms. I am working on one of our newest launches, which happened at the end of last year, Promoted Videos. We think of it as the equivalent to Adwords on YouTube, as it is a paid Video Search product. Eric Enge: From an optimization point of view, the first thing you have to do is produce content that is interesting to people who end up discovering it on YouTube, which sort of goes without saying. Matthew Liu: Yes, absolutely. I think we've always had the philosophy at YouTube, whether it's talking to our users, content partners or advertisers, that whatever it is that you want to share should be good content. So when we speak to advertisers we ask them to try to make their advertisements videos that people would want to watch any ways. By using our advertising products advertisers are able to help put a little bit of gasoline on the fire and allow it to spread more quickly and potentially become viral. Similarly, our content partners and everyday users trying to get viewership should really think about what the community is looking for in general at a specific moment. And they should really try to personalize their video for the YouTube community as opposed to simply just taking content that might otherwise have ran on television or some other medium. Eric Enge: So I think there are a couple of key non-SEO things that people typically talk about. For example, advertising and allowing people to share your videos is a good thing to do. Also, making sure that the content in some way reinforces the brand rather than just being entertainment without purpose, so to speak. Allowing ratings and well selected thumbnails are also good promotional strategies as well, right? Matthew Liu: Yes, absolutely. You touched on a couple of those things, such as ratings, comments and also on embedding. One of the larger paradigms is that a lot of people put content on YouTube and they allow themselves to engage in conversation with the community. Sometimes we see our larger content partners or advertisers shy away from that, because they are afraid of what comments and what ratings they are going to get. Accepting comments and ratings may feel a bit more risky, but it definitely offers you very valuable instant feedback. So if we are able to get a couple thousand views and see what the ratings are and what people's comments are, it empowers you to make changes. And if you are getting positive feedback, not only is your video getting out there, but you are spurring positive conversation as well. So that's definitely one thing we recommend. Eric Enge: I guess it gets back to the old social media lesson, the conversation is going to take place with or without you. Matthew Liu: Yes, that's a perfect statement. Eric Enge: The choice becomes very obvious once you think about it that way. So do you have any interesting case study examples of someone who used advertising as a way to really launch a successful video? Matthew Liu: Yes. The first example involves OfficeMax, which is a large retail supplier of various office products. It is a traditional brand advertiser, with its own TV commercials in most cases, but they knew they wanted to do something a little bit edgier, with a potential to go viral. They commissioned The Escape Pod to be their agency, because they wanted to do something much more creative. So they came up with interesting series of videos, the Penny Pranks videos, for their Back to School campaign. These involved a funny looking guy who would go to various places in New York City and try to pay for everything with pennies, and everyone would be outraged. He would try to buy a car with 200,000 pennies, or something similar to that. They decided to use advertising to drive those initial views. They wanted to accelerate that and also as a byproduct increase the discoverability on organic search and on YouTube. So they worked with us using Promoted Videos and some other paid mediums. What they found was they were able to get fairly efficient views, so they were very pleased with the price. They were able to get a ton of clicks, which drove a lot of traffic to their videos. And as a result, they started that viral loop. So over time, we saw that for many search query terms. On the organic side, for some of their target queries, their videos became the top search result. OfficeMax actually was able to become so embraced by the community that our search engine deemed them to be the most relevant for that time period. And they also saw additional uplift on their other videos; not just the videos that they promoted from users watching and clicking on more from OfficeMax, but more views on the related videos as well. They were very pleased, because they had a very successful campaign that they were able to conduct in a very efficient way. That's one major example where you can think of brand advertisers trying to efficiently drive traffic to their online videos, engage in positive conversation and even potentially engage in that viral spreading of video. The second example that we can talk about regards a producer of consumer gadgets and products. During the launch of Promoted Videos they participated with us in producing a couple of videos that highlighted their iPhone 3G cases. The company is Zagg, and their product is called the Invisible Shield. It's an invisible, scratch-resistant film that goes on the iPhone. You could take a key or a knife to it and it will prevent your iPhone from being scratched. So in the video they show two iPhones side-by-side, one with the cover and one without it, and they show the different results. When promoted against terms such as iPhone and iPod, it was not only able to drive traffic to that video, but ZAGG was able to convert the traffic into sales. The amazing thing about it is that they were actually able to drive conversions at a cheaper value than they would have been able to do on Google and other competing search engines. One of the hypotheses we have is that for certain types of products where the user may not be as aware as to exactly what it is, being able to see it is far more compelling than just three lines of text. Eric Enge: So what about the power of send to a friend, and other options for sharing? Matthew Liu: There are a bunch of different sharing options, from sending to a friend, to embedding that video, to sharing on Facebook or MySpace, to even just copying and pasting the URL so you can go back to it later. So these all have various different positive benefits. I won't go into the details as to which ones we found most successful, but I think there is a reason why we encourage video distribution through different means beyond just YouTube, whether it's IM, Connections on YouTube or posting to third party sites. They definitely have a lot of positive values driving additional viewership and potentially even subscriptions. It just creates an overall deeper engagement. Eric Enge: Let's get into more basic SEO kinds of things. Standard advice in the industry places a lot of emphasis on category selection, titles and descriptions, and the use of tags. Can you talk about that a little bit? Matthew Liu: If you pull up a YouTube watch page, you'll see three main areas of tags that the user can input. We do have the title and the description tags just as you mentioned, but I think what a lot of people are missing when they use these three fields is comprehensiveness. A lot of times we see videos with very short titles, very short descriptions and somewhat erratic tags. The first thing I would say is if your video has subtopics or a subtitle, include them in the original title, and include all the details in the description. We offer a lot of space where we usually type in all the details, and obviously we are indexing all those descriptions and tags, and they are going to be surfacing in both YouTube video search and Google video search. So it's important that you have comprehensive data. Secondly, we would say be consistent. A lot of videos we see have a good title and a good description, but then totally random tags. So we actually do have measures that penalize this poor behavior. We recognize when videos are trying to spam, and that's actually something we penalize. So be consistent with your title description and tags. Make them clearly about that video and don't try putting unrelated keywords in any of those fields. Another layer of video SEO is to make your video open. Allow it to be embedded and allow users to comment on it and rate it. We definitely do take user feedback as an additional ranking mechanism. This can hurt you if you end up getting a lot of negative ratings, but the positive benefit of getting higher ratings outweighs that risk. Now let's talk a little bit more about engaging with that user. You mentioned the thumbnail which is probably one of the most basic things. Pick a thumbnail that is both representative of your video and engaging. Right now we will give you three thumbnails that we take from areas that we think are representative of your video, so any user that uploads a video should definitely take the time to find the best thumbnail. There are some positive benefits to higher quality videos. Users may or may not care as much about the quality of the video itself, but because we are taking that thumbnail from the video, the higher quality of video will make the thumbnail a higher quality as well. And higher quality thumbnails are something that we definitely notice attract our users. Eric Enge: Right. So you've got to care about the content and the quality of the thumbnail. Matthew Liu: Absolutely. Then going further along with engagement, we've launched some features such as path annotations. These are becoming more and more powerful overtime, as they are an additional way for you to communicate with your users. We are able to put speech bubbles or links to your other YouTube videos. Often times, savvy users do very interesting video tours where they link back to one another through different videos, or they even have games you can play by clicking on different annotations. It's interesting how you can create an extended cycle of viewership through annotations. Then, rather than just interacting with their user base, they are also interacting with the rest of the YouTube community. So what we've seen is that a lot of successful people can cluster together. A lot of our top users have formed this community where they send video responses to each other, they comment on each other's videos and they subscribe to each other. So we definitely encourage people who are trying to get increased viewership to tag back. We don't want to have people spamming or just randomly adding irrelevant videos as video responses, or comment spamming, and we definitely penalize videos that do these things, but when it is legitimate, posting video responses is a good way to network with other community members. Think of it almost as a message that you would get back on a social networking feed or a Twitter feed. Just continue that dialogue with important members of the community. Often times if that original video does get traffic, then your video response may get additional traffic and help viewers discover you as a new source of quality videos. Eric Enge: You get value by building relationships. Matthew Liu: Yes, completely. Eric Enge: Should people strive to avoid "stop words" in their titles. Similarly, should you include the word video in your title or description, so that if somebody searches on tech software video, for example, then you have a better chance of coming up. Do those things make sense as well? Matthew Liu: Yes, they do. Especially in the context of discovery from Google, because Google also indexes YouTube videos. Another thought that I forgot to mention is if your video was shot at a particular location or on a particular day, then you should also include some of that information in the video's description. Eric Enge: Another suggestion I've heard is to use adjectives such as happy or sad to pick up mood-based searches. Matthew Liu: What I can tell you is that YouTube search and Google search are a bit different at times. It's not in all cases, but we have seen that some users tend to search in more generic terms,. So you'll see users searching for very specific pieces of content, such as "CBS video" or "NBA video". You will also see users searching for terms such as funny videos. What I would say is video owners should target both the very specific terms and they should also potentially broaden out a little bit so that there are more generic queries in the description and the tags. Eric Enge: What's the best way to get a sense of the best keywords within the YouTube environment? Matthew Liu: Great question. We don't have anything to announce for now, but we are working on various keyword tools. We have a couple of very basic keyword tools as part of Promoted Videos right now, which allow you to checkup similar keywords. The Insight tool that Tracy talked about also helps to understand keywords that are already driving traffic to your video. We are working on a couple of other similar projects where we'll be able to have much more robust keyword suggestions in the near future. But in general, I would say use Insight and use the keyword tools that are already available in Promoted Videos, and those are probably going to be your best bet in the short term. Eric Enge: I have also heard a suggestion that you go to the search tool when you start entering a query, and then the search suggestions that you can get there may be in volume order from largest to smallest? Matthew Liu: I can't comment specifically about that. Those are suggested queries that we think users might be searching for as they start typing certain letters. I will add a caution that publishers should avoid keyword stuffing because it's very easy for you to potentially broaden the scope for your video by adding a couple of keywords. But, it only takes one or two irrelevant keywords to trigger us to think that the video is trying to spam the system. Our penalties will outweigh the benefits you can get with keyword stuffing. Eric Enge: But you did say earlier that it's important to be comprehensive, which means that you should include all the keywords that are in fact relevant (without putting too many total keywords), correct? Matthew Liu: Yes, there is definitely a balance you have to find. It's actually more of an art than a science. Use keywords that are related, but don't type in every letter in the alphabet. Just come up with the most important relevant keywords and add all those words into your description and tags. Eric Enge: It's got be highly relevant and something that people can search to discover your video, and then have a good chance of being happy when they get there. At a minimum, they get relevant content, even if it is not exactly what they are looking for. Matthew Liu: Yes, absolutely. Eric Enge: Thanks a lot Matt! Matthew Liu: Yes, thank you Eric! Have comments or want to discuss? You can comment on the YouTube interviews here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Thu, 14 May 2009 21:29:30 +0200 Published: May 11, 2009
John Mueller is currently a Webmaster Trends Analyst at Google Zurich. Prior to working at Google he became well known for his active participation in Google Groups and a variety of SEO forums. Interview Transcript Eric Enge: Can you provide me with your definition of cloaking? John Mueller: The standard definition of cloaking is to show Googlebot something different than you would show your users. So, in a worst case situation, you would show Googlebot a nice family-friendly homepage, and when a user comes to visit that page, they would see something completely different. Eric Enge: Like porn or casino ads or something of that nature? John Mueller: Exactly. So if the user was searching for something and finds what he thinks is a good result, he clicks on it, and then there is nothing even related to what he was searching for on that page. Eric Enge: Right. So that's clearly an extreme form of cloaking. There are many different levels of cloaking, and I'd like to explore some of those. Some people, for example, may have a content management system that just insists on appending session IDs or superfluous parameters on the URLs. They may not be superfluous from the CMS' point of view because they are using the parameters to pull information from a database or something like that. And given the content management systems that they have, it's actually very difficult and very expensive to fix this problem at its core. So one solution would be to serve the same content to users and to Googlebot, but to modify the URL seen by Googlebot to remove the superfluous parameters and the session IDs. John Mueller: That's something that we've seen a lot of in the past. We currently have a great new tool that can really help fix that problem without doing any redirects or without really changing much at all, and that's the rel="canonical link element. You can place it in the header of your pages and specify the canonical URL that you would like to have indexed. So you could take away all the session ID parameters or anything else that you don't need, and just specify the one URL that you want to have indexed. Eric Enge: Right. And that's something that you announced with the other search engines just a few weeks ago, correct? John Mueller: Yes, it's fairly new. It's something that not a lot of people have already implemented, and there are a lot of people who are already using it to clean up this problem. Crawling a website and finding many duplicate versions of the same content with different URL parameters such as session IDs can confuse search engines. Using this link-element helps to make it a bit clearer and can help to resolve this problem. Eric Enge: So you basically implement the canonical tag on various pages and you tell people what the canonical URL is. If, for example, somebody has different sort orders for their products in the e-commerce catalogue (e.g. by price, brand, size, color, ...), you can basically point Googlebot back to the canonical version of the URL, it's supposed to behave much the same way the 301 redirect would, except for it does not actually take the user to the different URL specified? Is that a fair summary? John Mueller: Yes. It's not a command that you would give a Googlebot, it's more like a hint that you would give us. One thing we've also seen is that people try to use it, but they use it incorrectly. For instance, they specify their homepage as a canonical for the whole site. And if we were to follow that as a 301 redirect, we might completely remove their website. So we have to take that information and determine if it is really a canonical for the other URL, or if the user may be doing something incorrect. Eric Enge: And of course one way you could do that is by making sure the content on the two pages is identical. John Mueller: Yes. Eric Enge: So if you make a mistake and use canonical tag to send everyone to the home page of your site, presumably the content will differ from the other pages. And, as I understand it, the gold standard solution is to fix the problem at its core and not have to rely on the canonical tag. John Mueller: If you can move to the cookie-based session tracking, then that would really help. But we know it's not always easy to change to a system like that. There might be a lot of money involved. So at least with this system there is fairly simple way to fix that problem. Eric Enge: Right. So it's the backup plan that should be used if you can't fix it at its core or if it's just too expensive to fix it at its core? John Mueller: Exactly. Eric Enge: Yes, that makes sense. Now I imagine there are also people out there who served a different URL to Googlebot and its users before the canonical tag existed. Is that problematic? John Mueller: I would suggest doing that for all new users who come to the site without cookies, instead of just for Googlebot. This way, if a user accesses an old URL that has a session ID, you can just redirect him to the proper canonical. That would treat users and search engines in the same way, and it would still help solve this problem. Sites that are currently showing prettier URLs to Googlebot should not panic, as long as their intent is genuine and it is properly implemented. But I'd advise against this for sites that are in the process of a redesign or sites that are being newly created. Using rel="canonical" is the current best practice for tackling this problem. Eric Enge: But if the system is relying on the session IDs, then it's there for a reason, right? John Mueller: Yes, but usually most CMSs resort to session IDs if they can't access a cookie. So if you see that a user doesn't have a cookie, you can redirect them away from the session ID. And I think the important thing here is that you find a way that you can treat users and search engines the same. Eric Enge: Right. You could use JavaScript to append your various tracking parameters to the URL upon the click. So that, in principle, is treating users and Googlebot the same. John Mueller: Yes, but that really doesn't solve the problem, because there would be something that would happen within the site. But when the search engine crawls a site, they don't execute the JavaScript, so it would have to work with and without the JavaScript enabled. Eric Enge: Right. So users that don't have JavaScript would of course be handled in an identical fashion to the search engine robots, and users who do have JavaScript would be able to benefit from whatever the tracking parameters are meant to give them. John Mueller: Exactly. That's similar to using AJAX on a website. If you have a normal HTML website and you start adding AJAX components to that website, a user with a limited browser, maybe from a mobile phone or even search engine crawler, would still be able to navigate your site using standard HTML.But someone who has JavaScript enabled would be able to use all those fancy AJAX elements, and that would also usually generate slightly different URLs, so I think that's completely normal. Eric Enge: Right. So, let's talk a bit about A/B or multivariate testing, which is something supported by Google's Website Optimizer product. It creates a scenario where users come to a page and some piece of code runs and decides what version of the page to show users, usually implemented in JavaScript. And of course the Googlebot will only see the one version, it won't see the alternate versions. John Mueller: Exactly. So, the clue here is that the intent matters, as is generally the case with Google. If the intent is really that the webmaster wants to test the various versions of the same content, then that's no problem. And if the intent is there to show the user something completely different, then that would be on the border. You would have to look at that. Eric Enge: I mean, you can always take any technique that was created with good intentions and find ways to abuse it. So let's say somebody is testing out four different versions of a key landing page on their site to see which performs the best for them. Maybe they are changing the logos and moving elements around, they might be changing the messaging a bit to see if one tagline is more effective than another, or they may be changing the call to action.John Mueller: If you are doing that with good intent to find the best solution for your users, and you are showing more or less the same content, then I wouldn't really worry about that. Eric Enge: Say you have a graphic of some sort, an image file on your site that might be a menu link or a logo. And there are various techniques for showing the search engine's robot or any specific user agent's text instead of the graphic. What are your general thoughts in that area? John Mueller: Generally speaking, if you do that with good intent and you more or less match the content up, then it's fine. So, for example, you could have a menu where you use JavaScript and graphics to create a really nice user experience with an alternate version that's in static HTML that might be behind the graphic menu then. If it matches up, that's fine. And if the home link has an alternate text tag, then that's fine too. But if you have a home link and alternate text that says, "click here to see our great cleaning products available in these 22 cities," then that's kind of sneaky, and not something that we would like to see. Eric Enge: So, there are various grades of this, correct? One level is where the text matches up a hundred percent with what is in the image. And there is a notion of substantially similar, and then you could actually several more grades and have somewhat similar, and then completely different. And, I think you just highlighted an example that's completely different. So, an identical is an easy case, I think you already addressed that. What if something is substantially similar, but is not word-for-word identical? John Mueller: I would say it depends on the case, but if you are not trying to deceive the search engine crawler or the user, then it's generally okay, but in general I would be cautious as soon as the content is not identical.. So if you have a link that goes to your homepage and it has a graphic of a house, then you wouldn't have to use house as an all-text. You could just say "go to homepage," or something like that, and it's fine. Eric Enge: So again it gets back to the notion of intent that you've already raised? John Mueller: Exactly. Eric Enge: And, of course, one flavor of this is sIFR, which stands for Scalable Inman Flash Replacement. sIFR uses text input to render what is shown in Flash so it is guaranteed to be identical. John Mueller: Exactly. Where we start to see problems is when a website has a completely Flash-based interface and a lot of different pages all on the same URL hidden behind it. Then it would be hard to include ten pages of HTML on a single page that match exactly what is written in the Flash file. So you have to find a solution for yourself there; how much really makes sense and how much you might have to cut back and just leave the basics in HTML and keep the bulk of your content in Flash. Eric Enge: Right. And of course when you get to that scale, you are past what you do with sIFR, which is really intended for putting anti-aliased fonts on your page, which is a more limited technology. But I think once you get into the more complex situations, you can use SWFObject, correct? John Mueller: Yes, it would be something like that. Eric Enge: That technology doesn't guarantee that the alternate version shown in text is identical to what is in Flash. John Mueller: Exactly. Eric Enge: So it is open for potential abuse, but I would imagine that the policy again gets back to what you actually do and what your intent is in doing it. John Mueller: Yes. And there are two other things that also play a role in that. The first factor is that we have started crawling and indexing Flash files. If you have a lot of content in your Flash file, we will try to at least get to that and include it in our search results. The second is that there are still a lot of devices out there that can't use Flash. So if you have a website that relies on Flash and you suddenly notice that there are a bunch of mobile users who are trying to use their iPod, iPhone or Android Phone to access your website, then you would start seeing problems because they wouldn't see the Flash content at all. , And if the HTML content doesn't match up with what you are trying to bring across to the user, they will simply leave the site. Eric Enge: One grade of this problem occurs when you try to implement something in Flash, but you are not going to be doing it with the intent of rendering the same thing that you can easily render in HTML. You are probably using it because you want to create a highly graphical type experience. It is not always the case of course, but certainly one of the things that's appealing about Flash is that you can create a really attractive visual experience. Say you have a man driving a fast car on the German autobahn, the Flash isn't going to narrate the course of the drive. But in your text rendering of what is in the Flash, you would want to describe what is happening. For example, "it's a nice day and a man gets into his expensive car and heads out onto the Autobahn". So you are actually implementing text that isn't in Flash, but the content essentially is. John Mueller: Yes, that's generally fine. If the intent is okay and it matches up so you can see that there is a car and a man driving on the autobahn, then that would be fine. Eric Enge: So again, it is about making sure that you are pretty much rendering the same information so that there isn't anything confusing in the user experience? Like if you flip from one mode to another, Flash, JavaScript or AJAX enabled or disabled, so to speak. John Mueller: Yes. If you can think about it from a user-experience point of view; if the user sees the HTML content in the search results and clicks on that page, does that match up what he would be expecting? Eric Enge: So what about serving different content based on an IP address to address things like language and national or even regional issues? Just to think of a regional issue, the products that your customer base in Florida buys could be quite different than the products your customer base in Minnesota buys. So you want to serve up the Florida user one set of offerings and the Minnesota user a different set of offerings. John Mueller: That is something that I see a lot as a European user, because in Switzerland we have four different official languages, and as soon as you start using a web site, it automatically tries to pick a language that they think is right. They are wrong most of the time, and it is something that really bothers me a lot. So I guess I might be a little bit emotional about that. One thing that I have noticed that you have differentiate between whether or not your content is really limited to a specific language or geographic location. For example, you have a casino website that you can show to users in Germany and in France, but you can't show it to users in the US. That's kind of an extreme situation, but in a situation like that you would still have to treat Googlebot like any other user that would come from that same location. So if we crawl your website from the US, and your website recognizes us as an American visitor, then you should show us exactly the content that an American visitor would see. And it would be a little bit problematic if the website started blocking all American users because of legal reasons. So what you would do then is make a public website that everyone can access and then just link to your private website that has been limited to users in a specific region. So, for example, you would have a general homepage that tells what your website does, gives some information and provides something that search engines can crawl and index. Then when users get to the right location they can click through to your actual content. Eric Enge: So are you suggesting that if a user accesses that website from Germany, they come to some initial page and then they have to click further to get through to page they are actually looking for? John Mueller: Exactly. Eric Enge: So it is not acceptable to just simply serve them? John Mueller: Yes, that might cause problems when Googlebot visits. The other problem there is that IP location and language detection is often incorrect. Even at Google, we run into situations where we think, an IP address is from Germany so we would show German content. But in reality, the user maybe based in France, and it is really hard to get that right. So if you try to do that automatically for the user, you are almost guaranteed to do something wrong at some point. That leads to leads to the other version of this problem, where users in the wrong location can still access your website. And in a case like that, we would be able to crawl and index the website normally, but I recommend that you include elements on your website that help the user find the version of the website that they really want to use. The important thing there is that you use different URLs for the different locations or different languages so that we would be able to crawl all of the specific content. So when I go to Amazon.com from Germany, for example, I have a little banner on top that says "Hey, don't you want to go Amazon Germany? We are much closer; we have free shipping". And that way, the search engine would still be able to see all the content, but users would still find their way to the right website. Eric Enge: So this of course is a little bit different than the scenario where you implement a website at casino.co.de, or .co.uk, or .com, or .co.us, where you really are creating versions that are meant to be indexed in the local version of the search engines? John Mueller: Exactly, yes. Eric Enge: So that's a different scenario that someone could use if they wanted to. John Mueller: I think the key point is whether or not users are allowed to access the wrong version of the website, or if there is a legal reason why it is blocked completely. Eric Enge: So if the legal reason isn't there and it is just that you want the default language that a German user sees, and you are willing to accept the fact that you are right about 90% of the time and you are wrong about 10% of the time, they can click the French link if they are really from France? John Mueller: Yes. I think that the important part, especially with languages, is that you really provide separate URLs so that Google can crawl all language versions. And this way you also don't have to do language detection on your site. The user will search for something using a German or French-speaking Google, and we will show the French or German-speaking pages appropriately. Eric Enge: So they end up in the right place through that mechanism? John Mueller: Yes. And you don't even have to do anything on your side. Maybe if you have a homepage you could show a little drop-down and let the user choose. Or you could have it pre-populated with the determined location by default, but you are still giving the user a choice between the different language versions. You give the search engine a choice and we will try to send the users directly to the right version. Eric Enge: What are your thoughts on serving up different content based on cookies, such as explicit or inferred user preferences. John Mueller: I think the general idea is also to make sure that you are not trying to do anything deceptive with that. Say, for example, you have a website where you just have general information. If a normal unregistered user comes there and you show that same general information to Googlebot, that is fine, because even a logged in user finds more information when he accesses the same URL. So if you make sure that it matches up with what a user would see, then that's generally not a problem. Eric Enge: And since we are talking about cookies, presumably we are talking about a user who has been at the site before. So if they come back, their expectations may be for somewhat of an enhanced experience based on their interactions. John Mueller: Exactly. So if you have it setup in a way that logged in users or users who have preferences get to see more detailed content, then that's fine in general. But if you have it in a way that users who were logged in see less content or see completely different content, then that would be problematic. Eric Enge: Right. Can you give us an overview of First Click Free and what its purpose is? John Mueller: We started First Click Free for Google News so that publishers could provide a way to bring premium content to their users. For example, if you have a subscription based model for your website, you could still include those articles in the Google News search results and a user who goes to those articles would still be able to see them and read that article normally. But as soon as they are trying to access more on your website, they would see a registration banner, for example. Now, we have extended that to all websites, because we know not everyone can be accepted into Google News; it is kind of a special community. So if you have some kind of subscription or premium content, you can show that to Googlebot and to users who come in through search results. But as soon as something else is accessed on that site, you are free to show a registration banner so that users who are really interested in this content have a way to signup and actually see it. Eric Enge: So the idea here is you have subscription-based content and Google wants to make its users aware that that content is there and it exists. John Mueller: Exactly. Eric Enge: So the user goes to Google, they see the article, they decide to go read it, the site implementing First Click Free checks the referrer and makes sure it is from Google, in which case they show the full article including all pages of a multi-page article, not just the first page? John Mueller: Yes. Eric Enge: And then the user potentially gets the registration banner when they go on to or a subscription box on a different article? John Mueller: Exactly. Eric Enge: Now, can a user just go back to Google and search on something and try to find that same article somewhere else in the search results? John Mueller: Theoretically, yes. That would be possible, but we found that most users don't do that. It is more work that way, and if it is content they are really interested in, they will figure out a way to access it normally. When you like the content, you might say a subscription and say "Okay, this is a good website. I want to come back and read more of this content. It is fine if I just pay a small amount for it". Eric Enge: I would imagine that for most subscription-based sites that it is an effective program to expose their content and increase their subscriptions. John Mueller: Yes, absolutely. Eric Enge: Exposure is really good. To do this, you basically bypass the login screen and give it access to all the content that you do want to index when Googlebot comes to the site. John Mueller: Exactly, yes. I would expect that you could probably do the same for other search engines. You might want to check with them, but I think that is generally acceptable if the user sees the same content as the search engine crawler would see. One thing that I have noticed when I talk to people about this is that they are kind of unsure how they would actually implement it and if it would really make a difference in their subscription numbers. It is generally fine to run a test and take a thousand articles and make them available for First Click Free, make them available for Googlebot to crawl and make them available for users to click on. You can leave the rest of your articles blocked completely from Googlebot and from users. Feel free to just run a test and see if it is going to make a difference or not. If you notice it is helping your subscriptions after a month or so, then you can consider adding more and more content to your First Click Free content. Eric Enge: Right. You can take it in stages. Are there other questions on these topics that you hear from people at conferences or out on the boards? John Mueller: Another thing about cloaking is that we sometimes run into situations where a website is accidentally cloaking to Googlebot. That happens, for example, with some websites that throw an error when they see a Googlebot user agent. It is something that can happen to Microsoft IIS websites, for example, and that would technically also be cloaking. But in a case like that, you are really shooting yourself on the foot because Googlebot keeps seeing all these errors and it can't index your content. One thing that you can do to see if that is happening, is to access your website without JavaScript, with the Googlebot user agent, and see what happens there. If you can still access your website, then that's generally okay. Another problem that sometimes comes up with language detection is that a website will use the same URLs for all languages and just change the content based on user or browser preferences. The problem here is that Googlebot will only find one language, and we will just crawl the whole website in that one language. So, for example, we have seen cases where Googlebot was accidentally recognized as a German-based user, and we re-crawl the whole website in German and suddenly all the search results were only showing up to German users. Eric Enge: So people in the UK couldn't see the UK-English version of the site, because, the Googlebot wasn't aware the content was there? John Mueller: Users in the UK would be able to see that content, but since the Googlebot was recognized as a German user, it was seeing the content in German only. In this case, the old pages would be re-indexed in German, so if someone was searching for an English term, they wouldn't even find that site anymore. The lesson here is to really make sure you have separate URLs for your content in different languages and locations. Eric Enge: Right, for purposes of this example, we are assuming that the content is identical but translated. John Mueller: Exactly. Eric Enge: And, you want to have separate pages for the different language versions of the content. John Mueller: Exactly. Eric Enge: Excellent, thanks John!! John Mueller: Excellent, thank you! Have comments or want to discuss? You can comment on the John Mueller interview here. Other Google Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Mon, 04 May 2009 18:57:23 +0200 Published: May 4, 2009
Sarah serves as COO of SEOmoz.org, Inc. and is also a law and technology blawger (don't be as dense as I was, "blogger" + "law") extraordinaire. Prior to Sarah's tenure at SEOmoz, She worked as a litigator for a small firm in Washington State where she managed diverse and complex cases. Sarah attended Simon Fraser University in Vancouver, Canada; Koç University in Istanbul, Turkey; University of Washington School of Law in Seattle, Washington; and East China University School of Law and Politics in Shanghai, China. Interview Transcript Eric Enge: Let's start by talking a little bit about a performance-based SEO agreement and how both sides of the deal should be thinking about it. Sarah Bird: A performance-based SEO agreement can work out really well if the parties have a lot of trust and openness, and if there is a good contract in place. But before you get a good contract, there needs to be a lot of good communication. You've got to talk about all the details and all the expectations. Obviously it's attractive for the merchant, because they don't want to pay for something if they are not going to see results. And they feel like they'll get better results if the person who is doing the SEO work has incentive to perform well. So, from a merchant's perspective, they are only paying for what they actually get. From the SEO side there is a feeling that if you have a product or business that you really believe in, then you can participate in the upside driven by that. If you know you can do a good job and take them to a whole new level --reaching their customers and selling their product-go ahead and do it. The potential reward can be really, really tempting and really powerful. So, there are some obvious reasons why these kinds of contracts seem like a great idea; and in fact they can be. I am a pretty conservative person myself, so sometimes I talk about performance-based agreements with a negative slant, but I do believe they can work. It's just there are so many things you've got to do first. Eric Enge: We have actually done very well with them when we've done them. As an SEO consultant, the way I approach it is to determine what level we'll be working on and how much we'll be spending. For example, if we are working at a 10K per month level, we might take half of what we are spending and keep it as a fixed retainer fee, and then take the other half and put it into the performance bucket. And if we are successful, we'll look to double our money on what the base fee would have been. So, if we take $30K in retainer, and we put $30K at risk based on performance, I want an opportunity to make $60K on that risk piece so we make a total of $90K. And, that's kind of where I think it really creates the big gain on the other side. Sarah Bird: Exactly. Clearly it can work. What it sounds like you've been doing well is having a lot of those difficult conversations upfront about what both your and their expectations are. Another important part of communicating ahead of time is talking about when it will end; at what point do you no longer get commissions for the work you do? And I think a lot of people leave that part out of the conversation, and then it becomes difficult. Many months will go by and the merchant will not want to keep paying commission forever to the people they have hired. If they are having success, they will want to keep some of that success for themselves. On the other side, SEO is thinking is that, "Hey great, I am just really starting to kick it into gear and just starting to reap the rewards that I knew I could get with all the time and effort I have invested". That is an important conversation to have. I am curious--if you are comfortable talking about your strategies --how long do you typically do your performance-based contracts? Eric Enge: We look for something like a 6-month time interval. One of the things that's very complicated about SEO. You never know when you are going to see the return. Sometimes you can get returns really quickly when all the chips fall the right way. But other times it takes six months before you have significant changes in trafficking. It really depends on where the site is currently, what the smartest strategy for them is and what it takes to execute that strategy so search engines will pick up on it. So, it gets complicated, but six months is usually pretty safe time interval. Sarah Bird: I think that's a nice time interval. Like you said, you are sure to see some results by that point in time. And, the merchant is probably pretty comfortable with that time period, because it doesn't seem like a lifetime for them. I think that's really within the typical range of about four to ten months commitment level on any kind of paper performance agreement. Another thing I recommend is being very clear upfront about who owns the intellectual property that is being created. That can even include your expertise on what should go in the title tags and what articles should be included. Sometimes you are even writing the skeleton of the article, if or actually fully creating the content. And it needs to be clear who owns the domain. Those are important questions. Sometimes SEOs even go in and set up the whole domain. They setup the analytics programs, they do everything, and then two months later the merchant tries to boot them out. The SEO has put in all this hard work and gotten the chance to get rewarded for it. I think that's an important discussion to have for clarity ahead of time. Eric Enge: But my main concern from that perspective in contracts is that there aren't any restrictions on any SEO techniques and methods that are inferred by the contract. If we use a novel new technique to help the client out, we certainly would agree to not reveal any details of their specifics. But, as a concept the piece of SEO expertise we have to own, because that's what we do for a living. Sarah Bird: Yes, exactly. And I think the more you can set forth what you are counting as your SEO expertise ahead of time in conversation, and hopefully in a contract, the better both parties will be. If their expectations are set, everyone is going to be happier in the end. And if you do have to unwind early, or even if you are just unwinding on time, there are no surprises about what you are taking with you. People who are just beginning to do SEO on this performance-based level tend to overlook this. It is really critical to have that conversation. I really like that you guys are doing a base monthly retainer, because then there is no complete loss to you for the investment and the work you do. I think a lot of the SEOs have that experience where they give advice, they have great ideas, but they are not in-charge of being able to implement it for whatever reason. You can give all the advice in the world to a client, but if they don't implement it, or if they implement it the wrong way, then you may not get any good results. It's not something the SEO always has control over, so I think that idea where you have a base monthly retainer, that's at least going to cover a certain amount of your cost and time. It's a great way to mitigate the risk of loss compared to a solely performance-based SEO agreement. Eric Enge: If you make a set of wonderful SEO recommendations, and it takes them four months to implement it, where does that leave you? That's certainly one reason why we always take some of the money in hard cash rather than put it all at risk. The other way to handle it is to provide the web development resources in the contract. Then specify that you have the right to make the changes you recommend, within reason, of course. This way you start to have more control, which matches up with risk in a very nice way. Sarah Bird: I think that is exactly right. And I think any SEO company would ideally like to have the manpower to also be able to complete the implementation. And whether or not that's an option, or the degree to which it's an option, will really depend on the clients and what kind of business structure they have. So yes, I think that works really great, especially for smaller businesses or for businesses that are just starting out. They are much more likely to want an expert to takeover all of there web development resources. But if we are trying to go to some huge site that already has tens of thousands of pages, there is no way you could do the web development for a site that size. I also want to point out that you have to have a lot of trust in whoever you are working with, especially if you are new to the industry. But trust is not a replacement for a contract and people shouldn't be shy about requesting a contract. If you are asking for a contract, it doesn't mean that you don't trust them, it means you want to make sure everyone is on the same page and stays on the same page for the time period that you'll be working together. Because you are working with a business, they could always have people coming in and out of roles. New people may not have the same institutional knowledge because they weren't there when the agreement started. I've seen agreements, performance-based agreements, that begin well, but then go downhill when someone new comes in. If you don't have that kind of contract written down for everyone to rely on, you can get into trouble even when you wouldn't anticipate that in the beginning of the relationship. Eric Enge: I totally agree. When you are trying to build rich information resources to help establish the website as a leader or an expert in its field, you may go off and create lots of different valuable content. So when you do that, one of the things you do, especially if you are working on large websites you start trying to create lots of valuable information, and you end up referring to other things on the web. You may start referring to information that's been published by others on pages that maintain a copyright notice. So, there is a concept called "fair use" that is applied in situations like this. Can you outline what it is, how it works and how they should interpret it in terms of not taking risks with the way their content is produced? Sarah Bird: I think this is a great topic, and it's a legally hairy one because there will never be a crisp definition of what's fair use is. On one side of the coin, there is this feeling that, if you created the content, you should be able to own it. I think everyone agrees that we want people who make good content to have a good livelihood. But I also think everyone agrees that this product-=information--is good for society--its dialogue, its ideas, its speech and in how human beings can advance through it. There is this natural tension between our core beliefs. We say people should be compensated and deserve to preserve the value they create, but then, we all want the information to be spread around freely and openly. So fair use is how we resolve this problem, and it basically says you can use other people's' content as long as it's fair. You can't steal that content, you can just use a certain amount of it in certain context. We are going to call that fair and legal. It's okay to do it a little bit, because we want to share the idea. We just don't want to totally rob or steal from the original content owner. So there is no clear definition for fair use. It merely evaluates the factors of context, the purpose and character of the use, the nature of the content, the amount of the work used in relation to the whole, and the impact on the potential market. The first factor is the purpose and character of the use. If you are using someone else's content to make money off it, that looks a little worse and feels a little less fair than if you are an academic professor and you are just trying to discuss an idea in a classroom. That's a noncommercial use of ideas, but the situation isn't always that clear and the decisions are generally not made that easily. You can talk about it, put it in practice, but somewhere along the line it can always get blurry. The other factor is the nature of the content. If it is scientific, biographical, historical or factual, then the public has a greater interest in accessing the information. And because the goal of their use is to encourage progression of knowledge, the more academic or scientific the content is, the more likely the court is to say that it' is fair use. Eric Enge: Isn't there also this notion that the things that are purely factual and available from multiple sources are fair game? Information that can be used very broadly, like the fact that Seattle is in Washington State. Sarah Bird: Exactly. Eric Enge: But, you still have the notion that if somebody renders the data in a specific way, then their specific rendition of the data can still be protected. Sarah Bird: That's where it starts to get blurry. Everyone can agree that the fact that Seattle is in Washington should be fair use because we want to spread knowledge. But on the other side of that tension, you always have to be aware of the content factor. Whether it's a song, some sort of graphic an artist has created, scenes from a movie or a product review, these things are not as factually based and not actually based in science or biography. They contain opinion and creativity that's more highly protected. One of the questions you should ask yourself, among many, is if it is just a fact or if it is somebody's creative work? If it's a fact you are probably more in the clear than if it's somebody's creative work. That can become a little blurry sometimes, but overall I think we hit the nail on the head with your Seattle example. No one owns that, so you will be fine using it. Another factor is the amount of work used in relation to the whole. For example, when they launch a new movie, they want you to come see this movie. In the advertisements they often take extracts from critics who've written reviews, if it's "two thumbs up," or if it's "fantastic," or whatever. But you'll notice they only take a couple words. They don't take a whole article, they only take a few things and that is considered fair use because you are not taking the critic's whole work. You are just taking bits and pieces. If you are online and you are reading a great Search Engine Journal piece on the Google algorithm, we can look at that article and we can take a sentence here and there and then talk about it on our blog, and then expand upon it with our own ideas. So we are interacting with the ideas on Search Engine Journal. Now, we can't take whole paragraphs or the whole articles and republish it on our blog, because that would be too much and that would be unfair. But, you can have a dialogue and you can extract little blurbs from it. How much is too much will depend on how large the article is overall, and how important the piece of information is that you took. Eric Enge: The related question is the role of citation. How does referencing the original work affect the equation? Sarah Bird: Fair use technically has no laws requiring you to provide any citation. For example, when you are making a mash up of various things you've found on the web, you don't need to cite every single one of them. You can create a new product out of bits and pieces about their products with no problem. But if you want to be a participant in a dialogue and you want to help create that impression that you are using this content to engage in a dialogue and to spread ideas, that's going to help your fair use in a sense. You are basically saying "I am participating in the communication of ideas. I am not trying to steal someone else's glory or content for myself and I am not trying to ignore them. As a matter of etiquette and as a matter of participating in a community, you can give them credit somehow. But citing it or giving attribution is not required by law. A judge isn't going to care whether you linked to them or just cited their name. They are not going to ask you that, but as a good net citizen it's something you should consider. You can just pat the other person on the back and tell them you really appreciate the article they wrote. As SEOs I think we're more aware that it's vital for the other person to get that link back. But I would be misleading people if I told them that, the law says you have to link back when it doesn't actually say that. Eric Enge: The law doesn't care basically. Sarah Bird: No, the law doesn't care about that, but it is certainly one of the elements of being a good net citizen. Eric Enge: Right. So, you are putting yourself in a position where they are less likely to get angry with you. Sarah Bird: Yes, absolutely. It can help make it clear that you don't have bad intention, but you really are just trying to engage an idea. There is one final factor we should discuss, which is in some ways the most important factor because it sums up the others. That last factor is whether or not the content you are borrowing from someone else has an impact on the potential market for that content. For example, I can't go and republish someone's book in it's entirety on my website, because that would mean I've just stolen part of the market from the original author. It has an impact on his market, as people can get it from me so they're not going to buy it from him. I think that sort of sums up what we are balancing on all the factors. It's important to think about whether or not taking someone's song, graphic or part of an article, impacts their market and their ability to get value out of the product they created. Eric Enge: Sometimes it goes further and people actually steal your content. Sarah Bird: Yes, it does happen. It happens all the time, where someone takes your content, they take your whole blog post and republish it on their site, and that's obviously not fair use. But there are several things you can do, especially if you are in the US or in some jurisdiction that's covered by the DMCA. The first, and I think best, suggestion I make is to at least try to contact the website owner and say "hey, you are stealing my content, take it down." Start by having a professional conversation with them. Some people roll their eyes when I say some of that stuff, because there are websites who know they are stealing content. They don't really care that they are stealing content and they are not going to care who contacts them, right? But other times it is a genuine misunderstanding of how the online system works. There are people out there who think that they are not stealing as long as they link back, but that's not true. Those are people who think giving attribution is enough, but it's not the same thing. The first thing I think is to try to contact the website owner. Usually they will have contact information on the website, but if it's just a scraper site, it's unlikely they are going to have any contact information on their website. You can also try to get their who is information, and hopefully that will; be accurate. If it is accurate then they are in the US, and they are more likely to respond to an email. But if you can get an email address for them, that's a great sign already that the request will be effective. The important thing about the DMCA is that it does allow you a quick way to get the content that someone's stolen from you out of the search engines. The whole idea is to protect your content, and obviously you don't want someone to go to the stolen content, you want him or her to go to your content. So maybe you can get the person who actually runs the website to take it down, but if that doesn't work you can get Google to take it out of their search engine results so that other people can't find it. And, that's sort of the idea behind a DMCA request. If I am a copyright owner and I have a problem with someone stealing my content, I can send this letter to Google and let them know that. If you do that, Google contacts that website to let them know that they are going to take the content out of their index. So if you are doing work for a client and they get one of these messages from Google by mistake, you can actually respond to Google and tell them that you got this letter saying you were stealing content, but it's not true. Then Google, who has to be the middleman, will put it back up. It's a really quick way for copyright owners to try to address this online stealing of content, but it's not a full trial or anything, it's just an incentive for search engines like Google to take content down quickly. And if you've been accused of stealing content, they'll also give you this really quick and easy way to get your site back up. Again, without a trial, without a jury, you just have to contact Google and tell them they have been mistaken, and swear it is your content. Now if there is still a disagreement, the content owner is now responsible for basically filing a lawsuit, because you've got to protect your rights in court. So, the DMCA is the first step, and it works for most people, but, you may have to file a lawsuit if the other website claims it is their content. Eric Enge: Any experience with how quickly the DMCA requests are responded to by Google? Sarah Bird: I've had good luck with those actually. I think they have fourteen days by statute to respond to DMCA takedown notices, and I had great luck getting into that window. We are talking in matter of days, but I have heard other people say otherwise. I don't know what makes the difference, and I am just guessing, but maybe they didn't send the takedown notice to the right place or they didn't sign it. There are few things that have to be done for the process to work smoothly. You've got to sign the request, you've got to swear that you are the owner and you have to be acting in good faith. So it could be that people who are not having success maybe are missing a step. But I am not really sure. Eric Enge: Right. So, another thing that we usually recommend as an interim step is to contact their hosting company if you've already tried and failed to contact the site owner. Sarah Bird: Yes, exactly. Hosting companies are also required to respect DMCA requests. You can do the same kind of process with asking them to take it down as we just discussed with regard to Google. So yes, I think that's another great option. Eric Enge: And if the hosting company doesn't respond, then they have some liability in this situation? Sarah Bird: That's exactly right. That's how the law gives incentive to the search engines and hosting providers to act quickly. If they act within a certain period of time, then they can't be held liable for any of the possible infringements taking place. So, it's a great incentive. It's not a perfect system, but I think it's a pretty good one. Eric Enge: Well, and if you work the system effectively, then you can protect your rights reasonably effectively. Sarah Bird: Definitely. And I would say there are only two caveats. One is if you are not from the US, but you are trying to use the DMCA to takedown a site or get a takedown revoked, you should probably think twice about doing that. Because foreign companies may not be subject to the laws of the US, but if you use the DMCA process, you are agreeing to be subject to the US jurisdiction that covers our copyright views. It's a technicality, but foreign companies and people may not realize that they are agreeing to US jurisdiction if they use this process. It's not just an administrative thing, it's an actual legal process. The second thing I wanted to highlight, which probably goes without saying, is to definitely make sure it really is your content when you send a takedown notice. Don't ever use the DMCA process abusively, because there are really severe financial penalties for abusing the takedown process. So if you send one because you are just trying to get a competitor's site taken out or whatever, you can get in some serious trouble. Make sure that you are not just guessing that there are some copyright infringements, definitely make sure you are acting in good faith and it's your content. Eric Enge: Right, The penalties are actually pretty stiff. Sarah Bird: Yes, they really are. Eric Enge: Another interesting thing that happened in the industry recently is that the Federal Trade Commission made some changes in the rules about self-advertising. Sarah Bird: Yes. It is debatable whether or not you would actually call them changes, but they are being perceived as changes within the SEO and online marketing world. There have been rules for print advertisers and for television advertisers about stealth marketing techniques and about substantiation in place for a long time. You have to be able to substantiate any claims you make about your products; you can't just say "Ours is better than Jesse's," or "This will make your kids pay better attention in school." You can't just make those kinds of claims, you have to be able to substantiate them. For a long time it wasn't clear whether or not those same standards of disclosure about substantiation applied to the online world as well. So finally, the FTC has come up with some guidelines that say they indeed to apply to the online world. This should just remind people who make a living on the web to know what's okay and what's not okay in advertising. Let's say you are an affiliate marketer and you are doing some advertising campaign work for some sort of a vitamin company. And say they tell you to put in the advertisement that the vitamin will make people lose 30 pounds. You can't just to put that in there without being sure it's true, because you have to have some ownership and responsibility for the claims you are making. You actually have to go to the manufacturers and merchants and ask for proof for these claims about the product. And if they can't give you any proof, then you shouldn't run that in the ad because you will be on a hook for those claims. Whether or not the claim is true is not the issue, the FTC is more concerned about whether or not you did your homework and got proof. Advertisers have actually made claims about products, and then the FTC has come in and say "Hey, where is your proof about these claims?" Then the advertiser would say " we don't have a proof right now, but we'll get you some". And then the advertiser does some research, and it turns out their claims were in fact correct. That is not good enough for the FTC, you can't just make claims and hope they prove to be true once you finally got around to substantiating them. You need the substantiation ahead of time. I think that's an important element. You can't just go with your gut feeling, you have to actually substantiate it in advance of making the claim. And if you are the marketing agency, online or not, you have to make the effort to see that proof before putting the advertisements out on the market. Eric Enge: Right. So, let's ask a question about a specific scenario. Say you are an agency and you go back to your client ask them if they can substantiate that 74% of people in the blind taste test picked their cola product over the other cola product. And then they come back to you and send you a nice email that says, "Yes, we did a blind taste test." But they choose to omit the fact that the blind taste test was done directly outside their headquarters' building. But they ran this test and you as the agency receive their email saying that they had in fact ran the test. If you have this kind of documentation have you done your due diligence at that point? Sarah Bird: Well, you are very close at that point. The confirmation that just said, "Yes, we did substantiate", is not quite enough. You have to ask them to send you the documentation next. They should have that substantiation on hand, it shouldn't be news to them that they actually have to prepare some kind of document saying that they did substantiate their claim. If they tell you that it's not really something they are prepared to show anyone, that should be a red flag to you. Verbal confirmation that claim is substantiated is not enough; you want to see some sort of evidence that they did it. Eric Enge: Sure. There are certainly a lot of businesses out there that have tiny internal staffing that have grown by being very aggressive about their marketing. But, if your visibility goes up, your exposure to these kinds of issues goes up. Sarah Bird: Absolutely. Eric Enge: For example, you have all these social media networks that tend to grow very quickly. I don't know if they do that much advertising, but if they did, and they made claims, they may not have had time to do any substantiating. Sarah Bird: Right, and that's just risky according to what the law and the guidelines say. Now of course every business owner has to decide his or her own risk tolerance level. Some people are going to say that the FTC gets around to validating so few companies' claims because they don't have the resources to check everyone, so chances are pretty good that they won't check me. That is true in a way. The FTC is only one agency and they've got a lot of stuff to do, so they are not going to check everyone. So some businesses are going to be willing to take that risk. I think the risk tolerance level is highly correlated to how big of business it is and the kind of presence it has with consumers. Big businesses just need to be more attuned to this risk. There are just all kinds of factors to consider in deciding what level of risk you are okay with. To summarize, the FTC definitely made it clear that online advertisers are also held to substantiation rules. Another thing they made clear is that online advertisers are also covered by rules about so called stealth marketing. This basically means that if it's not obvious that the advertiser is being paid to do this advertisement, there needs to be a disclaimer saying he or she is in fact being financially compensated. Like Tiger Woods advertising golf balls, we all know he is being paid to do that, so they don't have to put a little disclaimer up there saying "Tiger Woods received money to do this". However, there are more subtle things, like if you are going through a chat room and you leave a comment on a blog about a product and link to it, it is not necessarily obvious that you have a financial association from the merchant that would require disclosure. It's considered to be material to the consumer's decision, because they never really know how much to trust this person and what they are saying about the product. The consumer's perception is going to change if they realize that that person is being paid by the company that makes that product. So if you are an affiliate for someone and you write a nice post talking about how great of a product it is, you should disclose somewhere on your site that you make money every time someone buys the product you are talking about. I think that has people really scared. In the online marketing world right now, there are people who think that that's really not a fair law. I personally think it maybe makes marketing much more difficult, and I feel like it hampers peoples' creativity, but it's probably a good thing for consumers to know if someone is being paid to talk about a product. So it's uncomfortable, but it's probably overall good for e-business. It will increase trust in the marketplace, which I think is always a good thing. Eric Enge: Right. And when you look at that from Google's perspective and their effort to address paid links, what's being said here is not only is it Google's position, but it's the FTC's position as well. Sarah Bird: That's exactly right. And you'll notice that that's why Google had to move to adding the words "sponsored-links" on its SERPs page through that same idea. We can't just say that these all happened to get to the top of the page because they were the best matches to your search, because people actually pay Google to put these here. Google is kind of ahead in the game in that regard. The EU had a law come down specifically addressing stealth marketing and word-of-mouth marketing last summer. They had a very similar disclosure requirement about the financial relationship between marketers and products. So the FTC is actually little behind on this, which I think is causing a backlash now. People have gotten used to marketing in certain ways, and they are kind of getting a rude awakening now. You've got to disclose who you are to the consumer, make the relationship clear. Eric Enge: Thanks Sarah! Sarah Bird: Thank you Eric! Have comments or want to discuss? You can comment on the Sarah Bird interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com Tue, 07 Apr 2009 15:38:54 +0200 Published: April 6, 2009
Bill Mungovan is currently the director of product marketing at Omniture responsible for their SearchCenter product. Previously, he helped build the search engine marketing practice at Carat, working directly with agency clients in exceeding their ROI and branding goals through search. Mungovan brings a broad range of skills in search advertising optimization, account management, search directory development and search content production to his role. He previously served as Director, Client Relations at LookSmart where he was overseeing the day-to-day operations of the Account Management and Customer Service teams. Prior to joining LookSmart, Mungovan worked for Snap/NBCi in the Search and Directory space, and also worked at CNET in San Francisco. Mungovan has shared his search expertise as an invited panelist at several industry events, including Search Engine Strategies in New York, San Jose, Chicago and Dallas, PubCon and OMMA West. He holds an MBA from the University of San Francisco and a B.A. from the University of New Hampshire. Interview Transcript Eric Enge: Let's start by talking about a basic overview of Omniture, and then move into an overview of Omniture SearchCenter. Bill Mungovan: Omniture is a general value proposition to the market. Our Online Marketing Suite includes our web analytics tool, SiteCatalyst. It also includes 9 other products, such as Genesis, an integration tool that pulls in data from other sources and Test&Target, a landing page optimization and a multivariate testing tool, and of course SearchCenter. Eric Enge: Is Test&Target based off the acquisition of Offermatica? Bill Mungovan: Yes, Test&Target is basically based off of the Offermatica technology. It is a dynamic Landing Page Optimization with Multivariate Testing on its landing pages. SearchCenter was basically built in the context of that marketing suite. SearchCenter is what we call a search management tool, in that it accesses each of the major search engines from a single location and provides automated bid management and portfolio optimization. You can access all sorts of different reporting functionalities through the SiteCatalyst integration. We think about SearchCenter as a tool for search marketers, but given the fact that we have Genesis, we can pull data in from other sources, like an email provider, an ad-server, a CRM system like SalesForce.com or a client's custom, internal database. Eric Enge: Right. Pulling in data from other sources is one of the big challenges with bid or campaign management. People are so used to treating everything like they are direct response marketers. But a company that has physical locations, and a web site, is likely going to have interactions with people going to the web site and buying offline, and vice versa. So, being able to pull in data from other sources allows you to credit those campaigns in a meaningful way so that you can more effectively manage your bidding strategy. Bill Mungovan: Yes. That's the heart and soul of the way we think about search, which is obviously the hot topic. We use SiteCatalyst to collect all of that data. So, in your example, you could pull in point-of-sale data or data from a call center. There is really no shortage of examples there. Then we can generate bid rules and bid strategies based on that data. That's how Omniture thinks about the world, given the fact that we have SiteCatalyst as an underlying platform. We can pull data in from all these different sources, and then use that data not just for attribution, but also to improve bid strategies. We have clients whose web sites generate more sales over the phone than they do on the actual site itself. Say they sell complicated items that people want to talk through over the phone. We need to be able to tie back exactly which keywords led to sales over the phone, and how much those sales were worth. So, it's not just attributing a sale to the correct channel, it's actually determining bidding based on that data. Eric Enge: You made reference to portfolio management, and there is an aspect of that that I'd like to dig into a little bit, which is the notion that if you are bidding on a very high volume keyword it's really easy to get enough data to make decisions about whether that keyword is profitable or not. But, we have the long tail, where the data is scarcer. It's maybe only a few clicks a day, or maybe it's a large pay-per-click account that has hundreds of thousands of keywords that get a few clicks every a week. So, by portfolio management, do you mean a strategy for looking at those keywords in a more holistic group fashion? Bill Mungovan: Yes, that's exactly what it is. It's just an option for us to have two types of bid management in the system. One is the bid rules, which are basically just if-then statements. So, if you are getting this much revenue from a keyword, then you should increase the CPC by a little bit. But we also have portfolio optimization on the other side, which is just another option for marketers. We found that having both presents more options to our advertisers. Now, with respect to the question of not having enough data to actually understand what's happening on a keyword by keyword basis for long tail keywords can happen. That's the biggest fundamental problem with portfolios of keywords. I think the portfolio optimization approach does not have enough data, and our tool projects it out based on what limited data we have and what we think may happen in the future. If there is no data, there is only so much we can go on. After a certain point, we just assume that that keyword is just not going to generate any clicks. But, that's one of the problems that we see. We do mathematical projections for the future based on the limited data that we have. What I am getting at is that our approach to search is the opposite of complicated mathematical Black Box formulas. We also have that built into the tool, we just don't believe fundamentally within Omniture that you can click a button and your entire search marketing program will be quickly taken care of. That's a Black Box approach that we feel has run its course in the market. We just don't believe that there is any single approach to bid management or search marketing that's going to work for many different clients. It speaks to the broader vision in which we view search, which is that we are not an agency, but we have agency services within Omniture. Our goal is really to be as transparent as possible to our customers. Transparency is a key issue for us, as we have many clients who have us manage their search program for a very short period of time, about three to six months. Then we coach them along the way on how and what we are doing. We get them up to a certain level of performance, and then give that over to them in-house. So, not relying on service revenue the way an agency might works to our advantage, because we can give full transparency to our clients. And that model has been working pretty well for us. So, to get back to the portfolio question, the idea of us being able to take care of the whole thing for you is just impossible in our mind. Eric Enge: Right. Can you give me a set of things that you are managing? There may be ten keywords that are producing great volume, another fifty that are producing marginal volume and then some that produce less than 10% of the volume of the high-volume keywords. You still want to be able to manage those less than 10% keywords at some level, correct? Bill Mungovan: Yes, and that's an area where you would apply different rules to the different types of keywords. One thing we tell clients a lot is there is no faster way to lose money in search marketing than to set up the wrong portfolio or to really have poor performing keywords dragging down the average of some of your best performing keywords. Similarly, you wouldn't necessarily want your highest volume keywords in the same portfolio as your lowest volume keywords. You may, depending on what the keywords actually are. But you may not, and we want our clients to be fairly careful about how they set up the rules in their portfolios if they are, in fact, using that particular feature. You may actually give more credit to certain keywords because those at higher volumes are doing all the heavy lifting. Eric Enge: Let's dig a little bit into the announcement you had recently with Scotts MiracleGro. Bill Mungovan: In general, we are doing more and more deals within Omniture, both in the SearchCenter business and in other pieces of our business that involve multiple products. And Scotts is a good example of that, because essentially there is only so much you can do if you just think about search engine marketing as a silo. So, by bringing in data from other sources and using it effectively, we opened up a lot of different options for search campaigns. That's what Scotts is trying to do. We don't have results for this particular example, just because it's a new announcement for us. But, in general they were having a hard time understanding exactly how email marketing campaigns could be used to remarket. And they also want to know how email marketing may have impacted or not impacted what happened on their site and what happened in their search marketing program. Scotts was trying to take a more holistic view of their online business optimization efforts. They made the choice to stop thinking about email as one silo and search as a separate silo. And so, by using SiteCatalyst as their platform, they used Omniture Genesis to pull in the ExactTarget data, and SearchCenter for their search data, and measure it all in one place. Eric Enge: Right. So interactions can be more easily understood. Bill Mungovan: Correct. And, on a related note, they had problems with what they called Post-Click Behavior, which is basically visitor engagement and what happens on their site once they attract a customer. They've stopped thinking about email and search as just visitor acquisition tools, and started to think about the whole thing holistically. They can see what happens when somebody clicks on a keyword and comes to their site, including where go, how much time do they spend on each page and what are they engaging with on the site. And they can do the same thing with email as well. Once someone opens the email and clicks through to their site, what they do and what is most important to them can be determined. By using all those products in one place, and using SiteCatalyst underneath it all, Scotts was able to gain that level of insight. These are relationships that we are pulling together these days, because people want to start to look at online marketing more holistically. Eric Enge: I believe ExactTarget is the email platform that Scotts is using, and there is an integration of data between email and the search campaign. What are some of the other data sources that can be pulled in and integrated in a fashion like this? Bill Mungovan: Omniture Genesis is the name of the product that is designed to pull in data from third-party sources. So, ExactTarget is one of many, many email providers we know of.. There is also ad-serving data, which allows display data to also be pulled in. Another very big category for us is CRM data. By tying actual backend CRM data to upfront advertising, or search engine marketing in particular, you can start to learn a whole lot more about what people do after the lead has been generated. You can also include call center data. That can be anything that people do with an SAP or an Oracle database, any of those enterprise-level systems which may be point-of-sale data, such as data from a system of kiosks. There are really two types of data: marketing data including online data, such as email and display advertising, and offline marketing data. So, data from the television marketing or any kind of offline media can also be pulled in depending on how it's structured in whatever system it's currently in. That's the one side of the advertising data. The other side would be backend sales data, which is the CRM, Kiosk call center data and the other enterprise systems that may live in an SAP or an Oracle database. Eric Enge: Right. And there has to be some pretty interesting things going on there to pull in CRM or call center data, which clearly can be massive in size. Bill Mungovan: Yes, and we have a product called Discover OnPremise for when it does get too big. It's something we got from Omniture's acquisition of Visual Sciences. For example, we have a rental car customer who is trying to figure out exactly how many people book online. Then they'll go to each individual location around the country and observe how many people actually show up to pick up the car they reserved online versus people who don't. Then they see what people actually buy, how far they drive and all other sorts of data like that. As you can imagine, it just gets absolutely massive at that point. So Discover OnPremise is a much more powerful and robust tool for when integrations get well beyond the needs of a standard advertiser. But, we do have advertisers who have millions of keywords in SearchCenter and are tying some of those actions back to the systems that don't have anything to do with what happens on their actual site. So, it starts to get pretty interesting at that point. Eric Enge: Are there ways to create ties into TV advertising, print advertising and radio advertising? Bill Mungovan: Well that's really the million dollar question that every advertising agency in the world is trying to figure out; exactly how does offline data impact online behavior and visa versa? And what we propose to people is to pull that data into SiteCatalyst, start to figure out your own correlations and, if possible, figure out the causality between different marketing programs. For us, we just provide the repository for the data, and then we allow agencies and advertisers to actually start to figure out what is occurring on a campaign-by-campaign basis. But yes, you can pull that data into SiteCatalyst. Eric Enge: What are some of the strategies for how you provide the data to SiteCatalyst? And what kind of data is it that you are providing in some of those more difficult scenarios? Bill Mungovan: I believe CRM data is the right place to focus the discussion because it's just a little bit more tangible and measurable. For example, Omniture uses SearchCenter for our own marketing efforts in order to get more Omniture customers. A really common scenario for us would be to run an online advertising program and then generate a lead on a web site. But what actually happens to that lead, at least in our case, is that it then goes to a sales force. The sales force follows it up, and some percentage of those leads actually turn into customers. We track it all the way down to how much we spent to acquire that customer, both online and through our sales team, and then we figure out what we've got in return for that. In our case the CRM system we use is Salesforce.com. But there are any number of CRM systems from which we can pull the data. For us, Cost-Per-Lead is a pathway to one very small piece of the full picture, which will actually help us figure out Profit-Per-Click. So, if you are able to figure out how much you spent on all operating costs, you can pull that data in through CRM integration, and then actually bid on the keywords that lead to the highest profitability for your business. Those are some of trickiest, but most interesting and most progressive features. Eric Enge: Let's talk a little bit about some specific tactics. For example, you know your paid search campaign results in phone call orders. And one tactic you can implement to make tracking much more effective is to give everyone that comes to your search from a paid search ad a custom 800 number. This way you can know the results of your paid search campaign just based on what number they call into. That's a tactic that is designed to give you much more accurate data. Bill Mungovan: Yes that makes sense. And another tactic that one of our clients is doing is automatically generating codes on the site itself. This way the customer can actually see that code, so each customer who visits the site from a given campaign will be identified. And we can actually get it all the way down to the keyword level. We know what keywords they came from that led to a call. Customers see a certain code on the site and then make a phone call and either make purchase or not. Then we have the call center actually take that code in from the customer, so we can record where that customer came from and what they did on the site. Then we can pull that data back into SiteCatalyst and make bidding decisions based on what happened. Eric Enge: You can also give customers that walk into a physical store a rebate as a part of some promotion that the store is holding. Then they collect the rebate by going online, filling out a form and plugging in the rebate number. Then the web site can check cookies to see if the person came in from a search campaign of some sort. Bill Mungovan: Yes. But we wouldn't necessarily be able to tell what specific keyword they came from. It is still a good example, but we've actually seen the opposite happen as well. When people come online from a specific keyword they come through to a page and have to print out a coupon that contains a bar code with information in it such as the keyword they searched on, bring it into the store and then redeem it in the store. What we can see there are two things; how many people print out the coupon and do not go to the store, and how many people print it out and actually redeem it. So that's just another way of understanding what's driving people to make offline transactions. Eric Enge: Exactly. Try to discover every aspect of the interaction that you can. Bill Mungovan: That's something really cool that we've seen with a retail client. It's pretty complicated, but it's very interesting at the same time. The client is able to look through SiteCatalyst as they are running geo-targeted campaigns. Again, these are big box retailers with stores in many different locations, and they are running different ad campaigns for the different geo-locations. And they can see what people are purchasing online and, more importantly, what products people are bundling together in a given geo. So, there might be a video game and CD on sale in the upper Midwest, and that particular bundling may be very, very different from what people are bundling in Los Angeles. So what they've done is taken all the online data and figured out what products people are bundling online. Then they rearrange the actual placements in the store based on what's happening in that geo. So, when you walk into a store, you would see two products next to each other on the shelf based on what people are doing online in that geography. Eric Enge: So you basically isolate the best way to put together bundling based on how people are behaving in different areas? Bill Mungovan: Yes. We figure out what they are buying based on the digital shelf and apply that knowledge to the actual store and rearrange products accordingly. Eric Enge: What would you recommend to someone running a TV campaign? Bill Mungovan: It is absolutely critical to pull your TV data into the same system where your online advertising data is running. You should at the very least make sure that you are actually measuring apples-to-apples in one place. So, it's not an easy question to answer in terms of what TV campaign yields the highest possible return online. That's a very complicated thing, and it will be different for every customer. But our advice to the market on that is to pull the data into the same place and then start to run reports on correlations between media in a given geo and what's happening online. Eric Enge: You can also try things like Vanity URLs, but things like that are very uncertain. Bill Mungovan: We have seen a lot of studies that tell us that very few people actually remember your URL address from the end of your television or radio ad, and even fewer go on their computers and actually type it in. For us it's more interesting to just let the campaigns run separately. So, you have your search campaign online, a display campaign, and then a TV campaign. Then pull the data into one place and use analytics to figure out the correlations. Say you saw a bump on the 21st of January, you can find out exactly what media was running in which geo, and then you can start to make correlations between the two. So, I think that those tricks of Vanity URLs and things like that don't work in every case. Eric Enge: Right. Well, I would think that there is a risk of actually lowering the actual return in return for trying to figure out how to measure it. Bill Mungovan: That's right. Eric Enge: Can you outline how the pricing model works for SearchCenter? Bill Mungovan: We typically charge as a percent of ad spent so the more you spend, the lower the percentage. We have customers in all shapes and sizes. We've had clients take it in-house and then they just felt like they really couldn't handle it for a while, and they then requested the additional help of our services group. So we manage it for them for a while, or on an ongoing basis, for an additional percent of ad spend fee. Then after a while we can give it back again when they are ready. We have some flexibility as part of that offering. Eric Enge: Can you say anything about some other well-known customers that you have using the service, and the total spend you have under management? Bill Mungovan: Sure. We have 600,000,000 in spend under management. One example of a large customer that I am allowed to disclose is Delta Airlines. They are using an agency called . We have both agencies and direct clients using the tool. And we have other retailers, like Backcountry.com, using the tool as well. Eric Enge: Thank you Bill! Bill Mungovan: It was good talking to you, thanks a lot! Have comments or want to discuss? You can comment on the Bill Mungovan interview here. Other Recent Interviews
About the Author Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns. Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com. For more information on Web Marketing Services, contact us at: Stone Temple Consulting (508) 485-7751 (phone) (603) 676-0378 (fax) info@stonetemple.com |
|
contact |