Over the past few months I’ve made a number of changes to Liliputing and Mobiputing in an effort to recover from a loss of Google search engine traffic following the company’s Panda update. Here are some of the things I’ve done. You can find more details and the back story below.
- Remove some web pages from Google’s search results and cache.
- Switch from a full text RSS feed to a partial feed to reduce the number of sites automatically posting every article from my web site on another web page.
- Send DMCA takedown notices to sites that have been scraping my content.
- Add “nofollow” attributes to all affiliate links on my web sites.
- Remove all broken links and fix links that redirect to another page.
- Use an SEO plugin to block certain web pages from Google.
- Change image galleries so that instead of opening a new web page with a new URL and very little content, you can now view large images on the same page.
- Remove the Liliputing Product Database (at least temporarily).
A few months ago I discovered that traffic to my web sites was dropping. The change coincided with Google’s Panda update which was designed to identify “content farms” which crank out high-volume, low-quality content. Unfortunately a number of web sites that produce 100% original content have been caught in the crossfire, because the truth is that Google’s computer algorithms can’t always tell the difference between the original source of an article and the hundreds of sites ripping off the content.
I’ve tried to take a Zen approach toward the traffic loss, understanding that it’s not as if Google owes us any traffic at all. More than 50% of traffic to Liliputing has consistently come from Google, and if the search engine determines there’s a better source of information than my web site, so be it. I’ll just work my hardest to find other ways to generate traffic and hope that Google notices what the people who subscribe to my RSS feed, Twitter and Facebook pages, and just regularly stop by to participate in the comments already know: there are few places that cover the affordable mobile tech space as extensively as Liliputing.
Unfortunately, what I’ve realized over the past few months as traffic has continued to fall, is that without Google, my traffic and revenue wouldn’t disappear altogether — but it would be much harder to sustain Liliputing and Mobiputing. Right now running these web sites is my full time job and I have a few side projects. If Google decided to stop sending any traffic, I’d probably have to find a different full time job and treat the web sites as a side gig — which would necessarily decrease the quality of coverage.
Anyway, Zen-like aspirations aside, I’ve been struggling frantically over the past few month to identify reasons why Google could be grouping my web sites with the content farms and delivering lower traffic. My first big breakthrough came when I submitted a request to Google for reconsideration.
While most web sites that have lost traffic due to Panda have been affected purely by an algorithmic change, it turns out that Google employees have also taken manual action against some web sites. Liliputing was apparently one of them. I submitted reconsideration requests for both of my web sites, and a week letter I got responses letting me know that there was no manual action taken against Mobiputing, but there was one for Liliputing.
Google doesn’t actually tell you why the manual action was taken, only that it was and that your web site is therefore considered low-quality. It’s often the case that a few bad pages on your site can lower the ranking of all pages on your site, and with more than 7,000 articles on Liliputing I didn’t relish the idea of looking back through each one to identify potential problems. But I also didn’t think I’d have to. I wrote most of those articles myself, and I have complete trust in the handful of people who have contributed other articles to the site. The problem must lie elsewhere, I figured.
So the first thing I did was seek out help from the Google Webmaster Forums. Almost immediately some helpful folks identified something I hadn’t noticed: a spammy link to an adult web site in the footer of every page on Liliputing. A few years ago my web site was hacked and I had thought that I managed to remove all of the code that was injected into the site. I was wrong.
I removed the link from the footer and spent the next few days looking through every file in my site’s theme and running scans on my database looking for several problems. I submitted a reconsideration request with Google again and waited.
Seven days later I got a reply. My request was rejected yet again. Maybe the problem wasn’t the spam link at all?
That’s certainly a possibility. But I noticed that when I ran a Google search for that term with the qualifier site:liliputing.com, thousands of pages were still showing up. Google takes a long time to completely crawl and index web sites, and it takes an even longer time to detect changes. Even though the link had been removed, it’s possible that Google’s computers hadn’t noticed — even though I explicitly stated that the link had been removed in my request.
So I might have to wait weeks or months before Google removed all references to the spammy link — but it was also possible that the problem lied elsewhere. So I began an effort to identify other potential problems and take action. Over the past two months I’ve tried about 8 different strategies for improving traffic and getting Liliputing taken off of Google’s hit list.
About 7 weeks after submitting my first request for reconsideration, Google tells me that the manual action against the site has been removed — although it may take some time for Liliputing to be re-indexed, and there’s no guarantee that traffic will improve because the latest Google algorithmic changes could be the real problem.
In fact, it’s occurred to me that the manual action could be related to the spam link to the adult web site which may have been around for several years. While it’s possible that Google didn’t notice the link until the Panda update rolled out, it’s also possible that I’ve been blogging with a handicap for years and I simply didn’t notice it until I submitted my first request to Google a few months ago. Google provides broad guidelines for web site quality, but it’s not easy to find specific information about why an individual site receives high rankings or low.
Anyway, traffic hasn’t really recovered yet, but it’s nice to know that Google has at least decided that Liliputing isn’t a purveyor of “low quality” content. After reviewing Google’s guidelines and reading hundreds of articles from other web publishers trying to figure out how Google works now that the game seems to have changed dramatically, here are 8 things I’ve attempted to do with my web sites. Some of them are probably good things to do whether they help my Google rankings or not. Other changes I’m a little less happy about, but overall they should help me focus on the core business of producing news and information articles for my two mobile technology web sites.
1. Removing web pages from Google’s search results and cache
When I noticed that a site:liliputing.com search on Google was still turning up web pages that used to include a spam link in the footer, I decided to help Google speed up its efforts to re-index my web site. You can’t really control how quickly Google scans every page on your web site, but you can identify pages that you want to remove from Google.
So I visited Google’s Webmaster tools, opened the “Site configuration” area, and found the “Remove URL” option under “Crawler access.”
From here, you can enter an individual URL and ask Google to remove it from search results and from Google’s cache. If the web page has actually been removed from your web site, Google will stop sending people to that page. If it still exists, Google will probably eventually find the page again but it will re-index the URL so that any text which is no longer on the page will no longer be listed in Google.
It can take up to 24 hours for Google to remove a URL and it can take much longer than that for it to re-appear, but my hope was that Google would realize that I really had removed the spam link if I encouraged the search engine to remove enough URLs. Over the past few weeks I’ve removed over 600 pages… many of those links were also to pages that I probably shouldn’t have let Google index in the first place, such as links to web pages that really only showed a picture or links to tag pages. I’ll talk more about this in a moment.
2. Switching to a partial RSS feed
One of the things I started to notice when my traffic was dipping was that sites which were copying and pasting full articles from Liliputing onto their own web sites were starting to rank more highly in Google search results than Liliputing.
I’ve never been the sort of person to keep track of specific keywords that drive traffic to my site, but I do have a sense of a handful of web pages which have always received a lot of traffic from Google searches.
What I did notice is that if I copied and pasted a chunk of text — say about two sentences — from any recent article on Liliputing into Google, my web site would show up about halfway down the Google search results page. Sits which were “scraping” my content were showing up in the first, second, third, and sometimes fourth, fifth, and sixth places.
I used to ignore scraper sites because Google had always done a good job of ignoring them for me. But that seems to have changed with Panda.
Most scraper sites steal content the lazy way. They simply subscribe to a publisher’s RSS feed and automatically post the entire contents of each post onto their web site — sometimes first stripping out all links so that even though my RSS feed includes a link on every post pointing back to my web site, sometimes the scraped content would provide little evidence that it originated elsewhere.
So I reluctantly switched my RSS feed from one which shows the full text of an article to one which shows just the first few sentences.
I’m not a huge fan of partial RSS feeds and I’ve offered a full feed since launching Liliputing in April, 2008. But by switching to a partial feed, lazy scrapers are now only able to get the first few sentences of any article. A couple of scrapers are a little more ambitious and seem to have found other ways to continue copying full text, but this move has cut down on the amount of Liliputing content showing up on other web pages.
Many observers say that the Panda update has caused Google to penalize sites publishing “duplicate content,” which covers both content that has been published multiple times on the same web site, and content which appears in multiple places on the web. For this reason, switching to a partial feed could possibly reduce the amount of duplicate content and reduce the likelihood of other web sites ranking above Liliputing in search results.
3. Sending DMCA takedown notices to scraper sites
In mid-June Google rolled out refined version of Panda which was meant to deal with the scraper problem — my web sites were hardly the only ones with this problem. But while I’ve noticed fewer web sites outranking Liliputing with text copied and pastes from my site, there are still a few that consistently show up higher in search results than my sites — where the content originated.
I’ve also noticed little change in search traffic since the Panda 2.2 update rolled out. It might just be a matter of time before we see any measurable results. But it may also be that the scraper site rankings are a symptom of something else Panda is doing and not a problem in its own right.
Before I came to this almost-conclusion though, I started doing something else I never thought I would have to do: sending DMCA takedown notices to sites that were scraping my content without permission.
US copyright law is an interesting thing. I’m not sure it makes sense for works to be protected under copyright for more than 70 years after an author dies… but I do appreciate the fact that you don’t need to take extraordinary steps to claim ownership of something you created. Basically throwing a note on your web site saying that you’re the original creator and being able to hold up some supporting evidence if anyone ever asks you to prove it is all you need.
When Google did a good job of noticing that Liliputing and Mobiputing were the original sources of content showing up on unauthorized web sites, I didn’t care all that much about this and figured that if people wanted to use my content for their own purposes, that was their own business. Once the scrapers started to actually grab potential traffic that would have otherwise been coming to my web site at a rate that seemed like it could actually threaten my livelihood, I took a less kind view and at the advice of a friend I signed up for a membership with a web site called DMCA.com which makes it easy to track web sites which are taking your content and send takedown notices formatted to comply with US law.
It’s tough to find contact information for the publishers of many scraper web sites — but it’s often much easier to figure out where the web site is hosted and send a note to the web host. The web host typically doesn’t want to run the risk of a lawsuit, so they’ll probably either get in touch with the publisher (they have to have contact information) or remove the content on their own.
You can also send DMCA notices to Google search and Google AdSense in some situations which can cause web sites to be removed from Google search listings, or even to have their Google AdSense accounts suspended if the publisher doesn’t remove infringing content.
This whole process is time consuming and may really not make much difference. You can only send one DMCA takedown notice per URL. If a web site has copied hundreds of articles from your page, that means you’d have to send hundreds of takedown notices to get all of those pages removed. But hopefully by sending a few you can let publishers know that you’re willing to fight and that the easiest thing to do is to remove your content from their pages.
I was a much happier blogger when I didn’t have to worry about this at all, because Google did it for me. I’m hopeful that eventually that will be true again and I may stop sending notices and switch back to full text RSS feeds. But for now, these are two of the ways to help Google find original content — by helping to remove the unoriginal content from the web.
4. Add “nofollow” attribute to affiliate links
Google doesn’t look too kindly on web sites that have little to no content except for paid links to other web sites or affiliate links to retail sites. While I don’t think Liliputing fits this category since we publish a huge amount of original articles with news and information about mobile tech as well as product reviews, there are some individual URLs on my web site that could look to Google like low quality content with long lists of affiliate links.
I regularly publish a list of daily deals on netbooks, tablets, and other mobile tech bargains. These posts tend to include a number of affiliate links, because I need to make a living, but that information is disclosed on our advertising page, and these pages have actually become some of the most popular because the only items I feature are products that are on sale. Some have affiliate links. Some don’t. All are offered for below their normal price.
It turns out Google couldn’t care less if you have affiliate links. But since Google uses links between web sites to determine which pages are are most well trusted, a high concentration of paid links could mess up Google’s search results. So what the company recommends you do is add a rel=”nofollow” attribute to affiliate links. This means that when Google indexes your page, it won’t place any real weight on that link.
In other words, links that are nofollowed don’t add any link juice. People can click on them as much as they like, but Google doesn’t really pay much attention to them.
So I found a way to identify virtually every affiliate link posted in the 7,000 articles on my web site, and add rel=”nofollow” to each one. You can read more about how I did that in an article I wrote last month.
Before I did this, I actually briefly removed every single “deals of the day” post from the site… but this didn’t seem to make a difference, so I’m happier with the nofollow solution. I brought most of the previously deleted posts back online — but not all of them because I took too long to come to this solution and misplaced some of the older articles. I have backups that I could restore, but it would be a lot of work to go through to bring articles that are mostly outdated anyway back online. I just hate deleting any published content from the web.
5. Remove all broken links and fix many redirects
Google Webmaster Tools isn’t just good for removing links from Google’s index. You can also use Google’s tools to identify problems with your web site — including broken links. It turns out that when you publish a web site for more than 3 years, many of the sites you link to may change URLS or cease to exist, which means anybody visiting your web page may click on a link only to encounter an error message.
While Google will point out many broken links, it’s not very easy to use Google’s tools to identify individual pages with errors or to fix broken links in batch jobs.
Liliputing, it turns out, had thousands of these broken links. There’s an excellent tool that you can run from a desktop computer called Xenu which scans your web site to find a list of broken links. It can take hours or days to scan a large web site, but the tool is very powerful.
If you’re using WordPress though, there’s an even better option. It’s a plugin called Broken Link Checker which not only identifies links, but lets you remove broken links or make other changes in batch jobs. It will also find links with redirects. For instance, I had thousands of links on my web site that were listed as www.liliputing.com, but they redirected to liliputing.com. That redirect takes a little extra time and server power, so removing the redirects may or may not help my Google search rankings — but it’s definitely a good thing to do.
I absolutely love this plugin, but there are a few problems. First, even with all of your broken links laid out in a simple format, it can take a long time to go through and fix them all — and there are some false positives, so if you don’t check carefully you could remove links that aren’t broken at all.
Second, the plugin uses a lot of processing power, so if you don’t have a good web hosting plan, you may not want to use it at all.
Anyway, long story short, I removed thousands of broken links and fixed thousands of redirects. I’m not sure I’ve caught them all — and I’m not sure how much Google cares. But the upshot is even if it does absolutely nothing to help me recover from Panda, that’s a few thousand less links that visitors could click only to be directed to a dead end.
6. Using an SEO plugin to prevent certain web pages from appearing in Google results
There are roughly a gazillion WordPress plugins that are designed to help your SEO or Search Engine Optimization in one way or another. Historically I haven’t bothered using any of them, because Google never had a hard time finding my content. SEO, as far as I was concerned, was writing high quality, easy to understand content for my readers and giving blog posts titles that were descriptive rather than clever.
Then Panda hit, and I read all sorts of advice about reducing duplicate content by doing things like adding noindex attributes to archive pages. That way Google won’t look at pages such as liliputing.com/tag/netbook, liliputing.com/category/1, liliputing.com/page/1, or liliputing.com/author/brad and see the same content on each — and on the individual article.
What’s the easiest way to do that? Install an SEO plugin.
I settled on Yoast WordPress SEO because I’ve had good experiences with other plugins from this developer, and because it offers a number of options that seem to meet my needs.
In the Indexation Rules section I blocked access to the site’s search results pages and admin pages — which are already blocked in my robots.txt file, but I figured it wouldn’t hurt to double up. Then I started tackling additional areas and added noindex to author archives, date-based archives, category archives, and archive subpages.
7. Alter image galleries
One of the things people seem to think can lead to a low quality judgment from Google is pages with thin content and a high ad-to-content ratio. I never intentionally created web pages that match that description, but it turns out that indirectly I created many of these pages because that’s exactly what WordPress creates when you upload an image — particularly when you use the gallery feature.
When I uploaded multiple images for a single post and created a gallery with thumbnail icons at the bottom, it allowed users to click on any picture to view a larger image. Each of those images would open on a new page… but you don’t just see the image on those pages. You see the Liliputing theme, complete with a header, navigation bar, list of featured articles, and advertisements. What you don’t see is any article text. This page isn’t really all that useful as far as search engines are concerned.
Unfortunately, WordPress doesn’t use a standard URL structure for these image-only pages, so there’s no simple way to add a noindex attribute using the robots.txt file. But I did discover a couple of neat tricks to avoid this — and to make my image galleries look a whole lot prettier.
First, I installed the Lightbox Gallery plugin for WordPress. Now whenever someone clicks on an image in a gallery, instead of opening a new page, a full sized image appears on top of the current page while the background goes dark. Not only does this prevent visitors from going to a pointless new URL, but it also allows me to post larger images to my gallery since I don’t have to worry about the width of the content column. I used to only be able to upload images that were 500px wide. Now I can upload much larger images which makes me happy, whether there’s an SEO benefit or not.
But just because clicking on the image doesn’t take you to a standalone image page doesn’t mean that WordPress doesn’t create one. It’s still there, waiting for Google to discover it… but there’s a fix for that.
I used the the Yoast WordPress SEO plugin’s “Permalinks” feature to “redirect attachment URLs to parent post URL.” Now if someone actually does manage to find the individual image URL, they’ll be redirected to the URL for the article associated with the image.
8. Remove the Product Database
One of the most popular posts I wrote during the early days of Liliputing was a “comprehensive list of low-cost ultraportables.” This was back when there were only a couple dozen netbooks to keep track of. The number continued to grow though, and so I created a database with detailed specifications for each model I came across.
Managing the database got a little beyond my ability, so I partnered with UMPC Portal, jkkmobile, and a number of other mobile tech web sites on a much more product database with netbook, laptop, and tablet information.
I’ve always felt that the database provided a useful resource for readers looking to compare specs between multiple devices. But I haven’t done a great job of keeping the database up to date or of fleshing out the individual product pages with detailed reviews, opinions, and other specifications.
That’s left many of the pages with a high ad-to-content ratio, and because specifications vary so little from one netbook to another, there is a lot of repetitive content on these pages.
It never occurred to me that this could be a problem until I looked at the traffic statistics for the product database and realized that visits to the product database were dropping at an even faster rate than those to the main web site.
Last week I made the reluctant decision to take the product database offline entirely. It’s possible it could return one day in another form. For instances, instead of a sub-page of Liliputing, all of the partner web sites could link to a single, central web page where the product database lives.
A few days after removing the product database I got my note from Google letting me know that the manual action against my web site had been revoked. I still don’t know whether the problem all along was the spam link, or if the problem had to do with the product database. Perhaps it was the way Google was indexing my content, or all the broken links.
In order to scientifically test for each of these things, I would have to make one change at a time and wait to see if it made any difference. Since it’s not clear how long it takes Google to react to changes though, (it could be days, weeks, or even months), there’s no way to know how long you would have to wait after making each change. Just going through each of the changes listed in this article could easily take a year — which is a long time to wait while watching traffic fall.
In other words, if my traffic does recover to any degree now that Google has removed the low quality label from my site, I may never know what change or combination of changes led to it.
I suspect that the manual action alone didn’t lead to the decline in traffic. I’ve been told that Mobiputing wasn’t under any sort of manual penalty, but traffic to that web site has dropped as well. Part of that may actually be related to the drop in Liliputing traffic, since Liliputing is one of the biggest referrers to Mobiputing. But I’m hopeful that some of the changes I made to each site will address items that had nothing to do with the manual action.
I’ve learned a lot about search engines over the past few months — but I’ve also learned that there’s a lot that I don’t know — and a lot that so-called SEO experts don’t know either. But there’s something else fascinating I learned: even while acting under a penalty, a huge portion of Liliputing’s traffic comes from search. It used to be about 60 percent. Now it’s closer to 40 percent. Google still does send traffic my way. If it didn’t, there’s little doubt that at this point I wouldn’t be able to make enough money to pay the bills.
There are thousands of loyal readers who visit Liliputing every day without being sent there by a search engine. But it takes more than thousands of loyal readers to generate enough traffic and revenue to make a news and information blog profitable. The exact figure will vary depending on the topics covered, the types of advertising or other monetization employed, web hosting expenses, and so on, but in my case, it looks like it takes tens of thousands.
I’ve created Facebook and Twitter accounts for each web site, and there are a growing number of people using each to interact with the web sites. I’m making more of an active effort to alert other bloggers when I have exclusive stories that may be of interest — something I used to do much more frequently toward the beginning of my blogging career, but something which I’ve been a bit less diligent about doing recently.
Will that be enough? It’s too soon to tell. At this point all I can do — and all most web publishers who find themselves in similar situations can do — is to make the changes they think will help, continue publishing high quality, original content, and in the meantime, try to find ways to continue attracting readers without relying as heavily on Google.
Update: I forgot to mention a few other minor changes:
- I use the KB Advanced RSS Widget to display the latest headlines from Liliputing in the sidebar of Mobiputing and vice versa. In both cases, I added rel=”nofollow” to the formatting options.
- When I removed the product database, I also removed sidebar widgets showing the last 5 entries and the top 5 entries, thus cutting down the number of links and images loading in the sidebar.
- I previously had several hundred posts titled “Deals of the Day.” I went back and manually added a date to each of these posts so that instead you see something like “Deals of the Day (6-24-11).”
- I disabled trackbacks.
Nope, removing those cross-linked posts last week doesn’t seem to have made a difference. Today’s project: I’m adding noindex, nofollow attributes to all posts that contain 100 words or fewer. There are more of these than I would have expected… but that’s what happens when you run a site for a few years and don’t pay that much attention to post length.
A quick glance at Google Analytics suggests that some of these short posts that are more than a year old have received only a handful of page views over the past 12 months — or in some cases, no page views at all. Still, I hate to delete content from the site altogether without good reason, so I’m going to try noindexing the articles instead.
This week I took a few more steps:
- Taking advantage of a new function in Google Webmaster Tools, I let Google know it should ignore certain URL parameters which didn’t affect the content of a web site. This prevents Google from counting a single web page multiple times and considering it duplicate content. These parameters show up when someone clicks a link from Twitter, an RSS reader, or other third party service that I have no control over.
- I also installed SEO Site Tools for Google Chrome and paid attention to some of the suggestions it made for my web site: for instance, decreasing the number of links on the home page.
- That plugin also recommended checking my site against the W3C Validator, and I did manage to detect a couple of errors in Liliputing’s code that were easily fixed.
I also brought the Liliputing Product Database back online this week, but added “noindex” to the header so that Google won’t crawl those pages. This doesn’t seem to have had any adverse effects, but leaving the product database offline for a month also didn’t seem to have any real positive effects. To be on the safe side I’ll probably leave that noindex attribute in place for now.