How to deal with smarter crawlers

Long gone are the early days of digital marketing where simply having keywords in all the right places got you straight to the first page. How do I deal with smarter crawlers you may ask? My answer to my clients would be to stop trying to outsmart them.

There is just not a feasible way to game the system. No quick tricks, hacks, spoofs, or methods to clinch those top spots. In this day and age, you simply (easier said than done) have to have content relevant to a user’s search along with good user experiences. In a recent article, Search Engine Land found that 35% of domains ranking for high-volume keywords don’t have the keyword present in the title. This is suggesting that Google’s algorithms are getting better at understanding context and synonyms, and/or that keywords in the page title are becoming a less important ranking factor. SEMRush found that content length generally had a positive correlation with search rankings; content for pages in the top three positions is 45% longer, on average, than content in the 20th position.

What these stats should tell you is that you should write content for people, not for bots. Write content that is shareable, helpful, and interesting. Give it value. Build content that people want to read. The crawler’s understanding of content has gotten exponentially better in the past few years.

Lastly, make sure these important pages are crawl-able. Pages that cannot be seen by the crawlers are not being shown in SERPs. Good places to check are your “robots.txt” and “no index” meta tags on important pages.

In summary, make sure your site is crawler friendly. Have an organized URL structure, provide a robots.txt file and sitemap in Search Console so crawlers have a map of your site, write coherent and valuable content in regular intervals, and improve on your sites’ UX so users can easily interact with your site.

How to deal with malicious crawlers/bots

If you have the access or know your webmaster well enough to them for help, a good place to start looking would be your server’s Access Logs. This is where the server captures some relevant user information when they connect to your website. Information like IP Address, their User Agent, Device, and Operating System.

Download these log files, consolidate all of them, and open the file in Excel. Sort the number of Hits in descending order so you can see which IPs are hitting your server the most often at the top. Now, identify the Users which have hit your site a significant number of times greater than all other users with no User Agent specified. We’re talking thousands, and in some cases tens of thousands of times, more than the next lowest.

Use any one of those IP address lookup websites out there to identify what country the IP address is coming from. If it’s from one of the ‘usual suspects’ and has a significantly great number of hits than all other users, go ahead and block it from recording any data in Analytics (by setting up an IP Filter) and plug that IP into your server’s IP Deny Manager so they can bother you no more.

Conclusion

A lot of CMS’s now have plugins which have the ability to block malicious bots and IP addresses. For example, if you’re using WordPress, you can take advantage of the Wordfence plugin to block IP addresses from accessing your site without having to deal with complicated server apps and log files.

If you’re running into trouble with communication between your site and Google or if you have malicious bots harassing you constantly, don’t hesitate to contact the experts here at Wakefly.