Using Robots.txt with Addon Domains
by Ethan Glover, Sun, Apr 05, 2015 - (Edited) Mon, Dec 25, 2017
Do you host many WordPress websites on a single account? Your robots.txt file on your primary domain applies to all your websites.
This makes sense from a technical point of view. When a spider crawls your site, it looks for yoursite.com/robots.txt. But, because addons are in your primary domain folder, things can get complicated.
For instance, if you have yourprimary.com/robots.txt and youraddon.com/robots.txt with WordPress, guess what happens? If you go to youraddon.com/robots.txt, you'll see the data in the file at yourprimary.com/robots.txt.
So what happens if you want to have different rules for different sites? That's exactly what I wanted to do. My primary domain is libertyresourcedirectory.com (LRD), and it's not exactly SEO friendly. So, I wanted to block spiders from crawling it.
My first thought was to use a simple meta-robots tag in the head section of the HTML. Yet, search engines give priority to the robots.txt file. And because I want the file for ethanglover.biz, I can't delete the one for LRD. (It's the same file!)
So, here's the solution. If you were to look at this sites robots file, you'll see that I've disallowed a few folders. Those folders don't exist here on this site. Rather, they are folders that I use for LRD.
If you were to look at the robots files for any of my sites you'll see the exact same layout. Yet, the robots files for those sites don't exist. Interesting enough, this only extends into sites with WordPress. This phenomenon doesn't reach this MediaWiki site which does use its own robots file.
I'll keep investigating to find out why this happen. But for now, if you're having this issue. I've ran tests with Screaming Frog and have confirmed that it will not crawl LRD. But it will crawl this site without issue.
UPDATE: I have since moved away from WordPress and this is no longer an issue for this site, feel free to use these strategies if you still have the issue.