Robots.txt – the world’s view of you in just nine characters
What is a robots.txt file?
Classy hotels always have a concierge who performs multiple functions (see what Rory Sutherland has to say about them in his book Alchemy). The concierge is there to welcome everyone and help with their immediate requests but also there to keep out the unwanted. In short, concierges are a signal.
Robots.txt files are like a digital concierge for search engines. They tell search engines which pages to rank and where to find the sitemap.
Robots.txt is a little file that should be on your website somewhere like this:
Take a look at it. Yes – mine is really that simple – it only does two things:
- It hides the admin login address (which is the WordPress default) to avoid hackers trying to hack my site; and
- It says to search engines, "please index everything on this site".
Which is why our site will rank highly for some pretty juicy terms like The Digital 100 even against Forbes.
Let me guess, not every top 50 law firm has a robots.txt file?
Well, I was looking at the top law firms’ websites this week to see if they all had robots.txt files. And the answer is: No. They do not.
Is that a problem?
Some of the largest firms on the list were the biggest offenders. Yes, the search engines will have scanned their sites effectively, but they come in expecting a concierge, don’t see one and mill around the lobby looking for a map of the site instead.
It’s not the end of the world, but it is a case of making life harder for your main provider of new traffic/readers, so it is a bit of an odd choice.
How many law firms lack the robots.txt file?
One-quarter of the top 50 firms (12) don’t have a robots.txt file at all.
Twenty firms have them just right.
Which leaves us with 18 who have gone a little off-piste.
Is this going somewhere?
I looked into this latter group and that’s where this gets interesting.
CMS.law has to hide all of its blog posts from the search engines as otherwise, the content will get penalised by Google rankings for being duplicate content (as it’s the last firm to run a separate knowledge strategy also using law-now.com whereas Pinsents wrapped outlaw.com into its main site mid last year and saw a massive boost in rankings as a result). They could solve this with a canonisation rule instead – but seem to have opted for this blanket approach (possibly both). It feels sub-optimal for a user journey to have two separate sites (because in our sector, content attracts and people convert).
Another massive firm has literally listed some individuals in its robots.txt file. Since they all now work at other firms, one can only assume that this is a “bad leavers” list. Ouch. Why would you do this given that any journalist can access that list and make a few calls? Their pages no longer exist so these can be removed and they now rank on google at their new firms (I checked). They’ve moved on so the firm needs to too.
Then, a third major firm has decided to hide just one document. Weird. I had to take a at it. It’s just a series of deal tombstones. Perhaps a client complained about being on there or a partner wanted it hidden. But it’s a bit odd to hide it in plain sight like this.
Next, Irwin Mitchell hides its more junior lawyers from ranking in Google searches. It’s quite a dated approach and assumes headhunters have not heard of LinkedIn (never mind Sales Navigator). It also assumes that clients come through the front door (home page) when looking for your people. They don’t. They’ll google a name and click on the top one or two links.
And then, wowee, it’s my favourite – this is basket case territory.
One firm has literally the longest list of departing partners, people, practices and all their things that relate to them in their robots.txt file. The problem is that at some point, they lumped in one practice area that I assume that they might have had problems with previously. But they have left it in there.
So, this practice area has over 400 blog posts, all of which are hidden from Google.
To be clear, you or I can find them if we navigate to them through the site (that route is roughly around 16% of all blog and insight reads), and they can be linked to from emails (which is great for existing clients), and they may even be spotted through social media. But the 70%+ of traffic which should come to those pages through search engines won’t ever come because the firm has disallowed the pages from ranking.
This team’s latest blog may not be the finest piece of legal marketing ever crafted (too many passives and written at the wrong level) but it was timely and should be allowed to function as a lead magnet. But it will never see the light of day because the firm is banning it from appearing on search engines. I have a screenshot of all the other firms who rank on Google for the same topic whose articles were all released on the same day.
Do you need help with the big strategy and the tactics to make your digital marketing improve?
If you’d like to hear more about how to avoid these basic errors, we can do an audit of your digital practices and channels, please just let us know.
I’ll also be sharing more results (like the one below) from The Digital 100 at a talk next week. See you there?