Episode 39: AI bias in guidelines and should you disavow?

MC: Welcome to Episode 39 of the Search with Candour podcast! Recorded on Sunday the 8th of December 2019, I'm Mark Williams-Cook and in this episode we're going to be talking about the Google Quality Rater guidelines update and how this relates to earlier documents that were linked by Google about bias in search, link disavowing - I've got just something I want to say about what we're currently hearing in the community about how link disavow should be used versus what Google are telling us, lastly, there's been a small update to yet another Instant answer zero click search allowing Google, as well as some free learning resources you can find.

The Google Search Quality Evaluator or Rater Guidelines were updated on December 5th, which was a Thursday. They’re closely watched obviously, by a lot of people to get any steer on when Google's changing things or re-evaluating how they are evaluating things and there were a couple, it's not a huge amount of changes this time round, but there were a few interesting things in there that I think are worth discussing.

All the time, before I ever speak about the Quality Rater Guidelines, I feel like I need to preface this by saying two things; so if you haven't read through them before they are a fascinating read, it's about one-half hundred pages that are provided to people that manually rate sites according to these guidelines. It's important to state that if you haven't encountered them before, that these people’s manual reviews - and this is the last report I heard is in excess of 10,000 people that are remotely working, rating search results and pages - these manual reviews are not directly impacting search results, so it doesn't mean that somebody is going to rate your website according to these guidelines and then your website ranking is going to change.

The purpose of these raters is for Google to assess the output of its algorithm, so to ensure that what they are judging as good, bad, spam, brilliant content is matching up with a human reviewer, based on a set of rules that they're given which is an important distinction and additionally to that, I think it's worth saying that although Google provides these rater guidelines as a set of techniques for assessing sites, these are not even necessarily the same techniques or things that Google are looking at.

So to give a rubbish analogy because it's the only one I can think of on the spot is, if you imagine that Google's job was to identify animals and they were doing this entirely through looking at DNA, so genetic material of animals, so they get some scraps of DNA and they look at it and they try and figure out what animal it is, they're building an algorithm and a database to do this and they want to verify, to test how accurate that algorithm is, so to do that they're going to get humans to identify this animal as well and see if their predictions match up.

Now, the difficulty with humans is that any individual person might not know the difference between a manatee and dugong for instance as they're similar animals, so Google's going to provide them with a set of guidelines which to classify these animals, so if they're looking at ducks and they want to say, okay, I want to find out if someone's looking at Mallard they're going to say like, does it have a greenish coloured hair? Does it have to webbed feet? etc. and from this they should be able to identify which animal, which species it is they're looking at. Now if you were to look at those guidelines you may think, ah look Google is working out what the animals are by seeing if it has two legs and if it has webbed feet and if it's got a greenish head, when actually that's completely false because we've said that Google is actually trying to identify the animals just by looking at their DNA, their genetic material and they're just providing us with a way to verify the results.

So that's how I personally think about these Quality Rater Guidelines, is that they're really helpful to let me know the direction and things that Google are thinking about and they're taking into consideration, but you've got to take on board it's not precisely and comprehensively all the things Google is looking at, so just because you take your chicken and you paint his head green doesn't mean you've got a Mallard. I hope that I’ve cleared that up and not confused people any further.

So the actual updates in the search quality rating guidelines is, they've added in a whole kind of six paragraphs at the beginning, this introduction to this search experience and I just want to read out two of these paragraphs for you because I think they're the most reflective of the changes that are being made.

So the fourth and fifth paragraph of the six that are new in the introduction say: ‘Different types of searches need very different types of search results. medical search results should be high-quality, authoritative and trustworthy. search results for cute baby animal pictures should be adorable. search results for a specific website or web page should have that desired result at the top. searches that have many possible meanings or involve many perspectives need a diverse set of results that reflect the natural diversity of meanings and points of view. people all over the world use search engines therefore, diversity in search results is essential to satisfy the diversity of people who use search. For example, searches about groups of people should return helpful results that represent a diverse diversity of demographic backgrounds and cultures.

So that's two paragraphs from a whole new introduction and then they've made several updates to existing sections where they've added in some clarifications here. So one of the sections is called ‘Raters must represent people in their rating locale’ and they've added a paragraph saying ‘unless you're rating tasks indicates otherwise your ratings should be based on the instructions and examples given in these guidelines. Ratings should not be based on your personal opinions, preferences, religious beliefs or political views, always use your best judgment and represent the cultural standards and norms of your rating locale.’ and lastly, there's another section, quite deep in the document that just adds in an extra sentence saying ‘keep in mind that users are from all over the world, people of all ages, genders, races, religions, political affiliations etc.’ and actually political affiliations is specifically added a few more times throughout the document.

So, it's quite clear from this particular update that Google has amended this document to try and bring that bias that people may have around their own preferences, religious beliefs, political views to the forefront of their mind, so it becomes less of an unconscious bias and people try and consciously think through and around and why this particularly interested me was, I think it was back in August this year so 2019, this year there was quite a big story about an ex Google engineer leaking a thousand or so pages of internal Google documents, the whole thing came about as part of this project Veritas and the general consensus was that the documents looked authentic and there was all sorts of claims made there which I won't go into now and I had to look through a lot of these documents and one thing that struck me within these was there was an internal presentation from Google which was talking about and being aware of the fact that some of the AI systems, the machine learned systems they have been building, did have bias built in so whether that be gender, race, religion, politics, boiled down to the layman which I certainly am - the point to take away from this presentation was that the majority of the data that was being fed into these machine learning systems was obviously created by people, by humans and therefore very likely contained bias. That bias is then translated, scored and weighted into the resulting algorithms, which then makes for this nasty cycle and with some of the project Veritas thing was, oh look Google's manipulating search results in certain ways and I think part of that was to combat this problem of this cycle that's created of, if you train a system based on a data set that has bias, the resulting algorithm will contain and promote that bias which will then - I don’t know if infect is the right word - but it spread that bias essentially and I noticed this in a few different types of search results, I haven't seen anyone mention this, so lots of search results as we know now we'll trigger these people also asked boxes, which allow you to explore a topic deeper and I've noticed there are specific questions within Google that I assume have been targeted manually, that will not trigger these types of results.

So for instance, if you do a search for something like ‘is the world flat?’ - the last time I did this search, you actually just get a Google answer box at the top basically being like ‘no, the world is an imperfect sphere’ and that's just presented as fact. There's no breadcrumb trail for you to follow where it might start to spread that, what I certainly deemed to be misinformation that the world is flat.

So I think it's interesting that firstly, they have come about and said internally, look there isn't there is an issue with the algorithm as we're training it, promoting and spreading bias so one of the steps that we can see publicly from the quality rater guidelines and I've no doubt this is a multi-faceted approach as we see with the things like the lack of people also asked results for ‘is the world is the world flat or spherical?’ to tackle this problem and the take away then practically as content writers, webmasters, companies, entities, publishing on the web is that we also need to work harder to ensure this diversity of intent and representation is at the core of everything we do, including the websites we build, the content we make and I think it's good that if the algorithm starts to represent that, it will give companies, for-profit companies working in the capitalist system, a commercial motivation to do that because as we know sometimes even when it's the right thing to do unless there's a commercial motivation, companies won't do things. So I thought that was a really interesting update to the QRG's and I think it's a really positive thing and something we should reflect on as well as web publishers.

Link disavowing is something I want to talk about briefly as well, that I dare to talk about. It's a really interesting and kind of hot topic. so in October 2012 Google first released their link disavow tool and Matt Cutts released it with a little ten minute video explaining what it was for, how it should be used and we've had this tool now for kind of seven or eight years and over this time, we've certainly seen how people consider an approach and use this tool to change. I've certainly seen a lot of people within the SEO community, over the last 12 months be very vocal about quite aggressive use, if you like, of the link disavow tool and being quite vocal about improvements that they've seen in ranking because of links that they've disavowed or apparently because of links they've disavowed.

So if we rewind a little bit, when this tool was launched and Matt Cutts introduced it to the SEO community, he introduces the tool as something to clean up backlinks, so if you had been and these were the examples that he used in 2012, if you had been paying for links, if you had been doing blog spam, comments spam, forum spam, guestbook spam or you had paid someone to write low-quality articles with keyword anchor texts embedded and then maybe you'd had a message from Google saying that they'd found unnatural links, this is when the disavowed tool could be useful for you and at the time this tool was launched, we were very very clearly in no uncertain terms told that the first step in removing these links was to email the sites that hosted them, multiple times to request the removal of these links and only when you got to this stage, where you couldn't remove the remaining links that, people weren't responding or has quite amusingly happened people started charging to remove links, that they'd probably sometimes been paid to put there in the first place was when you could use the disavow tool.

In the subsequent years, I've certainly seen people say ‘well there's no point actually emailing sites, I just bunged them all in the disavow tool and everything was fine after the manual reconsideration request’ and I think that's quite a pertinent point which is that when the usage of this tool was certainly introduced they specifically mentioned getting a message from Google saying that they had found unnatural links which is as we know a penalty and we spoke in Episode 16, I’ll link to it in the show notes, all about Google penalties and talks about the different types of manual penalty you'd get and unnatural links is absolutely one of those. So if you do get a penalty for paid links, spammy links, whatever link scheme Google's decided you've taken part in, you will get a message within your Google search console that tells you as such and then absolutely that's the time, in my opinion, there's no doubt there you should be using the disavow tool to clean those up.

Where we are now in, so at the end of 2019 is, it's quite common to hear people saying ‘oh we should just use the link disavow tool to clean up what we consider to be low-quality links and that will help’ regardless of whether or not you have a penalty and there's two very different schools of thought about how Google is handling these links and I want to dare to give you my opinion on this.

So the reason I brought this up in this episode is earlier this month, John Mueller on Twitter replied to someone who was asking about, should they be disavowing spammy domains that they've got links from and they don't penalty, and John Mueller replied saying ‘we already ignore links from sites where there are unlikely to be natural links, so there's no need to disavow’ and he's repeated their something quite specific that Google has mentioned time and time and time and time and time again, which is that if they find sites where there are unnatural links they ignore those links, they do not apply a negative modifier so it's not that those poor links are damaging your ranking, they're just not being counted - that is what Google is saying. There's definitely two schools of thought on this, the other one being that people are saying well how does negative SEO work then because there are several quite convincing cases where people have in some instances used link spam profiles to negative SEO a competitor so they've delivered them lots of - thousands of spam, low-quality paid-for links and then they've seen that competitor and their rankings be negatively affected from that, which would appear to be at odds with what Google is saying, which is ‘we ignore these links’.

So if you believe Google ignores links that it identifies as bad links, then on the face of it there would be no reason at all to disavow links, unless you had a penalty. so if you did not have a penalty, the only result from disavowing links could be negative. so if we think through the logic; if we have a thousand links, we've got say a hundred genuinely bad links and Google can identify 50 of those links, it will be ignoring 50 of those links. the other fifty which we know are bad, Google for whatever reason, algorithmically hasn't worked out that they're bad links so it's counting them or it's giving us some shred of value for them, therefore if we disavowed all 100 bad links, we would be disavowing the 50 bad links Google knows of, which would mean they have no value (as they currently are) but it would also then potentially mean that we're telling Google these 50 links that it’s giving us credit for are also bad links, so we would then lose the benefit for those links. So overall, theoretically you could actually damage your ranking. However, that's at odds with what we've seen some people report saying, okay well we got rid of these thousand, 2,000 bad links and we saw an improvement in rankings.

Now I don't actually think these two points of view which seemed to be contradictory, actually are contradictory, so bear with me here. It was, I can't give you the exact date this year, but something interesting I saw Google say earlier in the year was that they use your overall link profile, including what you have done with your link disavowing, to make probability judgment calls on links that they considered to be on the fence, i.e. they can't really tell if it's a natural link or not. When you actually look at quantitative links that lots of sites get, the easy to identify editorial links from trusted, higher quality sites are actually the minority and there is, I would say, there's bulk in the middle which could be bad links and could be natural links, could be unnatural links and this is why we see PBNs, so private blog networks, are still hugely effective within SEO because those links are very hard to for search engine algorithms to detect and flag as unnatural, so that they still work.

Now if we've got this sort of bulge of links in the middle that Google is a bit on the fence about, if someone built another five thousand or million bad links to our site, while those individual links are not negatively affecting us, what it might do is Google can then say, ‘okay well I can identify actually ten percent of the links to this whole profile are definitely unnatural, paid for spam links and therefore I am going to raise the probability that these links I am on the fence for are probably unnatural as well, therefore I'm going to discount them. So they're not going to individually have a negative impact on my ranking but I'm now going to choose to ignore them, so when they were previously giving me benefit, they will now not. The same could apply if you had a reasonable proportion of spammy, bad, poor links in your link profile and you disavow them, potentially it could change how Google views those links in the middle. So you can have your link profile and Google's decided to ignore a couple percent of that link profile and it's also ignoring a small chunk of the lower end of your links it's on the fence about, if you sort of fess up to Google and say, yeah look these actually are unnatural links, don't worry about counting them, it can then say okay well that then changes the probability of these on the fence links because there's no links I am independently having to decide are unnatural and ignore myself.

So I've changed my view a little bit on that, so I've always been in the camp that the disavow tool, until recently, basically shouldn't be touched unless you've got a specific manual penalty. I think in some cases it is worth looking down at your - when you don't have a penalty - at your total link profile and actually at the quality of the links across the board and trying to work out what your risk profile might be. I've always said certainly if you do have you know paid links, bad links that just haven't been found or at least you haven't had a penalty for yet, you should disavow those but it might be worth if you have been in my camp of being quite strict with the use of that disavow tool, just to reconsider that, just based on the summation of the trail of breadcrumbs Google's given us about the clues of maybe how they use that disavow tool.

The last thing I’ll add on to the end of this, which is something I didn't know until again like a few months ago and this certainly wasn't announced at the beginning when they released a disavow tool, is that if you remove links from your disavow list they are basically counted again - so you can remove links from your disavow list, they're not permanently disavowed once you've put them through, submitted them once and that does feel like a change because when the disavow link tool was launched, they were very specific about saying that the tool should be used with caution because it can damage your rankings and that to me felt like they were saying, ‘if you disavowing links, that's a permanent action’ because obviously if you did damage your rankings and then you just submit a blank or one-line file, you should be able to reverse that. So have a think about that, it may be worth - knowing that you can reverse that as well - if you're in a hole it may be worth looking at if you do disavow some proportion of your profile, how that might impact you in the longer term.

Lastly on the tail end of this show, I just wanted to pick up on a Google Webmaster Central blog post I saw, I'll link to it again in the show notes, show notes are at search.withcandour.co.uk, which would take you to all of our podcasts, the latest one being first and Google has released yet another zero click, in search results and this time it's to do with package tracking. so on Thursday the 5th of December, on their blog they posted: ‘People frequently come to Google search looking to find information on the status of their

packages, to make it easier to find we've created a new package tracking feature that enables shipping companies to show people a status of their packages, right on search’ and then they give an example of how it appears within the search results, so you search for the delivery company name and then package tracking and you can enter the tracking code directly into the SERP and it will bring up a nice result, showing you the expected delivery date and then the stage it's act in transit, much like you'd normally get by going to all these sites and this is available through Google early adopters program and obviously the delivery companies have to make some effort to integrate with this.

I find this interesting because they talked about this in the blog post as it's something that people can find directly through the search results and for me again, this is absolutely not the primary use it's for Google for this, meshing of data, this is absolutely playing into their Google home, personal assistant presence so you will just be able to say, hey Google where's my DPD package? and then it'll say what's the tracking number and you’ll read it out or if looked previously I’m assuming it’ll store it and it will just say, oh it's arriving on the 29th of December or whatever.

So it's just another rising tide of these 0 click instant answers search results, hopefully it shouldn't come as a surprise to you anymore that these things are taking over as Google is becoming this answers engine, but I just wanted to bring it up if you haven't heard of it because there's so many of them coming up now, they do slip through the net, you know I’ve seen people talking about things like flight booking in the SERPs, as if that's new recently, you know that Google is tackling so many of these different operations now it can be hard to keep track, so package tracking is going to be coming now to search results near you

Lastly, just for those who maybe haven't programmed before, I wanted to mention this right at the end of the show and there is actually from Google, a free 2-day Python class which has got loads of free videos, example code, I actually found this on Twitter through Dawn Anderson, who's a great SEO, if you don't follow her, I would recommend doing so and I'll put the link in the show notes again but it's at developers.google.com/edu/python. If you haven't or you haven't programmed in Python, it can be really helpful if you're working in SEO, especially in technical SEO, and I've released a few tools in Python.

Basically it's really useful because I'm kind of lazy and I hate doing repetitive tasks. Some of the best advice I ever got about using a computer was, ‘If you're using a computer to do a repetitive task, you're using the computer wrong’. Python is pretty easy to pick up and there's a whole bunch of - I wouldn't go as far as to call it even data analysis - but just really manual legwork that sometimes you have to do once you've got log files or data sets from Screaming Frog or things like this to get the output you want and sometimes it's if you've put the time in to learn a little bit of Python, you can save yourself a lot of time and obviously improve results for yourself, your clients, your company, whoever it is you're working for.

So as I say, the links as always are in the show notes, which are at search.withcandour.co.uk. I'll be back, maybe with Rob next week, I don't know, on the 16th and we're going to have one more episode I hope on Monday 23rd - I haven't quite decided if that's going ahead. I'm going to commit to it, I said we'll go ahead and then we will have a break for a couple of weeks over the Christmas period. I hope you all have a brilliant week and do remember to subscribe if you are enjoying Search with Candour!

Episode 39: AI bias in guidelines and should you disavow?

Play this episode

Links for this episode

Transcript