In this episode, you will hear Mark Williams-Cook talking about GSC API...
Or get it on:
In this episode, you will hear Mark Williams-Cook talking about: The big Google glitch: Another problem at Google this year led many SEOs to believe there had been one of the biggest algorithm changes in the last few years. GMB updates: New ways to update Google My Business information and new insights are coming GPT-3 thoughts: A brief look at GPT-3 and some of its content generation uses. Listener Q&A: Your SEO questions, answered!
Sitebulb sponsored 60 day trial https://www.sitebulb.com/swc
Patrick Hathaway from Sitebulb interview https://withcandour.co.uk/blog/episode-68-seo-site-audits-and-sitebulb-with-patrick-hathaway
Google My Business revamp announcement https://www.blog.google/outreach-initiatives/small-business/business-profile-maps-and-search/
GPT-3 Turing test https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html
GPT-3 Dice game generation https://twitter.com/sharifshameem/status/1284807152603820032
GPT-3 Answers engine https://twitter.com/paraschopra/status/1284801028676653060
GPT-3 Paper including bias https://arxiv.org/pdf/2005.14165.pdf
Google guidelines on automatically generated content https://support.google.com/webmasters/answer/2721306
MC: Welcome to episode 74 of the Search with Candour podcast, recorded on Friday the 14th of August 2020. My name is Mark Williams-Cook and this week, I'm going to be talking to you about the big Google glitch that happened, that absolutely devastated some people's rankings that we initially thought was maybe just some wild Google update. I'm going to talk about a small Google My Business update that's coming, both in how we can update our GMB data and the insights we're going to be receiving.
I want to talk a little bit about that big scary AI, that is GPT-3, and how this might affect SEO and we've also had some listener questions submitted that I'll do my best to answer. I'm also really proud to say this episode of Search with Candour is sponsored by Sitebulb, hopefully, it's something you've heard of, but if you haven't Sitebulb, a desktop-based website auditing tool, we actually interviewed one of its founders, Patrick Hathaway, a few weeks ago now it was episode 68 of this podcast so if you want to go back and get a bit more detail about how Sitebulb came to be in some behind the scenes information that's a really good place to start. A cool thing for me is it's really easy for me to tell you about Sitebulb because I use it a lot, we use it in Candour - in Candour agency a lot, and there's a lot of things I like about it. So especially if you're learning to do SEO, or maybe if you're more senior and you're training people with SEO, it's one of my favourite tools to use and that's because of the way it actually produces the audit report that you get out of the other end. So obviously there are lots of different types of desktop and online auditing programs you can use, and some just give you raw data which can be helpful but the thing I particularly like about the Sitebulb reports is a couple of things.
Firstly, it doesn't just give you the data it identifies the actual SEO issue in context and the team there have gone to great lengths to write up a really detailed explanation of why this is an issue from an SEO perspective. So if you're learning SEO, it's really great because it will say - okay we found this issue with hreflang and if you don't know what that is you can actually just click on details and you'll get a description about that issue.
So Sitebulb have given us an offer for listeners of Search with Candour, so normally you get a 14 day free trial with Sitebulb, they've extended that to a very generous 60 days and if you go to sitebulb.com/swc - so S for Search, W for with and C for Candour - you see where we're going with that - sitebulb.com/swc, you can download a 60-day trial, so nothing to lose, you don't need any credit card details just go there download it have a play.
So let's start on some big, but shortly lived news, that could have been massive news, which was on the 10th of August, so four days ago pretty much every channel that SEOs used to communicate - so I saw it on Twitter, I saw it on forums, I saw it discussed on Slack, were talking about an apparent massive Google update and not in a very positive way. So the search quality took an absolute plummet. We had completely incorrect search results and by incorrect I mean, people searching for recipes with beans and getting completely the wrong thing, e-commerce sites vanishing, just really the SERPs completely turned on their heads and it was initially being reported as a possible Google algorithm update and a very very large one. But it soon became apparent, thankfully for many people, that that wasn't the case. So, John Mueller jumped in very early the next day on August the 11th and said, ‘I don't have all the details yet but it seems like this was a glitch on our side and has been fixed in the meantime. If someone could fix the other 2020 issues, that would also be great.’ And that was followed up later that same day with a tweet from Google Webmaster Central that said, ‘On Monday, we detected an issue with our indexing systems that affected Google search results, once the issue was identified, it was promptly fixed by our site reliability engineers and by now it has been mitigated. Thank you for your patience.’
So to give you a little behind the scenes from our agency point of view, we had one client that was very obviously affected by this, they were based in the US and we're talking about dozens and dozens of number one, number two positions going to not found in the top hundred anymore and when this tweet came out and said the issue has been mitigated by now, our rank tracking was still showing they hadn't quite recovered - it does look like now they've made a full recovery, but it was certainly more than 24 hours in that case. So if somehow you didn't catch wind of this, so maybe you've had some time off work and you've come back and you've seen that massive swing you might have seen a huge dip in certainly I saw it on different tools like SISTRIX and SemRush it was quite visible this big dip invisibility for 24 hours it does look like it's fixed. It's really interesting for me because I think I'm losing track now. I think this is the second time this year, maybe the third, that Google has had some big, very public issue with their indexing.
So there were definitely two in 2019 that were very publicly noticed and talked about and then fixed, and I think this is the third time this year. So previously we've again had trouble with Google indexing new pages and I believe a few episodes ago, I talked about when Google was discussing that from their point of view, in terms of their crawler was overloading their indexing capability, and this was creating a big backlog which is why ends webmasters users were reporting - hey look, it's taking days for us to get new pages indexed when it's normally minutes or hours. So that's just interesting for me, from an outside observer point of view at Google, with this incredibly complex system of lots of little algorithms working together constantly being tweaked. Gary from Google actually followed up with some nice information about caffeine, the indexing system, which I’ll just read. So he tweeted this which is in relation to this issue they had - ‘The indexing system caffeine does multiple things, one, ingests fetch logs, two, renders and converts fetched data, three, extracts links meta and structured data, four, extracts and computes some signals, five, schedules new crawls, six, and builds the index that is pushed to serving. If something goes wrong with most of the things that it's supposed to do, that will show downstream in some way. If scheduling goes wrong crawling may slow down, if rendering goes wrong we may misunderstand the pages, if index building goes bad and ranking serving may be affected, don't oversimplify search for it's not simple at all. Thousands of interconnected systems working together to provide users with high quality and relevant results. Throw a grain of sand in the machinery and we have an outage like today.
So it's just really interesting, seeing this point of view where Google is basically saying, yeah look things are super complicated, this may happen once in a while and that's kind of scary, I guess, from website owners. So when we spoke to our client about this, they were really on top of it and messaged us almost as soon as their rankings dropped, saying what's going on and we spoke to them and said, this is the situation with Google, it's a glitch - what can we do? Well, very little actually, all the powers in their hands to sort this, mitigate it and get things back to going, so I think that's particularly scary if you are reliant on organic traffic.
I caught a little bit of news about Google My Business yesterday on the official Google blog, so there's a post I will link to on the show notes at search.withcandour.co.uk, which is called update your business profile on Google maps and search. And I’ll just read you an excerpt from this post to let you know verbatim what's happening. So Google says, ‘today we're making it easier to update your business profile directly from Google products you already use. Now you can create posts, reply to reviews, add photos and update business information right from Google search and maps.’ So that's pretty cool because there's a Google My Business centre at the moment and there is the app, they're a little bit clunky I feel to use, so this is saying we can just - once we're logged in from Google - just edit what we see directly. So it says, ‘To start, make sure you're signed in with a Google account used to verify your business on Google maps, simply tap your profile picture in the top right corner of the mobile app and select your business profile to access these tools. On Google Search, you can look up your business name or search for my business to update your profile. The ‘my business’ functionality is currently available in English and will expand to other languages over the coming months. And then this was quite interesting, they said, ‘we're also rolling out more free tools on Google maps and search that will help you understand how your business is performing and how you can enhance your online presence. Business owners and managers will see a revamped performance page with new customer interaction insights. This page will provide refreshed metrics on a monthly basis and will evolve over the coming months to share more helpful data to business owners. All of these features will be available on an upgraded merchant interface that will offer helpful recommendations about how you can improve your Google presence, whether it's adding new information to your business profile, responding to recent customer reviews, or using Google Ads to help your business stand out.’ So I think there's potentially some really good insights they could start pulling into Google My Business.
So if you've ever done a search say for a local gym, you will have an estimation of how busy that location is and that's based on Google's tracking of phones and when people are going in and out, so they know roughly how busy it is every day and things like, how long people spend time there and I'd be really interested if they could start maybe piping some of that data into Google My Business and giving that insight. So at the moment, we've got insights like, how many times we're appearing in search, how many times people call us for directions, that kind of thing, and obviously key phrases that are showing our profile, but I think, especially for high street businesses, having that extra data integrated would be really helpful. But again, those changes are rolling out, and I'll update you once I know more about the new insights when Google announces them.
Over the last couple of weeks, I've been looking through some recent examples of GPT-3. So if you haven't heard of GPT-3 or GPT-2 or GPT - GPT stands for generative pre-trained transformer 3 - which is an autoregressive language model that uses deep learning to produce human-like text. So if you haven't heard of GPT-3, that probably hasn't made it much clearer for you what it is. Essentially, it's some AI that help predict what text comes next in many ways. It's created by OpenAI, that's backed by Elon Musk and it's caught a lot of people's imagination recently.
So there was GPT-2, that came out before and the main difference between GPT-2 and 3 is essentially the amount of the quantity of data that they're using. So, GPT-3 and this whole kind of GPT approach to natural language processing are interesting because using transformers, which is the T in GPT, what they're doing is essentially processing massive amounts of text to learn some of the more just general attributes of language because this is then used as a base, as a starting point, for then adding this extra detail to specific language tasks, which they call fine-tuning. But it basically drastically reduces the amount of labelled data that's required for specific natural language tasks and GPT-2 was kind of put out there, so you could download it, and you could play with it like on Google collab notebooks, anyone could use it and it's still out there, you can generate some really interesting stuff. So you can, for instance, ask it a question and you can give it a sample or you can just give it a data set, some pages, somewhere in that in that information is the answer to the question, and you can just ask the question and part of the technology is it will go there, work out what the answer is, and then use that as the to construct an answer for you.
You can also just use it to just generate the next thing if you like. So it doesn't have to be it can do this approach, that doesn't require you to tell it where the answer is, it just tries to answer, and there's been some impressive work done where they've trained these models to do things like a general pub quiz and see how good its general knowledge was. There's been with GPT-3 - so GPT-3 they haven't released just the code how they did with GPT-2, so you can't just download it and fine-tune it yourself and all these things they've released an app, there's a closed beta at the moment for the API - so that's the application programming interface - which means you can access, you can give some inputs and get some output from it but you can't see what's going on behind the scenes.
I was having a discussion with someone the other day and trying to explain the difference between them providing the source and having this API access, and the best analogy my friend and I could come up with were - if someone's providing you with the source, it's like if you're trying to cook and someone gives you the instructions and they give you the recipe and all the ingredients, whereas if you've got an API, you just pass the ingredients through the window, you can't see into the kitchen and then they prepare the meal and you just get what comes out the other side, and you don't really know what process went into to making that meal or what spices they've added. So that's the best analogy, maybe someone can give me a better one, we kind of came up with it on the spot but I liked it.
And the reason they've done this, OpenAI have done this and only provided this API is they have concerns actually because the performance of GPT-3 is so impressive, they're worried about how it might be abused and they're not even quite sure about how people could abuse it, but there's definitely some interesting thought cases about things that could be done that might be harmful. So I could maybe start generating pages with medical advice that sound very convincing to the layman for instance through this, and so by providing an API it gives them a way to monitor what people are using it for. If they think someone's doing something that they don't agree with, they can just cut off their API access and limit that damage.
That said, there have been some really cool examples. So the people that have got access - so I saw a tweet from Sharif Shamim who said, wow I built a react dice component with GPT-3, this feels much more fun than writing JSX - so what is actually used is there's a site where it says describe your app and he just types in what he would like that app to do, so in this case, he's just typed in a box, a button, that says roll dice and then displays its value, and it hits generate and GPT-3 has then understood what the functionality is of that and it's literally written the code for him, and given a button that you click and then it's showing you what the dice roll is on one to six, and I've seen a couple of examples of that where people have used explanations of what they'd like to see.
So I saw one for a website as well, where someone just described what they wanted a website to look like, and then it just went off and built it. Kevin Lacker gave GPT-3 a Turing test - so again I'll link to all of these examples in the show notes at search.withcandour.co.uk - and I'll just read out to you this test that he gave GPT-3 so he said, so he's tried some common sense tests because traditionally artificial intelligence struggles with “common sense” but GPT-3 can answer a lot of common sense questions, so this is the question he posed - ‘what is your favourite animal?’ and then this is the answer that GPT-3 gave - ‘my favourite animal is a dog’, and then he says why? and then GPT-3 answers, ‘because dogs are loyal and friendly’ - very, very good answers to why you might like a dog, that's fully believable that would be a human. So then he asks, so Kevin asks - ‘what are two reasons that a dog might be in a bad mood?’ and GPT-3 answers, ‘two reasons that a dog might be in a bad mood are if it is hungry or if it is hot’, and then I found that super impressive myself, that answer, and then Kevin says ‘how many eyes does a giraffe have?; GPT-3 says, ‘a giraffe has two eyes’, ‘how many legs does a frog have?’, ‘A frog has four legs.’ ‘Are there any animals with three legs?’ and GPT-3 answers ‘no, there are no animals with three legs.’ I guess that's kind of true, maybe there are animals with three legs due to injury, but I don't think there's any born with three legs or regularly at least. And then he says, why don't animals have three legs? GPT-3 answers, animals don't have three legs because they would fall over.
So I mean, to me, that's immensely impressive, the answers it's managed to generate here and it's a huge leap from GPT-2 and other models. Now, you can read Kevin's blog post because he's given a whole series of questions and a lot of them are very impressive. He does manage to find ways to get to GPT-3 to fail the Turing test. So he starts asking, ‘how do you spargel a morgle?’ and GPT-3 confidently answers ‘you spoil a mogul by using a spargle’. Another question is ‘how many rainbows does it take to jump from Hawaii to 17?’ and GPT-3 confidently answers ‘it takes two rainbows to jump from Hawaii to 17.’ And Kevin says, ‘Which colourless green ideas sleep furiously?’ and GPT-3 answers, ‘Ideas that are colourless green and sleep furiously are the ideas of sleep furiously’ - so obviously we've gone into the realm of nonsense here, and I guess while it's made these really clever connections, it can't stop and say hang on that question is ridiculous, it doesn't make sense, it's always gonna have a go at answering them, so would fail the Turing test on these kinds of questions but still massively impressive.
And there are some other examples as well, so I found one by Paras Chopra who has made a fully functioning search engine on top of GPT-3. so basically he's made, it looks like a layout on top of Wikipedia, so you can just ask his search engine things like, who killed Mahatma Gandhi and it will immediately just answer the question for you and give you a link to where it found the answer or how many carbon atoms are there in benzene, and it just immediately comes back with, there are six. So again this is really interesting, I think in terms of things like chatbots as well, so you can just give them this library of information to use and then they can pull out the answers that it needs, much like a human would.
Of course, there are downsides and there are problems with this kind of technology, so I was reading through a paper on GPT-3 last week - which again I'll link to in the show notes - and it just talks about various aspects of GPT-3, and the section I found particularly interesting was on biases that are present in the training data that may lead models to generate stereotyped or prejudiced content. So this is an excerpt from this paper where they were sort of analyzing and looking through GPT-3 and doing various tests. So they said, the co-occurrence of male/ female descriptive words in data, females were more often described using appearance orientated words such as beautiful and gorgeous when compared to men, and in their investigation, they looked at gender bias in GPT-3 - so they focused on associations between gender and occupation, and they said, we found that occupations, in general, have a higher probability of being followed by a male gender identifier than a female one, in other words, they're male leaning. When given context such as - insert occupation - was 83 per cent of the 388 occupations we tested were more likely to be followed by a male identifier by GPT-3. We measured this by feeding the model a context such as the detective was A and then looking at the probability that the model followed up with a male indicating words eg man male etc, or female indicating words woman female etc. In particular, occupations demonstrating higher levels of education, such as legislator, banker, or professional, were heavily male leaning along with occupations that require hard physical labour such as mason millwright and sheriff. Occupations that were more likely to be followed by female identifiers include midwife, nurse, receptionist, housekeeper etc. So there are obviously problems that we don't want to train machines to generate content like this and essentially continue this perpetual cycle because it's a two-way thing without going into it too deeply. Obviously, people are learning from what they read, and then regurgitating more of what they learn and then that's just going to this big feedback loop. So it'd be interesting to see how issues like this are tackled.
I think this is actually something we're going to dig into, we'll have a whole episode on GPT-3 because I think it is now coming to a time, realistically, where we need to think about how this might affect digital marketing, how it will affect content online, and how we view it, how we trust it, and what we do in terms of marketing. John Mueller from Google did say that they may, in the future, have to get more granular with Google's own guidelines on automatically generated content. So at the moment, the Google guidelines are a little bit vague but generally, it's saying we don't want you to generate automated content, especially if it's low value. But you know, if the content is genuinely helpful and genuinely good and correct, you know does it matter if a human wrote it at the end of the day? So that's something I would just want to talk about in more detail, we'll have, I think, a whole episode of that, we'll get someone on and we'll talk about that because I think it's a really interesting area to explore in terms of the opportunities, the limitations, of that technology and what it would mean for people that adopted it and for those that didn't. I've seen other people in the SEO industry such as Will Critchlow talking about how it is time now to look at using this technology to do things like write product descriptions. So things that maybe don't take a huge amount of creativity when you're doing them at scale, to just do them quickly and efficiently, and have them correct. So yeah, we'll explore that in another episode and I just wanted to give you an introduction to GPT-3, if you haven't heard about it and check out the links in the show notes if you want to know a little bit more.
We'll finish up on some listener Q&A so some of you kindly submitted some non-site specific SEO questions as I had requested, as I don't particularly want to get too much into the nuts and bolts of individual websites, much rather answer questions that might help everybody. So I've just taken three here, we had several more, so I'm sorry if I didn't answer your question - it might have been too difficult for me, you never know, but I'll try and get through them in the next few episodes as well.
So, the first one is from Iris Grossman who said, let's say there is a site that has UGC - so user-generated content - and it offers those users the possibility to download a generated code to embed a snippet on their website. For example, a book now button with some text, linking towards the pages they have with the content they have created - would this be considered a link scheme and should the links be nofollowed? So that's a really good question, so Google has a whole page about link schemes in their Google Webmaster guidelines and they have a definition of this widget thing. So what they say to avoid is keyword rich hidden or low-quality links embedded in widgets that are distributed across various sites, for example - and then it has one of the examples for visitors to this page - 1472, and then it's just got a link to car insurance underneath that. So that makes sense and that's definitely something we've seen, some actually very well known big websites get penalties for which was dishing out these embeddable widgets and then kind of sticking a link in.
I think the important thing is whether the link is the thing they want to place. So if you're saying, look these people want to link back, so they want to create a book now button to link to our website and we're just giving them code to do that, then I think that's fine because that's an editorially placed link. If you didn't generate that code for them, they still want to make a link right, so they're still just going to have to work out how to code it themselves. So if that was me, I personally don't see why you would need to nofollow those links, if however, you were generating this widget where the reason someone wanted it - you know to use Google's example, I want this widget because it shows how many visitors this page has had or something like that, and then you put a link in there, that link hasn't been placed editorially, that link has been placed because you've kind of crowbarred it in there, you force them to have it, that's not the reason why they've got that widget. So I think it's just important to think about, one, the intent of that - so if the editorial intent is what they want the link then I don't see any harm in making a widget to do that for them, and the only thing I'd consider is even if that was the intent, if you're gonna distribute it to 5,000 sites then that's going to probably raise some algorithmic eyebrows. If it's a few websites and you're offering that I wouldn't see a problem with it, no.
The second question we have is from Ramesh Singh, a very short question which is, does branding improve organic rankings? which is a very cool question, so I think there are a few things to consider here and I just want to separate it immediately because I've seen in these conversations people start talking about things like exact match domains. So we've talked about them before, exact match domains still work and the reason I think they still work is twofold mainly because when you're searching for something that someone says has the exact.com for, Google has to or can't rule out the possibility that the intent behind your search is navigational, meaning you're not searching generically for those words, you're just searching for that website because that's one thing people do use Google for, they just type in the name of websites that they want to go to or in their Omnibox, in their browser. And the other reason is any anchor text that you do earn and naturally, ie it doesn't have a different type of anchor text, it's just going to be those words as well.
So separating that, I think the general move Google is making is they're moving away from kind of this link graph approach and more into this knowledge graph-based approach, which is understanding what things are and what the relationship between things is and certainly, I think having Google understand that this is the name of our organisation and Google's seeing that lots of people are talking about you, and you're well known and you're linked to, will improve rankings. So I think it's very hard to separate how well is my brand known, and what is my brand known for, to what part is contributed from the link graph and what part is contributed through people just knowing us. You can even see, when you look through Google's own advice on things, like Google My Business and local rankings, they talk about things like how well your company is known and if it's a well-known entity, a well-known company. So if you're doing, in my opinion, if you're doing “good SEO” and “good outreach”, there's no way of building links or getting people to talk about you and link to you without improving your branding. So I guess it means how we're defining branding and I define brand now, a lot of it as what other people are saying about your company or your organisation and naturally, a lot of that's going to appear on the web, and it's going to be in links as well, so there is a big overlap there. There are the other aspects of branding, there are visual aspects, there are sentiment aspects, which I think will get there with them. I don't think they're big ranking factors at the moment.
But the answer is, I think yes, maybe not in a direct sense, but that's one of the things I think Google and search engines are trying to measure - which is how well known, how popular, how trusted you are, which are all things you relate to having a good brand.
And our last question is from Afeez Adebayo, and Afeez asks ‘if the pages that generate the most backlinks to your site are UGC - so user-generated content, which you don't have power over - assuming those UGC pages, he's saying there's half a million of them, 500 000, and only 100 of them account for 90% of your site backlink profile, the rest is worthless to search engines - what would you do to avoid crawl budget issues in this case?
Wow, that's a horrible horrible question, I'm very tempted to just go down the ‘it depends’ route because obviously it will depend on all sorts of things. I mean, I can only imagine it's some kind of forum or Q&A site that you've got so many pages and I guess I'd be asking questions about, why is it then that such a high percentage of pages are worthless to search engines, because if it's worthless to search engines that to me normally means it's worthless to users as well. So if we just follow that logic that 400 and 499 thousand and 900 of these pages are pretty much worthless to users and search engines, we're saying we don't want them to appear in search. So we're not bothered about them being crawled basically.
So how I would solve that is, assuming that we've got such a tiny amount in these pages that we're saying are really important - so a hundred, and the other thing we know is yes that sending Google or whoever off to crawl half a million pages, we don't care about, is a massive waste of time and we don't want them to appear in search anyway. Because it's a nice small number, I would actually lift the links to those 100 pages out of whatever link architecture already exists and put them on another page that's easily accessible, because obviously it's good content you said, so you want people to find it. So you make those, make a page, it might be if it is a forum, it might be like featured posts, or something like that or popular posts, whatever it is that links to these, so that Google can find them. Outside of that you can then just use robots.txt, or if they're already indexed and you want them out, you need to use no index first and wait until they drop out and then put robots in place, but basically block off the usual crawling path channel to that half a million pages and the other pages will still be accessible in their entirety because we're going to link directly to them and make sure that robots are allowed directly to those pages. But that's the rough approach I would take given the information we've got there, which is we have half a million pages we don't want people to see. Obviously we must block them because that's a large number of pages to waste crawler resources on. We've got 100 that we want to keep, so it makes sense just to pull those links out of that architecture because that's a manageable amount, and then you're still going to get the benefit from the links from them, people are still going to find that content that's helpful, and then maybe review why you've got that big issue in the first place.
So that's everything we've got time for in this episode, we're gonna be back or I will be back - well actually, no I'm going to be back with a guest because I'm going to find someone to talk to me about GPT-3 next week. So that will be on Monday the 24th of August, I have another couple of guests actually lined up, so really excited for the next few episodes over August and September, I won't tell you who they are just yet, but some really cool people we're going to be hearing from. As usual, if you're enjoying things, please do subscribe, tell a friend, give us a do-follow link, all appreciated and I hope you all have a brilliant week.
In this episode, you will hear Mark Williams-Cook talking about GSC API...
In this episode, you will hear Mark Williams-Cook talking about Buy with...
Get in touch