Episode 67: Web spam statistics, text fragments and tracking featured snippets

Play this episode:

Or get it on:

What's in this episode?

In this episode, you will hear Mark Williams-Cook talking about the Google 2019 webspam report: the key stats about combatting webspam in 2019 and how Google is doing, text fragments: A new Google Chrome extension that can generate highlighted text fragment links and the potential impact for SEO and tracking featured snippets: An inventive way to track clicks on featured snippets using Google Analytics and Tag Manager.

Show notes

Webspam stats, text fragments and measuring featured snippets


MC: Welcome to episode 67 of the Search with Candour podcast. Recorded on Thursday the 25th of June 2020. My name is Mark Williams-Cook and today I'm going to be talking about webspam statistics, text fragments and how you can measure clicks on featured snippets.

Earlier on this month, it was June the 9th, Google published its annual webspam report for 2019. So this is a post they've started doing every year where they review the previous year in terms of how they've been combating web spam from webmasters, marketers, spammers BOTS globally and how that's impacting search experience. I really enjoy reading through these because it does give, firstly, a couple of insights into maybe what spam is working, what's on the decline, but just the scale of what Google is dealing with on a day to day basis. So I've highlighted just some statistics from this long post that I thought were interesting. In the show notes at - I'll put a link to the blog post if you want to read the whole thing you can do it

So yeah, the webspam was really interesting. Here are some of the stats that I picked out that I particularly liked. Google said, on a daily basis they discover 25 billion pages that they are deeming as spammy, so I guess this is the stuff that they can hundred percent categorize as spam. That is a mind-boggling amount you know 25 billion pages this is like if you've got every single person on earth to order a couple of web pages a day, this is the amount of spam that they're finding on a daily basis. And they said, our efforts have helped to ensure that more than 99% of visits from our results lead to spam-free experiences.

Firstly, I think it's really interesting how they've worded that because they said 99% of visits, and one trend that we have definitely seen in Google over the last few years is the marked increase in the no click type results, so the featured results, the knowledge panel results, all the results that aren't basically turning into a visit. so I think it's interesting I don't know if that's intentional but obviously that change in what Google is actually providing, in that I think they're providing less clicks percentage-wise and they used to be might be bleeding into that number, but either way, 99 percent sounds good. I had a look and very quick Google, this might be a little bit out of date, I found it from a HubSpot article that they are saying, Google globally serves around 5.8 billion searches per day because I just wanted to put this in context to real numbers. So if we assume then that 50% of these searches result in no click and only 1% of them are a spammy visit, this still means Google is providing 29 million spam clicks a day, so their search results are driving 29 million sessions, visits a day, to spam pages which sounds less impressive than 99% of visits, but it's still massively impressive, I think, as a technological achievement.

They went on say in 2018 they reported they had reduced user-generated spam by 80 percent and we're happy to confirm this type of abuse did not grow in 2019. So it seems that they've made that first big step, no doubt through algorithm changes, but it seems actually what they're saying is, they haven't made much more progress there. Link spam, as you can imagine, it says continued to be a popular form of spam but our team was successful in containing its impact in 2019. I notice again they say containing, not beating or decreasing. So they said more than 90 percent of link spam was caught by our systems and such techniques like paid links or link exchanges, have been made less effective. Again, you know the number of links that are posted, injected, created, on the web every day is a huge number. So even 10% getting through and I'm assuming if they're not being able to classify them as spam, and they're being counted means that a lot of that activity is still working and that's to be expected. I don't think it's reasonable for a machine to get a lot higher than that because even human reviews I think would struggle to tell which are spam links, or which paid for links in which on machines actually are probably better than this humans already in a few cases already.

They've gone on to say one of our top priorities in 2019 was improving our spam-fighting capabilities through machine learning systems. So this is something we've spoken about on a few of the podcasts, especially in terms of the changes to nofollow becoming a hint and things like rel sponsored and rel UGC and Google saying, they might or they could use those tags to train models. They were very careful in the - so I listened to the podcast, the names escape me briefly, Search Off The Record, that was it - so I listened to the first episode of this Search Off The Record podcast and this actually came up, this nofollow, and they're very careful to say that it's something they could do but they weren't committing to saying that they were actually using those tags to train systems, but this post is confirming that there is some kind of data being used to train their systems that are policing spam-fighting, so that's particularly interesting.

They've also mentioned as well, so apart from the machine learning automated approach, there is obviously that you can manually report search spam, and they said they received 230 or nearly 230,000 reports of search spam in 2019 and actioned 82%, so they're able to take action on 82% of those reports we processed. So it's not clear actually reading that, they say they've got 230,000 reports and they've taken action on 82% of those that processed, so I'm not sure if they're saying they processed all 230,000, but that seems - if you put that 230,000 number in the context of all the other numbers we've been talking about, that's a tiny, tiny, tiny, number and actually I would imagine, my guess would be the usage of these reports would actually be better used to help train those machine learning systems, rather than prioritising actioning them, unless they're obviously terrible or leading to harm because you know this amount of spam that's being generated is obviously absolutely massive.

They've said as well, they've observed an increase in spammy sites with auto-generated and scraped content with behaviours that annoy or harm searchers such as fake buttons, overwhelming ads, suspicious redirects and malware, and again I found this particularly interesting. So I think the reason behind this is there are a lot more publicly accessible, easy ways now to auto-generate content. For instance, there are several different models that have come up where you can just start putting subjects and questions in, and actually the models can generate pretty good, readable, unique English now. And you know, many people don't even know that a lot of the new sites that they read; mainstream news sites, a lot of breaking news stories that go on there, are actually generated, they're written by robots, they're fed information, dates, statistics, whatever it is and they'll wrap the English around that to deliver them, so auto-generated content isn't necessarily bad, but using it obviously for spam is. The way people, I guess, have been doing that is using these models to generate huge amounts of spam because you only need a small percentage of it to get through the filter to make it worthwhile and then you monetise it very aggressively, like it's with loads of ads or buttons that tricking people into clicking onto stuff you're getting paid to make them click on. So that's really interesting we're seeing this arms race of the machine learning approach of tackling spam, versus the more advanced generation of different kinds of spam. So I'll be interested to see the 2020 report at the end of next year.

Okay, let's talk about text fragments. I really wanted to talk about this because it's been in the back of my mind for a while, I hadn't had time to look into it. Google has just released an official Chrome extension which allows you to link to text fragments. So again, I'll put a link to this extension in the show notes at - but this is an extension that allows you to directly link to parts of your content on a page, in a way that it will allow it to be highlighted and jumped straight to by a browser. Let's rewind a little bit and go through that in a bit more detail because I'm sure there's probably a fair amount of listeners that this change has gone under the radar. I've seen it live in Google search results now for at least a month, it's probably a bit longer than that if I've only known noticed it a month ago, it's probably at least a couple of months which is that in the feature snippet results, I've started to notice and you'll see it now, if you do a Google search that generates a featured snippet result, when you click on that result when you get taken to the web page, you'll have the part of that text on the page now highlighted in yellow, as if it had a highlighter pen dragged over it, if you're running a chromium-browser. So this is something that chromium brace browsers like brave. Chrome, Microsoft edge, support and it's a way that within the URL, so within the link to a page, you can ask the browser to essentially highlight some of the content on a page and this is called text fragments. So I'm just going to talk a little bit about text fragments, a little bit about their usage and some speculation on what they might be used for in terms of SEO.

So this highlighter type thing, embarrassingly for me, I kinda didn't notice it on the first few sites I visited from featured snippets. I just thought it was maybe part of their design that they had highlighted that part of the page and it wasn't until I was outside of work actually, and just leisure browsing that I noticed it happening again, and I was a bit like, hang on, I didn't get this memo that we're all meant to be highlighting content or text on our sites and then I start Googling it and realised, oh actually I'm an idiot, it's this text fragment thing this rolled out. So, if you're feeling a bit like that at least know you're not alone, you're at least with me miss that.

So this text fragment stuff - so what's it doing it's, as I said, if you're Googling something or getting featured snippet, the idea is Google is generating a link now from the featured snippet that will highlight where that text is on the page. This aligns perfectly with everything, hopefully, as an SEO or as a digital marketer, you've been heading towards or educating your stakeholders with, in that people want the answer as quickly as possible with the least amount of friction. There's loads of discussion and ways you can do that in SEO, you know it starts from the very basics of using good titles and H1s and H2s and allowing your web content to be scannable by the eye, so not writing perhaps like a 1920s newspaper where you need to read from beginning to end to get the details, that's not how people interact with web pages. This is one extra step Google has taken of saying, okay well we've started doing the featured snippets which might be able to answer your query without the click although lots of people still will click normally to get a bit more information, then it becomes the challenge of okay, well if it's a three thousand word page, I know that what I've searched for is on there somewhere, because I saw it highlighted in the feature snippet but where? So this link now that Google can generate will highlight that text straightaway. So that's been happening as I said for a while. There is a post on the blog, which I will link to, that describes exactly how this new extension works and that's what I wanted to talk about.

So, the link to text fragment extension allows you to, within chrome, generate the link that is required to highlight the text. So how it actually works is, if you were going to link to your blog /page that would just land you on the page, as you're aware. But what these chromium browsers are now supporting is after a hashtag fragment in the URL, you can specify what's called a text start and text end. So you can actually, in here now, put the words you want to start the highlighting from which is the text start and where you want the highlighting to end which is text end. So if you had a section of your text that was starting and it was talking for instance, about ‘SEO tactics’ and the end of that paragraph you wanted to highlight that said something like ‘so implement those changes’ then you would create your link and you would define your text start as ‘SEO tactics’ and you would define your text end as ‘so implement those changes’ and what will happen then is when you actually visit that link, that information that you've put into the URL is used by the browser to say, okay I understand I need to highlight from here to here, and just to let you know it's case insensitive, so it just works on the actual text it doesn't matter if it's capitals or not.

They've added in some extra neat things such as prefixes and suffixes, so this might mean if you want to highlight a particular section of your site but you use that word many times over you can specify a prefix and suffix word at the end. So you can say it's this sentence I'm going to highlight and then it appears immediately after this word and before this word, so there's lots of ways to specify it. Now I imagine if you're trying to follow what I'm saying without looking at the transcription of this episode, or without looking at the blog post, you're probably, if you're around at least my intelligence, struggling to picture that in your head and it is a little, especially if you're not very familiar with linking and maybe even like command lines stuff, where you use different parameters it probably is a lot to take in, and that's exactly what this Chrome extension is for. I think Google realised there would be this slower uptake if it's difficult or tricky to do it. So all you need to do with this Chrome extension is install it and then you can literally just highlight the text that you want to link to, you right-click and you choose copy link to selected text and then, there we go, you will literally have your link which, if anyone clicks on it will take you to that page with the text you have selected highlighted, really really easy. So that's something I think is going to come in as mainstream over the next few months and years because it makes sense to me, and I think it will make things easier. The blog post on did say that there hasn't been any intent signed by Firefox to implement this as well. So we're still waiting as to whether all browsers will adopt this and whether they'll take the chromium lead on this, but I suspect that will happen.

The interesting thing that I was thinking about was we've often given some quite colourful examples of when Google's really badly messed up on featured snippets, so there have been some funny examples of when you've Googled ‘how many legs a duck has’ Google says ducks have four legs, how many legs rabbits have, this is at 200. So like the legs and arms thing was a particular example that Google was really struggling to do, and I wonder now if they will be able to use this extra information, that's within links, to help improve this featured snippet selection. So as far as I've observed, working in SEO, featured snippets are one of the things that Google is struggling to get right. Meaning sometimes you get very weird results or just plain wrong results, and Google shrug and they're like, well you know it's impressive what we've achieved, we can't get it right all the time. But now, in combination with anchor text - so what text people are using to link to a page - and the additional edge of information around actually what text are they linking to, I wouldn't be surprised if that information could be used to help improved featured snippet selection. Anyway as I said, have a look at that extension, the links will be in and if you're a chromium browser you can download that extension and give it a play with today.

So lastly, in a related topic on featured snippets I wanted to talk about how you can track featured snippet clicks via Chrome using Google tag manager, actually by tracking what we were just talking about, which is this text fragment components that are in the links. So I discovered this through Brodie Clark's blog, who I've mentioned on the podcast before. Brodie does a lot of really interesting stuff. The blog post he's written up, he's made it very clear that this wasn't a solo effort from him, that he's had various people, mainly on Twitter through different conversations, help him come up with this idea and actually how to implement it within Google tag manager and within Google Analytics.

So I will again link to this blog post if you want the step-by-step guide of how to do this, but I just wanted to give you an overview because I think it's really interesting. So what this is essentially allowing us to do is now within our Google Analytics, we will be able to track when we have had a click on our featured snippets, but we’ll also be able to see within Google Analytics the text fragment that was highlighted, which you know could be really helpful in understanding where people are going to on your page and maybe how you need to adjust that content.

So I'm going to give you an overview of roughly how this works obviously if you want to do it it would be better just to look at this guide, which I'll link to, but I'll just explain roughly how it works. So tracking feature snippet is going work through Google tag manager so hopefully, you all have Google tag manager set up anyway, have a container there and what you're doing is you've got a guide here on how to create a custom tag with some JavaScript, and what that's essentially doing is once someone has clicked on a featured snippet in a Google result, you're gonna have this fragment in the URL, which then Google tag manager is going to grab. And that's then going to enable you to import this information basically into a custom dimension within your Google Analytics and then essentially you're going to have a report, so you're gonna have a secondary dimension on your site content report. So if you look at all pages, you'll have a page URL and there will be a secondary dimension you can add called - scroll to text fragment clicks which are going to tell you basically where they were going on that page.

I just want to read Brodie's final thoughts on this, because I think it's a really interesting thing to experiment, it's very quick, easy, and cheap to implement. So he does say that while this post is here, it's wise to bear in mind that this sort of trick could be gone tomorrow and we're not sure if Google is aware that people were doing this, or if they care, and they could easily change it. So he said, here's a running list of ideas for using this approach. Firstly, this allows you to see clicks on featured snippet URLs, which was impossible to do previously. Filtering isn't currently in Google search console, they tested this for a bit a few years ago but it's not there anymore. So this allows you to generate an X hash of featured snippets and assign that to a site via chrome when the highlighting is triggered, so the majority of the time that's going to be there for paragraph snippets at least. Because the data we're collecting in Google Analytics shows both the start and end of the content being highlighted, you can essentially see the part of the page where a lot of users were being directly taken to. Knowing the highlight section, we can strategically place other elements close to this text based on our goals. if we want the user to continue reading and other datasets, scroll depth, session duration, etc are telling us they aren't, we can figure out how to make the content more persuasive below the highlight point. Potentially we can add CTAs - call to actions - such as a free downloadable resource link, directly below the highlighted section, if you were to add this to the very end of the post and the highlight were to be triggered and the user leaves the page they might never see the CTA. So it's allowing you to put those call to actions where you know they're going to be seen.

So this, as I said I’ll put the post up at, is actually a really straightforward guide. This will take you, if you've already got tag manager, this is like a 10-minute job I reckon to set this up, it's super quick, you'll start getting the data straightaway. so there isn't a lot to lose here. it's a really, really, really great post there by Brodie and Co.

And that’s everything for this episode. We will be back on Monday the 6th of July 2020, so please tune in then. if you are enjoying the podcast and you are listening on one of your favourite podcast app, please do subscribe or if you're really bored in lockdown and you have absolutely nothing better to do, why not leave us a review - I will read every single one. I think we have three on Apple now and I enjoyed reading every single one of them will be very happy, when they're positive. so I'll be back next week and until then enjoy the week, stay safe and see you then.

More from the blog