Candour

Episode 90: December Core Update, crawl stats and budgets and Core Vital FAQs

Play this episode:

Or get it on:

What's in this episode?

In this episode, you will hear Mark Williams-Cook talking about:

  • December Broad Core Update: This week Google announced a broad core update for December 2020

  • Crawl stats and budgets: Google give Search Console a new crawl stats report and more advice on managing crawl budget has been published

  • Web Core Vital FAQs: Google has offered more specifics on how Web Core Vitals will work in context to ranking

Show notes

Crawl reports and budget: https://searchengineland.com/google-expands-local-business-message-map-query-analytics-344306

Google Core Update: https://twitter.com/searchliaison/status/1334521448074006530 https://twitter.com/areej_abuali/status/1334562615499034624

FAQ on Core Web Vitals: https://support.google.com/webmasters/thread/86521401?hl=en

Transcript

MC: Welcome to episode 90 of the Search With Candour podcast recorded on Friday the 4th of December 2020. My name is Mark Williams-Cook and, today, I'm going to be talking to you about the announced Google Broad Core Update, I'm going to be talking about the new FAQ Google has released on the Core Web Vitals and just picking out some interesting things on there, and we're going to have a talk about the new crawl reports in Google Search Console, as well as, again, some more new guidance Google has provided us on crawl budgets. Before we get going, I have the pleasure of telling you this podcast is sponsored by Sitebulb. What is Sitebulb? If you haven't heard of it, where have you been? It's one of the most popular SEO auditing tools for Windows and Macs; it runs from your desktop and it does absolutely loads of great things. We use it at Candour; I've used it for quite a few years now. At the beginning of these podcasts I normally just go through some of the features that I particularly like about Sitebulb and today, based on some client work I've been doing this week, I wanted to talk about hreflangs, those tags which help search engines decide which particular version of a page they should serve, based on language or location.

Simple, in theory, in practice I find they tend to get a little bit more complicated and it's something that Sitebulb can handle really well for you. When you put your URL into Sitebulb, it does have a little box you can just tick if you want to take a look at the internationalisation aspects of the site, and it will go off and it will crawl and check all of these hreflang tags. It does really helpful things, it picks out for you things like, if a URL self-references more than once with different hreflang annotations, it lets you know if URLs mismatch their hreflang declaration. Don't forget as well, URLs which have hreflang tags need to be reciprocal, they need these return tags for them to be valid and for search engines to understand them. It can be tricky, it's very easy to miss out, and there aren't loads of really great tools to do this. That's something Sitebulb does really well, breaks it down, gives you all the URLs and gives you something really quick and easy to action. In terms of reports, I've spoken before about the kind of detail the reports and guides go into, that you get with Sitebulb. The great thing about you listening to this podcast is, you can get a 60-day trial of Sitebulb - there's no obligation, no credit card or anything required and all you need to do is go to sitebulb.com/swc to get that.

Yesterday, at the time of recording at least, so last Thursday if you're listening to this on Monday, Google did one of their, what I think we can now safely call, standard tweets about broad core algorithm update. The SearchLiaison account very creatively tweeted “later today we are releasing a broad core algorithm update, as we do several times per year. It is called the…” Can you guess it? “December 2020 Core Update. Our guidance about such updates remains as we covered before. Please see this blog post for more about that.” Now, you've probably got from my tone, I'm a little bit disappointed that Google kind of got this Search Liaison role, and this is the level of communication we get now, which is two-three times a year, we get this copy and pasted tweet and the only new information we get told is, and we already know it, the month of the update and that it's a core update.

Interestingly, a couple of points I do want to add to this, because the same thing applies to every time I've spoken about these broad core updates when they're announced, which is that I don't know what they've changed, nobody knows what they've changed, except maybe some internal Googlers who aren't apparently going to tell us anything, so there's not a lot of direct advice you can get from this but it's good to know. If you are seeing ranking changes this is a possible reason but something I found particularly interesting about this was, on Wednesday I did see several people posting screenshots of the various SERP monitoring tools showing high volatility in search results. It looks like we had some pre-tremors to this announcement, so my guess is that this had just already started rolling out before this announcement was made. Certainly, this time around, even on some of our client sites, we have already started to see changes, and these are in sites outside of a lot of the sites we normally see affected, which has recently been a lot of “Your Money or Your Life” type sites. If you want to know more about Your Money or Your Life type sites, we did a brilliant interview with Lily Ray a few weeks ago, you can find the link to the episode in the show notes at search.withcandour.co.uk.

What I did want to share with you about this update was a thread on Twitter by Areej AbuAli, who again we've actually talked to on this podcast. Areej AbuAli is the founder of Women in Tech SEO and we talked to her about that so, again, search.withcandour.co.uk, if you have a look in there you'll see there'll be a link to the episode where we spoke to Areej. She put together some nice advice here, that I thought I would share with you, about this algorithm update. She wrote “getting impacted by core algorithm updates can be stressful but please remember that there is only so much you have control over. Here's a few things to focus on over the next two weeks.” I feel calmer already, thank you, Areej.

She says “one, ‘communication is key’. Over-communicate with your team, clients, and business stakeholders.” Absolutely. Already today, we've emailed a lot of our key clients, telling them that this has been announced, so if we see any dramatic movements, don't panic, don't do anything rash, let's just wait until the dust settles and we can look at what's happened. “Two revisit your reporting plan. Ensure that you're monitoring and reporting on critical metrics for yourself and your competitors. Three, don't rush into theories.” I'll say that again, don't rush into theories. I'm fully expecting to see the ‘ultimate guide for how to beat Google's December 2020 Core Update’ in the next few years. Really good point by Areej; don't rush into theories. It's a core update, it's evaluating your website as a whole rather than this one change you did to your site a month ago. Don't rush into theories or assumptions. “Four, share your learnings. We work in an industry full of thousands of people who are keen to share their knowledge and learnings with one another. Get involved in those conversations, and ask and share your knowledge openly and kindly. Five, evaluate the best plan forward. Whether you're impacted or not, ensure you have a clear plan on how to continue optimising a website that's closely aligned with the business KPIs.” She just stops off saying “while some might feel fairly obvious, it's easy to get bogged down in the detail and not take a step back and breathe.” Fully agree with everything you've written there, Areej.

I was actually helping a potential client out this week with a very odd issue they had with Google, to do with ranking, that without giving away too much, the results we were seeing in Google weren't logical. The search quality seemed off and they'd been around talking to various people, they'd been pulling their hair out internally trying to figure out what it was, and I was being thrown questions about “could it be this? could it be we don't have enough entities on the page? could it be these hreflangs? could it be that we've got this small percent of pages erroring?” I had to have this discussion with them and, in the end, I thought it was actually a kink on Google’s side, it certainly wasn't anything they had been doing. I think that's an easy trap to fall into, especially when you get into, as Areej puts it there, getting bogged down in the detail.

One thing, I think, to take on board is, if you listen to people like Gary Illyes, John Mueller, Martin Splitt, having their conversations on their podcast, they're constantly joking about the web being broken. This means that Google needs and does operate with the assumption and with the ability to work with what you give it. Yes, it is optimal if pages return 200 status codes when they're working, and things load quickly but the fact is, lots of stuff on the web is broken and it doesn't mean, just because you've got a tiny percentage of a percentage of pages not working, that that's going to have a detrimental impact on your ranking. I'd be interested to see over the next few weeks, over Christmas, what changes we see from this core update.

This week I saw Google has posted an FAQ on Core Web Vitals. Again, as usual, if you go to the show notes - search.withcandour.co.uk - you'll find a link to this FAQ there. It's quite long, so what I've done is I've pulled out some of what I think are the more interesting questions, and maybe stuff we haven't covered before. We've had two episodes previously, again we'll link to them in the show notes, where we've discussed Core Web Vitals, what they are, why they exist, why they're good as general metrics; we've also had a discussion about Core Web Vitals in context to Google using them as a ranking factor, and pulling them in as metrics and ways to measure page experience for users. I think there were a couple of things in this FAQ that, while I shared some of them and I reposted some of them, somebody replied to me and said “well that's obvious”, some of them are perhaps obvious. But it is good when you get confirmation from Google about specifically how something works, because that's like a new axiom that you can build on and know that, at least that rock that you're putting your foot on is steady. It's when you start coming up with these theories and ways of working which are not quite sure, or pretty sure assumptions all stacked onto each other, it only needs one of those to be wrong before everything else starts falling to bits.

I’m just going to go through a couple of questions that I thought were interesting and give some context as to why. A question I found interesting was “Is Google recommending that all my pages hit these thresholds and what's the benefit?” That's referring to, for each of the three Core Web Vitals, for cumulative layout shift, first input delay, and largest contentful paint, they all have these targets that Google have set. Their answer is “we recommend that websites use these three thresholds as a guidepost for optimal user experience across all pages. Core Web Vitals thresholds are assessed at a per-page level and you might find that some pages are above and other pages are below these thresholds. The immediate benefit will be a better experience for users that visit your site, but in the long term we believe that working towards a shared set of user experience metrics and thresholds across all websites will be critical in order to sustain a healthy web ecosystem.”

The short answer there is, firstly, that these web core vitals are measured on a per-page basis, not on a per site basis, that, yes, Google does really want you to try and meet these threshold thresholds on all pages and, as we and I have mentioned before, the primary reason you should be focusing on these Core Web Vitals is the user experience; if you improve these things it's likely going to impact your bottom line, your leads, your sales, whatever it is you want people on your site to do, SEO is almost a second bonus reason to do these.

There's some questions around AMP as well, accelerated mobile pages. The question “if I built AMP pages, do they meet the recommended thresholds?” Google's answer is very pro-AMP, surprisingly? “There is a high likelihood that AMP pages will meet the thresholds. AMP is about delivering high quality, user-first experiences. Its initial design goals are closely aligned with what Core Web Vitals measured today, this means that sites built using AMP likely can easily meet web vitals thresholds. Furthermore AMP's evergreen release enables site owners to get these performance improvements without having to change their code base or invest in additional resources. It is important to note there are things outside of AMPs, control which can result in pages not meeting the thresholds such as slow server response times and unoptimized images.”

They're giving themselves a caveat, but a lot of this initial performance was spearheaded by this AMP project that Google's been very keen on over the last few years, and they have made a step back from it. We talked about it, I think three episodes ago, we were talking about inclusion in Google Top Stories because the criteria for you being included in those is changing, up until now you've had to have AMP pages to be included and they're now going to change that to anything that fits in these web vitals metrics, so they do seem to be broadening that out. Another question on AMP is “can a site meet the recommended thresholds without using AMP?” and, as we've kind of said here, Google's saying “yes, you can take a look at the guidance offered on web.dev and how you can optimize your performance against Core Web Vitals”, but they're really making it clear here with this answer that there are now a myriad of options outside of AMP.

Then we get onto some interesting things about PWAs and SPAs, so progressive web apps: “If my site is a progressive web app does it meet the recommended thresholds?” The answer’s “not necessarily since it would depend on how the PWA is implemented and how real users are experiencing the page. Core Web Vitals are complementary to shipping a good PWA; it's important that every site, whether a PWA or not, focuses on loading experience interactivity and layout stability. We recommend that all PWAs follow Core Web Vitals guidelines.” I think what's trying to be said, between the lines, here is yeah, it's going to be tricky to hit some of these metrics, if it's a PWA but, basically, we don't care how you're delivering the experience, but it needs to be good. I think it's what's being said there, it's possible to do but you're not going to get a free ‘get out of jail’ card just because it's a PWA.

We've got the same question for single page applications, so super popular: “Can a site meet the recommended thresholds if it is a single page application?” “Core Web Vitals measure the end user experience of a particular web page and don't take into account the technologies and architectures involved in delivering that experience. Layout shifts, input delays and contentful paints are as relevant to a single page application as they are to other architectures. Different architectures may result in different friction points to address, to meet the thresholds. No matter what architecture you pick, what matters is the observed user experience.” This is really important, I had a would a protracted conversation over many months with a web development agency that had built a site for a client, that was a single page application, and amongst some of the other foibles that you normally encounter within SPA is that the web vitals are all coming back as poor, from field data, from real users. How observed users are seeing these pages and the discussion was kind of like “okay, look, the performance isn't quite where we need to be” and the developer's response was like “well um it's an SPA so you can't judge site speed as you would normally” What Google's saying here, is it doesn't matter what's happening behind the curtain those core vitals you see especially, if you're lucky enough to have that field data in Search Console, is what you're being scored on and what needs to be good.

There's a question here about the mobile and desktop scores; at the moment you've got the two different scores in Search Console, and says “Why are there differences in scores between mobile and desktop?” The answer is “At this time, using page experience as a signal for ranking will only apply to mobile search.” I'll read that again. “At this time, using page experience as a signal for ranking will apply only to mobile search.” I thought this was really interesting, and this was one of the kind of tips I shared that I got told it was obvious, which is that, in May 2021, when Google is starting to integrate these core vitals in their ranking algorithm, that it's the mobile score they're going to use and, of course, this makes sense because we've had this mobile-first/mobile-only indexing push from Google, so it makes sense that that's the score they're gonna pick for core vitals and ranking.

I think it's good though, as i said, to have this as a confirmation because we don't know, up until now, whether they were using some kind of aggregate or average between the two. That's now been laid out that that's the one you need to focus on. If you go into your Google Search Console and everything's coming up green for desktop, but red or yellow for mobile, that is actually something you will need to focus on. The rest of the answer is essentially saying “while the technology used to create the site may be similar to the mobile desktop site, real users of the two versions will have different constraints, such as device, viewport size, network connectivity, and more.

This leads on to this next question and this I found particularly interesting. The question is “How do Core Web Vitals account for sites whose user base comprises high volume NBU traffic or other users with poor internet connectivity?” Now, I had to ask what NBU traffic was, because I'd never heard of it and it turns out not many other people had heard of that either. There were a few guesses which were ‘non-broadband users’ and another one was ‘non built-up areas’. Kindly, Google came back to us and said “oh sorry, that's some internal terminology they'll change it” but, the point is, the question is asking “how does webcore vitals account for traffic where their internet connection isn't very good.”

The answer is “Core Web Vitals is meant to measure the quality of a user's experience on a website. The user population of each site differs and some sites, not limited to any particular region, may have significant populations of users that may be using older devices, using slower networks and so on. In such cases, sites should adapt the content to ensure that such users are still receiving a great user experience, and ideally still meet the recommended Core Web Vital thresholds.” I thought this was really interesting, because it means that there's no objective set standard, in terms of performance, except these webcore vitals, which themselves are determined by the user's connection, so you may have a site that functions perfectly well in the UK or the US, and you get green lights on all of your web core vitals, and if you then deployed that same site as an international version somewhere, where most of the connections are 3G, you'll suddenly find that you've got lots of red lights for the exact same site because they're having a poor experience, because of the slow internet connection.

What Google's saying is, in those instances you need to tone the site down, you need to make it a better user experience for those sites. Whether that comes down to making pages simpler, less javascript, smaller images, whatever it is, you need to still try and hit those metrics. I found that really interesting because I haven't heard that talked about so much in terms of planning, with a lot of websites, when it comes to actually thinking about the internet connectivity, speed of the users in the country or place you're targeting versus, actually, the web developer checked it on their performance tool and it says everything is fine.

Lastly, the last question I'm going to go over is “How does Google determine which pages are affected by the assessment of page experience and usage as a ranking signal?” The answer is “Page experience is just one of many signals that are used to rank pages. Keep in mind that the intent of the search query is still a very strong signal. A page with a sub par page experience may still rank highly if it has great relevant content.” It's just highlighting that it's not a live or die thing, if you still have the better site, the better content, it doesn't mean you're not going to rank because your page is slow, it's just one of many signals that are interacting with each other. Links in the show notes to the rest of the FAQ. I think it's a really important one, an interesting one to read through.

Fairly quietly, a new set of reports has been released in Google Search Console and they are crawling reports, which is great news because it's not something we've had in Search Console for a long time. We've had the legacy crawl reports, which are quite limited, these ones are a little bit hidden away. If you're in Google Search Console you need to go down into settings, which is right near the bottom on the left. If you click that, you'll see your normal property settings and in the middle you have a little box labeled crawling, and there will be an open report link to get crawl stats. From here, we've got a really nice in-depth report. Not as in-depth is actually going through your logs but, for easy accessibility and cross-referencing what you're seeing with everything else in Search Console, it's really nice.

It's very similar to the search results Performance report, you get to easily look at your total crawl requests, the total download size, the average response time. Google plots those in charts and you can then, quite quickly, drill down if you are finding you had spikes, for instance, in different types of response codes. We've got reports here as well of “by response code”, “by file type”, “by purpose”, which i've never seen before. We've got refresh and discovery listed, so I'm assuming here, without reading any more documentation, that refreshes when Google knows a URL exists and it wants to check if something's changed, versus discovery which is “I'm here to find new pages”. We've got Google bot type as well, so smart phone, desktop image, ads bot, etc. File type, so we can see how Google's crawling your HTML, Javascript, json, images, CSS, etc, the response code as well. Response code is really helpful; you can click on any of these individually, like the 404s, and it will give you all of the URLs that have been hit by crawlers, how often, and the response codes they're getting. Really nice, same with them, average response time as well.

We can have a look at specific URLs, or sets of URLs, and their response time. It's a really nice service level amount of data that's quickly accessible, that can help you root out problems that might have been more challenging. If you have a suspicion, for instance, that a certain set of pages aren't getting crawled, or certainly not as much as they should be, you can break down and find that out through Google Search Console, without trying to go through Log Analysis. Anyone, certainly that works agency-side, will know getting hold of log files isn't sometimes as simple as you would like, personally I've sometimes had to wait months to get hold of log files. Having this data now in Search Console makes it a lot more accessible as well.

Linked to this, I did want to talk about crawl budget - these kind of things are related. Google has published some new guidance on crawl budget, which is really cool. They've been doing a few different things lately, they have their Search Off The Record podcast as well and they've been talking about Google's crawling, rendering, indexing infrastructure, Caffeine, and they've talked about crawl budget as well on there, and now we've got this new documentation, specifically, they've called it for ‘large site owners and actually managing your crawl budget’. This is really helpful because, previously, there really wasn't that much official documentation around crawl budget and, for new people, when they should be thinking about it, and actually what crawl budget is. There probably won't be anything in here that really surprises experienced SEOs but, again, they've been very specific on a few points which I think will help.

Firstly, they've specified, and they've they've made it clear, while they have specified that the numbers they've given a rough estimate to help you, so the answer of “it depends” does apply but, very similar to the advice to be honest that we've given over years is, you only really need to think about crawl budget if you have what Google defines as a large site, which is to put a finger in the air, maybe a million plus unique pages that changes moderately often, so they're saying maybe once a week, or you've got a medium site; 10 or 100 thousands of pages with daily changing content. It just gives you a general idea to when these numbers are of a size that everything else might be applicable. If you've got a 500 page website, it's not going to be an issue for you ever, Google's not going to have this issue with limiting their crawling.

The interesting things that I haven't seen myself specified before, in Google Docs, talk about crawl capacity limit and crawl demand. Crawl capacity limit in this documentation starts talking about how Google calculates this number, which is the maximum number of parallel connections that Googlebot can have to crawl your site, because Google is quite capable of flooring websites, if it liked, but they want to avoid doing that. They have a few things they look at, such as, they've listed the crawl health which is “if the site responds quickly for a while, and the limit goes up, meaning more connections can be used to crawl. If the site slows down, or responds with server errors, the limit goes down and Googlebot crawls less.” I think Gary Illyes referred to them on their podcast as, Google calls them “back off signals”, which is when they're crawling a site and it looks like they may be putting too much strain on it.

Crawl health is one, second thing is, if there's actually a limit set by the site's owner in Search Console, because you can optionally reduce Google's crawling speed in Search Console. They've put a note there that, if you ask to be crawled slower that's obeyed, if you ask to be crawled more, that won't necessarily happen, that's at their discretion. Then Google's crawling them it's because, while Google is big and has a lot of infrastructure, there is a finite amount to their resource, especially at the sharp end when it comes to indexing and rendering. But, in terms of crawling as well, they need to say they make choices in the resource that they have and how they deploy them. That's how they calculate, the main things they look at calculating this crawl capacity limit, and this is overlaid with what they've called ‘crawl demand’.

What they've written here is “Google typically spends as much time as necessary crawling a site, given its size, it's update frequency, page quality, and relevance compared to other sites. The factors that play a significant role in determining crawl demand are, first, perceived inventory. Without guidance from you, Googlebot will try and crawl all or most of the URLs that it knows about on your site. If many of these URLs are duplicates, or should not be crawled for some other reasons, that they're removed, not important etc, this wastes a lot of Google crawling time on your site. This is a factor that you can positively control the most.

Secondly, is popularity, “URLs that are popular on the internet…”, they haven't said it here but they mean links, “tend to be crawled more often to keep them fresher in our index. Staleness: our systems want to recrawl documents frequently enough to pick up any changes.” They've got a note here saying “Additionally, site-wide events, like site moves, may trigger an increase in crawl demand in order to re-index the content under the new URLs.” Again, that's something Gary Illyes mentioned on his Twitter around, if they detect your site significantly changed, they will dispatch more robots to you, and you'll probably see that now because we've got that cool new crawl stats report, so I imagine you would see an uptick in discovery, if there are new URLs there that they need to crawl.

Apart from that, we'll link to this guide on crawl budget management in the show notes at search.withcandour.co.uk. It goes on to give you some guidance that, we've talked about it a few times on the podcast, but it's a bit too much in depth really about the best practices around managing crawl budgets, but it's really good to share with your developers and certainly avoiding these big issues that sometimes sites go live with, like if you've got what are essentially non-canonical pages variations of items, products, categories, that are crawlable, rather than have to do a duct tape patch fix with a robots.txt, you could just not make them crawlable in the first instance. Check that out, really nice bit of documentation that Google has given us.

We'll close off the show with just very brief last couple of bits of news I picked up that I don't want to go too deeply into. Google My Business, there's a couple of new features coming to this. There is a post that Google have put live which is called “New ways to connect with and understand your customers”. It's a fairly long post about Google My Business but i'll save you some time; it basically rolls down to two things which is messaging and metrics. To pull out the relevant bits Google says “starting today, we're rolling out the ability for verified businesses to message with customers directly from the Google Maps app. Once you turn messaging on from your business profile, you can start replying to customers on Google Maps from the business messages section in the updates tab.” That's one update and the second, little bit more vague, but they've said “starting this month we're rolling out more metrics to give you a deeper understanding of how customers discover your business profile.”

We did cover, earlier in the year, it was episode 74, some updates to Google My Business in terms of reporting, you can go listen to that episode if you've missed it. These are some more updates where, apparently soon, we're going to see a more detailed list of search queries customers use to find our businesses on Google. At the beginning of next year, we'll see updates to the performance page that show whether customers saw our business via Google Maps or Search, and if they saw it on a computer or mobile device. They're the two things that are changing about Google My Business, which will just happen to your accounts.

Lastly I wanted to mention, some of you know I post daily unsolicited SEO tips on LinkedIn, we've just now compiled 400 of these, would you believe it? You can find them if you just google “unsolicited SEO tips” It'll probably come up with a LinkedIn post, if you go to that link through to our site. We've had to host them on our site because the article became too long for LinkedIn, which I found after adding another 100 to a LinkedIn article, hitting save, closing the tab, and not realizing LinkedIn had errored and just been like sorry this content is too long. There is a quite a low, I consider low, limit to how much you can do in LinkedIn articles, if you're interested in that, but they're now hosted on our site.

They'll just stay there, we've even put some videos in, there's some clips there around different concepts; around robots.txt, canonicalisation, how to remove stuff from Google, so it should be really helpful if you're starting out or if someone's training with SEO, and even if you've been doing SEO for ages, I'm sure there's something in there that will jog your memory as to “oh yeah, I forgot I knew that” or “oh I forgot, I should be looking at web stories” or something that should help you so do check them out.

That's everything for this episode. Of course, I'll be back in one week's time, which will be Monday the 14th of December. I think this will be the penultimate episode we'll record before Christmas, so i'll record one more, probably on the 18th, we'll do an end of year episode again and probably get some more guests on to talk about 2020, and what we think is going to happen in 2021. Then I'll probably have a couple of weeks off over Christmas, before we start again in 2021. If you are enjoying the podcast, please subscribe. It's really been cool over the last few months well, last year, year and a half, seeing the subscribers slowly month on month increase and some spikes in listeners. That's been really good and thanks for all the good feedback we've had. Other than that, have a brilliant week and I hope everyone working Black Friday has calmed down now and we're into Christmas mode. Take care everyone!

More from the blog