Candour

Episode 124: Schema tools, author.url and Core Web Vitals tech report

Play this episode:

Or get it on:

What's in this episode?

In this episode, you will hear Mark Williams-Cook talking about:

Schema tools: Google pushing for Rich Result Tool usage despite saying it uses "non-supported" schema

Author.url: Google has a new recommended property to identify authors

Core Web Vitals tech report: A quantitative look at how tech stacks are performing with Core Web Vitals

Show notes

Update: https://developers.google.com/search/blog/2020/12/structured-data-testing-tool-update

https://developers.google.com/search/docs/advanced/structured-data

Ep. 69 https://withcandour.co.uk/blog/episode-69-blogspot-in-indexing-low-quality-pages-and-the-rich-results-tool

English Google Webmaster Central office-hours from June 23, 2020 https://www.youtube.com/watch?v=rO6wTSL6joE

Article about structured data https://developers.google.com/search/docs/data-types/article

Episode 85 - https://withcandour.co.uk/blog/episode-85-all-about-googles-eat-ymyl-and-patents-with-lily-ray

Rick Viscomi Twitter https://twitter.com/rick_viscomi

Contribute to the 2021 Web Almanac https://github.com/HTTPArchive/almanac.httparchive.org/issues/2167

SE Roundtable article https://www.seroundtable.com/google-merchant-center-to-enforce-manufacturer-part-number-mpn-31848.html

MPN Definition https://support.google.com/merchants/answer/6324482?hl=en-GB

Transcription

MC: Welcome to episode 124 of the Search with Candour podcast, recorded on Friday, the 13th of August, 2021. My name is Mark Williams-Cook, and in this episode, we're going to be talking about Google's structured data testing tool, and the fact it's been stabilised and what that means. We're going to be talking about Google adding new recommended properties, author URLs to schema. And we're going to be talking about the absolutely brilliant Core Web Vitals Technology Report. Before we kick off, I would like to tell you this podcast is lovingly sponsored by our friends at Sitebulb, And wait, don't skip ahead yet because I have something of value to tell you, which is for those that listen to his podcast, you know I normally talk about one feature or one thing I like about Sitebulb. This one might be helpful if you're doing SEO and you're involved in things like site migrations or redesigns. And that is the feature that Sitebulb has around analysing internal links.

So you may know this, if you're a Sitebulb user, that once you've done a crawl of a website, Sitebulb or analyse your internal links, and it will score URLs by how important those pages are based on how you are internally linking to them. Meaning if you are linking to pages lots, i.e. from a main menu or from other important pages like your homepage, these pages will show up higher on this internal page score. So you know that. So what? So I want to tell you how you can use this really brilliant feature to mitigate problems on migrations, redesigns, things like that. So one of the things we do as an agency of course is try and help redesigns, migrations run smoothly. And one thing I like to do with Sitebulb is run this crawl on the live site and have this internal scoring, and then, say someone wants to redesign the site, or they're redesigning a menu. Once that's live on our staging or our dev site, I will run this same crawl and then export both the before and after page importance metrics, basically, by URL.

Because what this will show you is if your internal linking structure has changed, and this is negatively or positively impacting how important pages are internally linked, this is going to give you an idea of how they might be impacted in terms of ranking. So what I like to do is set up a spreadsheet and have some conditional formatting, so if a page importance has gone from 90 to 20, say, that's going to highlight deep red for me and I know it's something I need to look at, and we can then connect that data up to Google Analytics and see, is this an important page, i.e. a landing page where we're getting lots of organic traffic? If so, this is something we definitely need to address because we're likely to lose traffic if we are taking internal link equity away from this page that's driving us traffic.

And the same is true vice versa. Maybe if we've had pages that aren't getting much traffic that are now linked to much more prominently, we may actually get more traffic there. So my rule of thumb is at the very least we don't want to change how important our current important pages are, and if possible, we want to bring some other pages up. So it’s super useful that I think Sitebulb does better than any other tool. And oh yeah, the reason they're sponsoring this podcast is I need to tell you if you go to sitebulb.com/swc you can get an extended 60 day trial of Sitebulb, no credit card, no payment required upfront. Give it a go. You'll love it. It's brilliant.

A nice place to start up this episode is the somewhat confusing, maybe contradictory information we're getting from Google around their various schema and rich snippet testing tools. I mean, I know it's very unlike Google to give us confusing and possibly conflicting information, but here we are. There is an update which has been covered on a few sites on the Google Developers Blog on August the 9th that confusingly says the schema markup validator has been stabilised. And now Google redirects the structured data testing tool to a landing page to help you select the right tool. I've got no idea what "has been stabilised" means. If you are a long-time listener of this podcast, you will know way back in episode 69, that was in July, 2020, we covered the Google announcement that they were going to sunset and get rid of their structured data testing tool in favor of their rich snippet testing tool.

And at the time this got quite the kickback from the SEO community, so to refresh your memory or to give you the highlights of that, if you weren't listening to the podcast back then, this was Barry Adam's response on July 7th in 2020. He said, "This is awful. The schema data testing tool is a tool that validates all schemas and helps make the web a semantically richer place. The rich result test only supports a tiny narrow subset of Google approved schemas. You are downgrading the web with this move. You are making the web worse. You, Google, had a chance to use your vast resources to maintain and improve the existing schema testing tool and help enrich the web. Instead you retreat back into your own narrow little view of the web and do what you want to do.

Ian Laurie, same day, says, "The new tool is painfully slow. The old tool showed a structured data result for the URL tested above. It provided useful feedback and supported industry-wide standardisation. The truth is you're replacing a great structured data tool for an inferior Google specific one." And again, if you remember the reaction from Google was, "Oh, okay. Well, we will keep the structured data testing tool running. We'll move over to schema.org and we'll continue also keeping our rich results testing tool live." So for those that maybe haven't used it before, the rich results testing tool only looks at the small set of schema that will trigger Google to provide a special SERP result, and it gives quite limited feedback on those. Whereas the older schema markup validators or structured data testing tool, as it used to be called, will check loads of different types of schema, it will give you feedback where there's errors, where there's warnings, because debugging schema can actually get really complicated.

Now, as we saw and as we read in that announcement, now the structured data testing tool just goes to this landing page where it says test your structured data and there is a link off to the rich results test tool and to the old schema markup validator, which is now on schema.org. And this has, again, kicked up a little bit of discussion because Google is accused of using dark design patterns here to get people just to use their rich results test tool. If you don't know what a dark design pattern is, it's basically when you make your user interface trick people into doing things they might not otherwise do. You're trying to take their choice away from them and steer them down a path.

What they've done here is quite interesting. We've got two buttons, one to go to the rich results test tool, and one goes to the schema markup validator tool. I put a screenshot and all the links, by the way, in the show notes. You can get them at search.withcandour.co.uk, so if you're on your desktop or you've got your phone, you can have a look at what I'm talking about. But basically the button to the rich results test or a big blue button that looks like a button, and they've made the schema markup validator tool just a little outline for a button, and it's white, the same color as the page. So if you glance at this, it definitely looks like the only place to navigate is the rich results test tool. And they've also been quite selective with the wording, just say the rich results test tool is the official Google tool and the schema markup one is this generic tool, which is factually correct, but they're definitely trying to push people to the rich results test tool.

And here's the thing. I don't even know why. Because one of the interesting things that we've covered before, again, and one of the things I've put in unsolicited SEO tips is that Google uses "non-supported" schema types, non-supported in quotations there in that they don't generate rich results and they're not necessarily documented as supporting them, but nonetheless, Google uses non-supported schema types to better understand a page. How do we know this? Well, again, we'll put a link in the show notes, search.withcandour.co.uk. We've got John Mueller on a webmaster hangout literally saying this. And obviously that's steering lots of SEOs and developers, and anyway schema has got a lot of uses. Of course we're going to do that if we're told that there's benefits to helping search engines understand the page better.

And again, that's primarily what schema is for. If search engines were perfect in their understanding of text and media on a page, we wouldn't need to use schema. The schema is not on the page here to benefit users. It's there to plug this technological gap that search engines have got, because they can't use their own heuristics and work out exactly with confidence what data relates to what and what is exactly what on a page. So we have to label it for them. So I'm not exactly sure what Google was trying to achieve here. Personally I'll primarily still be using the schema markup validator. I'll just have to update my bookmarks and I'm interested to see what the community discourse on this is over the next few months.

Talk about authors, because, well, firstly, there's some news about authors and secondly, it's a subject that really, really interests me personally around SEO. I think that Google, other search engines establishing who authors are, knowing more about them and knowing what they've written, is actually going to be a key milestone in how search engines rank pages or documents or whatever we want to refer to them as. Now, I've referenced this episode quite a few times, even recently, which is our episode 85, where I was joined by Lily Ray, and we were originally going to talk about EAT, which is expertise, authority, trust, which is a subject, that among other things, Lily Ray is particularly well known for, for her research and talks in this area.

Lily was kind enough, though, to go on a wild speculation run with me as we went through various Google patents and talked about possible ranking futures and what Google may or may not be looking at, with of course the caveat that we all know that just because a patent exists doesn't mean it's implemented in Google, but it's interesting to see what they're filing. Two of the things we spoke about were patents around identifying specific people or specific writers by their writing style. So that's looking at text and being able to use machine learning to classify, "Okay, this is this writer and this is this writer, because we can tell by the style they write in." And similarly with audio, like now with the podcast, Google being able to identify individuals by their voiceprint.

And one thing we spoke about here was that a lot of Google, as we know them, a lot of search engines lean on this link graph, which looks at specific domains or URLs to try and get an understanding of how authoritative, how popular they are, how relevant they are. And that doesn't necessarily reflect how we would look at it as humans. So for instance, if you knew somebody that you trusted for a specific area, I don't know, say a famous scientist, and you wanted to read their opinion on something, it wouldn't matter to you where that was published particularly, as long as you knew and trusted that author or that person was an expert in their subject.

And that's something search engines have always struggled to do. They've had to lean into the logic of, well, a good website should only publish good articles through trusted people. We can't necessarily say that about any brand new website that we've got no history for. It's safer to assume that they won't have this same level of quality. And we know one way they measure that is through this link graph. So the piece of news, which I guess I can get to straight away, which is fairly small on its own, but I think points us in this direction is Google has added a new recommended field to their author schema... It was actually their article's schema, and it's about identifying the author of that page. So again, search.withcandour.co.uk, our link to the documentation on Google search central.

But what this is doing is we have a list of recommended properties in... Well, recommended and required properties when we're defining our article schema. And as many of you will know, the author name is one of the required properties and a new recommended property is author.url. The description of author.url is a link to a webpage that uniquely identifies the author of the article. For example, the author's social media page and about me page or a bio page. So this is a method Google has to identify individual authors that are writing across various different websites, which hopefully you can see those ducks have aligned very clearly now, in this is literally what we were just talking about.

But hang on a minute, for any of you that have been doing SEO for a few years, you may remember that Google used to have a similar rel author property where you could identify who the author of an article was. And this was fairly promptly abandoned by Google. And this leads me to the interesting question of... As with many new features for SEO, how will this be abused? Author.url, if this is a simple way to say, "Okay, well this is the author name and this is their social media page, so this is who's written the article," assuming that gives any authority or weight or ranking equity, magic dust, whatever you'd like to call it to the specific article, the question is of course, what on earth just stops me picking any expert in my area and marking up my articles to say that they wrote them? I'm going to be interested to see how this plays out. But author.url is now alive. It's a recommended property. Links at search.withcandour.co.uk if you want to look at the docs.

And while we're in the midpoint of the show, I would like to introduce our sponsor Wix, who has this update for you: URL customisation on Wix is now available on product pages. You can now customise URL path prefixes, or even create a flat URL structure if that floats your boat. Plus Wix automatically takes care of creating 301 redirects for all impacted URLs. Full rollout coming soon. Also fresh off the press, bot log reports. Get an easy understanding of how bots are crawling your site without any complicated setup right inside of Wix. There's so much more you can do with Wix. You can now add dynamic structure data, upload redirects in bulk, including error notification and warnings, and fully customisable meta-tags, and the robots.txt file. You can get instant indexing of your homepage on Google, while a direct partnership with Google My Business lets you manage new and existing business listings right from the Wix dashboard. Visit wix.com/seo to learn more.

Before I talk about the Core Web Vitals Tech Report, I do want to shine a little bit of a light on something else which I am very late to the party on. So apologies if you already know about this. I'm hoping because I don't know about it and I spend half my life on the internet, that some of you may not have heard of it as well. So on the Twitter account of Rick Viscomi, who describes himself as a purveyor of fine web transparency at Google, who was also tweeting about this Core Web Vitals Tech Report, I discovered, is kicking off the 2021 Web Almanac state of the web report, or at least he was on April the 27th when he originally tweeted it. I hadn't actually seen this before, somehow. This had totally gone past me.

It's a massive project on GitHub that is, as it says, looking at this state of the web. So they're looking for authors, reviewers, analysts, and editors. And just listen to this lineup of stuff they're covering. They're looking at page content, and as part of page content, they're going to be looking at CSS, JavaScript, markup, structured data, fonts, media, WebAssembly, and third parties. They cover user experience, which includes SEO, accessibility, localisation, performance, privacy, security, mobile web, capabilities, and PWA. They're looking at content publishing. So they're looking at content management systems, e-commerce and Jamstack. And lastly content distribution, so they are going to be focusing on page weight, resource hints, CDN, compression, caching, and HTTP. It's a huge, huge, huge project. And despite actually being tweeted about way back in April, I had a look and they are still looking for help specifically in the areas of CDN. So content distribution networks, accessibility, and JavaScript for this 2021 report. I think it's specifically analysts they're looking for.

So if that's your thing, you know about that, you can find the link to the GitHub project at search.withcandour.co.uk and you can get involved there. So I just found that really interesting and I'm really looking forward to going through that report, probably on one of these podcasts, and we'll focus on some of the areas that we are particularly interested in. So obviously SEO is one, but lots of the things I just read out around, especially things like page speed, CDNs are important and have an impact on SEO. But that's not what I want to talk about right now on this part of the podcast.

What I would like to talk about is this Core Web Vitals Technology Report that Rick has put together. And this is using the raw data from HTTP Archive, which is on BigQuery now. So this is terabytes of data smushed together with... I don't know if smushed is the right word. Carefully brought together with Chrome User Experience, the CrUX data. So that's the data from real users and it's where we get our Core Web Vitals scores from in Google search console. So that aggregation of how people's browsers are actually processing and getting the sites that they're visiting.

So what have we got from this? We've got a really cool data studio report that allows us to choose a date range, choose a client, like desktop, mobile, and a technology, for instance, he's been focusing on Shopify, Squarespace, and Wix, and look at how these sites are performing in terms of Core Web Vitals. So it's given as a percentage of these platforms that score good on Core Web Vitals. And in that default report for Shopify, Squarespace, and Wix, it's really interesting and telling to show around February 2021, all of these platforms increase their scores on Core Web Vitals, as I'm sure everyone was trying to work towards this May, June launch as a ranking factor.

And as you're here, we've spoken about it before. What I find interesting about this is the person with the runaway best improvement is Wix by far here. I mean, they've moved from sub 5%, which is truly awful, to around 40%, which puts them within five percentage points of Shopify and of Squarespace. So essentially what that data shows is that Shopify, Squarespace, and Wix are pretty much neck and neck. Now I would again give you the data caveat that you have to think about the type of people that are using these platforms. So we spoke when we had Mordy on the podcast, who was the search liaison for Wix, we spoke about a report where they were comparing... Or not. It was, I believe, an Ahrefs report that was comparing how well WordPress sites ranked versus Wix sites, and they used this quantitative method to pretty much see what the search visibility was across these two platforms and concluded that WordPress was way more visible. Therefore suggesting, "Hey, WordPress might be better for SEO."

And in fact, that isn't necessarily true. What it's showing is that perhaps more people with higher budgets or people with better SEO teams are using WordPress. Wix, as we know, traditionally has been marketed or has at least been used by very mom and pop style home stores, very small websites, very unlikely that they're going to have SEO expertise or maybe even know anything about SEO, especially over the last few years. Whereas WordPress has long been adopted, especially through plugins like Yoast, it's very well known in the SEO community, in the web development community. And it's a little bit more hands-on, which as we know with various CMS, it has its pros and cons, but it does therefore attract more experienced people.

So just a slight caveat of just because we're seeing specific platforms come in higher or lower doesn't necessarily mean that that technology itself is better or worse in terms of Core Web Vitals. Interestingly, as I said, you can pull in other technologies here. So I pulled in the content management system Statamic versus WordPress. So Statamic, not many people have heard of it, is another content management system, which by default is flat file. So while you can use it with databases, it will normally generate flat file pages. And WordPress, as we know, by default is MySQL. And I wanted just to see how these two platforms compared.

Interestingly, WordPress actually finished way below Shopify, Squarespace, and Wix. So Shopify, Squarespace, and Wix were all scoring, as I said, around between 35 to 45%. WordPress came in at around 25%. So about 25% of those pinged sites were performing good on Core Web Vitals. And Statamic goes way up in about 56, 57%, which is incredibly high. And I tweeted a screenshot of this to tag in Statamic, and their reply interested me because they immediately said, "Yes, we've got a community of thoughtful developers." So immediately they were saying that they acknowledge the fact that a lot of this is actually how it's implemented.

But it's a super interesting set of data. A huge amount of data has gone into this and I think it's really interesting to see which platforms are progressing as well, and maybe which ones aren't, because that is one thing that you can clearly see from the data, which is regardless of how they're scoring over time, i.e. maybe how developers are working with them, implementing them. If you see a move such as Wix, where they've moved from 5% to 35%, or even Shopify has moved from 25% to about 44%, that shows that there has been definite platform level improvements to push this up. Again, we'll link to it, search.withcandour.co.uk. Hugely interesting data set.

As most of you will know, a lot of the recent episodes we had have been recordings of our SEO for e-commerce series. If you haven't heard of it, our SEO for e-commerce series is something we do on LinkedIn Live, and we hook up with Quickfire Digital, who are a specialist Shopify build agency, and essentially we talk about various topics of SEO that impact particularly e-commerce sites, and it's all run on LinkedIn Live, so we actually do live Q&A as we go along as well, and each one of these is an hour long. So if you're interested in that, the next episode is going to be on Wednesday, the 25th of August. It's running at 9:30 AM British time. Funny enough, we're going to be talking about page speed and experience. So we're going to be diving into what I've just been talking about, Core Web Vitals, the difference between performance and page speed, what's important, the impact it's having now, the impact it might have, and answering any specific questions you've got if I can during that.

So if you'd like to join in, one thing you can do is follow me on LinkedIn. I'm Mark Williams-Cook. Connect with me and you'll see in my events list that's coming up and you can subscribe, get an alert for when we go live. Or if you just pop along on Wednesday morning, log to LinkedIn, you'll get a notification as soon as we go live. I'd be absolutely honored if you'd join us, really love hearing your questions. The last couple of episodes we've had around a hundred people in the audience, so there's no shortage of questions, but it's really great if you do join us. Apart from that, I will be back as usual in one week's time, which will be Monday the 23rd of August for our next episode. Really hope you're enjoying the podcast. If so, subscribe, tell a friend, all that lovely stuff, and I hope you have a brilliant week.

More from the blog