Candour

Episode 125: Mass Google title rewrites and passage indexing deep-dive

Play this episode:

Or get it on:

What's in this episode?

In this episode, you will hear Mark Williams-Cook talking about:

Google mass title rewrites: Google has been changing the titles shown for lots of websites - how are they doing this and why do some of the results look so bad?

Passage indexing: Search tweak or breakthrough that will change how we do SEO?

Show notes

Passage indexing is likely more important than you think https://www.slideshare.net/DawnFitton/passage-indexing-is-likely-more-important-than-you-think

Episode 66 - SEO, information retrieval and foraging with Dawn Anderson https://withcandour.co.uk/blog/episode-66-seo-information-retrieval-and-foraging-with-dawn-anderson

Ep33 BERT https://withcandour.co.uk/blog/episode-33-bert-lead-form-extensions-and-bumper-video-beta

Summary: Brodie Clark https://brodieclark.com/google-title-tag-update/

Lily Ray tweet https://twitter.com/lilyraynyc/status/1428370833492389891

Google Colab Notebook https://colab.research.google.com/drive/1eD_t2p6f_KpmdhGj281eNOB1iOFhRcg2

Transcription

MC: Welcome to Episode 125 of the Search with Candour podcast recorded on Sunday, the 22nd of August 2021. My name is Mark Williams-Cook, and today, we're going to be talking about the big Google title rewrite. Google has decided that it knows best and will be rewriting your titles. Lots of people have noticed. So, we'll be talking about how Google is selecting these new titles and why some of them are so stunningly shit. We're also going to take a deep dive into passage indexing from a fantastic deck by Dawn Anderson, a real specialist in information retrieval, and she'll be talking about why passage indexing is likely more important than you think.

Before any of that, I want to tell you, this podcast is very kindly sponsored by Sitebulb. Sitebulb is a desktop-based SEO auditing tool for Windows and Mac, something I've personally used for quite a few years, and we've used at the agency as well. It goes one step further than a lot of SEO auditing tools in that it doesn't just crawl your site and dump you out a load of data, they actually have an analysis engine which looks at that data and tries to actually diagnose SEO problems based on that configuration. So, it catches some things that are otherwise hard to spot when you're manually looking at that data. But of course, as I'm sure many of you are thinking, no, you can't replace SEOs with any tools, at least yet, and Sitebulb agrees with you as well. They've got a YouTube channel where they've got experts like Kristina Azarenko and Natalie Arney with live videos on CMS migrations, internal link audits, exactly how to use Sitebulb to do your job as well as you can. You can't ask more than that. They're giving you the tools, they're showing you how to use them, and they're letting professionals do their job. You've got a special offer for listening to Search with Candour. Go to sitebulb.com/SWC for a 60 day extended trial, no credit card required. That's sitebulb.com/SWC.

Passage Indexing is Likely More Important Than You Think was the name of a talk given by Dawn Anderson at Engage Marketing Conference, which was held online. If you haven't heard of Dawn before, she is a veteran SEO. She's appeared on this podcast as well back in Episode 66, which was entitled SEO Information Retrieval and Foraging, and this is really the area that Dawn has been specialising in, which is information retrieval, which is, as you can guess, really central to all of search and SEO, and Dawn describes her talk as whilst passage indexing may seem like a small tweak to search ranking, it is potentially much more symptomatic of the beginning of a fundamental shift in the way that search engines understand unstructured content, determine relevance in natural language, and rank efficiently and effectively.

And the cool thing about Dawn's talk, which is the opposite to talks that I do is across the 249 slides she's provided, there's actually quite a lot of text on there. So you've probably heard people saying when you do a talk, you shouldn't just be reading off slides which is fine, and I get that's to help people with their presentation style, is not so helpful though if I do a talk and then someone asks for the slides afterwards, and essentially, they get 15 pages of random pictures and memes. But Dawn's really nicely put these sides together so even if you didn't see the talk, you can flick through them, and they make a lot of sense because she's highlighted the key points on there. As I said, there's 249 slides. So, I'll put a link to that slide deck in our show notes, which you can get at search.withcandour.co.uk. But I'm going to just summarise some of the key points in her deck to give you a taste for it, and hopefully encourage you to go and look at it.

So, as you've got from the blurb and the title, Dawn's been talking about passage ranking or passage indexing. We're using those terms interchangeably. Google kind of changed how they referred to it because of some original misunderstanding about what they were trying to achieve. Dawn generally refers to it as what she considers correct, which is passage indexing. So, this is what this is. It's a deep dive into passage indexing. If you miss those announcements, it's been live in the US since February 2021, and we're told that it's going to impact around about 7% of queries globally eventually. So it's something, again, that's rolling out. Something we've seen with lots of algorithm changes, tweaks is that they start in the US in English, and then roll out from there.

And what Dawn is suggesting is that passage indexing is actually just the tip of the iceberg, and Dawn says being aware of how important passage research in information retrieval is might help your SEO efforts because it's part of a bigger breakthrough and Google themselves, it's fair to say this, I think it was in October 2020 when they announced passage indexing and actually described it themselves as a breakthrough in ranking. So, I don't think that's an exaggeration on Dawn's behalf. It's something maybe as a community we've overlooked a little bit because we haven't been able to extrapolate exactly what the effects are and what they might be.

Now, the caveat in her deck is it's not something like many things that you can directly optimise for. So, if someone tells you that they're going to optimise your passages, that's maybe a red flag to run for the hills. What it is, is a new kind of clarifying algorithm. Now, specifically, what we're talking about basically is natural language understanding and the nuance of this. So, if someone searches for ‘bank’, does that mean they're after a bank in the financial sense, or are they talking about a riverbank, for instance. That's what we're talking about when we start getting onto this topic of natural language and understanding.

It's also an algorithm that's primarily aimed at long-form content. So, Dawn goes into quite a bit of detail about how she doesn't think it's really going to impact things like e-commerce pages, and we have discussed at length when we covered BERT on the podcast. That was way back in episode 33. Again, I'll link to it in the show notes, and I'm glad I got it right because we were predicting the impact of BERT and of these kinds of algorithm updates to maybe help what I would class as kind of unoptimised sites, where we have a lot of content quite possibly written by an expert in their field, but they do not know and they do not care about SEO and maybe they don't even kind of they're not thinking about even the user that much. They're just writing what they know about. And from a search point of view, this can be a challenge where you've got these huge documents that don't really have any kind of structure. There's no internal linking there, and it's hard for search engines to pick out. Dawn describes them as the diamonds of really good knowledge that are buried in this landscape of text.

And we've seen when we did speak about BERT how this information retrieval process has changed. So, for instance, stop words such as ‘and’, ‘the’, ‘of’ just used to be stripped out when information retrieval was happening and this analysis was happening. But now, Dawn's has got a really nice way of describing them which is contextual glue, and we gave an example of this again, when we covered the BERT algorithm update with the Brazil and US visa example when we were looking at the quality of search results for people searching for things like Brazil, national wanting visa to go to US. That's quite a complicated search term and doesn't make sense without what were traditionally as stop words because as a human you're naturally understanding, okay, this is someone in Brazil, and they're going to the US so that's the way around I need to kind of answer this query and that actually affects a lot of searches.

So, Dawn goes into quite a lot of detail about this and about again, what I think is her specialist topic, which is information retrieval. I'm sure even if you are a super technical SEO that you will learn something from looking through these slides.

But what I really like is Dawn starts to talk about these multiple stages of ranking. So, this is this retrieval stage, and then a reranking stage. Now, I'm not going to do this justice because I'm going to very quickly summarise what she's spoken about here, which is that this first retrieval stage is almost like a rough grab of documents that might be relevant to a particular search query, and this reranking stage is the refinements. So, it's finding the best out of those documents, and that's a very difficult thing to do because firstly; the refinement stage, that second stage, reranking stage is expensive. And by expensive, we mean in a lot of different ways in terms of time, in terms of computation power. They're using a lot of machine learning algorithms to really try and extract what is the best of the best out of these documents and how do we then rank them. And that's a lot more in-depth work for a computer to do, rather than just doing this rough grab of okay, I think these are related.

And there's these layers of difficulty that then Dawn goes on to talk about, which are really, really interesting. So, even that model of retrieval and reranking is super basic compared to what search engines like Google are actually trying to do and are trying to achieve because we haven't thought about things like the difficulty of ranking in rich SERP experiences where there is maps, and there's videos, and there's images, and there's PAAs and all this kind of stuff and covering a range of intent spaces.

For instance, while we earlier gave an example of ‘bank’, and we said that could be a financial bank or a riverbank, there's other types of searches such as if we do a search of ‘Harry Potter’ is an example I've seen used a lot, and an example Dawn uses. When people search for Harry Potter, do they mean the books, do they mean the films, do they mean the Harry Potter tour, the experience? There is a lot of things that could mean, and this is something again, that we've talked about quite a few times on the podcast which are these ambiguous queries, which is while the search results do need to be diverse, meaning if you have a search query like Harry Potter, there isn't necessarily a correct answer in that. Everybody searching for that term is looking for the books because some people might be looking for the films or the tour or some specific branch from that search term because it isn't very specific.

So, there isn't really a correct answer, but what the search results need to try and do is match the probable audience needs. So, they're balancing out okay, we're showing these diverse search results. But we know from looking at all this data, when people search for Harry Potter, we think maybe 80% of those searches are around the films, for instance. They might weight the kind of diversity they're showing results by that. And again, we've spoken about research done by Semrush on ‘people also ask’ for ambiguous queries and seeing how Google tries to actually sometimes prompt the user when it does seem to be in quite a jam to refine that query for them. They give them the nudge with PAAs. So, the thing we were finding was that. PAAs, that's the People Also Ask boxes tended to appear higher up, or sometimes as the first thing in the search results for particularly ambiguous queries.

So, everything I've spoken about here is actually only in the first half of Dawn's presentation. So, I really recommend you go and take a look at it. I'll summarise with this came a little bit later in the deck, but I think it's a nice takeaway which is you will be judged, and that's your SEO efforts, your content efforts, on "the whole site pie." And what Dawn means by that is since meeting the needs of underspecified queries requires broad coverage, you will be judged on how many specific intent of this underspecified query do you meet. And I think that's a really good thing to think about when you're doing a keyword research, when you're putting your site together, your categories, your subcategories, whatever it is when you're building these documents to keep circling back to that, to use things like the people also asked information to try and work out what that broad intent is, and what's relevant to you. As I said, I'll link to Dawn's deck, search.withcandour.co.uk. I highly recommend you go and check it out.

We're at the midpoint in the show. So, I want to give you an update from our podcast sponsor, Wix. You can now customise your structured data markup on Wix sites even more than before. Here are some of the new features brought to you by the Wix SEO team, add multiple markups to pages, create the perfect dynamic structured data markup and apply it to all pages of the same type by adding custom markups from your favorite schema generator tools, or modify templates by choosing from an extensive list of variables, easily switch between article subtype presets in blog posts, and add quick link for structured data validation in Google's Rich Results test tool, plus all this is on top of the default settings, which automatically adds to dynamic pages like product event, forum posts, and more.

There's just so much more you can do with Wix from understanding how bots are crawling your site with built-in botlog reports to customise URL prefixes and flats URL structures on all product and blog pages. You can also get instant indexing of your homepage on Google while a direct partnership with Google My Business lets you manage new and existing business listings right from the Wix dashboard. Visit wix.com/SEO to learn more.

Biden is no longer the president of the United States. That was according to a title tag on the White House website that Google decided to change this last week. It renamed him as a vice president. Obviously, Joe Biden is actually the president and Google has since fixed this, but it's a crowning example of an algorithm tweak, change, whatever you like to call it, that they've rolled out that has changed the way Google is replacing website titles. So, this means titles that are provided by content writers, editors, journalists, webmasters, paid content people, whoever, bloggers that are carefully thinking about the title they want their users to see, Google is saying, "Nah, we've got it covered," and is just mass changing it.

Now, there has been a chorus of it's not new people. Yes, Google has changed all kinds of stuff around snippets and titles before, but this is kind of unprecedented, and I can say this because I first became aware of it when I saw people in a dev chat on Discord asking what they'd done wrong on the new CMS they're working on because Google seems to have screwed up their titles. They were not aware that this change has actually just been rolled out across loads of different sites.

Now, as usual, Brodie Clark has done an excellent write-up of this title tag update and given us a little bit of history and pulled together some data from Rank Ranger, which has shown one of the trends we see and what Google appears to have done, or at least an effect of what Google has done is that the average length of titles seems to have dropped somewhat, and Brodie's given some really nice examples. So, for an Australian site called Interior Secrets, the title tag in Google on August the 17th was wooden furniture, Melbourne Sydney, Perth, Adelaide... as the title is obviously longer, and Google has rewritten this to wooden furniture, Melbourne, Interior Secrets. So, something a little bit shorter that they think is more relevant.

Now, we've seen lots of examples, as I said of this happening, and the initial scramble was trying to work out how on earth Google is deciding what to use as a title. The initial research done by the SEO community suggested Google might be using header tags as replacements for titles, but as it was one of our own development team who spotted the Joe Biden title change that changed him from president to vice president, when we looked at that, we saw that this was actually just something in content, and as more and more of these examples surface, people were finding that there wasn't particularly any one element that Google was using to replace title. So, it does seem that it's just using quite a general algorithm to decide what might be a better title.

I think the most interesting example here was found by Lily Ray, which was for a website called NewsOne, and the title that Google was showing in their search results for an article was Tim Scott Reportedly Eyeing a Run for President. And interestingly, when you look at the source code for this page, so we looked at the document object model, raw HTML, that string of text is not there anywhere. It's not as a header. It's not in the page content. It's not anywhere on that page. And some further investigation showed that that text that Google has chosen to use as a title actually comes from the anchor text of an internal link somewhere else on the site. I think that's important to let that sink in that Google has changed a page title to something that isn't even on that page, just from the anchor text from somewhere else. And there's loads of examples of Google doing this really badly, and I'm sure there are also lots of examples that we don't even notice where Google is doing a great job, but there are some quite high-profile clangers here. Apart from getting the job title of one of the most influential people on the planet wrong, there are just some examples where it's just butchering page titles.

So, Lily Ray wrote this tweet saying, "This must be really frustrating for the writers, editors and SEOs pouring time, energy, and resources into creating perfectly optimised titles and headlines. The title change is feeling like a major case of if it ain't broke, don't fix it." And she's linked to a New York Times article where the title provided by the author is There's A Name for the Blah You're Feeling: It's Called Languishing, and Google has rewritten this as re Feeling: It's Called Languishing - The New York Times, which obviously makes no sense whatsoever. And Google did respond to this.

So, Danny Sullivan, the search liaison for Google had this to say, "Suffice to say, we've heard the feedback and are looking into all this. That said, it was never the case that writing the 'perfect title' guaranteed that title would be used. We have long used more than title tags for creating page titles. That's not some new change. It's kind of surprising that I keep seeing so many SEOs who seem to believe title tags were always exactly used as titles, not being the case for as long as I can remember, explained on our help page." And there's a link he's given there. He goes on to say, "I'd also like to point out it's easy to spot when titles seem weird, but there are definitely cases where site owners have terrible title tags and longstanding systems of generating titles from more than that improve. But no one, of course, tends to tweet about that... All that said again, the feedback has been heard and we're looking at some of the examples to improve our systems."

And Yoast from the plugin that you might know from WordPress Yoast SEO says, "Not sure if it's something that has been discussed already, but I'd really like to see a way to opt out of title rewriting entirely. As we've seen now with the Biden example, it can lead to disinflation and websites should have the ability to opt out." Danny Sullivan replies, "Yes, the 'I really really mean it' tag, which I used to think should exist. Then someone makes that the default in an SEO tool or it's an option in a tool and someone's told it should get switched on, and every page is home or untitled or blank. Not everyone is an SEO or talks within SEO or has access to an SEO or even reads the basics of SEO. They're busy running a business. I know one site that still uses images for all their texts. So, we focus on how to do the best for all and help them avoid mistakes."

Interesting position. I'm not sure quite where I stand on that yet. I mean, Google does provide us with specific ways to do things like opt out of snippets. While snippets obviously are an additional feature or not kind of a core HTML thing, yeah, I can't see them, well, they're certainly not getting it right at the moment. And as Lily goes to raise later, there could actually be legal implications from this. There are lots of industries where you have to be very careful about what you say. So, if you've made a claim or a title tag is changed, that could potentially land you in hot water.

So, I'm interested to see how Google handles or steers itself around that. Again, I don't know if they can maybe use your money, your life type classification or algorithms to avoid title rewrites in those situations. But in my opinion, it's a complete shit show at the moment, and they're going to have to do some work on it because there is a lot of bad examples. I won't go through them all. It's kind of flogging a dead horse, but have a look on Twitter. There's loads of pretty funny ones that are out there.

Now, out of all this something interesting, something else interesting I should say, popped up, which was Lee Foot, who is the director at search solved, released very quickly a Google Colab notebook that might be able to help you assess where you sit with this situation. So, what his Colab sheet does is it checks your top keywords in Search Console and it runs them through a Google custom search engine API. So, Google CSEs are free for a certain amount of queries. You can basically remake a basic Google searching the whole web and access it via API. So, it runs your top search console queries through one of these CSEs, and it shows you how title tags and meta descriptions are being displayed in live SERPs. Google custom search API allows for a hundred free searches per day. Paid accounts are only limited by budget. And John Mueller actually replied to Lee when he published this and said, "That's pretty neat," and then in brackets, "I don't know if the CSC uses the same logic for titles and snippets, but it's still neat."

So, I'll be interested to see if that does actually work. It'll probably need quite a few people use it and just compare it to what we're seeing in the real world. Google custom search engines certainly don't include a lot of the search features that get added after core ranking that you will see in live sites. So, it would depend when this title rewrite is happening. The title rewrite, my guess, would be something they kind of tape on at the end when everything else is done. So, I would be surprised if it does work, but I thought I'd put it out there because I don't know it doesn't work, and it would be good if everyone can test it and chip in.

Apart from that, I probably wouldn't go away and spend a lot of time trying to change or optimise title tags based on this because I think things are going to change. Google, I believe will be releasing some kind of rollback patch, fix, tweak, whatever you want to call it because I don't think the current situation is good for them. I don't think it's improved search quality, at least for the popular sites. I mean, the argument that Google was providing about helping people maybe that don't have access to SEOs and stuff, have lots of respect for that, is great, but in terms of the total amount of traffic kind of sloshing around the web, how much traffic is going to these big sites where things are very broken compared to okay, here's for every one big site we've broken, there's a thousand tiny sites with basically no traffic that have nicer title tags now. I don't know how that stacks up in the whole kind of search experience and expectation of the user, but we will see as always.

And to console anyone, I don't think this is going away. Google will get it to work eventually. So, it's going to be another thing that we're going to have to live with and work around and work with.

That's all I've got time for in this episode. I'll be back, of course, in one week's time on Monday the 30th of August. But if you'd like to have a chat before then, you can catch me on SEO for e-commerce on Wednesday the 25th, so in two days time, if you're listening on the Monday at 9:30 AM British summertime, and we're going to be talking about page speed and experience on LinkedIn Live. So, if you want to find that, go to LinkedIn, type in Mark Williams-Cook, connect with me, and you'll get a notification when we go live on the Wednesday morning. Otherwise, I hope you come back next Monday, have a listen, and please, if you're enjoying the podcast, tell a friend about it. I hope you all have a great week.

More from the blog