Google Know Your Data, Google Multisearch and Google's stance on AI-generated content

Jack: Welcome to Episode 14 of Season Two of the Search With Candour podcast, recorded on Wednesday the 13th of April, 2022. My name is Jack Chambers and I'm joined by Mr Mark Williams Cook. And today we'll be talking about Google Know Your Data, Hannah Rampton's Search Console Explorer, Version Two, is now live, updates on trends and Primark's recent migration from SISTRIX, Google Multisearch and Google's stance on AI-generated content.

Jack: Search With Candour is supported by SISTRIX the SEO's toolbox. Go to sistrix.com/swc if you want to check out some of the excellent free tools, such as SERPs snippet validation, on-page analysis, hreflang validation, page speed comparison, and tracking your site's visibility index at sistrix.com/swc for free SEO tools, sistrix.com/trends for TrendWatch, and sistrix.com/blog. And we'll actually be talking about a few of the latest blog posts and case studies later on in the show.

Mark: And…we're remote again.

Jack: Aren't we just? I don't know if you can tell, listeners, but we are not in the same room as we have been for all of Season Two, so far. Mark and I are separate, unfortunately. We are, we are separated.

Mark: This is the Search With Candour, COVID edition.

Mark: So I had a great time at BrightonSEO. It was brilliant, but I got COVID again. So we are having...

Jack: Was it about six weeks after you had COVID the first time? It's a pretty quick turnaround for you, wasn't it?

Mark: Little, little bit longer. Yeah, I had it in December. So, so like three and a half months, I was kind of hoping I'd wing it, but apparently not. But it was worth it. Kelvin if, if there isn't, if there isn't something, a kind of a bigger compliment you could have about your conferences. I got COVID but yeah, it was kind of worth it, he says. But I didn't expect us to be remote again in 2020 style, but here we are.

Jack: Yeah. Speaking of 2020 style, I think I have COVID. I'm still testing negative, for the record. I originally had it in 2020, all the way back, over two years ago, at this point. So yeah. We think both of us have it. There's a few people who have been to BrightonSEO were testing positive. I've seen it going around on Twitter as well, so yeah. But I think everybody had a good enough time in BrightonSEO. Like you said, Mark, what's a bigger compliment to events organiser to say I got COVID but it was worth it?

Mark: Yeah. So, I mean, what were your highlights for... Let's talk about BrightonSEO just briefly before we kick off.

Jack: Yeah. I briefly touched on it last week. Because I recorded the intro after we'd been, that was just me, home recording it as I was editing my main interview with Claire Kyle, which if you haven't checked out listeners, please do go back and listen to last week's episode, a fantastic interview with Claire. And funnily enough, Claire, as I said last week, was the first person we met at Brighton. Literally, as we walked in to check into our hotel room, there was Claire, sat in reception, Welsh cakes, waiting for us, ready to go.

Mark: I did love that, being greeted with Welsh cakes and seeing Claire.

I was actually surprised the amount of people that came to BrightonSEO, bearing gifts. I got a very nice new D20 from Jamie Indigo, so thank you very much for that, Jammer Volts on Twitter. Appreciate that. Yeah. And a couple of other kind of random things given, too, which was cool.

I got to meet the people from Keyword Insights. So Andy Chadwick, Suganthan, and Nina, absolute pleasure to meet them. We should probably get them on the podcast as well, because I've talked about their tool a few times, Keyword Insights, about primarily kind of keyword clustering, which is a big task in SEO, so it'd be really good to get them on.

Mark: Any talks you particularly liked?

Jack: Yeah, definitely. There were quite a few interesting things. One that really stood out to me was Jess Pecks' one about how to build your own crawler and why you should build your own crawler. Basically talking about Python and if you've ever run into any issues with... some of the pre-built crawlers we all know and love here on Search With Candour. You already know Sightbulb from previously sponsoring the show. You know Screaming Frog. Everybody knows that kind of stuff. But maybe there's something you aren't quite getting from those crawlers that you think like, actually this would be really useful. And even just to get a better understanding of how these things work and how crawlers understand the sites that you're working with and stuff like that. It was a really interesting talk from Jess about how to actually build your own crawler using Python and going through that process and kind of... I'm really inspired to kind of get stuck into Python a bit more and really have a look at that. And that was definitely one of my highlights from Day One, for sure.

Mark: Not surprised, Jess that she's a fantastic person. It's funny, actually, that's one of the... I did that many, many years ago through, in a tutorial in a Pearl book, which was about building your own like mini search engine, crawler, and index. And when I went through that kind of tutorial and through that process, one thing that I realized I hadn't actually thought about too much is that so much of the HTML and code on the web is just broken.

And if you make a crawler, that's very kind of strict and okay, I'll just follow this kind of HTML and if it's correctly formatted, you're going to miss out huge chunks. And it really made me aware that to do a good job of actually crawling the web, your agent that's out there kind of has to be like, yeah, I see what you're trying to do there. I'll give you the benefit of the doubt and follow on. I really enjoyed actually saying that I saw, I spent more time this, this year talking to people than seeing talks before. So while some of the talks was going on, I was going and meeting people and I met... I summarized it on Twitter, but I met a lot of really good people, especially that I've been speaking to quite regularly and that I don't, hadn't been to BrightonSEO for a few years. So quite a few I hadn't ever met in person, and the pandemic as well.

One talk that did stand out for me was by a chap called Harry Sumner, who did a talk on kind of forecasting and using Facebook's Prophet and Google's CausalImpact to take a more objective approach to what impact do we think these changes will have. And then afterwards kind of doing the, as Harry said it, the "I told you so" of actually saying, this is the exact date when we made this change, and this is the result we've seen from that. And actually, I hadn't heard of those tools before. So I'm obviously more focused, specialism-wise, on kind of the technical SEO stuff and less so on forecasting. But I found that really, really interesting. Also enjoyed Azeem's talk. So Azeem Digital's got a podcast as well, really nice to talk about just considering kind of channels and their importance. More strategic, but again, really, really enjoyed it. And I think I was saying to you, sometimes you see a talk about a subject maybe that you already know about and understand, but hearing someone else explain it in a different way makes you think about it differently.

Jack: Yeah, definitely. I had that conversation a lot with our Head of Marketing, Brendan, because I spent a lot of time hanging around with Brendan over the BrightonSEO period, and we were kind of comparing notes from different talks and making sure we each went to as many different talks between us at Candour that the four of us that went there for the whole time, and we're kind of comparing and contrasting like, oh yeah, make sure you go to this one and I'll go to this one so we can compare notes later on. And Brendan and I had that very similar conversation of like, I kind of already knew that was a thing and my assumptions have been confirmed, but seeing it from a different perspective from the speaker or from, talking to other people as well, is really, really useful to actually be able to sort of bounce ideas off and go “I'm not going crazy, that is a thing!” or “Oh yeah, I did think that, it turns out I was right!” or maybe proven wrong in other cases like it's always good to be proven wrong as well. When you think you know something and it's like, Actually, no, that makes sense. That makes way more sense. I should be doing that instead of this, and that kind of thing as well. A couple of other ones I really want to shout as well, in a similar kind of vein as well. Steve from Conductor slash ContentKing, his talk on log file analysis was really interesting despite the very loud heckler during the talk.

Mark: Yeah, I heard about that. Dealt with very professionally, though.

Jack: Yeah. Steve handled it very well. I actually managed to catch up with Steve afterwards and complimented him on his heckling handling skills. So that was good.

Jack: But yeah, log file analysis is something I definitely want to get more into. I know loads of people are having problems with indexing stuff at the moment, and there's been a lot of issues going around the entire industry, pretty much, with very slow indexation and stuff like that. So having a better understanding of log file insights, is how he put it, so basically being able to have that kind of really technical stuff, be accessible to multiple departments across one company, which I think is something I kind of don't really think about. And I know something we've talked about at Candour a lot is having that interconnectivity between the development team and, for want of a better phrase, the sales team or the SEO team and the PPC team, having us all having the same access to the same data and things like log file analysis can be useful for people across a company, across departments, across clients, however, you want to word that.

And it was really interesting from Steve kind of to about how to be able to make that more digestible, basically not just throwing a CSV file at a bunch of people who don't know what log files look like. Here you go, have fun. But actually being able to kind of process that and digest that. And that's something that ContentKing does, I know, for their premium users, you're able to then access that and go through and kind of present it a more manageable way if you have managers who are less technical or clients who are less technical, you're able to then present that information. I thought that was really interesting because log file analysis is something I've dipped my toes in a couple of times, but never really kind of had the time to really spend and share it with other people and go through it. So yeah, credit to Steve there, for sure.

Mark: I think it's certainly more important now we're losing kind of fidelity on certain analytics data and sites are getting more complex as well. So I was kind of thinking maybe there would be less of a need for technical SEO over the years because Google's getting smarter, which it is, but it seems that as a web community, we can create problems faster than they can be fixed. Like, Hey, we've built this new, super complicated JavaScript framework, which does all these really clever things and also makes it incredibly difficult for crawlers. So yeah, it's always a good thing to, as you say, be able to dip your toe into, and especially communicate that because it can get fairly complicated, fairly quickly.

Jack: Yeah, definitely. And last couple of highlights, Beth Barnham's talk on advanced schema implementation was really good as well. Again, schema is something I've kind of dipped my toes in a few times but never properly kind of gone through from square one to full implementation and all that kind of stuff. And Beth's way of presenting it was full of humour and making it, again, making it accessible to people who maybe know less about schema, but also going into real high level, kind of advanced technical stuff by the end of the talk there as well. So yeah, really, really enjoyed Beth Barnham's talk on advanced schema implementation as well.

Jack: And perhaps most importantly, I beat you at Mortal Kombat, Mark.

Mark: Oh, come on. Let's talk about Street Fighter.

Jack: Oh, yeah. You've beaten me, what 25 times in a row, on Street Fighter, I think? Something like that.

Mark: And I would like to say this was a flavour of Mortal Kombat I'd never played before. I think it was like Mortal Kombat 3.

Jack: It was 3 Ultimate. Yeah, it was.

Mark: Yeah, never played it. And I knew you played it before because you're smashing out those special moves The second we sat down.

Jack: I feel the same way when you play Street Fighter 2.

Mark: Fair. Fair.

Jack: And you actually had a challenger, not me, funnily enough, just take you on and you smashed him as well.

Mark: Yeah.

Jack: There you go. Undefeated BrigtonSEO Street Fighter 2 champion, Mark Williams-Cook.

Mark: Definite sign of a misspent childhood there.

Mark: So all those talks, BrightonSEO is going to be online 21st, 22nd of April. So all of the speakers also did a kind of home or office recording of their talk as well. So anything you missed, because sometimes the slides don't tell the story, you'll be able to access them online.

Mark: Okay. Let's kick off with the actual things we were going to talk about now. So there's a couple of tools. Again, I love my tools. A couple of tools I want to mention to you and the first went into beta actually about a year ago. It's from Google and it's just called Know Your Data. And it's at KnowYourData dot with google.com. I haven't got loads to say about this. This is their description, to give you an idea about what it does.

"Know Your Data helps researchers, engineers, product teams, and decision-makers, understand data sets with the goal of improving data quality and helping mitigate fairness and bias issues." So that might not mean a lot to you at a first read, but I think it's quite important because especially when you're dealing with things like machine learning. So I've been playing around the last couple of years with various different machine learning models. If you follow me on Twitter, you probably saw I went through a little phase of doing, generating art, quote-unquote, art with AI, with generative adversarial networks, and I found the hardest thing about anything I've played with, with machine learning models, is getting good data in, because basically if you put crap in, you get crap out.

And the thing that has been uncovered before now, especially when we are making these AI assistants for all sorts of things like I've seen it in things like hiring, is that because we live in a world that has bias entrenched in it, we are feeding AI systems data that is inherently biased. Therefore, we are creating Frankenstein's monster of AIs that are, have hardwired bias, racism, sexism, whatever, built-in, which is obviously super dangerous.

Jack: I know that was brought up with some police database stuff, not too long ago. Obviously, us being two white guys, we're not going to talk about this from a place of understanding. But from my perspective, looking at it from a data side of things, it was scary to see that how quickly like facial recognition technology and stuff is like, yeah, that went straight to racism. It's like, oh, okay. People are terrible. Human biases are inherited, inherited into the things that they create, and now machines are learning to be racist.

Jack: Oh God, people are terrible.

Mark: Yeah.

Jack: We're teaching the machines terrible things, straight away and just quickly lean towards, you know, targeting people of colour in certain ways and understanding women in different ways. And so like how have the machines learned racism and sexism and stuff straight away?

Mark: AI, for when you want to do terrible things at scale. So obviously this is a well-known issue to people who professionally work within AI. Google has a, quote-unquote, responsible AI toolkit, which helps guide people through these issues. I think it's more of a problem for people who, like myself, who are not experts in data and using these models to make these errors without realizing they were in the data set.

So a couple of things that Know Your Data does is it can look firstly, outside of bias, it can look at data quality. So if you had a set of a few million images, for instance, it can actually add additional metadata to that set of data by using machine learning itself. So for instance, I upload loads of stock photos, for instance of people in the office. It can look at things like image sharpness and say 92% are high-quality images and 8% are blurry. It can say 60% of these pictures contain headshots and only 20% of these pictures are of women, for instance.

So it gives you a very quick way to have a bird's eye view of the data set that you've got, which is pretty much impossible for a human to do with sets when you're especially going into millions and the large data sets that you need to make machine learning models work, unless you're, again, super skilled with whatever language it is, where you're interrogating the data yourself. And again, that's them falling onto a very specialist skill set that maybe people using these models don't have. It can then look at the bias side of things. So again, pointing out where, for instance, if you are doing some analysis like on wage differences, for instance, it can make sure that your data set is representative of the story you're trying to tell.

Machine learning, I think, is becoming more accessible to us as marketers because we've got these pre-made models just served up to us and it's kind of like, Hey, just put your data in and we can tell you stuff. And again, I think this is the little bit of knowledge is dangerous situation where suddenly you've got marketers without really understanding, again, highly specialized subject, exactly how it all works. Just putting stuff in, getting stuff out and then running away with conclusions, campaigns, decisions, based on that. So we'll put a link in the show notes, Search.WithCandour.co.uk. I think it would be good idea for everyone, if you are using your own data sets, to have a think about that and use those kind of tools to make sure what you're getting out is good data, fair data, what you actually want to achieve.

Mark: As you mentioned at the top of the show, Hannah Rampton has released a new version of Search Console Explorer. Funnily enough, we spoke about BrightonSEO. I actually met Hannah for the first time, 12 years ago at the second BrightonSEO. So she's been involved in search for over 15 years now. Really, really talented freelancer and her Search Console Explorer is the second version of her Data Studio version of this tool. And I think the third or fourth version overall, because there was a Google Sheets version. So headlines are, this is a free tool, which is amazing. There is a link in the tool to donate, kind of buy me a coffee type thing for Hannah. So if you do use it and it is useful to you, please do that because it provides, in my opinion, a huge amount of value.

What it is, is essentially a Google Data Studio template that you can connect to your search console data. So why would you want to do this? There are lots of good reasons. So search console data is obviously really useful to us as SEOs, as marketers. It's some of the only kind of first-party data we get directly from Google. But it's kind of slow to use through their interface. It's very one dimensional. You can kind of look at this one thing and sometimes you can overlay, very excitingly, sort of a metric with a couple of filters, but it's very limited. It's even limited from the interface, what you can export in terms of data. And what Hannah's tool does is plugs straight into your search console. You select your site and it gives you this amazing visualized overview of all of the data.

So you can literally explore the data. And by that it will immediately give you things like year-on-year or previous period of a previous period, clicks, impressions, CTR, like you have in the Search Explorer interface, but you can break it down into brand, non-branded, cumulative year-on-year. It's got kind of a bubble chart of all of your main search terms, their positions versus impressions. It's a tool that I use normally when we first... If we're doing a pitch and we've got access, or when we first land a client, I pipe their search console stuff through this tool because it will give me a really solid and fast orientation of what they've got, what it's ranking, how it's changed recently, what areas have particularly changed as well.

So there's a few different tabs in the sheet she's provided. So you can explore this data. You can export data from it, which gets you around the 1,000 row limit. It's got a specialized sheet to consolidate data to find cannibalization issues. You've got tools in there to compare various date ranges over brand, non-brand, with geography overlaid on a map as well.

And I think one of the most popular features in there is there's an opportunities tab, which is a really good way of summarizing all those key for raises that have got high potential that maybe you're tracking ranking like say third, fourth, fifth, 10th, whatever, where with a little bit of nudge, a little bit of a SEO nudge, you might be able to drive a lot more traffic.

What I really like to do with is just literally the Explore section will show you the delta, the change, of specific queries in terms of clicks and impressions. So if a client says to you for instance, oh, okay, we need help with our SEO. We've lost some rankings recently. I can immediately pull their data into Search Console, look at Explore, and then immediately see a group of these are the key phrases where they've lost clicks or lost impressions, and see if there is any kind of pattern there in topic or anything like that, which would be a lot trickier to do through the interface itself. Or you've got again, kind of dated tool in limits if you export it yourself.

So really pleased that Hannah's updated this absolutely fantastic tool. Cannot recommend it enough and we'll put a link obviously in the show notes, Search.WithCandour.co.uk. And there are other tools on Hannah's site which I will let you explore, which are also fantastic and very useful.

Jack: We're at the midpoint in the show, let's dive into some of the latest updates from SISTRIX.

Jack: We have a fantastic post on the blog from SISTRIX, talking about trends. And in fact, Steve and the team at SISTRIX had two very special guests, Nicole Scott and Lily Ray, who are both data journalists working with SISTRIX and analyzing trends and you've probably already heard their name if you've heard us talk about SISTRIX and TrendWatch and IndexWatch and all that kind of thing.

And it's a really interesting interview with both of those people talking about what makes a trend, how do you detect a trend and kind of get ahead of the curve. And Nicole's a particularly interesting example. She has where she had built a site that really focused on news coverage and getting to trends before anyone else and built an amazing, completely organic backlink profile from that.

It's a really interesting kind of delve into that side of SEO from both Lily and Nicole and Steve throwing questions at them about how you can bring expertise to your news sites and your coverage, structuring things to make sure you're getting that coverage. Exiting trends once you've kind of then played the role in going towards evergreen content as well. Coverage from all kinds of things. And like I said, we've touched on it a couple of times with TrendWatch, and this is basically exploring how the data journalist team over at SISTRIX come up with the facts and figures for the data for TrendWatch, for IndexWatch and all that kind of stuff. There is a link for that. I will put a link in the show notes, as we said, Search.WithCandour.co.uk. We talk about that and then go through and look at the YouTube video, which is, like I said, an interview with both Nicole and Lily, conducted by Steve over at SISTRIX as well.

Fantastic little piece there to kind of get you in the mood, get you in the zone for thinking about trends and how you might be able to create content around trending data and all that kind of stuff.

And another interesting thing from SISTRIX, I know something Steve himself has actually been diving into and really looking at data, is the recent migration and update to the Primark website.

For those of you who don't know, Primark is a very, very big clothing and, kind of, home brand here in the UK, kind of on the cheaper side of things, to say the least. And it's pretty big. But famously you cannot buy their stuff online. That has always been a thing for them. You have to go into the store.

Mark: That was going to be my first question with the update. Can we buy online yet?

Jack: No.

Mark: Oh.

Jack: It's a weird one. And I think that's why Steve has kind of picked out as a particularly interesting example is because you can view availability in your local store. You can see prices for everything, but there is no option to actually buy anything.

So it's an e-commerce site that you can't buy anything on, which is really weird to me. I still don't fully understand. I'm sure Primark have their reasons. But I have tried to buy stuff online on Primark a few years ago and then realized like, huh, that's not even an option. It's not a check availability thing. It's like there were no options previously. And now it kind of feels more structured like an e-commerce site.

They've completely restructured their category pages. They've updated a lot of their database and completely restructured their libraries, pretty much. And yeah, it's a weird... It's a weird, weird thing. I know you and I were talking about this a little bit earlier, Mark. It's a weird thing to have an e-commerce site you can't buy anything on, right? That seems like a weird choice to me. And I'm trying to think if I've ever seen another example of a site this big, and a brand this well known, have done something like this.

Mark: Yeah, certainly. The thing that interests me from the SISTRIX data was looking at their potential competitors. And the... SISTRIX was trying to summarize with the search data, so the things they're ranking for, the intent, so someone like Next.co.uk, huge online retailer, way more search visibility at the moment than Primark. And SISTRIX was saying kind of like 80% of the searches that they are ranking for are kind of, quote-unquote, do intent. I want to go and onto this website and do something, i.e., buy something, transact, like the normal thing people do with e-commerce sites, right?

And what I think is so interesting about this migration is like you said, the Primark site is now structured just like an e-commerce website, but it doesn't have the add-to-cart button. So I am very interested to know how will Google react to a site that, from any kind of heuristic method it could take of looking at content and structure and understanding, looks like an e-commerce website, is also from a known brand with lots of searches, is like an entity, is trusted, but doesn't actually sell. Therefore, I would say from common sense and from the data that SISTRIX has, from an unbranded search point of view, probably does not fulfill user intent.

So if someone in the UK is searching for something, something Primark, they probably know that they can't buy it online, because that's in the UK. That's like a big thing. Everyone knows that you can't buy online from Primark. It's kind of a running joke. If it's an unbranded search for fluffy socks or whatever, I'm probably expecting to be able to buy that thing online. So I'm really interested, what's going to happen in terms of will Google work out that this site is not serving user intent and therefore it's never going to actually rank that well. There's a few key hints for Google. Obviously that Primark won't be able to do things like provide a feed, a shopping feed, which would allow them into Google organic shopping results, and there's other ways that we know Google does really smart things like if a page says out of stock, it'll soft 404 it. So that we know they're looking for these on-page hints for, can I do the thing? Can I buy from it? That's going to be muddied of course, by the... Will it prevent people linking to it as much, because it's kind of not as helpful and online you can't just be like, yeah, you can buy that from here link. It's like, yeah, can look at that thing online and then just go to the shop.

Jack: You can check the availability of your local store here. Like great, thanks so much.

Mark: Doesn't have the same ring to it, does it? Go online and check the availability in your local store and then get on the bus and go there.

So yeah, that's why I'm really interested in this as kind of like a case study, if you like, for what happens when you have a big brand that's kind of chameleoning its way into these other sites. It looks like an e-commerce site, smells like an e-commerce site, but it doesn't quack like one. So I'm interested to see how it's going to rank.

Jack: Yeah. There's a couple of questions touching on that branded versus unbranded side of things, you just mentioned there, Mark.

There's some notes here from Steve saying the problem with the Primark website it simply doesn't rank well for anything other than its own brand. Only 3% of its ranking keywords are on page one. And of those, the vast majority of search volume is for Primark-related topics. The branded search terms, as we just mentioned, the most successful non-brand ranking is for fluffy socks, as you mentioned just now, Mark.

When you take a search of more search volume like jumpers, for example, the ranking success is typically page two and beyond, effectively invisible if people are looking to buy jumpers or look at jumpers. And talking about the structural changes as well, I briefly touched on it, going from a really kind of long URL structure, now redirecting to something that is much smaller and much more manageable to the eye on your status bar. You can really see that, like there's a clear idea for building those redirects and restructuring the whole site, but will Google be able to spend the time to understand this? And Steve would say spend a lot of watts and dollars trying to understand the new structure and how long will this kind of process take? We've seen migrations take months and months to really pull through on Google side of things. And could we see, because Primark is such a big brand, big site, kind of pulling through that.

Or as you said, Mark, because it's not necessarily serving that user intent, will it actually be a slower process and Google is less inclined to crawl, index, and everything like that, compared to other similarly e-commerce sites like Next as an example as well.

I'm very, very interested and I know Steve is very keen to keep an eye on this, so this will be a live blog post that is going to get updated regularly as this ongoing process happens as the data comes through from SISTRIX's side of things, they can see all the rankings that are changing and the structure of the site changing as well. So we'll keep an eye on that. And I know Steve is very keen to keep an eye on that as well. And we'll have links to all of that in the show notes as well. So you can check out direct links to all the blog posts. Even if you are listening to this in a few weeks or in a few months, there'll be an updated version of that blog post with whatever is going on with Primark at the time as well.

And, as a little tease, we should be having Steve from SISTRIX on the show, in person, in the Candour studios in a few weeks time as well. So if you do have any questions about SISTRIX, if you have any questions for Steve, let us know on Twitter. And yeah, we'll be chatting with Steve in a few weeks in person. Our first in person guest of Season Two. I'm very excited.

Mark: That'll be awesome. Another person I can tick off my 'met in real life' list.

Mark: So while we were on our way down to Brighton, Google announced Google Multisearch on their blog, which is not to be confused with Google's MUM, which is their Multitask Unified Model. So they sound similar. They actually do fairly similar things, but they are not equal. So to start, I guess at the beginning, because I think it's worth explaining what their MUM, their Multitask Unified Model is, that is a piece of technology Google has been trying to roll out, and they're not there yet, which they describe as a thousand times more powerful than BERT, and in a nutshell allows them to crawl, index, compare, and understand queries and results over different modalities, which is their posh word for things like text images, video, and audio. So you could ask a question in text and they might find the answer within a video, or their final example in their blog post which explains MUM is you can take a photo of your shoes and say could I hike Mount Fuji with these shoes? And it can take the different formats of your query, different modalities, and find you, kind of construct the answer for you.

Jack: I think they use the term Multisearch, as it is obviously much more easy to understand than MUM. But the fact that they kind of address it in the blog post, like we said, links in the show notes, Search.WithCandour.co.uk, the example they give there is take a photo, so you can use the Google app with Google Lens, take a photo of a product or a thing and type of color that you'd like to see that product in.

So they use a yellow dress as an example here, and then you can also type green and you type the word green with a word search in Google, a text search for Google, and an image search using Lens combine the two, and it'll give you examples of dresses that are green that look like that yellow dress you took a photo of, which is fascinating and cool. And something I think will be very, very useful to a lot of people going forward, looking for, like you said, can I do this thing using this thing, will be a really interesting way of like I found this tool. I don't know what it is. It's in the bottom of my toolbox. I've forgotten what it does. How do I find it, take a photo of it? What can I use this for? And it'll be like, oh, by the way, this is a type of Allen key or this is a type of old screwdriver that nobody uses anymore or whatever.

I think it's really interesting that we can now, or will soon be able to, combine the two. And it's always felt fairly separate to me, looking at image search, looking at text search on Google, and the fact that we're now kind of combining the two with Multisearch. And I know they are really keen on pushing Lens and stuff. It's been a thing Google has been harping on for a while now. But the power of Lens consistently amazes me. And the way it's just able to identify the weirdest, obscurest stuff out of nowhere, is incredible. And this is just giving it even more and more power.

Mark: Yeah. I've spoken about Lens to a lot of people, even just kind of outside of marketing, just because it's so useful for what is this trainer? What is this bug? What is this flower? And it's amazing just how quickly Google was like, oh yeah, that's that's this.

And what we were just talking about... So the Multisearch thing, which Google has announced, is essentially, they've just made a manual version of what they're trying to do with MUM for one specific area, because they've said that really you want to focus your searches at the moment around fashion and home decor. And specifically it says for best results, shopping searches, because I think that's the area where they know they have kind of the structured data that they need.

So for instance, if you have a shopping feed and you upload a product, you tell Google what colors it is available in. So they can apply their kind of image matching algorithm, which is already there and it already works really well. And then you can essentially filter data that they have that's structured, I would guess. And rightfully so that they're probably struggling to realize MUM as a generic thing, because that's massively complicated and false positives always stick out like a sore thumb when you try and roll out these kind of algorithms.

Again, it's limited as most new features are, to English. And again, it's saying it's available in English in the US. So the actual guide Google gives is to get started, simply open the Google app on Android or iOS, tap the Lens camera icon, and either search one of your screenshots or snap a photo of something around you, like a stylish wallpaper pattern at your local coffee shop. Who do they think we are? Then swipe up and tap the plus to add your search button to add text. And I would find that helpful anyway because a few times I've you been using reverse image search for years anyway, but sometimes you need to side step because you're like, it's almost got what I want, but I have no way to point it in the right direction.

Jack: Yeah, definitely.

Mark: And that's what this does. So I think it's potentially, again for users, really helpful. And I think they're doing this just to kind of bridge the gap between them and the situation where they have this model that kind of just works generically with everything.

We're going to end on one of my favourite subjects, which is AI content because Google says AI-generated content is against guidelines.

So this topic was brought up in one of the recent Google search central SEO office hours Hangouts. And it was in response to a couple of questions about GPT3, which we've spoken about many times now on the podcast, and the writing tools thus provided. And John Mueller again, very helpfully gave some insight from Google's point of view.

Jack: Was that in between taking selfies at BrightonSEO?

Mark: That was in between taking thousands of selfies at BrightonSEO.

Jack: Poor John!

Mark: And he said, for us, these would essentially still fall into the category of automatically generated content, which is something we've had in the webmaster guidelines since almost the beginning. And people have been automatically generated content in lots of different ways. And for us, if you're using machine learning tools to generate your content, it's essentially the same as if you're just shuffling words around, or looking up synonyms or doing translation tricks that people used to do those kind of things. So that's actually a big spammy trick that people used to do as well, which was scrape websites, translate them, republish them as well, and kind of get traffic, especially for countries that had kind of lower SEO competition, it was... I know people have done that from English to say, have websites in Thailand. Things like that.

John goes on to say, my suspicion is maybe the quality of content is a little bit better than the really old school tools, but for us it's still automatically generated content and that means for us, it's still against the Google webmaster guidelines. So we would consider that to be spam. Fascinating. For many reasons. So firstly, I know for a fact that many news websites for a long time use various methods to automatically generate content around things where we have nice, easy structured data sources. So whether it's stock prices or whether it's earthquake alerts, it's very easy to say, there was an earthquake at this time of this magnitude, detected here, breaking story, kind of placeholder for someone else to come along and fill in the human bit of the content. And that's, that's technically AI doing it? Jack: Yeah.

Mark: Right? It's not like GPT3.

Jack: And we're not talking about nobody websites. These are some of the biggest websites in the world, in the entire internet.

Mark: Yeah. And it's because with, in the news cycle, the speed at which you publish is paramount to getting the clicks, right. And therefore the links and da da da, you know, it's basically the source of their revenue. So first I was like, okay, that's an interesting point. I would also say, I have seen John Mueller say before, and I haven't gone back to look on Twitter, you'll just have to take my word for it, that he said, and I'm paraphrasing here because I don't have a perfect memory, he said something along the lines of Google will likely at some point have to go back and revisit the guidelines around AI content because my position as a user, and this is completely outside of SEO, is as long as the content is answering my query, I don't care who or what wrote it. And there are lots of humans out there that write really shitty content that GPT3 can do a better job of.

And just to kind of demonstrate this, at the time this story came out, I saw someone on LinkedIn saying Google recently mentioned that AI-generated of content is against their guidelines. They don't think there's a direct mechanism to track down AI content. But even if an update is rolled out somewhere down the line, da, da, da. And they're asking this question about what do people think about AI-generated content? So I was like, I tell you what would be interesting. So I took their question and I pasted it into Open AI's GPT3 Q and A model. And the answer it came up with was Google has not provided any specific guidance on how to deal with AI-generated content. However, it is generally advisable to avoid publishing content that is not clearly attributable to a human author. If you do publish AI-generated content, you should make clear that the content was generated by a computer and not by a human author.

And then they followed up on this and were like, thanks. What do you think about Jasper? And then a load of people liked it. Nobody, no human had picked up that this was a 100% AI-written content. And I think as a... From an SEO point of view, I'd say I'm a subject expert. That's a decent answer.

Jack: Yeah, absolutely. So I saw on LinkedIn before I heard your story and I was like, that makes sense. Yeah. That seems like the kind of thing Mark would say, it's detailed. It clearly explains to the person asking the question where it is and it doesn't like... We often say, I don't know, we talked about GPT3 fairly recently having incredibly specific answers and being able to kind of play around with that and expand it. You can have just yes or no, or yes, that happened in 1965. And it expands it a little bit and adds that almost like human element of sentence structure and stuff. That is totally believable to me. 100%.

Mark: The follow on question John had was basically can Google detect this AI-generated content? And his answer was, I can't claim that, but for us, if we see something is automatically generated, then the web spam team can definitely take action on that.

And I don't know how the future will evolve here, but I imagine like with any of these technologies, there'll be a little bit of a cat and mouse game, where sometimes people do something that they can get away with. And then the web spam team catches up and solves that on a broader scale. From our recommendation, we still see it as automatically generated content. Over time, maybe this is something that will evolve in that it will become more of a tool for people. Kind of like you would use machine translation as a basis for creating a translated version of a website, but you would still work through it manually.

And maybe over time, these AI tools will evolve in that direction that you see, that you use them more and more to become efficient in your writing and make sure you're writing in a proper way. Like we've got with kind of Grammarly spelling, kind of checking grammar tools. It comes down to me. If a human can't detect whether it's written by a computer or not, I believe a machine will struggle. And I've had this conversation with some actual SEO experts that have said to me, the opposite is true because obviously, computers are really, really, really good at pattern matching. It's kind of their thing, right? Way better than like mushy brained humans, right?

And from the people I know that in with this kind of stuff, they've had sometimes more success with really old school content spinning, which is taking words that were written by humans and just swapping out words because structurally it looks more human than stuff that was generated by a computer from scratch.

Now, I don't think we're going to be there much longer in terms of, I don't think they're going to be distinguishable, because the answers that I see in most cases, obviously sometimes it goes way off the rails, but in most cases, the answers that GTP3 gives and whatever GPT4 holds for us, are very, very impressive.

And this lastly, I round off by, I saw you tagged me in it. Someone else sent me a post by a very smart chap known named Ferry on LinkedIn. He was highlighting another website that he estimated was generating about 50,000 pounds a month just by scraping... People also asked, featured snippets, and kind of mashing them together on a page and sticking some YouTube videos in like, not that technically advanced kind of scarping and kind of long tail content, which people have been doing for years.

But as I showed in the BrightonSEO talk, I did about zero volume keyword research, there are lots of sites that currently are having lots and lots of success and driving millions of visitors doing this, which I think stems back to Google releasing BERT, releasing passage indexing, becoming better at understanding these queries, these very specific queries, they're now more attracted to content that has the specific answers on and are relying slightly less on some of their older, well tested and still in play link metrics, because it seems we have a new generation of kind of AI-type spam that's working really well. So I think that's a whole minefield for Google and I think it'd be incredibly dangerous for John Mueller to be like, yeah, AI content's fine. That would really open the floodgates.

Jack: Yeah, I'm sure he's not allowed to say that, almost like there's a Google executive there with a gun to his back. Like, don't talk about AI content, John.

But yeah, I think also touching on what you just said and combining with what John said as well, it's the cat and mouse thing, right? We're always finding this, whether it's cyber security like Infotech people versus hackers, or whether it's Google trying to sort out spam stuff from their end of things, there's always going to be people generating spammy nonsense and making money from it and clogging up, for want of a better phrase, the rest of the web. And there's always going to be hackers that are breaking through and creating new issues for cybersecurity people. And then companies often hire those people, who then become the cybersecurity people who are then fighting the other hackers and da da da. And it's just kind of, yeah, on of like an ouroboros, snake eating its own tail, a feedback loop, cat and mouse game, where we're just going to be constantly battling spam versus Google, spam versus Google, AI-generated spam versus Google.

And yeah, I'm, I'm very, very intrigued to see where this is going, because this is clearly part of our future, right? Not to go off and into a big like we will become one with the machines, kind of Matrix-style tangent, but AI is getting more and more powerful seemingly by the day. And like just you saying GPT4 just then, just struck a chord of like, oh God, that'll be doing our jobs for us. They'll be auto-generating podcasts and deep faking my voice and creating podcasts from scratch and I won't be required anymore, and all this kind of stuff. It's just yeah, scary stuff. But very interesting as well. Very, very interesting.

Mark: With the kind of cat and mouse thing in mind, what I would like to see, because we have lots of AI kind of chat bots now on sites, is a chat bot for consumers where you can give it the goal. So, yeah, I'm having trouble with my broadband, I want to speak to a human. And then you can just load up their live chat and leave your AI to talk to their AI. And your AI can just answer all their questions and get you to the human or resolve, get the resolution that you're wanting. Just have two computers talk to each other, just be like, this is what I want. This is that I'll settle for. Can you please go and negotiate that on my behalf? Because we thought we did see that with the Google AI that made the phone calls to do like the hair booking thing, a couple years ago. So I'm interested where that will go. But yeah, that's where we are with AI content at the moment, which I would say is a gray area.

Jack: You and John Mueller agree.

Mark: Yeah. Yeah, kind of a gray area. It depends...

Jack: Well, that's all, we've got time for this week. We'll be back next week on Monday the 25th of April, with all the latest SEO and PPC news from Mark and I. Thank you so much for listening and we hope you have a wonderful week.

Google Know Your Data, Google Multisearch and Google’s stance on AI-generated content

Play this episode

Show notes

Transcript