There is so much nuance to SEO and so much to know. It’s such a treat to talk with someone whose understanding of the discipline is as wide as it is deep, and vice versa. We could probably riff for days on the topic. And the someone I’m talking about is Dixon Jones. Dixon is an award-winning member of the internet marketing community with 20 years of experience in search marketing. He is the CEO and founder of InLinks.net, which is an entity-based SEO tool. He was also formerly the marketing director of Majestic, the largest link intelligence database in the world.
In this episode, we talk about Google’s core web vitals and what it means for you. We talk about authority and trust metrics and how to think of them as page-specific, not site-wide. We get into how a link could be powerful but not trusted, and also how it could be trusted but not powerful. And we get into what triggers a knowledge panel–those boxes on the right-hand side of a Google search result that make you look like more of a big deal–how they’re populated, and how you can get one. This is great practical information. If you want to learn about future-proofing your SEO, this is a must-listen. So without any further ado, on with the show!
In This Episode
- [00:29] – Stephan introduces Dixon Jones, the CEO and founder of InLinks.net, a well-known, respected, and award-winning member of the internet marketing community with 20 years of experience in search marketing.
- [06:13] – Dixon explains how Majestic’s Trust Flow and Citation Flow works.
- [13:17] – Stephan and Dixon talk about Majestic’s scatter graph that can visualize how a link can be powerful and not trusted or can be trusted but is not very powerful.
- [19:20] – Dixon describes a knowledge graph as a big encyclopedia because of its concept and structure.
- [26:03] – What’s the advantage of using inLinks, an entity-based semantic SEO tool, to your website?
- [32:31] – Stephan and Dixon discuss the complication of building a knowledge graph in different languages independently.
- [37:55] – Dixon mentions Jason Barnard’s list of trusted sources cited in Google’s Knowledge Panels.
- [46:51] – How efficient is inLinks’ internal linking optimization for big websites?
- [52:55] – Dixon discusses inLinks’ content schema and FAQ schema.
- [56:37] – Visit inLinks.net to sign up for a free account and check out how inLinks can help you rank higher and stay ranked longer.
Transcript
Dixon, it’s so great to have you on the show.
Stephan, thanks very much for the invitation. I appreciate it.
We’ve known each other for many years. And I’m very excited to finally have you on the show to talk about entities, links, relationships between entities, the knowledge graph, and all that good stuff.
It’s been a long time, isn’t it in the search industry? It’s good to be on the show. We couldn’t do these kinds of things when we started. The technology wasn’t there yet.
Yeah. When did you start with Majestic? Speaking of ancient history.
It started on the internet, before that, with an agency called Receptional, which I founded as well. But I moved towards Majestic in 2009. Alex was in his bedroom, and I found a beater of Majestic’s product. At the time, Yahoo! Site Explorer was the only backlink provider in town, and everybody else is scraping Yahoo! Site Explorer. So I went on and spent $10 or £10 or something on some backlink data, pressed some buttons, hit the thing, and it didn’t work. So I tried to use a very early feature of the tool, and I just went on to support and started banging out a complaint. And I got Alex coming back to me saying, “Sorry, I’ve retro programmed it, fixed it, giving you some more credits. Try it again now.” That’s when I realized that he hadn’t used the Yahoo Search API to crawl the internet to get this data in the first place. And I thought I know everybody in the world of SEO that needs this product. So we kind of got together around about 2009. And then I kind of became more interested in that than I was about my search agency.
It's a lot easier waking up in the morning knowing that you're proactively future-proofing your business. Share on XThat’s very cool. You’ve made a big difference for Alex and Majestic.
You can’t do everything. There’s a big distinction between the marketing of a product and the making of the product. And if there’s one secret I’ve got in life, for any marketing person out there on the internet marketing, is make sure your developer owns more shares than you do. Because then they’re not going to leave your organization. And I find, again and again, it’s not a problem coming up with the ideas, but it is a problem finding a good developer that’s going to stick it out. So if you can tag onto their idea and help them and make them better, it’s a lot easier to ride around, if everyone has a better time.
Now Majestic is a must-have tool for link intelligence.
Find a good developer that’s going to stick out.
For a lot of people, what’s interesting about it is that it’s got a lot more granular. It already looks at links at a block level, so it divides a page into 40 different sections. So every two and a half percent of your content, the links are then sort of marked differently. And there’s quite a lot of research, from the early days of information indexing, that analyzing blocks of text is more effective than trying to analyze the whole page because the whole page can be about a whole load of different things. But if you look at links around a certain section, or paragraph, or section of the text, then it is much more likely that the algorithms are going to get some better hits and better understanding. Being able to do that is pretty cool. And being able to do things that are topic-based level is turning into something very useful, for inLinks as well. So they’ve done some good stuff.
Just recently, I’m no longer the marketing director for Majestic but I’m still an ambassador for them. I was very impressed in December when they came out with the ability to visualize links – four, possibly five links or ways. The link map, you can put in a website or web page, and it’ll show you not just the first tier links, but the second tier, third tier, and the fourth tier links and some incredible visualizations there, which is very interesting, if you want to start trying to find backlink networks. Or the fact that a link can be three or four tiers away from you, and it’s affecting your life. Before, we could never see if the front page of the BBC links to an article over here that then links to an article over here that is then linked to your site, and they were all citing effectively you as the source of information. You get a lot of benefit out of that, and Google would give some benefit to that. But you’d never be able to see it because you could only see the links to your site, but to be able to go several layers back now is changing the game again. They still got a lot of ideas coming out. I hope that they do very well because I still have shares. I’ll be honest.
That’s awesome. Some of my favorite features of Majestic, and we will talk about inLinks and the knowledge graph in just a moment, but I love that there’s not just Trust Flow – well, that’s a key distinguishing factor that Majestic has a separate trust metric from the importance or power metric.
Sites that are used, curated, and loved by humans are more likely to have a better Trust Flow than Citation Flow.
Yeah, so Citation Flow can be the proxy for PageRank, I guess if you want a better phrase. It’s pretty damn close to the proxy for PageRank. But yeah, Trust Flow is kind of interesting because it weeds out, it’s kind of the same math, but instead of starting straight from every page on the internet. Initially, Citation Flow starts with the number of IPs that are linking to any web page on the internet, and then takes out as a proxy to start in the same way that PageRank algorithm does. For anyone that’s really into the PageRank algorithm, like you are, the Trust Flow metric doesn’t assume the same thing. It just starts with a subset of sites that we know or believe to be trusted, and then extrapolates. So it gets to the same set of data and letters, a bunch of links that are just purely mechanically organized, and no human would ever link to a set, then those ones don’t appear in the system. But it kind of changes the emphasis a little bit.
So sites that are used by humans and curated and loved by humans are more likely to have a better Trust Flow than Citation Flow, but they’re the minority because there’s a lot of manufactured links, I suppose, not just manipulative. If I write a post on my blog, then all of my other posts, my author page, will automatically link through to those and stuff, so things happen. The Trust Flow metric is kind of useful, and then again, being able to break that down by topic as well. So if you dig right into this stuff, you can sit there and say, “Well, this has got some trust,” but it’s got about 800 different categories or something in there. When you think about it, logically, Bill Gates is incredibly influential in operating systems and the world of charities, but not so influential in the world of pop music, or social sciences, or other subjects. So trust and authority come with context. And I think that most metrics don’t bear that in mind.
Yeah, supercritical. So a Topical Trust Flow, you get a score with each topic, and the most powerful, highest-scoring, Topical Trust Flow will appear at the top of the list, and hopefully, it’s relevant to your category, niche, or industry.
For each page, because we do most of our stuff at a page level, we give you metrics at the domain level, but they’re essentially accumulated bits of maths and stuff that’s analyzed at the page level. And so we’ve got an overall score for trust. Then we can work out where that trust came from, the kinds of content that trust came from. It’s made up of pages that are about this, and this and this. So that’s how we then have a different score because it’s not a linear graph, you can’t just add the individual bits of trust together and always get the same number, but it’s all done through the magic of logs and things. It’s a bit complicated in that way.
So that’s an important point, too, that these metrics are logarithmic. And it’s not Majestic that most tools have.
Trust Flow and Citation Flow are logarithmic metrics.
Yeah. It’s a little more complicated than that. But it is essentially logarithmic. And I think what people don’t pay attention to is, you can see in the COVID numbers and all sorts of things in life and nature. But when it comes to links or pages on the internet and how powerful each page is on the internet, I don’t think people realize that a pretty high 90s percentage of pages have no authority whatsoever. So that means that the number of pages that even have an authority of whichever metric you use, whether it’s Majestic’s, or whatever, is already just a fraction of the internet. If you haven’t kicked out all of the rough stuff before you’ve got to number one, you won’t have enough headroom. Everything would be down at number one. You can’t sort of spread it out to get that kind of spread in the numbers. There are very few web pages and websites, web pages particularly, that are up at the top of that pyramid, certainly as a percentage of the pages on the internet. It’s very few.
Then that brings up another important point that folks tend to think about websites being authoritative, trusted, important, powerful, but it’s about pages because pages link to other pages.
That’s always been my frustration with the phrase “domain authority,” and I’m not blaming Moz for this; everybody uses the concept of domain authority or site authority. Even Google says it’s not. But the vast majority of it is not site-based. It’s page-based. Of course, it has to be. Take a big site, take eBay, of course, the homepage of eBay is authoritative. Probably the page on the delivery times is authoritative, in some way. But by the time you get down to the individual products, they’re not authoritative at all and have no power or influence. So I did some analysis on Mashable’s website. And if you have a look at the ways in which you do site code on Mashable.com and Google, and you can start seeing that some types of pages are much more likely to be at the top of that list than other prototypes. If you did want to be on Mashable, then obviously being on the homepage would be great. That’d be fantastic. But if you can’t do that, one of those static pages. If you can’t do that, it turns out that having a video on Mashable has much more emphasis. So the video pages have much more emphasis than the article pages on Mashable and stuff. So you can start to see how that logic breaks down when you take a big website and just have a look at the architecture of the URLs. Just putting a page up on the BBC doesn’t in itself make your site more valuable.
Well. It’s a nice bragging right. It’s a logo for the “As Seen On” bar.
I did when my agency at the time, Receptional, back in 2012. It got a link from the BBC issues near 2004 or 2005. And theoretically, that page is still there. But the only person that links to it is me when I’m trying to execute or so. It’s no better than a PBN right now.
That’s funny. Well, you mentioned eBay a few minutes ago. It’s funny that you mentioned that because that was one of the big winners in the December Core update. Amazon lost a fair amount of ground, but eBay got a lot of ground with that December Core update.
That’s kind of interesting because I know that the guys at eBay have thought about entities and been looking at what it means and things for a long time, actually for several years. That’s kind of reassuring for me that the people that are working on the entity approach are doing well because I hadn’t seen that eBay was a big winner out of that.
Yeah, we should talk about entities. But before we do, I want to close out this topic around pages versus websites and links. And one great feature about Majestic that I love is the scatter graph and comparing the scatter graph for the domain level versus the page level or the homepage that you’re evaluating. Being able to see what is above the diagonal on that scatter graph, meaning that it has more trust than importance, more trust flow than citation flow, that’s what puts it above the diagonal. And then the farther that dot is away from the XY axis, the more powerful and trusted that link is. And if it’s domain level, that’s interesting. But when we’re talking about the page-level scatter graph, it’s a lot more relevant to what I’m trying to evaluate. So I can see that most of what’s happening is below the diagonal, or most of what’s happening are towards the XY axis.
An SEO's worst nightmare is finding out one day your site will not be optimized because of an update you missed. Share on XAbsolutely. It was an interesting thing. That was one of the very first visualizations that Majestic got out before doing tier four or five and stuff, that’s some time back, but it got people’s imagination. It showed that a link could be powerful and not trusted, and also can be trusted and not very powerful. And it makes people start thinking about that idea. Well, I think Majestic carried on trying to do that all the way. And that’s how they lead to topics as well, because we’re trying to break that down more into context, and then just be on trust and power.
Yeah. And speaking of trust, for those listeners who are not familiar with the acronym E-A-T. They should read some articles about that. And I have one on Search Engine Land.
Or you could even read a hundred pages of the actual E-A-T documents that Google has. The expertise authority and trust document. Marie Haynes is always quoted as the woman that talks about it a lot. But I think that your stuff is really good on it. And there’s a lot of useful stuff out there. Google is very open about Google’s guidelines, but what I like about it is that Google is saying this isn’t what the algorithms are doing. Humans are scoring the algorithm’s output. So what it’s telling us is where Google is trying to get to with its algorithms. By following the E-A-T approach, you’re trying to get to where Google’s trying to get to.
You’re future-proofing.
By following the E-A-T approach, you’re trying to get to where Google’s trying to get to.
Exactly. I think that it’s a lot easier to wake up in the morning, knowing that you’re trying to future proof your business because I think that the biggest stress for SEOs is they go to bed saying, “Thank God, my client site is optimized.” And then there’s an update, which by definition, if you’re going to go live by that rule, it means the next day you wake up, and your site is not going to be optimized. So all you’re doing is living for the cliff. You’re waiting for the cliff. And the stress that goes along with that, it’s just immense.
It’s like trying to trade Bitcoin and just buy on the dip. So I found the article on E-A-T that I had written a while back. It’s called, “There’s no shortcut to authority: Why you need to take E-A-T seriously.” The Google quality rater guidelines is where you could get the information straight from Google, from the horse’s mouth. But if you want a summary article, this is a great start. Also, Marie Haynes has a lot of great content around E-A-T. But the idea of all these human reviewers who are contracted by Google to score different webpages, that is a data source that the machine learning algorithms can use as training data to get to where Google wants to be in terms of understanding what’s trusted or trustworthy, what’s authoritative, what is written from an expert opinion versus just an average Joe. Very important.
I went to the Tower of London once, and so I was doing a picture of three people that have been to the Tower of London, one was a beef eater, one was Michael Schama, who’s a well-known historian, generic historian that does TV programs in the UK. Another one was Jeremy Clarkson. I don’t know if you guys know him, Top Gear guy. Anyway, he’s very famous in the UK base, sort of not very authoritative, shall we say? Typical representor. And it’s really interesting, all three of them have some experience with the Tower of London, but then you kind of say, “Well, which one would you trust? Which one has authority when it comes to talking about the Tower of London?” And I think that that’s an interesting dilemma for a machine to try and figure out, let alone a human being. So it’s been difficult to figure out. The beefeater is the one that knows the Tower of London because they go around and spend every day talking about the history of it to hundreds of people, but no one knows who they are. But of course, they’re the most authority.
The knowledge graph is a big encyclopedia.
That’s a great point. Let’s talk about entities and how Google has structured its knowledge graph, what it bases the knowledge graph on. And then how inLinks, your tool, kind of feeds on that or extends that into something that website owners need to think about and take actions on?
I think that the knowledge graph is really interesting that Google went there in the first place, to be honest with you because clearly, the internet got too big to index page by page. What also happened is that everyone was talking about the same stuff, there’s only so many pages that you can write on how to tie a bow tie before it becomes more sensible for Google to understand knots, and bow ties, it’s two different ideas and then put those ideas together. So the way that I look at the knowledge graph, I say, the knowledge graph or knowledge graphs, because of course, it’s not just Google, although that’s the biggie. But it’s at scale, and it’s better to index and understand concepts than it is to index and understand pages. And by understanding a concept, you get around a whole load of issues. Because if you’re trying to find things around the art of search, for example, then at a keyword level, and it’s kind of interesting, but as soon as you get to the concept of search engine optimization, and then these are authoritative information to add to a corpus of information about search engine optimization, that becomes a much more efficient way to organize the world’s information. So I see the knowledge graph is a big encyclopedia. And I know somebody will tell me off for doing that, but I ain’t that wrong.
Now, one thing I think is important to point out about this is so you mentioned concepts and how they relate to each other. But they’re also persons, places, and things that Google is better at analyzing and connecting up and doing the relationships on. So if you use, for example, the Google Cloud Natural Language API, which is an AI or machine learning algorithm that you can try for free. You can put in the text from your homepage, and have the tool, analyze it and tell you what the entities are that it identified from the page, and which ones are salient, and to what degree. So you get a salient score for all of the topics and entities that it’s identified on the piece of content or on that page. And that’s cool.
We’ve got an interesting little secret about that NLP API, which is about to come out in a blog post. So I might tell you, it’s a secret.
I would love to hear that. But this tool gives you some information about what your page or the piece of content that you pasted into the text box is about. But it doesn’t go as far as I’d like it to. It’s more focused on identifying things that are people or places or like geographical locations and things like that. It’s not so good at identifying topics or concepts as I’d like it to be.
Yeah, so there are two things about that. One of them, this might be the first time it is mentioned. Firstly, Google has some patents about how they understand topics and entities, and they use the concept of capital letters as an entity. I don’t know how you can have a patent about that because I was always taught at school that a proper noun has a capital letter. But anyway, there is a patent, so they are using capitalization to help them identify entities, which is why places, people, and brands tend to get picked up better in their algorithm than others. But here’s the really interesting thing about Google’s NLP API, which is that when you run that on their free service, what comes back is there are a lot fewer entities shown in there than you think. So it comes back with a little grid saying all these different things. But if you look at those, some of them are not entities, like best software, best sneakers, best black, blue sneakers, or whatever. As soon as you get the word best in there, it’s not an entity, and you can’t see that in any kind of way, shape, or form. It’s still a keyword. The only entities that they’re bringing back, you can’t see you’ve got to investigate, because the only ones that are proper entities that are defined in that algorithm return are the ones that have got a link to a Wikipedia article. Because all the other ones are essentially just showing the salience of the keywords.
So it’s a natural language processing algorithm, not an entity extraction algorithm. So I think we’ve looked at that, and a lot of SEOs and I’ll put myself in the same category kind of looked at, for all the little boxes that are coming back at different entities, you look at it, and it’s not, it’s only the ones that have got the buildings to Wikipedia articles that are purely defined as entities bought back in the data. So if you have a different algorithm, that’s an entity extraction algorithm, which is what inLinks has, and it’ll extract entities much more aggressively than the ones that are coming out in Google’s NLP API, we can create a metric what we call SEU, Search Engine Understanding, that is essentially the number of actual entities, the ones that link to actual Wikipedia pages in Google’s NLP responses, versus the ones that we see. And that gives us the potential to see how many of the entities that are actually in the text of a page are understood by Google and reported as understood in their NLP API, which goes a long way. But the main point here is people do not understand what’s an entity in the output of the Google NLP API because most of the stuff isn’t. Even the salient is referring to keywords, not to entities.
inLinks has an entity extraction algorithm, which extracts entities more aggressively than the ones that are coming out in Google’s NLP API.
That’s great — a very important distinction.
Honestly, I better make sure that our entity SEO, which by the time this has come out, I am sure is an entity-based SEO guide, which was written by Fred, not me, who exposes. This one has come out on the inLinks.net Blog because that’s him that found that out. And he found out quite a long time ago, but he’s been sitting there on it and building our inLinks system on that bit of knowledge for well over a year now.
Let’s describe the benefits to the listeners of using an entity extraction algorithm like inLinks on their website. Why would they bother to do that? Why not just leave it for Google to figure out what topics and entities you’re talking about?
There are two or three different things there. Apart from our ability to automate internal links and stuff, which we can maybe talk about. But the main thing is that if you start by looking at a piece of web content and extracting out the entities, if I do that at the site level, and let’s say your site is about blue widgets, black widgets, and green widgets, and you’ve got that talked about all over the site, but you’ve got one page that is specifically about blue widgets and another page that’s specifically about green widgets. Then every time you kind of talk about these things, we can plot where you’re talking about them and work out some kind of level of cannibalization so that we can then focus all internal links to the main page that’s about blue widgets.
The reason why knowledge graphs were created is that it's better to index and understand concepts than it is to index and understand pages. Share on XThe other thing that we can do is as soon as you tell us which page is the one about blue widgets because we’ve understood this relationship between Google’s NLP API and Wikipedia, we can then write schema on the fly. We inject that straight in with JavaScript. So I don’t think many other tools do that; they don’t have JavaScript on the page or suppose that the WordPress plugins and things have equivalents. But we can inject these things on the fly. So we can sit there and say, “You’ve told us that this page is primarily about this. But we’ve also read that it’s about this, this, this and this.” So we can sit there and say, “It’s mostly about these ideas,” and we can write that in the schema. So we can say, “Dear Google, this is about a thing, or a person or an event or whatever, called ‘this.’ And in case you don’t understand what that is, here’s a Wikipedia article so you can’t get it wrong.” So in marrying those up and since we got what the page is about and what the pages have mentioned. So that kind of gives it a hierarchy as well.
So you can say it’s about these two or three things, one or two things, it mentions these other ideas, but they’re subservient to but related to the main topic. So the beauty of mapping comes into its own when you try to do on-page optimization because what we can then do is we can build a knowledge graph. So that’s building a knowledge graph of your page. Your page is now about all these different bits and pieces, these different ideas, and these different entities. But then when you want to optimize for the blue widget, or put in any keyword here, I say keyword, we can then go and find the ten most authoritative pages for the phrase that you’ve entered. And we can run the same entity extraction algorithm on that content. And then the whole magic of the on-page stuff as we can compare your attempt to the optimizing for that phrase with the best of the breed and show you what topics you’re missing, what topics you might be over-egging, and then you can dive into that and find related topics. You can build up a picture of the topics you need to add to your content to try and be more authoritative around the concepts that are important for the key phrase that you’re trying to optimize for.
The beauty of mapping comes into its own when you try to do on-page optimization because we can build a knowledge graph.
It literally will extract the best-of-breed ideas and compare them to your ideas. And if you haven’t got any ideas, you haven’t written anything yet. Okay, let’s still come up and see what the main ideas are that you should be talking about and give you a brief to say, we’ll start with these ideas, write in this kind of order. And I think that’s a much, much better way to go than a keyword-based approach. Because although we start with a keyword to try and get the data, we immediately pull it back to entities, we’re reading these pages of content, and we understand the underlying entities. And it gets rid of that problem of synonyms and recognizing plurals or variations on a word or the fact that an engine is a combustion engine or a search engine or whatever to try and find all those contexts of what something means in what context has already been worked out by the entity algorithm, if you see what I mean, and you don’t get into the habit of trying to say green widgets and just try to rephrase the same keywords. That’s not how the future searches are, it’s about understanding the underlying entities.
The other thing that’s important about an entity-based SEO approach is that entities are not language-dependent. So the keyword for washing machine in France is washing machine. The keyword for washing machine in England is washing machine the keyword for washing machine in Spain is–I don’t know the Spanish word for washing machine, but it ain’t washing machine. And of course, that word is different in every different language. But the concept of a washing machine is the same. And so, an entity-based approach is giving a whole new level of scale to Google, which we probably haven’t even started to think about yet. But if you realize that entities are language-independent, then imagine the magic if they can work out everything they need to know about Fujitsu in any different language and feed it into the knowledge graph Mt. Fujitsu. And somebody in Japanese can add that the mountain has just grown three feet in the last year. And all of a sudden, that pops into something in French over there. And because it’s in the underlying information, I think that’s exciting. And it might break down some of the problems of echo chambers that we have as well.
One of the flaws I think in this approach, of the sentence-based approach, in particular, Wikipedia-based approach or Wikipedia itself, is hugely biased to white males, like you or me updating Wikipedia. And so, all of a sudden, Brighton pier suddenly gets a lot of emphases, but the major pier in Addis Ababa has no appearance in Wikipedia, even though it’s used by 10,000 times as many people here as Brighton. So there’s this inbuilt bias into Wikipedia’s data. And anything that we can do to combat that in the future, I think it’s going to be an interesting game. But that’s a story for 2022 or 2023.
Wow, that’s great. And to figure out that a washing machine is an entity with different names and different languages means that you can carry over the relationships that the entities have with each other. So the fact that you probably want to pair up a washing machine with a dryer, and vice versa, is something that you can translate over to another language without having to rebuild the knowledge graph or the whole map of relationships.
Because if you have to build the knowledge graph independently in every single language, without having any connection between them, you would have errors in your knowledge graph because some people wash by hand and some people wash by machine. So there are all sorts of errors that are going to happen if you try and make those things up independently. And there’s not going to be a common point of truth. But if you can make the common point of truth language-independent, then at least search engines can then use that as a basis to derive or extrapolate ideas and concepts.
So as Google moves towards being more of an answer engine and an entity engine, you will see less of this kind of being left behind with a version of Google that is for some language that isn’t that popular. Because if you’re trying to optimize for, let’s say, google.ch, that’s a lot easier to do than optimizing for google.com. And that’s in part because the google.ch algorithm isn’t as up-to-date as google.com is.
If you can make the common point of truth language-independent, then search engines can use that as a basis to derive or extrapolate ideas and concepts.
Yeah, I think that’s true. And then if google.ch results can then start to bring in and possibly translate articles from other languages if it’s finding weak results in the “ch” database, but it now understands the underlying concept of the users’ query, then with them also doing translations on the fly and other things on the fly, then there’s every possibility in the future that the results that you get back, are not even in the language, originally. You’ll see them in your language, but they’re not designed or written in your original language. They were written in a completely different language and have been translated in by machine translation. And I know they’ve got this little “Would you like to see this in the original format? This was originally French,” or whatever it may be. So yeah, I think there’s a lot that Google can do with that. And I don’t suppose they’ve started, but you can see them getting the translations. That’s been getting easier and easier to do. And so now, on Chrome, you go to a page that’s not in your language. It is really easy to press the button to translate it, and the translation is pretty good.
Or it does it automatically in Chrome if you set it up that way.
If you set it that way, you get to choose to do that. And I think it’s not quite as good. If you want a better translation tool than Google right now. I think DeepL is a little bit better. But who knows where they’re gonna get to keep up if Google carries on like this. But also, I think you see entities coming out in all sorts of other areas of Google’s products as well. So Google discovery is a pretty good place. If you’ve got an Android phone, then Google pretty much picks up the ideas that are related to the ones that you’ve been interested in and can suddenly spurt out information about different entities and topics. They’re using the word topics more than they’re using the word entity at Google. So they kind of use the word “topic,” which makes more sense. And so it comes out there. Google Trends, there’s obvious because when you start typing in Google Trends, if you want Danny Sullivan, the racer, Danny Sullivan, the guy at Google that used to be an internet marketer, or Danny Sullivan, a bunch of search phrases. And it’s saying that these are topics. So it’s picking out and showing what topics are. And actually, you go to Google’s main search now, and it’s got little icons where it’s got topics, you can see the icons in Google suggest. And so it’s becoming more obvious where the topics are if you just keep half an eye out for them.
If you want a better translation tool than Google right now, DeepL is a little bit better.
Yeah. Now, you said earlier that Wikipedia is a primary source for Google to figure out that it is an entity. Now, what about Wikidata, Freebase? How do those feed into this?
When Google bought Freebase or the company that owned Freebase. And the really unfortunate thing is, I wasn’t quick enough to get my name in Freebase. Because there was a time when you could just go into and freely add your name or whatever idea to Freebase, and it would become an entity as a result of that legacy. So those entities are there. And you can look at the IDs. If you go into have a look at the different entities, you can see a Freebase ID. That Freebase ID is now mapped into Wikidata as well. So if you’re going to Wikidata, and lookup an entity in wikidata.org, if it’s got a Freebase ID with it, as well, then that Freebase ID is there as well. So Wikidata is trying to match up the Freebase IDs and the Wikidata IDs. And, of course, they’re going to be trying to mix them up with other things. So if Google decides that a source is authoritative, then that’s great. So they’ve kind of suggested IMDb and a whole bunch of other things. There’s a guy, Jason Barnard from Kalicube.pro.
Make sure your developer owns more shares than you do. Then they're not leaving your organization. It's one thing to have ideas and another to have someone develop them for you. Share on XYep, he was on this podcast.
Yeah. So he spends a lot of time trying to find the data sources mentioned in Google’s Knowledge Panel. Finding the data sources and then recording them in a database, his logic being that those data sources are now being used to feed the Knowledge Graph. I think one thing that is still unknown, and I don’t know if Jason has got further down this than I have, is that it may be that there’s a difference between Google treating it as an authoritative data source and using it as a source that it can cite as a bit of information from it. So it may not be using all of those data sources to create entities. It may be using those to expand its knowledge about a particular entity or topic, or idea. And then so it gets displayed in the knowledge panel as interesting facts about a particular idea.
In other words, an example of this might be a search for Hamlet Batista, who’s an SEO guy, a friend of mine, and he has a knowledge panel that’s not based on Wikipedia. He doesn’t have a Wikipedia article. But what is listed as a source that provides a little blurb about him in that knowledge panel was from Search Engine Land from the author page of his Search Engine page, and now it’s the Search Engine Journal author page. And so those two websites would be trusted sources for Google pulling in data to populate the knowledge panel, but it may not be a trusted source that is triggering a knowledge panel to get created there. There’s a difference between those two things.
I think so. So Dixon Jones also has a knowledge panel and no Wikipedia page. It took me a long time to get that. I gotta say I didn’t write a book but writing a book, Stephan, is a great way to do it.
It only takes a couple of years to write a book.
Yeah, and I know you got your knowledge panel because of that book behind you, The Art of SEO. It was a great piece of work and continues to be, and I’m glad you keep on updating it.
Working on it now.
But you’re right, and I think that the data sources that Google may use for seeding ideas are going to be different from the ones that it uses for expanding on an idea in the first place. So getting that entity in the first place is a challenge. But I think you can certainly use Wikidata to do that.
Let’s just clarify for our listeners who are not familiar with that website, wikidata.org is a sister site of wikipedia.org. Just like wikinews.org is a sister site of Wikipedia. And many people don’t even realize that the Wikidata site exists.
Yeah, because it’s just about the underlying entities. But ultimately, every Wikipedia article is referring back to the Wikidata that’s underlying it. And the great thing about the wikidata.org structures is that it’s feeding entities out to all of the Wikimedia Foundation sites and things. So anytime any of those sites cite anything, then it’s probably going to get back to the wikidata.org stuff.
That makes it important to think about what about my company about me, personally, if I’m considered at least somewhat notable, is in the Wikidata repository, and do I call in favor with a friend who is not directly related, is not an employee or a contract or anything like that, so that I’m not violating the conflict of interest guidelines of the Wikimedia Foundation. But then the notability requirements for wikidata.org are lesser than wikipedia.org. So you might not be able to make the cut to be considered notable enough to have a Wikipedia article about you. But it’s a lot easier to make that cut to be worthy of being in wikidata.org with a page of a bunch of data points about you or about your company or about a product or service that you’ve created.
Yeah. And I think that’s important, but you can still get kicked out of Wikidata, and I’d still be a little wary of just diving in and trying to do it yourself.
Because of that conflict of interest guideline, it applies to all of Wikimedia Foundation’s websites, not just Wikipedia, Wikidata, Wikinews, etc.
It does make me wonder, though, Stephan, when you’ve got that guideline in there, how did they create so many articles when no one’s got an interest to do so? Surely, by definition, the fact that somebody’s writing an article on artificial intelligence or bricks, they’ve got to have some kind of interest.
Wikipedia is a primary source for Google to figure out that something is an entity. Share on XThey’ve got an interest, but they don’t have a conflict of interest. They don’t have a direct monetary benefit to writing or revising that article. So if it’s an article about, let’s say, IBM, having employees of IBM editing that article would be a direct conflict of interest because you’re trying, therefore, to sway public opinion towards being favorable towards IBM, and that’s not encyclopedic. Encyclopedia Britannica wouldn’t allow that. Why should Wikipedia allow that?
Except that Encyclopedia Britannica might well employ somebody who is an employee of IBM to write. So I think there’s also a missed opportunity by Microsoft here as well. I don’t know how many of your viewers are old enough to remember Encarta, which was their encyclopedia. And it was a good encyclopedia. It gave Encyclopedia Britannica a run for its money. And one day, they just shelled it; they just gave up on the insight of Encarta. And then see Google, then take the idea of an encyclopedia and turn it into a centralized theme. Somebody surely is gonna buy Encyclopedia Britannica or something like that and say, “Well, you know what, let’s go and do this with paid editors because the idea that Wikipedia is entirely done voluntarily surely has some limits at some point, and the data within it may have a few ambiguities, as we say along the way.
And biases. Right? So people who can’t afford to put a bunch of free time into a pet project, like Wikipedia editing, won’t have their voices reflected in the content, and somebody who’s got a cushy job and makes a lot of money, and therefore, they have some free time that they can afford to spend editing Wikipedia, that reflects on a bias. Oh, and also, you mentioned ambiguity, and that brings up the point that disambiguation is an important thing that Wikipedia has done well with. And thus, Google has because it’s basing a lot of its Knowledge Graph on Wikipedia and Wikidata. Disambiguating means, like, say that the word “dolphins” could refer to the football team, as well as it could relate to the mammal that’s in the ocean.
Disambiguation is an important thing that Wikipedia has done well with.
And this is where natural language processing really helps disambiguation in context because when you’ve got the word “dolphins” on a page on its own, you’ve got no idea which one it means. But as soon as it’s in a sentence, you can quite easily see whether that content page is about the topic of football or whether it’s about the topic of marine life. So that distinction is easily made with a decent NLP algorithm over the content. But it doesn’t mean to say that a dolphin is only a mammal that lives in water or only a football team. It’s both but in completely different contexts. And this is one of the reasons why I think the gambling industry has some real challenges with search engine optimization, I mean, the casino industry. Because all of their slot machines and things that are called the Pharaoh’s Gold or the Klondike, whatever they’re using words that are completely out of context with what they are. What they are, is a gambling platform. And then, they use words that are designed to attract gamblers emotionally. But those words are a long way from gambling, and as soon as they try and optimize around the gold of the pharaoh, they’re all of a sudden, actually deoptimizing their site in many ways. I think the gambling industry has some interesting challenges, not that they will have any trouble getting around them.
Right. So why would a big site, let’s take IBM as an example? Why would they need internal linking automation, as the inLinks tool provides? Why would even any site want or need that?
Massive efficiencies of scale, The two things that a big site would need out of inLinks. The first thing we do is run our NLP over whatever pages they tell us that they want us to have in their knowledge graph. We build a knowledge graph for them. So that will allow them to see which topics they are talking about. We can also then use that to compare, where you can compare page by page, but you’ve got that knowledge graph of your site. So you can sit there and say, “Right, okay, these are the things, and now we’ve got a nice sort of map as a pretty interactive graphic.” But if you are explaining a concept in a page of content on IBM’s website, say you’re talking about Deep Blue, the computer which they built, or Watson or whatever, and you’re talking about these kinds of concepts on a page that Deep Blue is talking about, that it’s a precursor to Watson or the machine that beat Garry Kasparov at chess. Then you’ve got Deep Blue as a concept that’s owned by IBM. And you’ve got Watson that’s owned by a concept of IBM, but somebody might not necessarily know about Watson or Deep Blue.
inLinks runs its NLP on the pages big sites want to have in their knowledge graph.
To be able to guide the user in the text so that if they want to know more about that, then it’ll dive straight into the page about Deep Blue, automating all of that becomes powerful because you may at some point, update all your information about Deep Blue. They’ve changed Watson to Deep Blue, and all of a sudden, they’ve rebranded or whatever. So you may have another page, now your cornerstone content for an idea. Well, if you’ve already written all those internal links manually and put them all over the site to say that this is the page, the only option you’ve got is a 301 redirect to the old content, I suppose. But that may not always be appropriate, and it may be that that page is still relevant, but not the most important page now for your new launch. And even if you do have 301s, of course, the more hops you have, the less likely it is that a search engine is going to bother getting to the end of all those 301 redirects. So much better to reanalyze the whole context. If you’ve rewritten your content anyway, some of those links may not be appropriate anymore, and other ones will. But if only you had a natural language processing algorithm to understand that, and then you can then effectively rewrite that link structure for that concept on the fly by just changing the association to the page, and the algorithm will do the rest.
Yeah, so that’s great for the user. It’s also great for SEO because then you are sending link equity internally to the most authoritative and important pages on your site about the topics, and you’re minimizing the cannibalization.
Yeah. And the important thing is we’re passing that authority in context, and we’re not passing just because we can link from the homepage to any page and give some authority to those on a page. We’re not doing that. We’re doing it in context. So it’s when you’re talking about Deep Blue, then we’ll link to Deep Blue. And also, you can then vary that a little bit. So if you want something like Marriott Hotels, for example, they may want to have information about all the restaurants locally, but they don’t want to link a pizzeria in New York to a hotel in Tel Aviv. They need to keep some silos in there. So you can then put context and say, “I want to link food restaurants to this page, but only when they’re also talking about New York,” or something like that. So you can then put context in the rules.
Contrast that with something that would be a very brain-dead approach to internal linking. I remember this from I don’t know, at least 10, maybe 15 years ago, Expedia used to include footer links that had all these different contextually optimized keywords linking to landing pages about those topics. So if they wanted to rank for cheap hotels in Bangkok, they would have that in the footer link. And they would mix in all these different footer links, depending on which page, and they had some algorithm figured it out. I think about which page and when to remove a link and add another one in its place, and so forth. That’s not useful to the user.
It’s not so useful to the user as well. Those links didn’t have any context around them. They worked brilliantly in the old school days.
But that’s spam these days, right?
It’s not the best course of approach. It’s much better to take the user when they’re reading, and they can see it in context, and you can take the meaning around that. Because otherwise, you’ve got a page literally on Miami hotels linking to a page on Tel Aviv hotels, and it’s just a list of links. It’s no different to those old school directories where you press a button, and it’s got them all sort of all those pages are links. So that context is all-important, at least in the entity-based model. I think that’s all-important. Yeah, absolutely.
So cannibalization happens when you have two different pages or multiple pages that both have an opportunity to rank for a particular search query. And perhaps the one that is outranking, the other isn’t the one that you want to have the higher ranking page because it doesn’t convert as well, or it’s not as up to date, or it’s not as relevant. So you need to understand which pages are ranking and have the potential to rank for a particular search query so that you can guide the search engines to go after the one and favor the one that you think is the most appropriate and the most valuable.
Yeah. We think internal links can do that. I think schema can help do that as well. Telling Google in the schema that converts it to machine language effectively to an entity that Google can understand. And you sit there and say. “This one is about this” and mentions the secondary idea. “Whereas this one’s about this” is just mentioned as a secondary idea, and it gives that hierarchy as well. But then the internal links just reinforce the direction for the main concept, main topic.
And what you’re referring to is schema.org markup, specifically, the about markup in schema.org.
We create two lots of markup, about and mentions whether they’re the same kind of thing. So basically, about the schema, I call it content schemer, it’s probably in this section of content schema, but the syntax is, this is about this kind of idea, as a thing, or person and organization place. And that’s got a label, which a dolphin, there might be, but it could be until you know whether it’s an animal or whether it’s a football team, you can’t get the context. So then you use the Wikipedia URL in the schema to show Google what you mean by that concept of dolphin. So that takes it to the fact that it helps to pinpoint or should help if it was a decent search engine, it’d be reading that information. Our evidence is suggesting that they are taking that information into account. We also create FAQ schema on the fly as well, which we kind of did, because it was relatively easy for us. So if we see more than one tagging question, followed by a paragraph of text, we assume that that’s a FAQ schema.
When we built that, Google was quite happy to see two lots of FAQ schema on a page. They’ve since come out and said. “If you’ve got two lots of FAQ schema, we might get confused.” And that’s a bit of a thing because we’ve got no way to turn it off now. You can delete them in our system, but if you are using another schema generation tool, which I think you probably should, at the moment, because we only create those two kinds of schema, we don’t create review schema, we don’t create event schema or all those old recipe schemas and all those other magical snippets things, were not helping the snippets, particularly at all except the FAQ snippets. So you probably may want to use some other kind of schemas as well. So just be a little bit careful, if you ever do use at all that your FAQ schema isn’t clashing with some other FAQ schema that you put into it, a Yoast plugin or whatever it may be.
Got it. And then to come full circle to earlier in the conversation, when we started this interview, I was asking you about Majestic. There’s a great resource tool inside of Majestic called Link Context. That shows where on the page the links are coming to your site. And that’s pretty awesome.
If you can take the time to understand those little graphs that often look like candlesticks and things, you can see. Exactly, if all your links are coming from footers, then it’s so obvious that you don’t need to go very far to realize that. And yes, that Link Context stuff, they came out about a year ago, I think a year and a bit ago. And that was the start of them. I think having a little bit of a revival on their love of links. I think they got to a point where people were thinking, “Oh, well, you can use this other tool here or that other tool there.” And instead of trying to go broader, they’ve doubled down on link information and decided to not try to be a rank checking tool or not try to be an all things to all people kind of tool.
And I appreciate that. It’s a great tool. I recommend it all the time. Awesome. So if folks are interested in trying out inLinks, and they can get a demo from you, they can get a trial period or something…
You can go to inLinks.net and sign up for a free account. That’ll let you play with 20 pages for free forever, and you don’t need a credit card or anything for it. But honestly, if you want to try it properly, as soon as you sign up, it’ll sort of look like an intercom. It’s not Intercom; it’s the thing called GoSquared, a little chat box where we’ll kind of try and tempt you with one of those messages saying, “Why not do a demo. Do the demo.” You can either do it in English with me, or French with Fred, or English or French with Kareem as well. And we will go through about 30-45 minutes going through the different features and stuff. The big advantage of doing that is that we will give you a free month after that. If you go on the demo, we will refund your first month with no commitment. If you don’t want to carry on, you can tell us that you want to cancel. But essentially, that’s the way to get a free month. And also, we found that there are people that try it on the free account or sign up to buy it. We try to be low-cost entry, but we find that people that just sign up without getting involved with us will be much more likely to stop after a month or two because they haven’t seen the power of the whole thing. So that 30-40 minutes demo can change your life, and you can start seeing why it’s different from other tools. I hope.
That’s great. And you have a blog on inLinks.net that has some great content there. Some thought leadership posts.
I started there, so I thought, you know what, I’ve got to understand this stuff if I’m going to start marketing this new approach. I guess the thing about SEO is you have to be in the world of the blind, and the one-eyed person is king. You’ve got to do some research.
The one identity.
The one identity is king or queen. And my personal stuff is at DixonJones.com as well.
All right, awesome.
Including my current medical state, which is not so great. I gotta be honest.
You’re still healing from that mountain biking accident.
So I’m not allowed to stand up yet. But you know, we’ll get there.
Well, speedy recovery.
Thanks very much.
And thank you for all the wisdom that you dropped on this episode.
That’s very kind of you, Stephan. Thanks very much for writing the book.
Awesome.
Important Links
- Dixon Jones
- LinkedIn – Dixon Jones
- Facebook – Dixon Jones
- Twitter – Dixon Jones
- InLinks.net
- Majestic
- Link Context – Majestic
- Announcing Link Graph – Majestic blog
- Jason Barnard – previous episode
- The Art of SEO
- Moz
- Mashable
- There’s no shortcut to authority: Why you need to take E-A-T seriously – Search Engine Land article
- List of sources cited in Knowledge Panels
- Marie Haynes
- Jeremy Clarkson
Your Checklist of Actions to Take
Prioritize my website’s inbound links. Make sure they are of high quality for Google to deem them as trustworthy content.
Hire an excellent developer who can help implement SEO strategies on the website. Building SEO-compliant sites makes for easier crawling by search engines.
Be more mindful of my site’s citation and trust flow. These terms coined by Majestic are essential metrics that depict how trustworthy a website is based on its inbound links.
Ensure I depict trust and authority on every web page. It’s best to do a regular site audit to ensure there aren’t any broken links, duplicate, or thin content.
Establish an excellent site structure that can quickly tell Google which pages of the website are the most important.
Be familiar with E-A-T, “expertise, authoritativeness, trustworthiness” in relation to SEO. Be more authentic and holistic in approaching my digital marketing strategies—having these as foundations will help me rank better on SERPs.
Future-proof my business. Be more proactive in implementing strategies that promote longevity. Rethinking past tactics, going digital, and remaining up-to-date with the current trends are some ways to prepare for the future.
Establish a clear and concise concept for Google to quickly understand what my website is all about. Bear in mind that Google understands concepts, not pages.
Keep updating my site’s content by adding a blog, resources, or FAQs page that contains relevant content for my target audience.
Check out InLinks.Net to learn more about how I can outperform the competition through content optimization.
About Dixon Jones
A well-known, respected and award-winning member of the Internet Marketing community with 20 years of experience in search marketing and 25 years of business innovation. An expert in Information Retrieval manipulation, specifically in Big Data environments. A pioneer of the Freemium SAAS subscription business model. A family man.
Leave a Reply