TF-IDF is Like a Cheat Code for Keyword Research
Does this sound familiar? That 3,000-word piece of long-form gold you published isn’t ranking or performing.
You did the work! You spent hours meticulously researching keywords, and added them to your post perfectly. Still nothing. What do you have to do to impress this fickle algorithm?
This happened to us recently and we discovered that our failure wasn’t so much about the keywords we were using. It was more about the “other” words that we were not using. Allow us to explain.
We recently discovered the virtues of TF-IDF analysis, and holy smokes, it’s a legit game-changer. It didn’t replace our keyword research, it showed us how to take it up a notch.
What is TF-IDF?
90% of all SEO trends are subjective, at best. As an SEO expert, I hear a new “surefire” trick, we’ll roll our eyes and say, “Yeah Ok. Show me the data.” The effectiveness of many of these hacks is gone with the next Google algo update.
But make no mistake, TF-IDF is no trick or hack.
TF-IDF stands for “Term frequency–inverse document frequency.” It’s a complex algo that looks at the frequency and importance of the words used throughout a document in a collection or corpus.
In the world of SEO, it can take a look at the search terms you’re trying to rank for and show you the words and phrases (besides the keywords) that the top 10 highest ranking blogs have used to get there.
Think of it this way: Keyword research shows you the keywords you need to optimize to rank. TF-IDF shows you the “other” words you need to leapfrog what is already ranking.
Let’s say you’re writing a blog titled, “How to Bake a Cake That Will Impress Your Mother in Law.” Your primary keyword could be “How to bake a cake.” Your secondary keywords may be “bake a cake from scratch or “bake a cake step by step.” Rock-solid start.
Now, you would take that to the next level with TF-IDF analysis and discover that the top 10 blogs on this topic all used words like flour, sugar, pan and hour. This shows you that failing to use any of these words will likely cause the Google algo to see your content as less complete.
How to Use TF-IDF for Next-Level SEO Research
My example above was not great. So, we will now use some real-world examples to show you what TF-IDF can do.
Let’s say you run a divorce law firm. You have identified “file your own divorce” as moderately competitive keywords that you want to go after and own.
Step 1: The Only Step That Most Marketers Take
Your first step is to plug this term into the keyword tool of your choice to see:
1. How competitive it is
2. What the search volume is
3. Any related keywords you can also use
All of those are great, but we use ahrefs. It gives us really deep and insightful SEO data that we can really nerd-out on. So, we begin our research in ahrefs and generate a report that looks like this.
Boom, there we go. We can see that this has a good sweet spot of medium competition, with enough search volume. It’s worth going after. We can also see that we should consider secondary and tertiary keywords such as:
- How to file your own divorce
- File your own divorce
- How to file your own divorce in texas
- File your own divorce in texas
- How to file for divorce on your own
This is where most keyword research would end. For the longest time, this is where our keyword research would end too. We would take this info and turn it into a content plan, and then turn that plan into blogs and other pieces of content.
Now, we take a deeper look at what can truly set us up for ranking success with TF-IDF data.
Step 2: Add the Secret Sauce
You can choose from any number of tools to do this. We have taken a real liking to Surfer SEO, so that’s what we’ll use in this example.
We will begin by punching our primary keyword into Surfer to see what words that the top-performing blogs about filing your own divorce are currently using.
Without these words, Google’s algo may see our content as pretty good, but not good enough to take over the rankings.
Here we can see that the obvious word “divorce” is the top word in all 10/10 of the highest rankers.
However, you can also see that many of the top 10 have frequently used some variation of “child” or “children.”
This means that if your content was only going to briefly touch on child support or child custody, you should probably write a good sized section focused on those things, and pepper the words in as naturally as you can.
If you don’t have those words, you are going to have a hard time ranking against any of these pieces. The Google algo will likely determine your piece is a lesser blog.
Next, we will click on the Popular phrases tab to see what multi-word phrases we should be looking at including.
This expands on what we learned on the previous tab and we definitely know that we had better have a goodly amount of content that talks about kids. We can see here that some of the phrases we should consider are:
- Child support
- Child expenses
That paints a clearer picture of how we should consider using the terms child and children. Surprisingly, child custody didn’t show up in these results. But, we should probably still include it. Good to know.
Let’s keep digging. Now we look at the Common words tab.
This shows us the words that all 10/10 of the top rankers are using, what words 9/10 of the top rankers are using, and so on.
You can see that all 10 of the top blogs used words we would have already covered, like:
However, it’s interesting to see that 9/10 of the top rankers had the word ‘business’ in their blog. We may have forgotten to talk about the effects that divorce can have on a business you own. We now know that we should probably talk about it a bit, because 9 of the 10 blogs we’re competing against have some mention of it.
We can also see that 7/10 have some mention of mediation. This could be something that we should at least address in our content.
Next, we can drill down a little bit more to see the Common phrases tab.
This shows us that 7/10 of the top blogs have the phrase ‘filing for divorce.’ It would actually be very hard to write this blog without using that phrase. But, a less obvious phrase appears in 5/10 of the top-performers: ‘uncontested divorce.’ That’s likely worth having.
Now we click on the Prominent words and phrases tab to learn even more.
Here we see a breakdown of how frequently these top words are being used.
Be careful with this data, as following it too closely can ruin your writing. For example, you can see the word divorce has a density of 2.13%. But, that doesn’t mean you need at least that number or higher to rank. More on that later.
Now let’s take a look at the Common backlinks tab.
This tab has nothing to do with keywording, but it is still massively helpful. You can see the common referring domains for the top 20 pages, which gives you an idea of the link profiles you will be competing against, and the work you will have to do to best what already ranks.
We can see that 8 of the top pages have Wikipedia links, and if we click that to expand…
We can now drill down deeper to see the subpages and the referred domain.
Again, this has nothing to do with your keywording, but it is still incredibly useful information when you’re chasing SEO success.
See how all of this golden information can help you take your keyword research up several notches? If we didn’t have this report, we may not have written enough about child support, mediation, or the impact a divorce may have on your business.
Now, we will explore the wrong way to use this data.
Warning: This Data Could Also Ruin Your Content
Like keyword data, there is a right and a wrong way to use TF-IDF data.
The wrong way is to take this data and say, “Ok, this report says the word ‘divorce’ has a density of 2.13%. That means we need to use it at least 21 times in a 1000-word blog. Or more!”
If you make that a steadfast rule and become a slave to density numbers, you will ruin your writing! And bad writing doesn’t rank.
The right way to use this data is to look at it and say, “Oh, 8 of the Top 10 talk about child support, and we barely even touch on child support. Let’s add a child support section.”
You need to use these words as organically as possible or all of your next-level research will be for nothing. If you’re awkwardly forcing words in a sentence where they don’t belong it leads to choppy and awful sentences that both your reader and Google will hate.
Both humans and robots can tell when you’re just trying to hit a certain keyword density. And they will both move on in a hurry.
Keyword Stuffing 2.0
These are lessons that we learned years ago when a lot of people were obsessed with the keyword density of their writing. The most popular target was 2.5%. Marketers and small business owners would kill themselves trying to stuff the keyword in to meet that number, which in turn killed the quality of the writing. Frustrated readers would click away and the SEO would suffer.
Trust us when we say that the Google algo looks at how readers engage with your blog, not whether or not the word ‘divorce’ makes up 2.13% of your word count.
Don’t be a slave to word density! Use this data as a guide into what you should be covering, based on what the top-performers have already done.
How to Use TF-IDF to Plan Your SEO Content
We gave you a snapshot of how to use TF-IDF in the divorce example, but let’s take a detailed look into how your SEO content generation process should look.
Step 1: Perform Keyword Research
You can use any number of tools to find the keywords that will help your business move the needle forward. Again, we are all about ahrefs. We feel it gives you the best data, but the important thing is that you’re performing some level of thoughtful research.
Don’t limit yourself to one-word or two-word keywords. People are Googling and asking Siri/ Alexa complete questions about your industry and those are the ones you need to rank for. In fact, single-word keywords only account for about 2.8% of the searches in the United States.
Step 2: Create Your Content Plan
Now, take all of these keywords and turn them into topics for impactful blogs, videos, whitepapers or infographics.
How far ahead should you plan? The popular answer is to create a year plan, with 12 months of content in the clip ready to go. This is great if you have the time and resources to plan that far ahead.
However, there is something to be said for only planning one fiscal quarter in advance, particularly when you’re just starting out. This allows for your plan to be nimble and scalable. You can measure what performed well and change your plan/ topics for Q2 to align with what resonated in Q1.
Plan the work and work the plan! Find the balance between remaining agile to react to changes, while not jumping off track for any little change.
Step 3: Conduct TF-IDF Research
Perform your TF-IDF research using the steps that we outlined in the previous section.
Add your findings to the content plan. Again, don’t go so far as to create notes for the writer that say, “Must use words ‘child support’ no less than 15 times.” That will stifle creativity and likely hurt the quality of the blog.
Instead, use this research to create broader notes that will ensure you’re covering all of the bases that the top-performing blogs have touched on. This means your notes to the writer would say, “Be sure to have sections that touch on child support and mediation.”
This gives the writer the freedom to create a great piece, without being handcuffed by word density. They will use those “other” keywords organically as they go, trust us.
Step 4: Write, React and Rank
Write your pieces, publish them on a regular schedule, and measure your results.
Too many companies view content marketing as a linear straight line, or a set-it-and-forget-it type of thing. They do almost no measurement along the way, which robs them of the chance to make crucial adjustments.
Again, doing this every quarter is a golden opportunity to look at your wins and losses so far. You can take a look at:
- What has resonated
- What has missed the mark
- What has surprised you
- What has disappointed you
Measure and react. This way, you can apply some quick fixes to what you’re publishing once a quarter instead of letting underperforming content flounder for a year at a time before you make any adjustments.
How We Used TF-IDF to Fix Our Content
Even hardcore SEO geeks like us aren’t impervious to being wrong… sometimes.
We recently invested a lot of time and effort into some big-time long-form pieces of content. And the more you invest, the more disappointed you are when it doesn’t perform.
There we were, staring at the analytics screen wondering, “What the hell happened? This content is unique, insightful and complete. And both our keyword research and on-page optimization seem to be on point too.”
Here is what the hell happened…
We knew what we were trying to rank against, but we didn’t really understand why those other posts were really ranking, until we tried a TF-IDF analysis.
Then it all became clear. We could see that we weren’t using a lot of the terms that the high-rankers were using, simply because there were gaps in our content. We thought we were telling a complete story, but we were not.
We now had the data to see that, “Oh, site X is talking about topic Y, but we don’t even address it. Topic Y is clearly more important than we thought. Let’s retroactively add a section that talks about it.”
We took this data, made some changes, and boom, we’re already seeing the results. It’s amazing.
Giving Underperforming Content a Second Chance
This is a bit different than what we talked about above.
Instead of using TF-IDF data to plan content right from the very start, we used it to find out why certain pieces of content fell flat, even though we thought they were pretty good.
This is possibly even more valuable than using TF-IDF to plan your content. You can also use it to fix it!
You might find yourself in the same position that we were. In fact, I almost guarantee you will. You will be staring at the analytics screen or whiteboard wondering what went wrong. “This blog was actually awesome! But nobody read it and it’s not ranking! What the hell happened?”
Audit your underperforming content with TF-IDF analysis to uncover the same type of gaps that we discovered. See what the top 10 blogs have done, and measure it against what you have done.
This will give you a roadmap to make some simple changes such as:
- Adding a new section
- Changing your verbiage in a few places
- Organically and strategically adding a few key terms, without stuffing
This can breathe new life into stale content, and help you finally see the return-on-investment that you hoped for when you created these blogs in the first place.
Other Ways to Refresh Old Content
Of course, TF-IDF data is not the only way to give your content a second life. While you’re auditing your low performers, there are a number of other things you can do to light a fire under them.
Here are just a few.
1. Audit Your Headlines and Leads
Always start here, because this is where your would-be reader will start.
Does your headline and lead sentence make an emotional connection with your audience? Failure to make this connection can be a performance-ruiner, as people pass your content by with no interest.
We’re not saying you need to resort to sensational click-bait level headlines. We’re saying you need to ask yourself what emotional pain-points your target audience might be feeling, and zero in on them.
Write Emotive Leads
Let’s stay with the divorce lawyer example. Let’s say you want to write a piece on child custody. You might be tempted to use a pretty standard title such as, “5 Child Custody Tips.” That is pretty good, but it lacks any emotion. You can do better.
Put yourself in your audience’s shoes and play to those emotions. You can write pretty much the exact same blog, but change the title to something more emotional like:
- Are You Afraid of Losing Child Custody?
- Has Your Ex Said You Will Never See Your Child Again?
Now, you’ve locked into the pain points that your reader is probably feeling, so seeing that headline makes them far more likely to click.
Apply this same mentality to writing your lead sentence. Tap into that pain point again. You can write stronger leads by:
- Asking a question you know they can identify with: “Are you expecting an ugly battle over your kids?”
- Citing an interesting stat: “Did you know that a third of custodial mothers live below the poverty line?”
Some very simple text changes can lead to some big wins. Try tweaking your leads and headlines and re-sharing them on social.
Revisit Your Images
Data conclusively shows that people are more likely to interact with your content if you use a strong visual.
Boring and cookie-cutter images will instantly tell your would-be readers that your content is also boring and cookie-cutter. Don’t treat your images as an afterthought, because they will be your first impression!
Again, making an emotional connection is the key to earning the click. Let’s say the blog’s title is “Are You Afraid of Losing Child Custody?” Which one of these images do you think will perform better?
Or this image:
The second one makes a far more emotive statement and is more likely to connect with your audience’s pain point.
You don’t need a graphic artist or an expensive stock photography account with Shutterstock or Getty Images.
There are lots of amazing and free images out there from websites such as:
You also don’t need an expensive suite of photo editing tools to add decorative touches. You can use Stencil or Canva to make better blog images or social images, completely free of charge.
I’m no graphic designer. In fact, most designers would tell you I’m awful. However, I was able to download the sad child image from Pixabay for free and edit it in Canva in about 5 minutes. Is it perfect? Far from it. Is it more likely to outperform a boring cookie-cutter image? Almost definitely.
You don’t need to invest any more money. You simply need to invest the time and commit to doing things better.
Check Your Speeds
When I perform an initial SEO audit for a client, I would say that 7/10 of them are being held back by slow load speeds. This is most likely the #1 SEO killer on the web today.
A lot of websites are too slow for two major reasons. The first is that you perform a speed test on your home page and you pass. However, you don’t perform a speed test on your product pages or blog pages. And they could be considerably slower than your home page. You need to test your speeds site-wide!
The second reason is that you might perform a speed test at the start of the year, then spend the next 12 months adding bulky images and videos, while adding more and more plug-ins into the back-end. All of these things will slow your site down. Be cognizant of the fact that every single thing you add to your site can slow it down.
(Slow) Speed Kills
Speed matters! Google has confirmed that speed is a major factor in assessing your site’s SEO value. Of course, they have been famously tight-lipped about what exactly a good load time is in their view, but the general consensus is, “the faster the better.” There is also this speed impact calculator that you can use.
Google’s algo will anticipate and notice people leaving your site because it takes too long to load.
- Sites that load in 5 seconds do 70% better than ones that take 19
- A 100-millisecond delay in load time can cause conversion rates to drop by 7%, and a full second may decrease your conversion rates by 70%
- Pages that load in 2 seconds have a bounce rate of 9%, but 5 seconds will give you a bounce rate of about 38%
How to Increase Your Site’s Speed
A false sense of security may be weighing you down the most. It really all comes down to what we mentioned earlier:
- Don’t just test the home page and assume you’re fine
- Don’t test once a year and assume you’re fine
If you do find that your blog or product/service pages are too slow, there are a few things you can do to fix them.
1. Remove any plug-ins that you’re not really using in the back end
2. Remove any unnecessary/bulky coding or CSS
3. Make sure your images are compressed for web use
If all of these things check out and you’re still too slow, it may be time to have a conversation with your hosting company. Or it could also be time to upgrade your site to something that runs leaner and loads faster.
We joked about this being a cheat code in the headline, but we really need to be clear that TF-IDF is not a cheat or a hack.
We hate SEO cheats and hacks. 99.8% of the time they are silly shortcuts that will give you some fleeting wins that will be erased with the next Google algo update.
TF-IDF is a solid tactic, not a hack. You’re not trying to outsmart the Google algo for a shortcut to a quick win. You’re looking at what Google’s algo has already given top rankings to and attempting to learn from their success.
This can help you:
- Create more complete content, with a higher probability of ranking success
- Revamp content that has underperformed so far
As we said, this doesn’t require you to throw out your current SEO playbook or your existing keyword data. This is going to help you take both of them to the next level. It’s simply an extra step in the content creation process that gives you the opportunity to add more data-driven insights.
We don’t know if you will find TF-IDF as fascinating as we do. But, we’re pretty sure that you will find it as useful as we do!
In the world of competitive SEO, we all have access to pretty much the same tools. Both you and your competition can each have an ahrefs account to perform your keyword research. The SEO wins go to the companies who properly leverage this information and commit to consistently do things the right way.
And if you’re in a neck-and-neck battle over SEO space with your competition, you will definitely want to start using TF-IDF data before they do!
If you have any questions about TF-IDF or keyword research, please reach out any time.