← Rupert Thieme Freelance Marketing
Work Experience Services Get a quote
skim

Case Study · Skim version

How I think,
in 13 chapters and 13 decisions.

This case study is about a piece of work I did for Nuud, a natural deodorant brand. At the time the overwhelming majority of Nuud's search acquisition was paid, and the wider strategy I was working on was really about reducing that reliance by building organic presence instead. This particular case study is the layer underneath that strategy, particularly the part where I had to work out what the organic content was actually going to be about in the first place.

Over six months I ran a topic-research audit across Instagram and TikTok, cross-checked against Google Trends, Answerthepublic and Semrush, with the goal of finding awareness and consideration topics that could pull people into the funnel at a lower CPM than paid search. The hard part was that neither platform really publishes demand data the way SEO tools do, and so a lot of the work was about constructing that signal where the platforms didn't really give it freely.

Each card below is one chapter, with the takeaway shown first, so you can expand any of them for the reasoning, or open the full chapter if you want the evidence.

Brand Nuud Deodorant
Reading time 15 seconds (skim) · 3–5 min (with summaries) · 30+ min (full chapters)
Source 101-page research log, embedded in the full version
Period Oct 2025 → Mar 2026 (6 months · complete)
→ Open full case study

13 chapters · 13 takeaways · click any card to expand

01 ·

The problem

I wrote down the three questions I was actually trying to answer, before picking any of the tools.

"Before I reached for any of the actual tools, I wrote down the three questions I was actually trying to answer, and if a method didn't help with one of them, I didn't really see the point of putting it in the workflow."

The brief was to build awareness and consideration content for Instagram and TikTok at a lower CPM than paid search. The actual hard part was that these platforms, unlike Google or other SEO tools, don't really publish search-volume data the way you'd want, which meant that picking the right topics involved kind of building demand signal where the platforms didn't really give it freely.

So before I picked up any of the tools I wrote down three questions I was actually trying to answer: what people search for across platforms, which posts win on the queries that do have traffic, and which queries on average have the highest engagement rates. And indeed every method that ended up in the workflow had to map back to one of those, otherwise I didn't really see the point of adding it.

Chapter 01 · full chapter & evidence
Open in new tab ↗
02 ·

Picking the four clusters

I would say it's worth deciding the spreadsheet shape before any of the actual data goes in.

"I would say it's important to figure out the shape of the spreadsheet before any actual data goes into it, particularly the source column, which ends up saving a lot of time later when you want to re-cut by tool."

I spent the first hour or so on this project deciding the spreadsheet shape before any data actually went into it. Three columns at first (the prompt name, the cluster category and the prompt source), and then two more added later for per-platform traffic.

I think this matters because schema changes mid-project tend to orphan the older rows, and I had specifically split armpit-care from armpit-health up front knowing that the health side would intersect EU advertising rules. The provenance column (the prompt source) I would say ended up saving me about half a day later, when re-cutting the dataset by tool became useful.

Chapter 02 · full chapter & evidence
Open in new tab ↗
03 ·

Anchoring with Google Trends

When a cluster bombed on Trends, the language was usually wrong rather than the audience interest, and I would say it's worth trying different angles before dropping the cluster.

"When a topic seemed to bomb on Trends, I would say it was usually the language that was wrong rather than the audience interest, and indeed staying with the cluster but trying different angles often surfaced something usable."

Three of the four clusters returned reasonable Google Trends data. The fourth one, armpit health, was dominated by medical-adjacent queries that Nuud can't really compete on under EU advertising rules, and that I would not have been comfortable producing content on anyway.

The reflex would have been to drop the cluster, but the actual audience interest existed, and it was the surface words people were searching for that were wrong for our brand. So I kept the cluster on the list and looked for different language for the same demand, which is how "armpit care" eventually surfaced as the actual Nuud-shaped angle. Killing the cluster outright would have closed off a usable seam in the strategy.

Chapter 03 · full chapter & evidence
Open in new tab ↗
04 ·

Building the spider web

I ordered the filtering rules by cost-of-being-wrong rather than cost-to-run, particularly because the cheapest filter is also the one most likely to misfire on a niche category.

"Because of EU rules I had Claude filter medical-condition terms first, then competitor mentions, and only after that the no-traffic ones, particularly because traffic gating is the cheapest filter to run but also the one most likely to misfire on a niche category like ours."

The spider-web expansion got automated with Claude once doing it manually stopped scaling, and the order I put the filtering rules in for Claude actually mattered quite a lot.

Medical-condition language went out first, because the compliance failure on that one is really kind of binary. Competitor mentions came second so that we didn't accidentally drift into being a comparison channel. And the traffic gate ran last, even though it's the cheapest filter to run, particularly because it's the one most likely to misfire on a niche category like organic personal care, where the tools tend to under-report. So I would say the order of the filters was really about ordering them by cost-of-being-wrong, rather than cost-to-run.

Chapter 04 · full chapter & evidence
Open in new tab ↗
05 ·

The Semrush sanity-check

Even though this was a social audit I would say running the SEO check made sense, particularly because topics that score on both surfaces tend to compound.

"Although this was a social audit, I would say running the top 20 terms through Semrush for keyword difficulty made sense, particularly because the same article ends up working as a TikTok script, an Instagram carousel and a search-ranking page from the same upfront research."

The audit was a social-content audit, but the wider strategy was SEO/GEO-heavy in parallel, and indeed topics that scored well on both surfaces had compounding value. The same article ends up spawning a TikTok script, an Instagram carousel, and a search-ranking page from the same upfront research, so you might as well prioritise the topics that compound.

So even though I didn't strictly need SEO data for the social audit, I ran the top 20 social-validated terms through Semrush for keyword difficulty before approving any of them for the manual platform audit. I would say the pass paid for itself by about month two of execution.

Chapter 05 · full chapter & evidence
Open in new tab ↗
06 ·

The Instagram manual audit

I would say automated social numbers should really be treated as a starting suggestion rather than a verdict, particularly on niche categories where the tools tend to under-report.

"The tools were reporting zero on terms that I personally knew had active content, and opening the Instagram app on the same query the same day returned posts with hundreds of thousands of views, so I would say automated social numbers should really be treated as a starting suggestion rather than a verdict."

Both Answerthepublic and Semrush were reporting zero Instagram traffic on terms that I personally knew had active content. "Armpit care" was supposedly empty, and yet opening the Instagram app on the same query the same day returned videos with hundreds of thousands of views on the first scroll.

I would say that observation became the formal justification for adding the manual audit step. The cost of trusting an automated tool that is wrong about your category is really six months of content that nobody watches, and the cost of a week of manual capture is a week, so I think it's better to treat automated social numbers as a suggestion rather than a verdict, particularly on niche categories.

Chapter 06 · full chapter & evidence
Open in new tab ↗
07 ·

From sheet to Audit Atlas

I would say the shape of the variance actually matters more than the topline engagement, particularly because a high mean is often just a couple of viral posts holding up a long tail of flops.

"After enough queries I noticed the topline engagement was actually quite misleading, particularly because a high mean with a very low median is really just a couple of viral posts holding up a long tail of flops, which I would argue is not really a content bet you can hand to an intern."

Once enough Audit Atlas queries had been run, a pattern started showing up that the topline engagement number was hiding. Some queries had a high mean engagement that was being held up by one or two viral posts and then a long tail of flops, and others had a high mean AND a high median, which I would say means competent content reliably performs in that space.

So I added an engagement-consistency column to the Audit Atlas (basically median over mean). High-consistency queries got commissioned first, and lottery queries got one experimental content slot per week, which is a rule the next operator on the brand can apply without me being in the room to explain it.

Chapter 07 · full chapter & evidence
Open in new tab ↗
08 ·

The TikTok pass, different beast

I would say the hook is doing virtually all of the work on TikTok, and everything else in the video is really just supporting it.

"Looking at the retention curves on the Creative Center, particularly the 'Dark underarms' ad that held top 99% retention for the full 1:55, I would say the hook is doing virtually all of the work on TikTok, and everything else in the video is really just supporting it."

The TikTok Creative Center exposes retention curves on top ads, and looking at the highest-retention ad in the audit (the "Dark underarms do NOT = bad hygiene" video, which held the top 99% of industry-average retention for the full 1:55), it was pretty clear that the hook was actually doing virtually all of the work.

I would say this is what ended up shaping the structure for every TikTok asset brief: a 3-second hook, then the educational core, then the product appears naturally inside the context of the video, and the close is a subtle CTA. Everything else is supporting the hook rather than carrying it, and indeed that observation went into the intern's video formula.

Chapter 08 · full chapter & evidence
Open in new tab ↗
09 ·

The TikTok Audit Explorer

Same category, but the platforms actually behaved quite differently, which is why I would say running the audit twice was worth the extra time.

"'How does sweat work' was middling on Instagram and the most engaged query on TikTok, and the gender split on TikTok was the opposite of what I expected too, which is why I would argue running the audit twice (and split by gender on TikTok) was worth the extra time."

What I found running the audit twice was that "how does sweat work" was actually middling on Instagram but the most-engaged query on TikTok, and the male-segment TikTok queries rewarded soft education while the female-segment queries rewarded direct USP messaging. Same category, opposite platform-and-segment behaviour.

I would say that if we had shared one content calendar across both platforms without per-axis topic weighting, we would have under-served TikTok's educational appetite and shipped soft-education at the female cohort who actually wanted direct-benefit messaging. Running the audit twice (and split by gender on TikTok) nearly doubled the manual capture time, but the cost of not doing it would have been a wasted quarter of content production.

Chapter 09 · full chapter & evidence
Open in new tab ↗
10 ·

The 8-week structured test

Particularly with a new strategy, I would say it's important to run the test in parallel to the existing programme rather than replacing it.

"Particularly with a new strategy where management wants to see proof before scaling, I would say it's important to run the test in parallel to the existing programme rather than replacing it, so that if the test underperforms, the current numbers don't take the hit."

A new content programme is exactly the kind of bet where leadership wants to see evidence before scaling spend. Particularly with a new strategy, I would say it's important to run the test in parallel to the existing programme rather than replacing it, because if the test underperforms, the current numbers don't take the hit.

The design I proposed was three new structured pieces per week per platform (one awareness, one consideration, one conversion), running alongside whatever was already shipping. Per-platform metric sets, particularly because the audit had already established that the platforms aren't really comparable, and an optional regional boost on a smaller market before the EU-wide rollout. The decision point lands at week 9.

Chapter 10 · full chapter & evidence
Open in new tab ↗
11 ·

The playbooks for the intern

If a junior can't operate the deliverable cold without me being in the room, I would say it isn't really a deliverable yet.

"I think the hook bank and the philosophy doc only really work if they can be operated by someone who isn't me in the room, and indeed I tried to make every deliverable readable cold by the next person on the brand."

Every deliverable I wrote was checked against the question of whether the next person on the brand could operate it without me being in the room, and if the answer to that was no, then I would say it wasn't really a deliverable yet.

The hook bank, the philosophy doc, the searchable HTML explorer, and the playbook for the intern were all written to be readable cold. In fact the content philosophy reads like an intern handbook on purpose, because the test wasn't really whether I personally could execute the strategy, the test was whether it actually survives me leaving.

Chapter 11 · full chapter & evidence
Open in new tab ↗
12 ·

Deliverables & handover

I would say the audit's value is not really in any single insight, it's in being a method that can actually be re-run by whoever inherits the brand.

"I would say the audit's value is not really in any single insight inside it. It's in being a method that can actually be re-run by the next operator on the brand, and the deliverables were all shaped around that goal."

The shape of what got handed over: about 600 rows of research across the four clusters, around 300 audited social posts with engagement metrics, two searchable HTML explorers (Audit Atlas for Instagram and the TikTok Audit Explorer), per-platform per-stage hook banks, a TikTok content philosophy plus a video formula, and the 8-week structured test plan with platform-specific success metrics.

Plus a documented and repeatable Claude-assisted pattern for batched data ingestion, which I used to scale the audit, and which I wrote up specifically so the next operator on the brand can re-run it. I would say the audit's value is not really in any single insight inside it, it's in being a method that can actually be re-run.

Chapter 12 · full chapter & evidence
Open in new tab ↗
13 ·

What this doesn't prove

I think it's important to write the scope edges into the deliverable openly rather than burying them somewhere, particularly with topic research.

"I think it's important to write the scope edges into the deliverable openly rather than burying them, particularly with topic research, where the limitations (English-only, snapshot-in-time, audit not conversion) actually matter for how to read the findings."

There are three things this audit doesn't really prove. It tells you what to make, but it doesn't really tell you whether the resulting content will convert at the rates the model is assuming (that's the 8-week test's job, not the audit's). It's also English-language only, with the Spanish-language audit scoped but not run. And it's a snapshot in time, particularly with TikTok where the surface changes fast, so I would say the repeat cadence should really be quarterly rather than annual.

I think it's important to be open about all of that rather than burying it somewhere, because the audit's value really sits in being a method the next operator can re-run and trust the output of, and that property actually survives every limitation listed.

Chapter 13 · full chapter & evidence
Open in new tab ↗

If you got this far and want the longer version with the full 101-page working log embedded inline, all 52 PDF page references, the manual-audit data and the per-platform per-query breakdowns, the full case study is one click away.