Creative Testing Methodology: How Thesis Tests Creatives At Scale

At Thesis, we’ve developed a testing methodology that generates more learnings, quicker, while also protecting our core scaling campaigns from creative flops. We’ll show you exactly how we set up these creative tests and generate more creative wins for our clients.

When it comes to finding success on Facebook Ads in a post iOS14 world, there are no hacks. There is only creative. 

The biggest driver in performance on the platform is having good creative that effectively communicates to your audience. But no brand, regardless of how many research studies they’ve done, comes out batting 1000 with their ad creatives. 

This is why consistent creative testing is core to finding success on the platform.

But how does one effectively test new creatives in an ad account?

Can you simply develop a new creative asset once every few weeks and pop it into your campaigns and then call it a day? 

In that case… how do you protect campaigns from under-performing creative?

At Thesis we’ve developed a testing methodology that generates more learnings, quicker, while also protecting our core scaling campaigns from creative flops. 

In this blog post, we’ll show you exactly how we set up these creative tests and generate more creative wins for our clients. 

Account Structure: Separate Creative Testing From Other Campaigns

Below is an example of the bare-bones account structure that we use for our clients at Thesis. While some brands might also have DPA campaigns or campaigns targeting additional geographical locations, this is the core set-up that we use for growing eCommerce businesses. 

The foundation of our creative testing methodology is that we isolate all creative testing into its own campaign, as shown here in T_PROSPECTING_TESTING_US. There are four main reasons why we separate our creative testing from the core campaigns:

  1. This ensures that our creative tests won’t bring down the performance of our core campaigns. By keeping the core prospecting and retargeting campaigns full of only our top performing creatives, we have a better foundation for scaling. 
  2. We’re able to facilitate iterative testing even with net-new assets, which increases the number of learnings we get from every single test. 
  3. Learnings are easier to track and report on. 
  4. While Facebook suggests to simply drip new assets directly into campaigns, we’ve found that the algorithm is really unfair to new assets. Under this methodology, we can force spend to drive faster learnings and curb creative fatigue. 

Some other things to note about this structure: 

  • From a high-level: we take the winners from our creative testing campaign and put them into our prospecting and retargeting campaigns. [More on this later]
  • We do not use Facebook’s AB Testing tool. We’ve found that this tends to increase the costs of creative tests with little to no upside. When launching Facebook’s AB testing feature to test creatives, you’ll essentially be launching one ad at a time, each in their own ad set. Under our testing methodology, this would double the number of ad sets in the testing campaign. 
  • Some brands tend to get hung up on having different creatives for prospecting and retargeting. This isn’t something that we’ve seen much proof of because the best prospecting creative often works just as well in retargeting. We’ve found that trying to diversify the messaging between prospecting and retargeting (with the exception of specific offers and discounts) often takes up more time than the results are worth. However, sometimes if a creative bombs in our prospecting campaign that we think would have a good shot in retargeting, we’ll try it there. 
  • In terms of Campaign Budget Optimization (CBO) or Ad Set Budgeting (ABO), typically I use ABO for creative testing and CBO for the core campaigns. With that said, we have other members on our team that opt to use CBO with minimum budgets on the ad sets. As a whole, we see this as a media buyer choice and not something that will fundamentally affect the outcome of a campaign. 
  • Because the core campaign, T_PROSPECTING_US, is always stacked with our current top-performing creatives, this is where we conduct audience testing. 

Inside the Creative Testing Campaign

Inside of the creative testing campaign, each creative test is separated out at the ad set level. So essentially, each creative test gets it’s own ad set. We keep track of the tests by assigning them a unique number (which is simply the number of tests we’ve run) and a name that details the test. 

It’s important to note that the only thing being tested in these ad sets are the elements of the ad unit itself. This is the actual advertisement that is shown to people as the browse on Facebook or Instagram. In most cases, each new creative test is testing just one element of a creative unit so that we can isolate concrete learnings. This includes: 

  • Ad formats: Single Image/Video, Carousels, Slideshows, Collection Ads
  • New images or videos
  • Thumbnails
  • Copy: Primary Text, Headlines, Descriptions, etc
  • CTAs

Another important call-out at the ad set level is that we conduct all of creative testing with broad audiences. These are ad sets with zero lookalike or interest targeting, perhaps with a few guardrails on age, gender, or geolocation if necessary, but otherwise is openly targeting all of Facebook. 

It’s a common belief amongst experienced media buyers that creative is ultimately what drives the targeting and the best results on Facebook Ads. But an additional reason why we do this is because we want to create the most scalable creatives possible. Since broad audiences are the most scalable (and often the cheapest), we know that when we have a creative that wins with broad, it is also likely to win in our core campaigns, regardless of the additional targeting that we have there. 

Inside of a Creative Test

Inside of each creative test, we aim to have 6 ads. This is still the number of live ads that we find optimal performance with inside of most ad sets, although this is something we should test more into. 

A major perk of testing creatives with this methodology is that we’re able to jump right into iterative testing, meaning that we can test up to 6 different variants of a single creative or creative element, even if it is a net-new asset. 

Under our methodology, all creative tests fall into one of these two categories: 

  1. Net-new: This is a brand new image or video that has not been tested before. 
  2. Iterative: Once a winner has been identified, we conduct tests on variations of that creative to gather more learnings and increase the lifespan of the creative. This article breaks down 20 iterative tests that you can conduct on creative, which include thumbnail tests, messaging tests, hook tests, copy tests, and more. 

An ideal test for a net-new video would look something like this: 

  1. We’d create a new ad set for the test. 
  2. We’d take the new video asset and decide which creative element we’d like to test. For completely untested footage, we’ll often start with testing different hooks, or altering the first three seconds of each asset to produce 6 different variants. The differentiating factor being the first three seconds of the video. (Here’s an article on why we think hook rates are so important.)
  3. While we do test dynamic creative from time-to-time, to get statistically significant results we keep all copy, headlines, and CTAs uniform so that the only element being tested is the hook on net-new video assets. 
  4. It’s a similar story for using multiple text options at the ad level: since Facebook doesn’t actually report on which of those copies is doing best, we prefer to isolate our own copy tests into new ad sets so that we can have that data. However, if we find a number of copy options that work for an asset, sometimes we’ll roll out the multiple text feature in the core campaigns. 

Once all of that is set-up… that’s when we launch the test. 

A Quick Note About The Learning Phase…

One of the perceived downsides of this testing methodology is that far fewer ad sets will be exiting the learning phase. Since many of our tests will never reach the 50 required conversions, they will likely be stuck in learning or even learning limited for their entire lifespan. 

Technically, this means that performance will not be “stabilized”.  

We’re making a conscious choice to accelerate creative learnings versus having the “perfect” consolidated approach. Several other top agencies also do this. 

Ultimately, we’ve found that creative learnings are more valuable than consolidation… even if that means that our creative testing campaigns are often in learning.

With that said, this is something we often go head-to-head on with the Facebook team, so it’s worth pointing out that this methodology does NOT follow their best practices to a T. 

How Much Do You Budget for Creative Testing?

Before we dive into the process for optimizing creative tests and identifying winners, it’s important to discuss how much to budget for your creative campaigns. 

According to Facebook’s best practices, each ad set needs to reach 50 optimization events before exiting the learning phase. Until this occurs, the ad set is likely “to be less stable and have a worse cost per result”. So, if you wanted to follow Facebook’s best practices to a T, you’d only be able to evaluate a creative test once it’s hit 50 optimization events over a 7 day period. 

Under these guidelines, if you had an average CPA (cost per purchase) of $35, then you would need to budget at least $1750 for each creative test. If you wanted the creative test to last a week, then this would boil down to a budget of $250/day. 

CPA $35 x 50 purchases = $1750 / 7 day = $250 / day budget

In practice, I would say that we get pretty close to these targets. But we aren’t militant about the 50 purchases for each test, especially if we see a test performing poorly. Most times we will use CPA targets to get a rough idea of the daily budget and then optimize the test based on performance after a few days of spend. 

An additional consideration is what percentage of your total Facebook Ads spend you’ll want to put towards creative testing. Every brand is different in this regard: some brands find their best performance actually coming out of the testing campaign, while others find better results coming out of their core campaigns. 

Therefore, the average range varies drastically. Our clients are spending anywhere from 15% to 60% on creative testing every single month. 

We typically recommend to follow the performance: if a client is driving better results from a higher % of spend in testing, that’s what we’ll do. 

However, as a starting point, we’d suggest at least 20% of spend being dedicated to creative testing each month. 

What Happens After You Start a Test?

Once a creative test is launched, we don’t typically start evaluating results or optimizing the ad set until at least 3 days after launch. Even Facebook now suggests a minimum of 72 hours before evaluating performance to get the most accurate view of your results. 

For creative testing, this allows the ad set to distribute the budget amongst the creatives, and ideally, you should start to get a sense of what kind of CPAs the new creatives are generating. 

Here, we often find ourselves in one of three scenarios: 

  1. Results are looking really good on the new tests. CPA is lower than average, and there are at least one or two potential winning creatives in the ad set. 
  2. Results are average. There are a few purchases being generated by the new creative that fall within 10%-25% of your core campaigns. 
  3. Results are poor. CPAs are high across the board or there are little to no purchases coming out of the new creative test. 

When results are looking really good, we start scaling the ad set directly in the testing campaign. We do this before dripping the new creative into the core campaigns to make sure that the creative can withstand an increased budget. According to Facebook’s best practices, you’d want to increase the budget by 20% every 3 days to avoid the learning phase. 

However, depending on the temperament of the account, sometimes we actually like to increase the budget anywhere from 50% to 100% to drive learnings faster. Typically we’ll do this 2 to 3 times before identifying a winner, at which point we’ll duplicate the winning ad into the core prospecting and retargeting campaigns. 

We never want to turn off an ad set that is performing well, so we’ll keep the ad set running in the testing campaign for as long as results are strong. 

As results start to dwindle on this ad set, and for the creative tests that aren’t getting as good performance, we’ll take the following steps for optimization: 

  1. We always start optimizing at the ad level. If we see an individual ad has a poor CPA after a few days of spend, we turn that ad off to allow other ads to get more spend. 
  2. We repeat this process every few days until we can identify CPAs for each ad. If an ad reaches 2x the average CPA without a purchase, we’ll turn it off. 
  3. If no winners are found in a creative test after 5-7 days of spend, we turn off the ad set. 

Which Metrics Do You Use to Determine Creative Success?

We use a combination of primary and secondary KPIs to tell the story of how the creatives are performing. 

Since we’re a performance growth agency, driving revenue is our number one priority. So the most important KPIs we use to determine a creative’s success are CPA and the amount spent. Historically, we also used Cost Per Add to Cart as an early indicator of performance, but this metric has become less important after iOS14. 

But these metrics only say whether or not the creatives worked. They don’t tell the story about why a creative performed, or more importantly, why it didn’t. To tell that story, we also track these secondary metrics to inform our learnings and create better creative iterations in the future: 

  • Hook rate: This determines the % of people who watched the first 3 seconds of a video. This is also known as the “Thumb-Stop Rate”. This is one of the most important secondary metrics to track, because it has huge impact on the number of people who stick around on your ad, and eventually convert. 
  • Hold rate: This is the % of people who watched up to 15 seconds of your video. While the hook rate determines the “click-bait” potential of your video, the hold rate determines whether or not your follow-up visuals and messaging is keeping the right people around. 
  • Unique Outbound Click Through Rate: This is the % of people who clicked on your CTA and went directly to your website. We opt to use this metric as opposed to Link Click Through Rate because this singles out individual users who clicked on your CTA button and actually got to the website. Ultimately, this metric reveals how effective your targeting and creatives are in tandem. 
  • Conversion rates or Purchases/Link Clicks: This metric shows the percentage of people who convert after the click. Typically, we’re tracking this as a primary metric across the entire account performance, but I also like to check it out on an ad by ad basis too. 
  • Post shares is kind of a random one that I don’t optimize for in any way… but I think it’s really interesting to see how many people share the ad for free!

Final Thoughts for New Brands

While there are few downsides to this testing methodology, it’s probably best suited to brands spending at least $20k/month to ensure enough spend on the tests themselves. 

When faced with a new account and pixel, typically we’d suggest starting off with creative testing to find initial wins at the ad level. Then as a next step, you can launch your core prospecting campaign to begin testing new audiences.

Get in touch