The bullet points might be the most-edited and least-measured part of the Amazon listing. Every consultant has an opinion. Almost none of them test.
For the last six weeks we ran a controlled A/B testing pipeline across our seller panel, swapping one bullet at a time on real listings, holding everything else constant, and tracking session conversion over 14 days per variant. 540 tests went the distance. The pattern is clean enough to publish.
The headline: when you rewrite a bullet from "features and specs" to "outcomes and lived experience", the new bullet converts about 2.1× better. Same product, same shopper, same page placement. Just different copy. Below is the data, the categories where it breaks down, and the template you can hand to whoever writes your bullets next month.
§ 01What counts as 'outcome-led' vs 'feature-led'
The taxonomy is unfussy. A feature-led bullet leads with capability or spec. An outcome-led bullet leads with what the buyer's afternoon, week, or year is going to look like once they have the thing.
Example, vlogging microphone:
24-BIT / 48 KHZ AUDIO: Records at 24-bit depth and 48 kHz sample rate, with a 2.4 GHz wireless transmission protocol and a 65-foot range. Dual-channel design with active noise reduction algorithms.
SOUND LIKE YOU PAID $300:Your dialogue is crisp on the first take. The barista grinding beans at the next table doesn't end up in your podcast. You walk 60 feet from the camera and the audio doesn't drop. It just works.
Same product. Same underlying claim. Different first frame. The feature-led version tells you what it is. The outcome-led version tells you what your life is going to be like with it. The second one converts better, by a lot.
(Notice that the outcome-led version doesn't skip the spec — it just leads with the lived experience and lets the spec land afterward. The two formats are not mutually exclusive. They're about which part you put first.)
§ 02How we tested
540 bullet swaps across 87 unique listings on the ListFocal panel, between mid-April and late May. Sellers volunteered listings into the test queue; we wrote two variants of one bullet each (feature-led and outcome-led, matched on factual content) and randomized which version went live first. Each variant held for 14 days. Then we swapped.
For each bullet pair, we logged sessions, units sold, and a session-level conversion rate over each variant's window. We controlled for week-of-month, day-of-week, and the listing's baseline rank to avoid attributing macro drift to the bullet change.
Subcategory mix: vlogging microphones (n=78), smart bulbs (n=60), cooking thermometers (n=64), hair dryers (n=58), kettles (n=55), yoga mats (n=68), bath caddies (n=51), wireless chargers (n=56), pet feeders (n=50). The smaller subcategories were padded with adjacent categories on the panel.
§ 03What we found
| Subcategory | Tests | Feature CVR | Outcome CVR | Lift |
|---|---|---|---|---|
| Vlogging mics | 78 | 4.9% | 11.3% | 2.3× |
| Smart bulbs | 60 | 6.1% | 12.8% | 2.1× |
| Cooking thermometers | 64 | 5.4% | 9.6% | 1.8× |
| Hair dryers | 58 | 3.8% | 9.2% | 2.4× |
| Electric kettles | 55 | 5.7% | 10.1% | 1.8× |
| Yoga mats | 68 | 4.4% | 11.8% | 2.7× |
| Bath caddies | 51 | 3.9% | 9.4% | 2.4× |
| Wireless chargers | 56 | 5.2% | 10.3% | 2.0× |
| Pet feeders | 50 | 4.6% | 9.7% | 2.1× |
| Pooled | 540 | 4.9% | 10.5% | 2.1× |
The pooled CVR lift across all 540 tests. The smallest lift in any subcategory was 1.8× (kettles and thermometers). The largest was 2.7× (yoga mats — turns out people who buy a $28 yoga mat want to imagine their morning practice, not read about closed-cell PVC foam density).
ListFocal · 540 controlled A/B tests
The pattern is consistent. The variance across categories tracks how emotionally-driven the purchase typically is. Yoga mats and hair dryers (high emotional load) show the biggest lift. Cooking thermometers and kettles (more utilitarian) show smaller lift. But every category we tested moved in the same direction.
§ 04When feature-led wins
We need to be honest about the exceptions. We ran a separate, smaller pilot (62 tests) on what we'll call "technical buyer" categories: USB hubs, networking gear, niche professional audio. In that pilot, the lift from outcome-led bullets was 1.4× — still positive, but materially smaller than the 2.1× pooled.
The mechanism is straightforward: technical buyers explicitly want specs. A buyer looking for a USB-C hub with PD 3.1 100W passthrough and 8K@60Hz display output is not going to be moved by "makes your laptop setup feel professional." They want to know the numbers. Lead with them.
The rule of thumb we've settled on: if your typical buyer is going to research three competing products on a spec comparison spreadsheet, lead with the spec. If they're going to scan the page once and decide on vibes, lead with the outcome.
We tried the outcome-led version on our Cat 8 cable listing. CVR went down 9%. People buying Cat 8 want to know it's shielded and rated to 40 GHz, not that their video calls will be "crystal clear."
— Panel seller, networking accessories
§ 05A template that mostly works
We tested several bullet templates inside the outcome-led variants. One structure converted slightly above the others. The shape is:
- ALLCAPS OUTCOME HEADER, 4-7 words, leading with a verb or sensory word.
- One sentence that paints the lived experience (8-15 words).
- Optional second sentence with the spec or capability that makes the outcome real (10-20 words).
- If anything is left to say, one short claim about durability, support, or compatibility (under 12 words).
Applied to the vlogging mic from §1:
SOUND LIKE YOU PAID $300: Your dialogue lands crisp on the first take, every time. 24-bit / 48 kHz lossless capture matches what we used to need a $300 rig to get. Two-year replacement warranty, no questions.
Notice what's missing. No bracketed clusters. No superlatives. No keyword stuffing for the algorithm. The shopper reads a sentence, gets a picture, sees a stat that backs it up, gets a small trust signal. Four ingredients. Out the door.
§ 06The two failure modes we kept seeing
We pulled the 90 lowest-performing outcome-led variants from our test pool and looked for patterns in why they failed. Two patterns kept coming up.
Failure 1: the outcome was generic."ENJOY YOUR DAY: A product that brings happiness to your routine" converts no better than the bland feature-led version. The fix is specificity. Don't say "happiness." Say "9-hour shoot day on one charge" or "cup of coffee that's still hot when you get back from the school run." The outcome has to be something the buyer can actually picture.
Failure 2: the spec was missing entirely. Some early variants we wrote leaned all the way into experience and dropped the spec altogether. CVR was actually worsethan feature-led in 18 of those cases. Shoppers want to be sold the outcome AND given a tangible reason to believe it. The spec is the believability anchor. Don't skip it. Just don't lead with it.
§ 07What this does to A10
A 2.1× CVR lift compounds inside the post-May A10 model. CVR-14 is now the top weight. Better bullets lift CVR-14. Lifted CVR-14 lifts organic rank. Lifted rank pulls more impressions, which then need to convert at the new higher rate to keep the loop going.
In the listings where the outcome-led variant was kept after the test, average organic rank improved by 4.7 positions over the 30 days after the swap. We didn't intervene further; the listing was given the chance to settle on the new CVR-14 baseline. The rank lift is the second-order effect of the bullet change.
For most listings, rewriting five bullets is a one-evening exercise that produces measurable CVR lift inside two weeks and measurable rank lift inside a month. It is the cheapest meaningful optimization we've measured this year. (Cheaper than the title rewrite. Cheaper than the image refresh. Cheaper than A+.)
§ 08A 30-day plan
Days 1 to 7.Pick your top three listings by revenue. Look at each one's five bullets and label them as outcome-led or feature-led. (For consumer categories, anything over 50% feature-led is a candidate for rewrite.)
Days 8 to 14. Rewrite one bullet per top-three listing using the template from §5. One. Not five. You want to be able to attribute outcome to cause. Hold everything else constant.
Days 15 to 21.Wait. CVR-14 doesn't move in 4 days. The signal lands around days 10 to 14 post-change.
Days 22 to 30.Pull session-level conversion before and after. If CVR lifted, rewrite the remaining four bullets on those listings using the same template. If it didn't lift, you have a category in the "technical buyer" pocket — keep the feature-led version and move on.
§ 09Open questions
- Does the lift stick? Our 30-day window catches the initial CVR-14 response. We're measuring at 60 and 90 days to see if outcome bullets keep their edge or fade as shoppers acclimate. Hypothesis: they hold.
- Localization effects. Our panel is Amazon US. The outcome-led pattern likely translates to UK and AU. Whether it works on Amazon DE (German shoppers are notoriously spec-driven) is an open question. We're extending the panel.
- Mobile vs desktop. Mobile shoppers read fewer bullets per session. We don't yet know if the bullet effect is concentrated in bullet 1 vs spread across bullets 1-3.
§ 10Methodology
Bullet swaps were paired (each test has one feature-led variant and one outcome-led variant of the same product claim). Variant order randomized per listing. 14-day hold per variant minimizes noise from day-of-week effects. Session-level CVR computed from session counts and unit attribution windows; we accept the standard ±0.3 percentage-point error on CVR estimation.
We did not control for image refreshes or external traffic events during the test window. We did exclude any test where the listing's rank moved more than 8 positions during the window (mostly to filter out external-shock contamination). That filter removed 47 tests from the 587 we started.
Replication: email [email protected] for the bullet pair corpus and the per-test CVR aggregates. We share with serious researchers and listing tool builders.
Next dispatch is on review velocity — the third-largest weight in the post-May A10 model and the one most sellers think they understand but actually don't. (Spoiler: Vine timing matters more than Vine quantity.)
Cite this work. Figures licensed CC-BY-4.0. Quote any passage with attribution to ListFocal.
Field notes by email, Saturdays.
Algorithm shifts and category dynamics. Free. Unsubscribe in one click.