Fardeen Chowdhury — Product Designer

Autopilot

Intro

As the solo product designer at Canny, I worked alongside a PM and 4 engineers to shape Autopilot. It's an AI-powered system for capturing and organizing customer feedback from the tools teams already use.

Keeping all of that feedback actionable takes real effort. Autopilot reduces that work by sorting and surfacing what matters. Teams stay in control of what gets prioritized while Autopilot handles the busywork.

Insights found

100,000+

ARR generated in Year 1

$300k

“I’m really loving the new Autopilot beta. It’s been awesome. It makes me engage with your product on a daily basis rather than every week or two.”

VP Product

Credit Repair Cloud

Context

Think of Canny like Reddit for product feedback. Users post requests, others upvote and comment, and teams use that signal to decide what to build next.

That works well when the volume is manageable. But once usage grows, the backlog becomes a living system. Posts to review, duplicates to merge, customers to respond to.

And if teams are collecting feedback in other tools too, Canny starts to feel like one source among many rather than the place where decisions get made.

Feature requests from Canny's users

Problem

Which is exactly what was happening. As teams scaled, we noticed that feedback stopped living in Canny.

High-signal requests were showing up across support conversations, sales threads, CRMs, Slack, emails, and app reviews. Canny was supposed to be the source of truth, but when half the signals never made it there, it became noise in its own users' stack.

Capture became inconsistent. Duplicates multiplied. Important asks got buried in threads nobody had time to comb through. The less teams knew what was in Canny (and elsewhere), the less useful it became.

Feedback, everywhere, all at once

Strategy

Instead of asking our users to bring feedback to Canny, we wanted to make Canny the place feedback naturally ends up.

If we could pull feedback from the tools teams already use and surface it as reviewable insights, they'd spend less time hunting for signals and more time acting on them.

We mapped the workflow end to end and worked backwards from the failure modes. Missed signal, duplicates, and conflicting sources.

The concept we landed on was Autopilot. It connects external tools, extracts feature requests, deduplicates them, and returns them to Canny as drafts for review. It handles the repetitive work without taking decisions away from the team.

App Store

Capterra

Discord

Gong

Google Forms

Help Scout

Hubspot

Intercom

Jira

Linear

Google Play

Salesforce

Slack

Shopify

Typeform

Zendesk

+ more

Canny

The more sources integrated, the bigger the opportunity

Constraints

We began exploring Autopilot in late 2023, when newer AI models made reliable extraction practical. We gave ourselves 6 months to reach public launch, with 4 months to develop a MVP and 2 months to run a closed beta.

To keep scope focused, I was embedded with engineering throughout the project: joining daily syncs, iterating on shared specs, and reviewing builds as they shipped.

Technical planning with the engineers

We couldn't overbuild the bet or pull engineering focus from the rest of the product. But beyond scoping down, the design still had to give teams enough context to feel in control. Most people didn't trust AI in this context yet, so the interface had to do that trust-building on its own.

That was harder without a foundation. Canny didn't have an established design system, so new surfaces had to harmonize with the existing product without shared components to pull from.

Trust also meant accuracy. Discovery calls told us we had to hit at least 90% before users would trust the system. We benchmarked across multiple models and prompt stages using our own data, then ran Autopilot on that backlog end to end until results held up.

Cost pressure made it harder too. Benchmarking is token-heavy, so we had to be deliberate about where we spent compute. We reduced unnecessary model calls with pre-filtering, batched and cached where possible, and reserved heavier models for low-confidence cases.

Solution

I studied tools built for high-volume processing (e.g. Intercom, Zendesk, MailChimp) and found a shared pattern: the inbox. A visual queue you work through top to bottom, taking quick actions until you reach zero.

That model made sense for Autopilot. The open question was how to adapt it for AI-generated suggestions where reviewers had to compare incoming tickets against existing feedback.

Early concept: flat table with type tags and inline quick actions

The first pass was a flat table. Every processed item lived in a row, tagged by type, with quick actions at the right edge. For duplicates, the incoming item and the existing post sat side by side so reviewers could compare without leaving the row.

I started testing and problems surfaced quickly. Mixing decision types in a single stream forced context-switching and content density made labels disappear.

Each iteration focused on the same problem: clarity. I restructured the view so reviewers could batch similar decisions, then made source visibility a priority. Teams needed to know where feedback came from, not just what type it was.

Smaller fixes compounded. I added text labels alongside logos for faster scanning, directional arrows to make merge relationships explicit, and credit usage in the sidebar for more awareness. I also trimmed quick actions to merge and create after usage data showed those covered nearly all decisions.

In V3, I cleaned up the hiearchy by reducing density, establishing a consistent reading order, and focusing actions so every row required one clear decision. It worked well at moderate volume, but row-by-row scanning wore people down as backlogs grew.

V4 separated scanning from deciding. I split the screen into a browsable list and a focused detail canvas so reviewers could move through the queue without losing context on the item in front of them.

During the beta, users were accepting 91% of Autopilot's suggestions, just doing it manually. That made automation a natural fast follow. I placed an automation prompt at the top, letting teams hand over decisions they were already agreeing with. Once enabled, the view shifted from approval queue to an audit log: same surface, different mode.

V3 went through the closed beta and became the surface teams used daily. I designed and spec'd V4 in full because I saw the ceiling. Row-by-row scanning would always degrade at high volume, and the split layout solved that.

But V3 was performing well enough that the business case for rebuilding the core surface wasn't there yet. The roadmap moved to other priorities, and V4 stayed on the shelf for a later release.

Knowledge Hub: product context for grounded extractions

Extraction quality depended on how much the model understood about a team's product. I added the Knowledge Hub as a lightweight way to provide that context. Teams upload reference material so extractions stay grounded, without turning setup into a project.

Every design decision from concept onward followed the same thread: reduce the cost of each decision, make the system's reasoning visible, and let teams control how much they delegate.

Validation

Before launch, we ran a closed beta with users who had a high volume of incoming feedback to further validate Autopilot.

Typeform had the clearest proof point. They handle about 7,500 tickets a month from Zendesk across support, sales, onboarding, and CS. At that volume, manual tagging was never going to scale.

In a side-by-side review of 1,600 tickets, Autopilot processed the set in 53 minutes, surfaced 109 feature requests, and hit 93% accuracy, about 30 points higher than manual review. Deduplication held up too, with a 98.3% acceptance rate.

At that pace, a full 7,500-ticket month is processed in about 4 hours.

Processing speedup

30x

Accuracy lift vs manual

+30 pts

“We’ve been able to 10x the feedback coming into Canny and remove many duplicate posts, with only a few minutes of work a week.”

Lead Onboarding Consultant

Typeform

Impact

Autopilot fundamentally changed how teams used Canny. What had been a backlog they tried to keep updated became a workflow they returned to daily.

Productivity improved 20-30x over manual review.
We logged 80% more feature requests after launch, generating over 100k insights for teams using Canny.
In its first year, Autopilot drove $300k+ in ARR, roughly 10% of Canny's total. For a bootstrapped company, that was meaningful leverage from a single feature.

Beyond the numbers, it also shifted the product direction. Autopilot became the foundation for Canny's next phase of growth. It proved that we could evolve the product beyond what it had been and create a new vertical in the space.

Celebrating Autopilot's launch in Perugia

Managing feedback at scale with AI