Daily methods and workflows for evaluating whether products are ready for human users, including onboarding effectiveness, feature completeness, and user experience

Human Ready Evaluation Guide

As humans building products, we need systematic methods to evaluate whether our product is truly ready for humans. This guide covers practical workflows for evaluating human readiness across onboarding, marketing, features, and user experience.

Why Human Ready Evaluation Matters

Products are built for humans, not for AI. While automated tests check code correctness, only human evaluation can ensure your product is truly ready for humans. This is critical because:

Products must work for humans - The end users are people with emotions, expectations, and real-world contexts
Commercial success requires human approval - People decide to pay, recommend, and return
Human experience drives business outcomes - User satisfaction, trust, and delight cannot be automated

These are manual evaluation workflows that humans perform to catch issues that automated tests cannot detect—especially user experience, human readiness, emotional responses, and real-world usability.

Why Human Ready Evaluation is Critical for Commercialization

Automated tests check if code works. Human Ready Evaluation checks if the product is ready to serve humans and succeed commercially.

What Automated Tests Check	What Human Ready Evaluation Checks
Functions execute without errors	UI makes sense to first-time users
APIs return expected data	Onboarding flow feels natural and trustworthy
Components render correctly	Copy is clear, compelling, and converts
Database queries succeed	Payment flow inspires confidence and trust
Types are correct	Brand consistency creates professional impression
Code compiles	Product solves real human problems

Critical Insight: Products ultimately serve humans and achieve commercialization through human satisfaction. Key truths:

Humans are your customers - They decide to sign up, pay, and stay
Humans spread the word - Referrals and reviews come from satisfied humans
Humans judge quality - First impressions, trust, and delight are human judgments
Business success = Human approval - Revenue, retention, and growth require winning human hearts

Bottom Line: Even with 100% test coverage and perfect AI validation, you still need Human Ready Evaluation to build products that humans love and pay for. No automated system can replace human judgment of whether a product is truly ready to serve human needs and create commercial value.

Method 1: Marketing-First Evaluation

Principle: Define your product from the user's perspective before evaluating features.

Demo Meeting Guide

After completing marketing-first evaluation, refer to the Product Demo Meeting Guide sections on Potential Customer Demo, Investor Pitch, and Internal Team Showcase to learn how to present your findings to different audiences.

The Workflow

Instead of starting from code or features, start by designing your product's public-facing presence—as if you're launching on Product Hunt or BetaList today.

Step 1: Design the Marketing Page First

Create or evaluate these pages as if launching to strangers today:

Landing Page (/)
- Can a stranger understand what you do in 5 seconds?
- Is the value proposition crystal clear?
- Does the hero section answer "What is this?" and "Why should I care?"
About Page (/about)
- Does it explain the "why" behind the product?
- Is the story compelling and authentic?
- Does it position the product appropriately (indie vs. enterprise)?
Product Hunt / BetaList Copy
- Write the 60-character tagline and description
- If you can't summarize it clearly, the product positioning isn't ready
Pricing Page (/pricing)
- Are tiers clearly differentiated?
- Can users self-select the right plan?
- Is the free tier compelling enough to start?

Step 2: Reverse-Engineer from Marketing

After designing the marketing, evaluate your actual product:

Marketing Promise  →  Reality Check
───────────────────────────────────
"Create posts in seconds"  →  Can a new user actually do this in &lt;60s?
"AI-powered suggestions"   →  Does the AI feel magical or clunky?
"Team collaboration"       →  Is inviting teammates intuitive?
"Enterprise-grade security" → Do you actually have SOC2/encryption?

Step 3: User Value Audit

For each feature you're claiming:

Passes if:

Feature works end-to-end for new users
Value is immediately visible
No manual setup or configuration required
Marketing copy matches actual capability

Example: "One-click deployment" → Deployment completes in one click without errors or manual steps

Fails if:

Feature requires undocumented setup
Value only visible after 10+ steps
Works only for developers/power users
Marketing overpromises vs. reality

Example: "One-click deployment" → Actually requires SSH keys, environment config, and manual DNS setup

Evaluation Checklist

Page	What to Check	Pass Criteria
Landing (`/`)	Value prop clarity	5-second rule: stranger understands product
About (`/about`)	Story & positioning	Answers "why this exists" compellingly
Pricing (`/pricing`)	Tiers & value	Users can self-select correct plan
Product Hunt	60-char tagline	Strangers understand without context
Screenshots	Visual consistency	Brand colors, modern design, no lorem ipsum
Demo Video	First-impression	Shows value in first 30 seconds

Red Flags 🚩

Landing page has generic copy like "Revolutionary platform for X"
About page is just a list of features (not a story)
Pricing page has vague tier names like "Pro" vs. "Premium"
Screenshots show developer UI or broken layouts
Can't explain product in one sentence to a stranger

Method 2: Onboarding Flow Evaluation

Principle: A product is only as good as a new user's first 5 minutes.

Demo Meeting Guide

After completing onboarding flow evaluation, refer to the Product Demo Meeting Guide sections on User Testing Session, Product Review, and Team Training to learn how to test and present the onboarding flow.

The Complete Onboarding Definition

Onboarding is "跑通" (fully working) only when the following end-to-end flow works seamlessly:

┌─────────────────────────────────────────────────────────────┐
│                     Onboarding Flow                         │
├─────────────────────────────────────────────────────────────┤
│ 1. Entry Trigger                                             │
│    → New user logs in → Onboarding modal auto-appears       │
│                                                              │
│ 2. Modal Guided Steps                                        │
│    → Step-by-step with clear "Next" buttons                 │
│    → Each step has success feedback                         │
│    → Final step shows "Complete" and transitions smoothly   │
│                                                              │
│ 3. Post-Modal Guidance                                       │
│    → System provides NEXT STEP guidance (no dead-end)       │
│    → Clear path to first task (e.g., "Create Space")        │
│                                                              │
│ 4. First Success                                             │
│    → User completes first meaningful task                    │
│    → Success is VISIBLE (toast, status change, output)      │
│                                                              │
│ 5. Quota Verification (Test Payment Flow)                   │
│    → Use GM command to set credits to 0                     │
│    → Retry same task → Payment gate triggers                │
│                                                              │
│ 6. Payment Flow Completion                                   │
│    → Payment modal has clear subscription/upgrade options   │
│    → After upgrade → Credits restore immediately            │
│    → (or clear message: "Refresh to apply")                 │
│                                                              │
│ 7. Post-Payment Success                                      │
│    → Retry task → Works without payment gate                │
│    → Upgrade is effective, user can continue               │
└─────────────────────────────────────────────────────────────┘

Step-by-Step Evaluation

1️⃣ Entry Trigger

Test: Sign up as a new user

✅ Pass Criteria:

Onboarding modal appears automatically
Modal is focused (no distraction)
"Start", "Next", or "Skip" buttons are clear
Default action should encourage starting (not skipping)

❌ Fail If:

Modal doesn't appear automatically
User must hunt for "Getting Started" link
Modal is missable or hidden
No clear call-to-action

Test: Complete all steps in the modal

✅ Pass Criteria:

Each step has a single clear goal
"Next" button is prominent
Progress indicator shows current step (e.g., "2 of 3")
Success states are visual (checkmarks, green highlights)
Final step transitions smoothly to the product

❌ Fail If:

Any step is confusing (needs human explanation)
"Next" button is unclear or hidden
Steps don't provide feedback on completion
Modal closes abruptly without transition

Test: What happens after closing the modal?

✅ Pass Criteria:

System provides immediate next action (e.g., "Create your first space")
Next action is ONE CLICK away (not buried in menus)
Visual indicator (arrow, tooltip, or guide) shows where to go
No "blank canvas" paralysis

❌ Fail If:

User lands on empty dashboard with no guidance
Next step requires reading documentation
User must search for "how to start"
Dead-end (no clear path forward)

Common Patterns:

Getting Started Stepper: Persistent checklist above chat input
Setup Guide Button: "Complete Setup" in navigation bar
Tooltip Guidance: Highlights first action button
Feature Tour: Spotlight overlay explaining UI elements

4️⃣ First Success

Test: Complete the guided first task

✅ Pass Criteria:

Task completes without errors
Success is VISIBLE:
- Success toast notification
- Status changes (e.g., "Post Created")
- Output appears (e.g., post shows in list)
- Confetti or celebration animation (optional but delightful)
User feels accomplished (not confused)

❌ Fail If:

Task completes silently (no feedback)
Success is hidden in logs or backend
User must refresh to see result
Error occurs but isn't caught gracefully

Examples of "First Success":

AI Agent: Send first message → Get AI response
CMS: Create first post → Post appears in list
Developer Tool: Make API call → See response
Team App: Invite teammate → Invitation sent confirmation

5️⃣ Quota Evaluation (Payment Gate Test)

Test: Force payment gate by zeroing credits

Setup:

// Use GM (Game Master) command or system admin
// Set user credits to 0
setUserCredits(userId, 0);

Action: Retry the same first task

✅ Pass Criteria:

Payment gate triggers immediately
Gate shows clear message: "Out of credits" or "Upgrade to continue"
Payment modal appears with options:
- Subscribe to paid plan
- Enter promo/redeem code
- Upgrade space/tier
Each option is clickable and works

❌ Fail If:

Task succeeds even with 0 credits (billing broken)
Payment gate doesn't appear
Error message is cryptic (e.g., "Error 402")
No path to upgrade (dead-end)

6️⃣ Payment Flow Completion

Test: Complete payment or upgrade

Actions to Test:

Flow: Click "Subscribe" → Choose plan → Enter payment

✅ Pass:

Payment form loads correctly
Test card (4242 4242 4242 4242) works in dev/staging
After payment: Success message
Credits are restored immediately (or with clear refresh instruction)
User redirected back to product

❌ Fail:

Payment form 404s or errors
Payment succeeds but credits don't update
No success confirmation
User stuck on payment page

Flow: Click "Upgrade Space" → Select tier → Confirm

✅ Pass:

Tier options clearly differentiated
Price and benefits are visible
Upgrade confirms immediately
Quota increases as expected
User can retry task

❌ Fail:

Upgrade button doesn't work
Tier changes but quota doesn't
Pricing is unclear
Must refresh or re-login

7️⃣ Post-Payment Success

Test: Retry the original task after payment/upgrade

✅ Pass Criteria:

Task completes successfully (no payment gate)
Output matches expectations
No errors or glitches
User can continue normal workflow

❌ Fail If:

Payment gate still appears (upgrade didn't apply)
Task fails with different error
Must log out/in for upgrade to take effect
Quota still shows 0

Onboarding Acceptance Criteria

Definition of "跑通" (Fully Working):

The onboarding flow is fully working when a human tester can complete steps 1-7 consecutively without:

Manual intervention (no console commands, no database edits)

Confusion (no "what do I do next?" moments)

Errors (no 500s, no broken buttons)

Need for documentation (flow is self-explanatory)

One Break = Not Ready: If any step fails or requires manual help, the onboarding is not complete.

Common Onboarding Failures

Failure	Impact	Fix
Modal doesn't auto-show	User skips onboarding entirely	Add auto-trigger on first login
Steps have no progress indicator	User doesn't know how long it takes	Add "Step 2 of 3" indicator
Post-modal has no guidance	User gets lost, abandons product	Add Getting Started checklist
First task has no visible success	User unsure if it worked	Add toast notification + visual change
Payment gate doesn't trigger	Revenue loss, billing broken	Test quota system end-to-end
Payment succeeds but quota doesn't update	User frustrated, tickets increase	Fix credit refresh logic
Upgrade requires refresh	Friction in payment flow	Auto-refresh or provide clear instruction

Method 3: Employee Simulation Evaluation

Principle: Test AI Agents like you would manage a subordinate employee's weekly workflow, ensuring they can understand and execute sequential, progressive tasks.

Demo Meeting Guide

After completing employee simulation evaluation, refer to the Product Demo Meeting Guide sections on AI Agent Customer Demo, Technical Review, and Team Training to learn how to showcase AI Agent capabilities.

Core Concept

Imagine your AI Agent as a subordinate employee. Over a workweek (Monday through Friday), you need to assign tasks and track progress. This method makes AI Agent testing more human-like and relatable, while ensuring it can handle coherent task sequences from real work scenarios.

Best Use Case: This method is especially suitable for chat-initiated AI Agent products (like conversational AI assistants, intelligent customer service) because these products have open-ended and dialogue-driven functionality, rather than being based on fixed buttons and step-by-step workflows. Conversational interactions require stronger context retention and task comprehension capabilities.

Why This Method Works

Traditional Testing	Employee Simulation Evaluation
Tests isolated features	Tests coherent workflows
One-time verification	Simulates real weekly work rhythm
Technical perspective	Manager's perspective
Feature completeness	Task understanding and execution capability
Single-point validation	Context retention and task continuity

Value: This method reveals issues with AI Agents' ability to understand task dependencies, maintain context, and achieve progressive goals.

Weekly Task Assignment Method

Imagine it's Monday morning, you have a new employee (AI Agent), and you need to assign work for the week. Here's the 5-step progressive task assignment flow:

📋 Task Background Setup

Scenario: Your team is conducting market research and competitive analysis for a new product.

Employee: Your AI Agent (playing the role of junior analyst)

Goal: Complete the entire workflow from research to report within one week

Step 1️⃣: Monday Morning - Information Gathering Task

Your Instruction:

"Hey Alex, this week we're doing competitive analysis. Today's Monday, please help me collect basic information about the top 5 competitors in our industry. Include: company name, founding year, main products, and target audience. Send me a preliminary list by end of day."

Validation Points:

Can the AI Agent understand the relatively vague requirement of "top 5"?
Will it proactively ask what ranking criteria to use (user base? revenue? brand awareness?)?
Can it return a structured list within a reasonable timeframe?
Is the output format easy to read and suitable for follow-up work?

✅ Pass Criteria:

AI proactively clarifies ranking criteria or makes reasonable assumptions
Returns structured information for 5 companies
Information is accurate and sources are traceable
Asks if more details are needed

❌ Fail Criteria:

Returns results without asking about criteria
Information is incomplete or format is messy
Finds wrong competitors
Cannot complete task or returns errors

Step 2️⃣: Tuesday Morning - Deep Analysis Task

Your Instruction:

"Alex, I reviewed yesterday's list - good job. Today's Tuesday, based on yesterday's 5 companies, select the 3 closest to us and conduct an in-depth analysis of their pricing strategies. Include: price ranges, subscription models, free tier features, and paid tier differences. Send it to me before tomorrow morning's meeting."

Validation Points:

Can the AI Agent reference yesterday's results as context?
Does it understand the criteria for "closest" (product type? target users? price point?)?
Can it perform secondary filtering and deep diving?
Is the output comparative and actionable?

✅ Pass Criteria:

Explicitly references the 5 companies from Step 1
Has reasonable filtering logic (with explanation)
Provides detailed pricing comparison table
Offers preliminary competitive landscape analysis

❌ Fail Criteria:

Forgets yesterday's results and starts over
Randomly selects 3 companies without explanation
Pricing information is inaccurate or outdated
Only lists information without comparison dimensions

Step 3️⃣: Wednesday Afternoon - Creative Output Task

Your Instruction:

"Alex, the pricing analysis is valuable. It's Wednesday now, based on the past two days' research, help me brainstorm: if we want to differentiate in this market, what 3 directions could we break through? Write 2-3 sentences for each direction explaining the rationale. Let's discuss Thursday morning."

Validation Points:

Can the AI Agent synthesize findings from the past two days?
Can it shift from analysis to creative and strategic recommendations?
Are the recommendations specific, feasible, and insightful?
Does it balance innovation with practicality?

✅ Pass Criteria:

Recommendations explicitly reference prior research findings
3 directions are distinct with logical support
Has innovation while considering feasibility
Uses data or case studies to support recommendations

❌ Fail Criteria:

Recommendations unrelated to prior research
Directions are vague or generic (e.g., "do it better")
Random ideas without reasoning support
Unrealistic and infeasible suggestions

Step 4️⃣: Thursday Morning - Integration Summary Task

Your Instruction:

"Alex, we discussed yesterday's 3 directions and decided to focus on the 2nd one. Today's Thursday, please compile all this week's findings into a brief, including: competitor overview, pricing comparison, and our differentiation strategy. Use PPT outline format, keep it under 10 pages. We're presenting to management tomorrow morning."

Validation Points:

Can the AI Agent integrate the entire week's work results?
Can it extract core insights from scattered information?
Does the output format meet business presentation standards?
Is the structure clear and logic coherent?

✅ Pass Criteria:

Brief includes all key findings from Monday to Wednesday
Structure follows "background-analysis-recommendation" logic
PPT outline has clear theme for each page
Highlights the focus (2nd differentiation direction)

❌ Fail Criteria:

Omits important information from previous days
Brief structure is chaotic or repetitive
Exceeds page limit or information overload
Doesn't highlight the decision focus

Step 5️⃣: Friday Afternoon - Reflection and Improvement Task

Your Instruction:

"Alex, this morning's presentation was a success - management was very satisfied. It's Friday afternoon now, please review this week's work and write a brief retrospective: what went well? What could be improved? If you had a similar task next week, how would you adjust the process?"

Validation Points:

Can the AI Agent self-reflect and summarize?
Can it identify strengths and weaknesses in the workflow?
Can it propose specific improvement suggestions?
Does it demonstrate learning and growth capability?

✅ Pass Criteria:

Accurately summarizes this week's 5-step workflow
Points out specific successes and shortcomings
Improvement suggestions are practical and actionable
Demonstrates understanding of work processes

❌ Fail Criteria:

Generic self-praise without specific examples
Doesn't identify any issues or improvement points
Retrospective is disconnected from actual work
Cannot propose valuable process optimization suggestions

Employee Simulation Evaluation Checklist

Use this checklist to evaluate the AI Agent's "employee performance":

## AI Agent Weekly Work Performance Evaluation

### Task Execution Capability
- [ ] Monday: Information gathering accurate and complete
- [ ] Tuesday: Deep analysis insightful
- [ ] Wednesday: Creative output feasible and valuable
- [ ] Thursday: Integration summary clear structure
- [ ] Friday: Self-reflection has depth

### Context Retention
- [ ] Can remember previous day's work results
- [ ] Can reference prior findings in subsequent tasks
- [ ] Understands dependencies between tasks
- [ ] Maintains coherent work theme throughout the week

### Communication Understanding
- [ ] Understands relatively vague instructions
- [ ] Proactively asks clarifying questions
- [ ] Output format suitable for business scenarios
- [ ] Can adjust direction based on feedback

### Work Maturity
- [ ] Natural progression from execution to analysis to creativity
- [ ] Can distinguish requirements of different task types
- [ ] Output quality is stable without fluctuation
- [ ] Demonstrates independent thinking and judgment

### Human-like Behavior
- [ ] Communication tone is natural, not robotic
- [ ] Understands work rhythm (Monday collection → Friday retrospective)
- [ ] Expresses uncertainty or need for help
- [ ] Overall interaction feels like collaborating with human employees

Common "Employee Performance" Issues

Issue Manifestation	Possible Cause	Improvement Direction
Every day feels like a new task, doesn't remember yesterday's content	Context window too short or conversation management issues	Optimize context retention mechanism
Can only execute explicit instructions, cannot handle vague tasks	Lacks reasoning and clarification capability	Add proactive questioning to prompts
Output format is arbitrary, doesn't fit business scenarios	Lacks scenario-based training	Add business document template examples
Friday retrospective is superficial with no substance	Lacks self-evaluation capability	Enhance metacognition and reflection ability
Quality deteriorates as tasks progress	Attention decay or context overload	Periodically summarize and reset key information

Application Scenarios

This method is particularly suitable for validating:

✅ Best Suited for - Chat/Dialogue-Driven AI Agent Products:

Conversational AI Assistants (like ChatGPT-style personal assistants, work helpers)
Intelligent Customer Service Systems (multi-turn dialogue understanding, unstructured queries)
AI Programming Assistants (completing programming tasks through conversation, like GitHub Copilot Chat)
Content Creation AI (multi-step creative workflows guided by dialogue)

These products are characterized by:

Open-ended functionality: Users express needs through natural language, not by clicking fixed buttons
Dialogue-driven: Interaction is continuous conversation, not step-by-step forms
Context-dependent: Need to remember and understand content from previous conversation turns

✅ Also Suitable for:

Data Analysis Tools (coherent workflows from collection to analysis to reporting)
Task Planning Tools (complex tasks requiring multi-step decomposition and tracking)

❌ Less Suitable for:

Button/step-based workflow products (like e-commerce checkout flows, form wizards)
Single-interaction utility products (like image editors, calculators)
Services that don't require context (like translation tools, format converters)
Enterprise systems that already have complete process testing

Best Practices

Do ✅:

Set realistic work scenarios: Use actual work scenarios from your own team
Maintain task continuity: Ensure 5 steps are different stages of the same project
Record interaction details: Save each day's conversations for Friday comparison
Use realistic time pressure: Simulate "need this by tomorrow morning" real deadlines
Test boundary cases: Try temporarily changing task direction to test adaptability

Don't ❌:

Don't design overly simple tasks: Should have reasonable complexity and challenge
Don't skip intermediate steps: Must complete the full 5-day workflow
Don't manually supplement information: If AI forgets previous day's content, record as issue rather than reminding it
Don't accept robotic responses: Should be as natural as conversing with humans
Don't ignore reflection session: Friday's self-retrospective is very important

Method 4: User Journey Mapping

Principle: Products are used as journeys, not isolated features.

Critical User Journeys

Map and evaluate these end-to-end journeys:

Journey 1: "Tire-Kicker" (Skeptical Visitor)

Persona: Someone clicking from Product Hunt, not sure if they'll sign up

Land on homepage → Read value prop → Check pricing → View screenshots
     ↓
Decide to try → Sign up → Skip/complete onboarding → Test one feature
     ↓
Decision: "Worth my time?" → Bookmark / Close tab

Evaluation:

Landing page loads in <2s (no patience)
Value prop understandable in 5 seconds
Pricing is transparent (no "Contact Sales")
Screenshots show real product (not stock photos)
Sign-up is 1-click (Google/GitHub OAuth)
Demo mode works without login (if applicable)
First feature works in <1 minute

Journey 2: "Power User" (Converting to Paid)

Persona: Free user who loves product, considering upgrade

Hit free quota limit → See upgrade prompt → Review pricing → Compare tiers
     ↓
Click upgrade → Enter payment → Confirm → Quota increases → Continue working

Evaluation:

Quota warnings appear before hitting limit
Upgrade prompt is helpful, not annoying
Tier differences are clear
Payment form is trustworthy (secure badge, no errors)
Upgrade applies immediately (or clear ETA)
Receipt email sent automatically
No disruption to workflow after upgrade

Journey 3: "Team Lead" (Bringing Colleagues)

Persona: Solo user wanting to invite team

Success with product → Wants to share → Finds "Invite Team" → Sends invites
     ↓
Teammates receive email → Click link → Sign up → Join team space → Collaborate

Evaluation:

"Invite Team" is discoverable (not hidden in settings)
Invite email is professional (not spammy)
Invite link works (not expired or broken)
New members land in correct space/team
Permissions work correctly (no accidental admin access)
Team lead can see who joined

Journey 4: "Power-Lost User" (Needing Support)

Persona: User encountering an error or confusion

Encounter issue → Look for help → Find support channel → Ask question
     ↓
Receive response → Implement solution → Continue working OR leave frustrated

Evaluation:

Help/Support link is visible in navigation
Contact options are clear (chat, email, docs)
Response time expectation is set
Error messages include "Get Help" link
Documentation search works well
Common issues have instant answers (chatbot/FAQ)

Method 5: Feature Completeness Check

Principle: A feature isn't "done" until it's discoverable, usable, and documented.

The 3-Layer Feature Check

For each major feature in your product:

Layer 1: Discoverability

Question: Can users find this feature without help?

✅ Pass If:

Feature is in main navigation or prominent CTA
Feature name is clear (not jargon)
Icon or visual makes sense
Search includes this feature

❌ Fail If:

Feature buried in settings or submenus
Only accessible via URL hack
Name is technical (e.g., "CRUD Operations")
Users ask "Where is X?" in support

Layer 2: Usability

Question: Can users complete the core workflow without documentation?

Test: Give product to a stranger, ask them to use the feature

✅ Pass If:

Stranger completes workflow in <2 minutes
No need to explain buttons or flows
Error states are helpful
Success states are obvious

❌ Fail If:

Stranger asks "what do I click?"
Must explain what buttons do
Errors are cryptic
User unsure if action succeeded

Layer 3: Documentation

Question: Is the feature documented for power users?

✅ Pass If:

Help docs exist at /docs/features/[feature-name]
Screenshots or video included
API documentation (if applicable)
FAQs for common issues

❌ Fail If:

No documentation exists
Docs are placeholder text
Screenshots show old UI
API docs are out of date

Feature Checklist Template

Use this for each feature:

## Feature: [Name]

### Discoverability
- [ ] Appears in main navigation or homepage
- [ ] Feature name is clear to non-technical users
- [ ] Icon/visual makes sense
- [ ] Searchable in app search

### Usability
- [ ] Core workflow completes in &lt;2 minutes
- [ ] No explanation needed for buttons/UI
- [ ] Error messages are helpful (not just "Error 500")
- [ ] Success states are obvious

### Documentation
- [ ] Help doc exists: `/docs/features/[name]`
- [ ] Screenshots match current UI
- [ ] API docs (if feature has API)
- [ ] FAQs for common issues

### Edge Cases
- [ ] Works on mobile (< 768px width)
- [ ] Works in dark mode
- [ ] Handles empty states gracefully
- [ ] Rate limits / quotas are clear

### Performance
- [ ] Loads in &lt;3 seconds
- [ ] No layout shift (CLS)
- [ ] No console errors

Method 6: Pre-Launch Evaluation

Principle: Before announcing your product, evaluate it like a critic, not a creator.

Demo Meeting Guide

After completing pre-launch evaluation, refer to the Product Demo Meeting Guide section on Pre-Launch Mock Meeting to learn how to conduct a final team simulation launch.

The 24-Hour Fresh Eyes Test

Process:

Don't touch your product for 24 hours
Open it as if you're a stranger
Document every friction point

What to Look For:

Typos and grammar errors
Broken images or 404 pages
Confusing copy or jargon
Slow loading sections
Buttons that don't work
Dead-end pages (no next action)

The "Show It to Mom" Test

Process: Show your product to someone completely non-technical

Tasks:

"What does this product do?"
"Can you sign up?"
"Try creating/doing [core action]"
"What would you click next?"

Red Flags:

They can't explain what it does
They ask "where do I click?"
They get confused by UI
They say "is this for developers?"

If mom (or non-technical friend) can't use it, neither can most users.

The Checklist: Pre-Launch Evaluation

Copy this checklist for final evaluation:

Product Pages

Landing page loads in <2s
Value prop is crystal clear
Screenshots show current UI (no placeholders)
All links work (no 404s)
Mobile responsive (test on phone)
Dark mode works (no broken styles)
No typos or grammar errors
About page tells a compelling story
Pricing page has clear tiers

User Flows

Sign up takes <30 seconds
Onboarding modal works end-to-end
First success is visible within 1 minute
Payment/upgrade flow works
Team invites work
Help/support is accessible

Technical

Content

Documentation exists and is searchable
Privacy policy is present
Terms of service is present
Contact information is visible
Social proof (testimonials/logos) if available

Trust & Security

SSL certificate valid
No mixed content warnings
Payment forms use Stripe/secure provider
Email sending works (test welcome email)
Password reset works
No exposed secrets in frontend

Best Practices

Do ✅

Test with fresh browser (incognito mode) to simulate new users
Use real test credit cards (Stripe test mode) to validate payment flow
Time yourself completing critical flows (under 2 minutes = good)
Document findings as you go (screenshot + note)
Test on multiple devices (mobile, tablet, desktop)
Validate in different browsers (Chrome, Firefox, Safari)
Check analytics after validation (did events fire correctly?)

Don't ❌

Don't validate your own work immediately (take a break first)
Don't skip payment flow testing (most revenue-critical flow)
Don't assume features are discoverable (test with strangers)
Don't validate only on desktop (50%+ traffic is mobile)
Don't trust memory (write down every issue)
Don't fix while validating (document first, fix later)
Don't skip edge cases (empty states, error states, slow networks)

When to Run Evaluations

Evaluation Method	Frequency	When
Marketing-First	Before launch, quarterly	Repositioning, major releases
Onboarding Flow	Every release	Any change to auth/onboarding/payment
Feature Completeness	Per feature	Before marking feature "done"
User Journey	Monthly	New features, user feedback
Employee Simulation	Before AI Agent product launch, after major updates	Evaluate multi-turn dialogue and context retention capability
Pre-Launch	Before launch	Final QA before announcement

Tools & Shortcuts

Developer Shortcuts

ProductReady includes keyboard shortcuts for testing:

Action	Mac	Windows/Linux
Toggle Welcome Wizard	`⌘⇧M`	`Ctrl+Shift+M`
Skip Welcome Wizard	`⇧⌃⌥M`	`Shift+Ctrl+Alt+M`
Open GM Commands	`⌘⇧G`	`Ctrl+Shift+G`
Clear Local Storage	Browser DevTools	Browser DevTools

GM (Game Master) Commands

For testing payment flows and quota limits:

// Set user credits to 0 (trigger payment gate)
gm.setCredits(0);

// Set user to specific tier
gm.setTier('pro');

// Trigger onboarding reset
gm.resetOnboarding();

// Grant unlimited credits (testing)
gm.setCredits(999999);

Human Ready Evaluation Success Metrics

Your product is human-ready when:

Metric	Target	Meaning
Landing Clarity	5-second rule	Strangers understand value prop instantly
Sign-up Speed	<30 seconds	Frictionless authentication
First Success	<2 minutes	Users achieve value quickly
Onboarding Completion	>70%	Flow is clear and compelling
Payment Flow	>90% success rate	Payment process is smooth
Mobile Usability	No critical errors	Works perfectly on phones
Load Time	<3 seconds (4G)	Fast enough for impatient users
Lighthouse Score	>90 all metrics	Technical performance meets standards

Product Ready Standards - Technical quality checklist
User Onboarding Strategy - Detailed onboarding design principles
Testing - Automated testing approaches
Quick Start - Getting started with ProductReady

FAQ

Q: How long should full evaluation take?

A: Budget 2-5 hours for comprehensive evaluation:

Marketing-First: 30 minutes
Onboarding Flow: 45 minutes
Feature Completeness: 1 hour
User Journey: 30 minutes
Employee Simulation: 1 hour (simulate 5-day workflow)
Pre-Launch: 30 minutes

Q: Should I validate every small change?

A: No. Run onboarding flow evaluation for any auth/payment/core flow changes. Run full evaluation before major releases or launches.

Q: What if I find issues during evaluation?

A: Document them immediately (screenshot + description). Prioritize:

Critical: Broken payment, broken sign-up, data loss
High: Confusing onboarding, discoverable feature issues
Medium: Copy typos, minor UI glitches
Low: Nice-to-have improvements

Q: Can I automate these evaluations?

A: Some parts yes (technical checks), but human evaluation is essential for UX, copy, and "does this make sense?" checks. E2E tests (Playwright) can automate flows, but not comprehension.

Q: What's the difference between evaluation and testing?

Testing = Does the code work correctly? (automated)
Evaluation = Does the product work for humans? (manual, empathetic)

Both are necessary. Tests catch bugs; evaluation catches UX issues.

Last Updated: December 2025

Human Ready Evaluation Guide

On this page