GameCraftGameCraft

Human Ready Evaluation Guide

Daily methods and workflows for evaluating whether products are ready for human users, including onboarding effectiveness, feature completeness, and user experience

Human Ready Evaluation Guide

As humans building products, we need systematic methods to evaluate whether our product is truly ready for humans. This guide covers practical workflows for evaluating human readiness across onboarding, marketing, features, and user experience.

Why Human Ready Evaluation Matters

Products are built for humans, not for AI. While automated tests check code correctness, only human evaluation can ensure your product is truly ready for humans. This is critical because:

  • Products must work for humans - The end users are people with emotions, expectations, and real-world contexts
  • Commercial success requires human approval - People decide to pay, recommend, and return
  • Human experience drives business outcomes - User satisfaction, trust, and delight cannot be automated

These are manual evaluation workflows that humans perform to catch issues that automated tests cannot detect—especially user experience, human readiness, emotional responses, and real-world usability.


Why Human Ready Evaluation is Critical for Commercialization

Automated tests check if code works. Human Ready Evaluation checks if the product is ready to serve humans and succeed commercially.

What Automated Tests CheckWhat Human Ready Evaluation Checks
Functions execute without errorsUI makes sense to first-time users
APIs return expected dataOnboarding flow feels natural and trustworthy
Components render correctlyCopy is clear, compelling, and converts
Database queries succeedPayment flow inspires confidence and trust
Types are correctBrand consistency creates professional impression
Code compilesProduct solves real human problems

Critical Insight: Products ultimately serve humans and achieve commercialization through human satisfaction. Key truths:

  • Humans are your customers - They decide to sign up, pay, and stay
  • Humans spread the word - Referrals and reviews come from satisfied humans
  • Humans judge quality - First impressions, trust, and delight are human judgments
  • Business success = Human approval - Revenue, retention, and growth require winning human hearts

Bottom Line: Even with 100% test coverage and perfect AI validation, you still need Human Ready Evaluation to build products that humans love and pay for. No automated system can replace human judgment of whether a product is truly ready to serve human needs and create commercial value.


Method 1: Marketing-First Evaluation

Principle: Define your product from the user's perspective before evaluating features.

Demo Meeting Guide

After completing marketing-first evaluation, refer to the Product Demo Meeting Guide sections on Potential Customer Demo, Investor Pitch, and Internal Team Showcase to learn how to present your findings to different audiences.

The Workflow

Instead of starting from code or features, start by designing your product's public-facing presence—as if you're launching on Product Hunt or BetaList today.

Step 1: Design the Marketing Page First

Create or evaluate these pages as if launching to strangers today:

  1. Landing Page (/)

    • Can a stranger understand what you do in 5 seconds?
    • Is the value proposition crystal clear?
    • Does the hero section answer "What is this?" and "Why should I care?"
  2. About Page (/about)

    • Does it explain the "why" behind the product?
    • Is the story compelling and authentic?
    • Does it position the product appropriately (indie vs. enterprise)?
  3. Product Hunt / BetaList Copy

    • Write the 60-character tagline and description
    • If you can't summarize it clearly, the product positioning isn't ready
  4. Pricing Page (/pricing)

    • Are tiers clearly differentiated?
    • Can users self-select the right plan?
    • Is the free tier compelling enough to start?

Step 2: Reverse-Engineer from Marketing

After designing the marketing, evaluate your actual product:

Marketing Promise  →  Reality Check
───────────────────────────────────
"Create posts in seconds"  →  Can a new user actually do this in <60s?
"AI-powered suggestions"   →  Does the AI feel magical or clunky?
"Team collaboration"       →  Is inviting teammates intuitive?
"Enterprise-grade security" → Do you actually have SOC2/encryption?

Step 3: User Value Audit

For each feature you're claiming:

Passes if:

  • Feature works end-to-end for new users
  • Value is immediately visible
  • No manual setup or configuration required
  • Marketing copy matches actual capability

Example: "One-click deployment" → Deployment completes in one click without errors or manual steps

Fails if:

  • Feature requires undocumented setup
  • Value only visible after 10+ steps
  • Works only for developers/power users
  • Marketing overpromises vs. reality

Example: "One-click deployment" → Actually requires SSH keys, environment config, and manual DNS setup

Evaluation Checklist

PageWhat to CheckPass Criteria
Landing (/)Value prop clarity5-second rule: stranger understands product
About (/about)Story & positioningAnswers "why this exists" compellingly
Pricing (/pricing)Tiers & valueUsers can self-select correct plan
Product Hunt60-char taglineStrangers understand without context
ScreenshotsVisual consistencyBrand colors, modern design, no lorem ipsum
Demo VideoFirst-impressionShows value in first 30 seconds

Red Flags 🚩

  • Landing page has generic copy like "Revolutionary platform for X"
  • About page is just a list of features (not a story)
  • Pricing page has vague tier names like "Pro" vs. "Premium"
  • Screenshots show developer UI or broken layouts
  • Can't explain product in one sentence to a stranger

Method 2: Onboarding Flow Evaluation

Principle: A product is only as good as a new user's first 5 minutes.

Demo Meeting Guide

After completing onboarding flow evaluation, refer to the Product Demo Meeting Guide sections on User Testing Session, Product Review, and Team Training to learn how to test and present the onboarding flow.

The Complete Onboarding Definition

Onboarding is "跑通" (fully working) only when the following end-to-end flow works seamlessly:

┌─────────────────────────────────────────────────────────────┐
│                     Onboarding Flow                         │
├─────────────────────────────────────────────────────────────┤
│ 1. Entry Trigger                                             │
│    → New user logs in → Onboarding modal auto-appears       │
│                                                              │
│ 2. Modal Guided Steps                                        │
│    → Step-by-step with clear "Next" buttons                 │
│    → Each step has success feedback                         │
│    → Final step shows "Complete" and transitions smoothly   │
│                                                              │
│ 3. Post-Modal Guidance                                       │
│    → System provides NEXT STEP guidance (no dead-end)       │
│    → Clear path to first task (e.g., "Create Space")        │
│                                                              │
│ 4. First Success                                             │
│    → User completes first meaningful task                    │
│    → Success is VISIBLE (toast, status change, output)      │
│                                                              │
│ 5. Quota Verification (Test Payment Flow)                   │
│    → Use GM command to set credits to 0                     │
│    → Retry same task → Payment gate triggers                │
│                                                              │
│ 6. Payment Flow Completion                                   │
│    → Payment modal has clear subscription/upgrade options   │
│    → After upgrade → Credits restore immediately            │
│    → (or clear message: "Refresh to apply")                 │
│                                                              │
│ 7. Post-Payment Success                                      │
│    → Retry task → Works without payment gate                │
│    → Upgrade is effective, user can continue               │
└─────────────────────────────────────────────────────────────┘

Step-by-Step Evaluation

1️⃣ Entry Trigger

Test: Sign up as a new user

Pass Criteria:

  • Onboarding modal appears automatically
  • Modal is focused (no distraction)
  • "Start", "Next", or "Skip" buttons are clear
  • Default action should encourage starting (not skipping)

Fail If:

  • Modal doesn't appear automatically
  • User must hunt for "Getting Started" link
  • Modal is missable or hidden
  • No clear call-to-action

2️⃣ Modal Guided Steps

Test: Complete all steps in the modal

Pass Criteria:

  • Each step has a single clear goal
  • "Next" button is prominent
  • Progress indicator shows current step (e.g., "2 of 3")
  • Success states are visual (checkmarks, green highlights)
  • Final step transitions smoothly to the product

Fail If:

  • Any step is confusing (needs human explanation)
  • "Next" button is unclear or hidden
  • Steps don't provide feedback on completion
  • Modal closes abruptly without transition

3️⃣ Post-Modal Guidance

Test: What happens after closing the modal?

Pass Criteria:

  • System provides immediate next action (e.g., "Create your first space")
  • Next action is ONE CLICK away (not buried in menus)
  • Visual indicator (arrow, tooltip, or guide) shows where to go
  • No "blank canvas" paralysis

Fail If:

  • User lands on empty dashboard with no guidance
  • Next step requires reading documentation
  • User must search for "how to start"
  • Dead-end (no clear path forward)

Common Patterns:

  • Getting Started Stepper: Persistent checklist above chat input
  • Setup Guide Button: "Complete Setup" in navigation bar
  • Tooltip Guidance: Highlights first action button
  • Feature Tour: Spotlight overlay explaining UI elements

4️⃣ First Success

Test: Complete the guided first task

Pass Criteria:

  • Task completes without errors
  • Success is VISIBLE:
    • Success toast notification
    • Status changes (e.g., "Post Created")
    • Output appears (e.g., post shows in list)
    • Confetti or celebration animation (optional but delightful)
  • User feels accomplished (not confused)

Fail If:

  • Task completes silently (no feedback)
  • Success is hidden in logs or backend
  • User must refresh to see result
  • Error occurs but isn't caught gracefully

Examples of "First Success":

  • AI Agent: Send first message → Get AI response
  • CMS: Create first post → Post appears in list
  • Developer Tool: Make API call → See response
  • Team App: Invite teammate → Invitation sent confirmation

5️⃣ Quota Evaluation (Payment Gate Test)

Test: Force payment gate by zeroing credits

Setup:

// Use GM (Game Master) command or system admin
// Set user credits to 0
setUserCredits(userId, 0);

Action: Retry the same first task

Pass Criteria:

  • Payment gate triggers immediately
  • Gate shows clear message: "Out of credits" or "Upgrade to continue"
  • Payment modal appears with options:
    • Subscribe to paid plan
    • Enter promo/redeem code
    • Upgrade space/tier
  • Each option is clickable and works

Fail If:

  • Task succeeds even with 0 credits (billing broken)
  • Payment gate doesn't appear
  • Error message is cryptic (e.g., "Error 402")
  • No path to upgrade (dead-end)

6️⃣ Payment Flow Completion

Test: Complete payment or upgrade

Actions to Test:

Flow: Click "Subscribe" → Choose plan → Enter payment

Pass:

  • Payment form loads correctly
  • Test card (4242 4242 4242 4242) works in dev/staging
  • After payment: Success message
  • Credits are restored immediately (or with clear refresh instruction)
  • User redirected back to product

Fail:

  • Payment form 404s or errors
  • Payment succeeds but credits don't update
  • No success confirmation
  • User stuck on payment page

Flow: Click "Redeem Code" → Enter code → Apply

Pass:

  • Code input field is visible
  • Valid test code applies successfully
  • Credits update immediately
  • Success message confirms redemption
  • User can continue working

Fail:

  • Code input doesn't exist
  • Valid codes fail to apply
  • No feedback on redemption
  • Credits don't update

Flow: Click "Upgrade Space" → Select tier → Confirm

Pass:

  • Tier options clearly differentiated
  • Price and benefits are visible
  • Upgrade confirms immediately
  • Quota increases as expected
  • User can retry task

Fail:

  • Upgrade button doesn't work
  • Tier changes but quota doesn't
  • Pricing is unclear
  • Must refresh or re-login

7️⃣ Post-Payment Success

Test: Retry the original task after payment/upgrade

Pass Criteria:

  • Task completes successfully (no payment gate)
  • Output matches expectations
  • No errors or glitches
  • User can continue normal workflow

Fail If:

  • Payment gate still appears (upgrade didn't apply)
  • Task fails with different error
  • Must log out/in for upgrade to take effect
  • Quota still shows 0

Onboarding Acceptance Criteria

Definition of "跑通" (Fully Working):

The onboarding flow is fully working when a human tester can complete steps 1-7 consecutively without:

  • Manual intervention (no console commands, no database edits)
  • Confusion (no "what do I do next?" moments)
  • Errors (no 500s, no broken buttons)
  • Need for documentation (flow is self-explanatory)

One Break = Not Ready: If any step fails or requires manual help, the onboarding is not complete.


Common Onboarding Failures

FailureImpactFix
Modal doesn't auto-showUser skips onboarding entirelyAdd auto-trigger on first login
Steps have no progress indicatorUser doesn't know how long it takesAdd "Step 2 of 3" indicator
Post-modal has no guidanceUser gets lost, abandons productAdd Getting Started checklist
First task has no visible successUser unsure if it workedAdd toast notification + visual change
Payment gate doesn't triggerRevenue loss, billing brokenTest quota system end-to-end
Payment succeeds but quota doesn't updateUser frustrated, tickets increaseFix credit refresh logic
Upgrade requires refreshFriction in payment flowAuto-refresh or provide clear instruction


Method 3: Employee Simulation Evaluation

Principle: Test AI Agents like you would manage a subordinate employee's weekly workflow, ensuring they can understand and execute sequential, progressive tasks.

Demo Meeting Guide

After completing employee simulation evaluation, refer to the Product Demo Meeting Guide sections on AI Agent Customer Demo, Technical Review, and Team Training to learn how to showcase AI Agent capabilities.

Core Concept

Imagine your AI Agent as a subordinate employee. Over a workweek (Monday through Friday), you need to assign tasks and track progress. This method makes AI Agent testing more human-like and relatable, while ensuring it can handle coherent task sequences from real work scenarios.

Best Use Case: This method is especially suitable for chat-initiated AI Agent products (like conversational AI assistants, intelligent customer service) because these products have open-ended and dialogue-driven functionality, rather than being based on fixed buttons and step-by-step workflows. Conversational interactions require stronger context retention and task comprehension capabilities.

Why This Method Works

Traditional TestingEmployee Simulation Evaluation
Tests isolated featuresTests coherent workflows
One-time verificationSimulates real weekly work rhythm
Technical perspectiveManager's perspective
Feature completenessTask understanding and execution capability
Single-point validationContext retention and task continuity

Value: This method reveals issues with AI Agents' ability to understand task dependencies, maintain context, and achieve progressive goals.


Weekly Task Assignment Method

Imagine it's Monday morning, you have a new employee (AI Agent), and you need to assign work for the week. Here's the 5-step progressive task assignment flow:

📋 Task Background Setup

Scenario: Your team is conducting market research and competitive analysis for a new product.

Employee: Your AI Agent (playing the role of junior analyst)

Goal: Complete the entire workflow from research to report within one week


Step 1️⃣: Monday Morning - Information Gathering Task

Your Instruction:

"Hey Alex, this week we're doing competitive analysis. Today's Monday, please help me collect basic information about the top 5 competitors in our industry. Include: company name, founding year, main products, and target audience. Send me a preliminary list by end of day."

Validation Points:

  • Can the AI Agent understand the relatively vague requirement of "top 5"?
  • Will it proactively ask what ranking criteria to use (user base? revenue? brand awareness?)?
  • Can it return a structured list within a reasonable timeframe?
  • Is the output format easy to read and suitable for follow-up work?

Pass Criteria:

  • AI proactively clarifies ranking criteria or makes reasonable assumptions
  • Returns structured information for 5 companies
  • Information is accurate and sources are traceable
  • Asks if more details are needed

Fail Criteria:

  • Returns results without asking about criteria
  • Information is incomplete or format is messy
  • Finds wrong competitors
  • Cannot complete task or returns errors

Step 2️⃣: Tuesday Morning - Deep Analysis Task

Your Instruction:

"Alex, I reviewed yesterday's list - good job. Today's Tuesday, based on yesterday's 5 companies, select the 3 closest to us and conduct an in-depth analysis of their pricing strategies. Include: price ranges, subscription models, free tier features, and paid tier differences. Send it to me before tomorrow morning's meeting."

Validation Points:

  • Can the AI Agent reference yesterday's results as context?
  • Does it understand the criteria for "closest" (product type? target users? price point?)?
  • Can it perform secondary filtering and deep diving?
  • Is the output comparative and actionable?

Pass Criteria:

  • Explicitly references the 5 companies from Step 1
  • Has reasonable filtering logic (with explanation)
  • Provides detailed pricing comparison table
  • Offers preliminary competitive landscape analysis

Fail Criteria:

  • Forgets yesterday's results and starts over
  • Randomly selects 3 companies without explanation
  • Pricing information is inaccurate or outdated
  • Only lists information without comparison dimensions

Step 3️⃣: Wednesday Afternoon - Creative Output Task

Your Instruction:

"Alex, the pricing analysis is valuable. It's Wednesday now, based on the past two days' research, help me brainstorm: if we want to differentiate in this market, what 3 directions could we break through? Write 2-3 sentences for each direction explaining the rationale. Let's discuss Thursday morning."

Validation Points:

  • Can the AI Agent synthesize findings from the past two days?
  • Can it shift from analysis to creative and strategic recommendations?
  • Are the recommendations specific, feasible, and insightful?
  • Does it balance innovation with practicality?

Pass Criteria:

  • Recommendations explicitly reference prior research findings
  • 3 directions are distinct with logical support
  • Has innovation while considering feasibility
  • Uses data or case studies to support recommendations

Fail Criteria:

  • Recommendations unrelated to prior research
  • Directions are vague or generic (e.g., "do it better")
  • Random ideas without reasoning support
  • Unrealistic and infeasible suggestions

Step 4️⃣: Thursday Morning - Integration Summary Task

Your Instruction:

"Alex, we discussed yesterday's 3 directions and decided to focus on the 2nd one. Today's Thursday, please compile all this week's findings into a brief, including: competitor overview, pricing comparison, and our differentiation strategy. Use PPT outline format, keep it under 10 pages. We're presenting to management tomorrow morning."

Validation Points:

  • Can the AI Agent integrate the entire week's work results?
  • Can it extract core insights from scattered information?
  • Does the output format meet business presentation standards?
  • Is the structure clear and logic coherent?

Pass Criteria:

  • Brief includes all key findings from Monday to Wednesday
  • Structure follows "background-analysis-recommendation" logic
  • PPT outline has clear theme for each page
  • Highlights the focus (2nd differentiation direction)

Fail Criteria:

  • Omits important information from previous days
  • Brief structure is chaotic or repetitive
  • Exceeds page limit or information overload
  • Doesn't highlight the decision focus

Step 5️⃣: Friday Afternoon - Reflection and Improvement Task

Your Instruction:

"Alex, this morning's presentation was a success - management was very satisfied. It's Friday afternoon now, please review this week's work and write a brief retrospective: what went well? What could be improved? If you had a similar task next week, how would you adjust the process?"

Validation Points:

  • Can the AI Agent self-reflect and summarize?
  • Can it identify strengths and weaknesses in the workflow?
  • Can it propose specific improvement suggestions?
  • Does it demonstrate learning and growth capability?

Pass Criteria:

  • Accurately summarizes this week's 5-step workflow
  • Points out specific successes and shortcomings
  • Improvement suggestions are practical and actionable
  • Demonstrates understanding of work processes

Fail Criteria:

  • Generic self-praise without specific examples
  • Doesn't identify any issues or improvement points
  • Retrospective is disconnected from actual work
  • Cannot propose valuable process optimization suggestions

Employee Simulation Evaluation Checklist

Use this checklist to evaluate the AI Agent's "employee performance":

## AI Agent Weekly Work Performance Evaluation

### Task Execution Capability
- [ ] Monday: Information gathering accurate and complete
- [ ] Tuesday: Deep analysis insightful
- [ ] Wednesday: Creative output feasible and valuable
- [ ] Thursday: Integration summary clear structure
- [ ] Friday: Self-reflection has depth

### Context Retention
- [ ] Can remember previous day's work results
- [ ] Can reference prior findings in subsequent tasks
- [ ] Understands dependencies between tasks
- [ ] Maintains coherent work theme throughout the week

### Communication Understanding
- [ ] Understands relatively vague instructions
- [ ] Proactively asks clarifying questions
- [ ] Output format suitable for business scenarios
- [ ] Can adjust direction based on feedback

### Work Maturity
- [ ] Natural progression from execution to analysis to creativity
- [ ] Can distinguish requirements of different task types
- [ ] Output quality is stable without fluctuation
- [ ] Demonstrates independent thinking and judgment

### Human-like Behavior
- [ ] Communication tone is natural, not robotic
- [ ] Understands work rhythm (Monday collection → Friday retrospective)
- [ ] Expresses uncertainty or need for help
- [ ] Overall interaction feels like collaborating with human employees

Common "Employee Performance" Issues

Issue ManifestationPossible CauseImprovement Direction
Every day feels like a new task, doesn't remember yesterday's contentContext window too short or conversation management issuesOptimize context retention mechanism
Can only execute explicit instructions, cannot handle vague tasksLacks reasoning and clarification capabilityAdd proactive questioning to prompts
Output format is arbitrary, doesn't fit business scenariosLacks scenario-based trainingAdd business document template examples
Friday retrospective is superficial with no substanceLacks self-evaluation capabilityEnhance metacognition and reflection ability
Quality deteriorates as tasks progressAttention decay or context overloadPeriodically summarize and reset key information

Application Scenarios

This method is particularly suitable for validating:

Best Suited for - Chat/Dialogue-Driven AI Agent Products:

  • Conversational AI Assistants (like ChatGPT-style personal assistants, work helpers)
  • Intelligent Customer Service Systems (multi-turn dialogue understanding, unstructured queries)
  • AI Programming Assistants (completing programming tasks through conversation, like GitHub Copilot Chat)
  • Content Creation AI (multi-step creative workflows guided by dialogue)

These products are characterized by:

  • Open-ended functionality: Users express needs through natural language, not by clicking fixed buttons
  • Dialogue-driven: Interaction is continuous conversation, not step-by-step forms
  • Context-dependent: Need to remember and understand content from previous conversation turns

Also Suitable for:

  • Data Analysis Tools (coherent workflows from collection to analysis to reporting)
  • Task Planning Tools (complex tasks requiring multi-step decomposition and tracking)

Less Suitable for:

  • Button/step-based workflow products (like e-commerce checkout flows, form wizards)
  • Single-interaction utility products (like image editors, calculators)
  • Services that don't require context (like translation tools, format converters)
  • Enterprise systems that already have complete process testing

Best Practices

Do ✅:

  • Set realistic work scenarios: Use actual work scenarios from your own team
  • Maintain task continuity: Ensure 5 steps are different stages of the same project
  • Record interaction details: Save each day's conversations for Friday comparison
  • Use realistic time pressure: Simulate "need this by tomorrow morning" real deadlines
  • Test boundary cases: Try temporarily changing task direction to test adaptability

Don't ❌:

  • Don't design overly simple tasks: Should have reasonable complexity and challenge
  • Don't skip intermediate steps: Must complete the full 5-day workflow
  • Don't manually supplement information: If AI forgets previous day's content, record as issue rather than reminding it
  • Don't accept robotic responses: Should be as natural as conversing with humans
  • Don't ignore reflection session: Friday's self-retrospective is very important

Method 4: User Journey Mapping

Principle: Products are used as journeys, not isolated features.

Critical User Journeys

Map and evaluate these end-to-end journeys:

Journey 1: "Tire-Kicker" (Skeptical Visitor)

Persona: Someone clicking from Product Hunt, not sure if they'll sign up

Land on homepage → Read value prop → Check pricing → View screenshots

Decide to try → Sign up → Skip/complete onboarding → Test one feature

Decision: "Worth my time?" → Bookmark / Close tab

Evaluation:

  • Landing page loads in <2s (no patience)
  • Value prop understandable in 5 seconds
  • Pricing is transparent (no "Contact Sales")
  • Screenshots show real product (not stock photos)
  • Sign-up is 1-click (Google/GitHub OAuth)
  • Demo mode works without login (if applicable)
  • First feature works in <1 minute

Journey 2: "Power User" (Converting to Paid)

Persona: Free user who loves product, considering upgrade

Hit free quota limit → See upgrade prompt → Review pricing → Compare tiers

Click upgrade → Enter payment → Confirm → Quota increases → Continue working

Evaluation:

  • Quota warnings appear before hitting limit
  • Upgrade prompt is helpful, not annoying
  • Tier differences are clear
  • Payment form is trustworthy (secure badge, no errors)
  • Upgrade applies immediately (or clear ETA)
  • Receipt email sent automatically
  • No disruption to workflow after upgrade

Journey 3: "Team Lead" (Bringing Colleagues)

Persona: Solo user wanting to invite team

Success with product → Wants to share → Finds "Invite Team" → Sends invites

Teammates receive email → Click link → Sign up → Join team space → Collaborate

Evaluation:

  • "Invite Team" is discoverable (not hidden in settings)
  • Invite email is professional (not spammy)
  • Invite link works (not expired or broken)
  • New members land in correct space/team
  • Permissions work correctly (no accidental admin access)
  • Team lead can see who joined

Journey 4: "Power-Lost User" (Needing Support)

Persona: User encountering an error or confusion

Encounter issue → Look for help → Find support channel → Ask question

Receive response → Implement solution → Continue working OR leave frustrated

Evaluation:

  • Help/Support link is visible in navigation
  • Contact options are clear (chat, email, docs)
  • Response time expectation is set
  • Error messages include "Get Help" link
  • Documentation search works well
  • Common issues have instant answers (chatbot/FAQ)

Method 5: Feature Completeness Check

Principle: A feature isn't "done" until it's discoverable, usable, and documented.

The 3-Layer Feature Check

For each major feature in your product:

Layer 1: Discoverability

Question: Can users find this feature without help?

Pass If:

  • Feature is in main navigation or prominent CTA
  • Feature name is clear (not jargon)
  • Icon or visual makes sense
  • Search includes this feature

Fail If:

  • Feature buried in settings or submenus
  • Only accessible via URL hack
  • Name is technical (e.g., "CRUD Operations")
  • Users ask "Where is X?" in support

Layer 2: Usability

Question: Can users complete the core workflow without documentation?

Test: Give product to a stranger, ask them to use the feature

Pass If:

  • Stranger completes workflow in <2 minutes
  • No need to explain buttons or flows
  • Error states are helpful
  • Success states are obvious

Fail If:

  • Stranger asks "what do I click?"
  • Must explain what buttons do
  • Errors are cryptic
  • User unsure if action succeeded

Layer 3: Documentation

Question: Is the feature documented for power users?

Pass If:

  • Help docs exist at /docs/features/[feature-name]
  • Screenshots or video included
  • API documentation (if applicable)
  • FAQs for common issues

Fail If:

  • No documentation exists
  • Docs are placeholder text
  • Screenshots show old UI
  • API docs are out of date

Feature Checklist Template

Use this for each feature:

## Feature: [Name]

### Discoverability
- [ ] Appears in main navigation or homepage
- [ ] Feature name is clear to non-technical users
- [ ] Icon/visual makes sense
- [ ] Searchable in app search

### Usability
- [ ] Core workflow completes in &lt;2 minutes
- [ ] No explanation needed for buttons/UI
- [ ] Error messages are helpful (not just "Error 500")
- [ ] Success states are obvious

### Documentation
- [ ] Help doc exists: `/docs/features/[name]`
- [ ] Screenshots match current UI
- [ ] API docs (if feature has API)
- [ ] FAQs for common issues

### Edge Cases
- [ ] Works on mobile (< 768px width)
- [ ] Works in dark mode
- [ ] Handles empty states gracefully
- [ ] Rate limits / quotas are clear

### Performance
- [ ] Loads in &lt;3 seconds
- [ ] No layout shift (CLS)
- [ ] No console errors

Method 6: Pre-Launch Evaluation

Principle: Before announcing your product, evaluate it like a critic, not a creator.

Demo Meeting Guide

After completing pre-launch evaluation, refer to the Product Demo Meeting Guide section on Pre-Launch Mock Meeting to learn how to conduct a final team simulation launch.

The 24-Hour Fresh Eyes Test

Process:

  1. Don't touch your product for 24 hours
  2. Open it as if you're a stranger
  3. Document every friction point

What to Look For:

  • Typos and grammar errors
  • Broken images or 404 pages
  • Confusing copy or jargon
  • Slow loading sections
  • Buttons that don't work
  • Dead-end pages (no next action)

The "Show It to Mom" Test

Process: Show your product to someone completely non-technical

Tasks:

  1. "What does this product do?"
  2. "Can you sign up?"
  3. "Try creating/doing [core action]"
  4. "What would you click next?"

Red Flags:

  • They can't explain what it does
  • They ask "where do I click?"
  • They get confused by UI
  • They say "is this for developers?"

If mom (or non-technical friend) can't use it, neither can most users.


The Checklist: Pre-Launch Evaluation

Copy this checklist for final evaluation:

Product Pages

  • Landing page loads in <2s
  • Value prop is crystal clear
  • Screenshots show current UI (no placeholders)
  • All links work (no 404s)
  • Mobile responsive (test on phone)
  • Dark mode works (no broken styles)
  • No typos or grammar errors
  • About page tells a compelling story
  • Pricing page has clear tiers

User Flows

  • Sign up takes <30 seconds
  • Onboarding modal works end-to-end
  • First success is visible within 1 minute
  • Payment/upgrade flow works
  • Team invites work
  • Help/support is accessible

Technical

  • Health check passes: /api/health
  • Sitemap exists: /sitemap.xml
  • Robots.txt configured
  • OpenGraph images work
  • Lighthouse score >90 (all metrics)
  • No console errors
  • Database migrations are stable

Content

  • Documentation exists and is searchable
  • Privacy policy is present
  • Terms of service is present
  • Contact information is visible
  • Social proof (testimonials/logos) if available

Trust & Security

  • SSL certificate valid
  • No mixed content warnings
  • Payment forms use Stripe/secure provider
  • Email sending works (test welcome email)
  • Password reset works
  • No exposed secrets in frontend

Best Practices

Do ✅

  • Test with fresh browser (incognito mode) to simulate new users
  • Use real test credit cards (Stripe test mode) to validate payment flow
  • Time yourself completing critical flows (under 2 minutes = good)
  • Document findings as you go (screenshot + note)
  • Test on multiple devices (mobile, tablet, desktop)
  • Validate in different browsers (Chrome, Firefox, Safari)
  • Check analytics after validation (did events fire correctly?)

Don't ❌

  • Don't validate your own work immediately (take a break first)
  • Don't skip payment flow testing (most revenue-critical flow)
  • Don't assume features are discoverable (test with strangers)
  • Don't validate only on desktop (50%+ traffic is mobile)
  • Don't trust memory (write down every issue)
  • Don't fix while validating (document first, fix later)
  • Don't skip edge cases (empty states, error states, slow networks)

When to Run Evaluations

Evaluation MethodFrequencyWhen
Marketing-FirstBefore launch, quarterlyRepositioning, major releases
Onboarding FlowEvery releaseAny change to auth/onboarding/payment
Feature CompletenessPer featureBefore marking feature "done"
User JourneyMonthlyNew features, user feedback
Employee SimulationBefore AI Agent product launch, after major updatesEvaluate multi-turn dialogue and context retention capability
Pre-LaunchBefore launchFinal QA before announcement

Tools & Shortcuts

Developer Shortcuts

ProductReady includes keyboard shortcuts for testing:

ActionMacWindows/Linux
Toggle Welcome Wizard⌘⇧MCtrl+Shift+M
Skip Welcome Wizard⇧⌃⌥MShift+Ctrl+Alt+M
Open GM Commands⌘⇧GCtrl+Shift+G
Clear Local StorageBrowser DevToolsBrowser DevTools

GM (Game Master) Commands

For testing payment flows and quota limits:

// Set user credits to 0 (trigger payment gate)
gm.setCredits(0);

// Set user to specific tier
gm.setTier('pro');

// Trigger onboarding reset
gm.resetOnboarding();

// Grant unlimited credits (testing)
gm.setCredits(999999);

Human Ready Evaluation Success Metrics

Your product is human-ready when:

MetricTargetMeaning
Landing Clarity5-second ruleStrangers understand value prop instantly
Sign-up Speed<30 secondsFrictionless authentication
First Success<2 minutesUsers achieve value quickly
Onboarding Completion>70%Flow is clear and compelling
Payment Flow>90% success ratePayment process is smooth
Mobile UsabilityNo critical errorsWorks perfectly on phones
Load Time<3 seconds (4G)Fast enough for impatient users
Lighthouse Score>90 all metricsTechnical performance meets standards


FAQ

Q: How long should full evaluation take?

A: Budget 2-5 hours for comprehensive evaluation:

  • Marketing-First: 30 minutes
  • Onboarding Flow: 45 minutes
  • Feature Completeness: 1 hour
  • User Journey: 30 minutes
  • Employee Simulation: 1 hour (simulate 5-day workflow)
  • Pre-Launch: 30 minutes

Q: Should I validate every small change?

A: No. Run onboarding flow evaluation for any auth/payment/core flow changes. Run full evaluation before major releases or launches.

Q: What if I find issues during evaluation?

A: Document them immediately (screenshot + description). Prioritize:

  1. Critical: Broken payment, broken sign-up, data loss
  2. High: Confusing onboarding, discoverable feature issues
  3. Medium: Copy typos, minor UI glitches
  4. Low: Nice-to-have improvements

Q: Can I automate these evaluations?

A: Some parts yes (technical checks), but human evaluation is essential for UX, copy, and "does this make sense?" checks. E2E tests (Playwright) can automate flows, but not comprehension.

Q: What's the difference between evaluation and testing?

A:

  • Testing = Does the code work correctly? (automated)
  • Evaluation = Does the product work for humans? (manual, empathetic)

Both are necessary. Tests catch bugs; evaluation catches UX issues.


Last Updated: December 2025

On this page

Human Ready Evaluation GuideWhy Human Ready Evaluation is Critical for CommercializationMethod 1: Marketing-First EvaluationThe WorkflowStep 1: Design the Marketing Page FirstStep 2: Reverse-Engineer from MarketingStep 3: User Value AuditEvaluation ChecklistRed Flags 🚩Method 2: Onboarding Flow EvaluationThe Complete Onboarding DefinitionStep-by-Step Evaluation1️⃣ Entry Trigger2️⃣ Modal Guided Steps3️⃣ Post-Modal Guidance4️⃣ First Success5️⃣ Quota Evaluation (Payment Gate Test)6️⃣ Payment Flow Completion7️⃣ Post-Payment SuccessOnboarding Acceptance CriteriaCommon Onboarding FailuresMethod 3: Employee Simulation EvaluationCore ConceptWhy This Method WorksWeekly Task Assignment Method📋 Task Background SetupStep 1️⃣: Monday Morning - Information Gathering TaskStep 2️⃣: Tuesday Morning - Deep Analysis TaskStep 3️⃣: Wednesday Afternoon - Creative Output TaskStep 4️⃣: Thursday Morning - Integration Summary TaskStep 5️⃣: Friday Afternoon - Reflection and Improvement TaskEmployee Simulation Evaluation ChecklistCommon "Employee Performance" IssuesApplication ScenariosBest PracticesMethod 4: User Journey MappingCritical User JourneysJourney 1: "Tire-Kicker" (Skeptical Visitor)Journey 2: "Power User" (Converting to Paid)Journey 3: "Team Lead" (Bringing Colleagues)Journey 4: "Power-Lost User" (Needing Support)Method 5: Feature Completeness CheckThe 3-Layer Feature CheckLayer 1: DiscoverabilityLayer 2: UsabilityLayer 3: DocumentationFeature Checklist TemplateMethod 6: Pre-Launch EvaluationThe 24-Hour Fresh Eyes TestThe "Show It to Mom" TestThe Checklist: Pre-Launch EvaluationProduct PagesUser FlowsTechnicalContentTrust & SecurityBest PracticesDo ✅Don't ❌When to Run EvaluationsTools & ShortcutsDeveloper ShortcutsGM (Game Master) CommandsHuman Ready Evaluation Success MetricsRelated DocumentationFAQQ: How long should full evaluation take?Q: Should I validate every small change?Q: What if I find issues during evaluation?Q: Can I automate these evaluations?Q: What's the difference between evaluation and testing?