Human Ready Evaluation Guide
Daily methods and workflows for evaluating whether products are ready for human users, including onboarding effectiveness, feature completeness, and user experience
Human Ready Evaluation Guide
As humans building products, we need systematic methods to evaluate whether our product is truly ready for humans. This guide covers practical workflows for evaluating human readiness across onboarding, marketing, features, and user experience.
Why Human Ready Evaluation Matters
Products are built for humans, not for AI. While automated tests check code correctness, only human evaluation can ensure your product is truly ready for humans. This is critical because:
- Products must work for humans - The end users are people with emotions, expectations, and real-world contexts
- Commercial success requires human approval - People decide to pay, recommend, and return
- Human experience drives business outcomes - User satisfaction, trust, and delight cannot be automated
These are manual evaluation workflows that humans perform to catch issues that automated tests cannot detect—especially user experience, human readiness, emotional responses, and real-world usability.
Why Human Ready Evaluation is Critical for Commercialization
Automated tests check if code works. Human Ready Evaluation checks if the product is ready to serve humans and succeed commercially.
| What Automated Tests Check | What Human Ready Evaluation Checks |
|---|---|
| Functions execute without errors | UI makes sense to first-time users |
| APIs return expected data | Onboarding flow feels natural and trustworthy |
| Components render correctly | Copy is clear, compelling, and converts |
| Database queries succeed | Payment flow inspires confidence and trust |
| Types are correct | Brand consistency creates professional impression |
| Code compiles | Product solves real human problems |
Critical Insight: Products ultimately serve humans and achieve commercialization through human satisfaction. Key truths:
- Humans are your customers - They decide to sign up, pay, and stay
- Humans spread the word - Referrals and reviews come from satisfied humans
- Humans judge quality - First impressions, trust, and delight are human judgments
- Business success = Human approval - Revenue, retention, and growth require winning human hearts
Bottom Line: Even with 100% test coverage and perfect AI validation, you still need Human Ready Evaluation to build products that humans love and pay for. No automated system can replace human judgment of whether a product is truly ready to serve human needs and create commercial value.
Method 1: Marketing-First Evaluation
Principle: Define your product from the user's perspective before evaluating features.
Demo Meeting Guide
After completing marketing-first evaluation, refer to the Product Demo Meeting Guide sections on Potential Customer Demo, Investor Pitch, and Internal Team Showcase to learn how to present your findings to different audiences.
The Workflow
Instead of starting from code or features, start by designing your product's public-facing presence—as if you're launching on Product Hunt or BetaList today.
Step 1: Design the Marketing Page First
Create or evaluate these pages as if launching to strangers today:
-
Landing Page (
/)- Can a stranger understand what you do in 5 seconds?
- Is the value proposition crystal clear?
- Does the hero section answer "What is this?" and "Why should I care?"
-
About Page (
/about)- Does it explain the "why" behind the product?
- Is the story compelling and authentic?
- Does it position the product appropriately (indie vs. enterprise)?
-
Product Hunt / BetaList Copy
- Write the 60-character tagline and description
- If you can't summarize it clearly, the product positioning isn't ready
-
Pricing Page (
/pricing)- Are tiers clearly differentiated?
- Can users self-select the right plan?
- Is the free tier compelling enough to start?
Step 2: Reverse-Engineer from Marketing
After designing the marketing, evaluate your actual product:
Marketing Promise → Reality Check
───────────────────────────────────
"Create posts in seconds" → Can a new user actually do this in <60s?
"AI-powered suggestions" → Does the AI feel magical or clunky?
"Team collaboration" → Is inviting teammates intuitive?
"Enterprise-grade security" → Do you actually have SOC2/encryption?Step 3: User Value Audit
For each feature you're claiming:
Passes if:
- Feature works end-to-end for new users
- Value is immediately visible
- No manual setup or configuration required
- Marketing copy matches actual capability
Example: "One-click deployment" → Deployment completes in one click without errors or manual steps
Fails if:
- Feature requires undocumented setup
- Value only visible after 10+ steps
- Works only for developers/power users
- Marketing overpromises vs. reality
Example: "One-click deployment" → Actually requires SSH keys, environment config, and manual DNS setup
Evaluation Checklist
| Page | What to Check | Pass Criteria |
|---|---|---|
Landing (/) | Value prop clarity | 5-second rule: stranger understands product |
About (/about) | Story & positioning | Answers "why this exists" compellingly |
Pricing (/pricing) | Tiers & value | Users can self-select correct plan |
| Product Hunt | 60-char tagline | Strangers understand without context |
| Screenshots | Visual consistency | Brand colors, modern design, no lorem ipsum |
| Demo Video | First-impression | Shows value in first 30 seconds |
Red Flags 🚩
- Landing page has generic copy like "Revolutionary platform for X"
- About page is just a list of features (not a story)
- Pricing page has vague tier names like "Pro" vs. "Premium"
- Screenshots show developer UI or broken layouts
- Can't explain product in one sentence to a stranger
Method 2: Onboarding Flow Evaluation
Principle: A product is only as good as a new user's first 5 minutes.
Demo Meeting Guide
After completing onboarding flow evaluation, refer to the Product Demo Meeting Guide sections on User Testing Session, Product Review, and Team Training to learn how to test and present the onboarding flow.
The Complete Onboarding Definition
Onboarding is "跑通" (fully working) only when the following end-to-end flow works seamlessly:
┌─────────────────────────────────────────────────────────────┐
│ Onboarding Flow │
├─────────────────────────────────────────────────────────────┤
│ 1. Entry Trigger │
│ → New user logs in → Onboarding modal auto-appears │
│ │
│ 2. Modal Guided Steps │
│ → Step-by-step with clear "Next" buttons │
│ → Each step has success feedback │
│ → Final step shows "Complete" and transitions smoothly │
│ │
│ 3. Post-Modal Guidance │
│ → System provides NEXT STEP guidance (no dead-end) │
│ → Clear path to first task (e.g., "Create Space") │
│ │
│ 4. First Success │
│ → User completes first meaningful task │
│ → Success is VISIBLE (toast, status change, output) │
│ │
│ 5. Quota Verification (Test Payment Flow) │
│ → Use GM command to set credits to 0 │
│ → Retry same task → Payment gate triggers │
│ │
│ 6. Payment Flow Completion │
│ → Payment modal has clear subscription/upgrade options │
│ → After upgrade → Credits restore immediately │
│ → (or clear message: "Refresh to apply") │
│ │
│ 7. Post-Payment Success │
│ → Retry task → Works without payment gate │
│ → Upgrade is effective, user can continue │
└─────────────────────────────────────────────────────────────┘Step-by-Step Evaluation
1️⃣ Entry Trigger
Test: Sign up as a new user
✅ Pass Criteria:
- Onboarding modal appears automatically
- Modal is focused (no distraction)
- "Start", "Next", or "Skip" buttons are clear
- Default action should encourage starting (not skipping)
❌ Fail If:
- Modal doesn't appear automatically
- User must hunt for "Getting Started" link
- Modal is missable or hidden
- No clear call-to-action
2️⃣ Modal Guided Steps
Test: Complete all steps in the modal
✅ Pass Criteria:
- Each step has a single clear goal
- "Next" button is prominent
- Progress indicator shows current step (e.g., "2 of 3")
- Success states are visual (checkmarks, green highlights)
- Final step transitions smoothly to the product
❌ Fail If:
- Any step is confusing (needs human explanation)
- "Next" button is unclear or hidden
- Steps don't provide feedback on completion
- Modal closes abruptly without transition
3️⃣ Post-Modal Guidance
Test: What happens after closing the modal?
✅ Pass Criteria:
- System provides immediate next action (e.g., "Create your first space")
- Next action is ONE CLICK away (not buried in menus)
- Visual indicator (arrow, tooltip, or guide) shows where to go
- No "blank canvas" paralysis
❌ Fail If:
- User lands on empty dashboard with no guidance
- Next step requires reading documentation
- User must search for "how to start"
- Dead-end (no clear path forward)
Common Patterns:
- Getting Started Stepper: Persistent checklist above chat input
- Setup Guide Button: "Complete Setup" in navigation bar
- Tooltip Guidance: Highlights first action button
- Feature Tour: Spotlight overlay explaining UI elements
4️⃣ First Success
Test: Complete the guided first task
✅ Pass Criteria:
- Task completes without errors
- Success is VISIBLE:
- Success toast notification
- Status changes (e.g., "Post Created")
- Output appears (e.g., post shows in list)
- Confetti or celebration animation (optional but delightful)
- User feels accomplished (not confused)
❌ Fail If:
- Task completes silently (no feedback)
- Success is hidden in logs or backend
- User must refresh to see result
- Error occurs but isn't caught gracefully
Examples of "First Success":
- AI Agent: Send first message → Get AI response
- CMS: Create first post → Post appears in list
- Developer Tool: Make API call → See response
- Team App: Invite teammate → Invitation sent confirmation
5️⃣ Quota Evaluation (Payment Gate Test)
Test: Force payment gate by zeroing credits
Setup:
// Use GM (Game Master) command or system admin
// Set user credits to 0
setUserCredits(userId, 0);Action: Retry the same first task
✅ Pass Criteria:
- Payment gate triggers immediately
- Gate shows clear message: "Out of credits" or "Upgrade to continue"
- Payment modal appears with options:
- Subscribe to paid plan
- Enter promo/redeem code
- Upgrade space/tier
- Each option is clickable and works
❌ Fail If:
- Task succeeds even with 0 credits (billing broken)
- Payment gate doesn't appear
- Error message is cryptic (e.g., "Error 402")
- No path to upgrade (dead-end)
6️⃣ Payment Flow Completion
Test: Complete payment or upgrade
Actions to Test:
Flow: Click "Subscribe" → Choose plan → Enter payment
✅ Pass:
- Payment form loads correctly
- Test card (4242 4242 4242 4242) works in dev/staging
- After payment: Success message
- Credits are restored immediately (or with clear refresh instruction)
- User redirected back to product
❌ Fail:
- Payment form 404s or errors
- Payment succeeds but credits don't update
- No success confirmation
- User stuck on payment page
Flow: Click "Redeem Code" → Enter code → Apply
✅ Pass:
- Code input field is visible
- Valid test code applies successfully
- Credits update immediately
- Success message confirms redemption
- User can continue working
❌ Fail:
- Code input doesn't exist
- Valid codes fail to apply
- No feedback on redemption
- Credits don't update
Flow: Click "Upgrade Space" → Select tier → Confirm
✅ Pass:
- Tier options clearly differentiated
- Price and benefits are visible
- Upgrade confirms immediately
- Quota increases as expected
- User can retry task
❌ Fail:
- Upgrade button doesn't work
- Tier changes but quota doesn't
- Pricing is unclear
- Must refresh or re-login
7️⃣ Post-Payment Success
Test: Retry the original task after payment/upgrade
✅ Pass Criteria:
- Task completes successfully (no payment gate)
- Output matches expectations
- No errors or glitches
- User can continue normal workflow
❌ Fail If:
- Payment gate still appears (upgrade didn't apply)
- Task fails with different error
- Must log out/in for upgrade to take effect
- Quota still shows 0
Onboarding Acceptance Criteria
Definition of "跑通" (Fully Working):
The onboarding flow is fully working when a human tester can complete steps 1-7 consecutively without:
- Manual intervention (no console commands, no database edits)
- Confusion (no "what do I do next?" moments)
- Errors (no 500s, no broken buttons)
- Need for documentation (flow is self-explanatory)
One Break = Not Ready: If any step fails or requires manual help, the onboarding is not complete.
Common Onboarding Failures
| Failure | Impact | Fix |
|---|---|---|
| Modal doesn't auto-show | User skips onboarding entirely | Add auto-trigger on first login |
| Steps have no progress indicator | User doesn't know how long it takes | Add "Step 2 of 3" indicator |
| Post-modal has no guidance | User gets lost, abandons product | Add Getting Started checklist |
| First task has no visible success | User unsure if it worked | Add toast notification + visual change |
| Payment gate doesn't trigger | Revenue loss, billing broken | Test quota system end-to-end |
| Payment succeeds but quota doesn't update | User frustrated, tickets increase | Fix credit refresh logic |
| Upgrade requires refresh | Friction in payment flow | Auto-refresh or provide clear instruction |
Method 3: Employee Simulation Evaluation
Principle: Test AI Agents like you would manage a subordinate employee's weekly workflow, ensuring they can understand and execute sequential, progressive tasks.
Demo Meeting Guide
After completing employee simulation evaluation, refer to the Product Demo Meeting Guide sections on AI Agent Customer Demo, Technical Review, and Team Training to learn how to showcase AI Agent capabilities.
Core Concept
Imagine your AI Agent as a subordinate employee. Over a workweek (Monday through Friday), you need to assign tasks and track progress. This method makes AI Agent testing more human-like and relatable, while ensuring it can handle coherent task sequences from real work scenarios.
Best Use Case: This method is especially suitable for chat-initiated AI Agent products (like conversational AI assistants, intelligent customer service) because these products have open-ended and dialogue-driven functionality, rather than being based on fixed buttons and step-by-step workflows. Conversational interactions require stronger context retention and task comprehension capabilities.
Why This Method Works
| Traditional Testing | Employee Simulation Evaluation |
|---|---|
| Tests isolated features | Tests coherent workflows |
| One-time verification | Simulates real weekly work rhythm |
| Technical perspective | Manager's perspective |
| Feature completeness | Task understanding and execution capability |
| Single-point validation | Context retention and task continuity |
Value: This method reveals issues with AI Agents' ability to understand task dependencies, maintain context, and achieve progressive goals.
Weekly Task Assignment Method
Imagine it's Monday morning, you have a new employee (AI Agent), and you need to assign work for the week. Here's the 5-step progressive task assignment flow:
📋 Task Background Setup
Scenario: Your team is conducting market research and competitive analysis for a new product.
Employee: Your AI Agent (playing the role of junior analyst)
Goal: Complete the entire workflow from research to report within one week
Step 1️⃣: Monday Morning - Information Gathering Task
Your Instruction:
"Hey Alex, this week we're doing competitive analysis. Today's Monday, please help me collect basic information about the top 5 competitors in our industry. Include: company name, founding year, main products, and target audience. Send me a preliminary list by end of day."
Validation Points:
- Can the AI Agent understand the relatively vague requirement of "top 5"?
- Will it proactively ask what ranking criteria to use (user base? revenue? brand awareness?)?
- Can it return a structured list within a reasonable timeframe?
- Is the output format easy to read and suitable for follow-up work?
✅ Pass Criteria:
- AI proactively clarifies ranking criteria or makes reasonable assumptions
- Returns structured information for 5 companies
- Information is accurate and sources are traceable
- Asks if more details are needed
❌ Fail Criteria:
- Returns results without asking about criteria
- Information is incomplete or format is messy
- Finds wrong competitors
- Cannot complete task or returns errors
Step 2️⃣: Tuesday Morning - Deep Analysis Task
Your Instruction:
"Alex, I reviewed yesterday's list - good job. Today's Tuesday, based on yesterday's 5 companies, select the 3 closest to us and conduct an in-depth analysis of their pricing strategies. Include: price ranges, subscription models, free tier features, and paid tier differences. Send it to me before tomorrow morning's meeting."
Validation Points:
- Can the AI Agent reference yesterday's results as context?
- Does it understand the criteria for "closest" (product type? target users? price point?)?
- Can it perform secondary filtering and deep diving?
- Is the output comparative and actionable?
✅ Pass Criteria:
- Explicitly references the 5 companies from Step 1
- Has reasonable filtering logic (with explanation)
- Provides detailed pricing comparison table
- Offers preliminary competitive landscape analysis
❌ Fail Criteria:
- Forgets yesterday's results and starts over
- Randomly selects 3 companies without explanation
- Pricing information is inaccurate or outdated
- Only lists information without comparison dimensions
Step 3️⃣: Wednesday Afternoon - Creative Output Task
Your Instruction:
"Alex, the pricing analysis is valuable. It's Wednesday now, based on the past two days' research, help me brainstorm: if we want to differentiate in this market, what 3 directions could we break through? Write 2-3 sentences for each direction explaining the rationale. Let's discuss Thursday morning."
Validation Points:
- Can the AI Agent synthesize findings from the past two days?
- Can it shift from analysis to creative and strategic recommendations?
- Are the recommendations specific, feasible, and insightful?
- Does it balance innovation with practicality?
✅ Pass Criteria:
- Recommendations explicitly reference prior research findings
- 3 directions are distinct with logical support
- Has innovation while considering feasibility
- Uses data or case studies to support recommendations
❌ Fail Criteria:
- Recommendations unrelated to prior research
- Directions are vague or generic (e.g., "do it better")
- Random ideas without reasoning support
- Unrealistic and infeasible suggestions
Step 4️⃣: Thursday Morning - Integration Summary Task
Your Instruction:
"Alex, we discussed yesterday's 3 directions and decided to focus on the 2nd one. Today's Thursday, please compile all this week's findings into a brief, including: competitor overview, pricing comparison, and our differentiation strategy. Use PPT outline format, keep it under 10 pages. We're presenting to management tomorrow morning."
Validation Points:
- Can the AI Agent integrate the entire week's work results?
- Can it extract core insights from scattered information?
- Does the output format meet business presentation standards?
- Is the structure clear and logic coherent?
✅ Pass Criteria:
- Brief includes all key findings from Monday to Wednesday
- Structure follows "background-analysis-recommendation" logic
- PPT outline has clear theme for each page
- Highlights the focus (2nd differentiation direction)
❌ Fail Criteria:
- Omits important information from previous days
- Brief structure is chaotic or repetitive
- Exceeds page limit or information overload
- Doesn't highlight the decision focus
Step 5️⃣: Friday Afternoon - Reflection and Improvement Task
Your Instruction:
"Alex, this morning's presentation was a success - management was very satisfied. It's Friday afternoon now, please review this week's work and write a brief retrospective: what went well? What could be improved? If you had a similar task next week, how would you adjust the process?"
Validation Points:
- Can the AI Agent self-reflect and summarize?
- Can it identify strengths and weaknesses in the workflow?
- Can it propose specific improvement suggestions?
- Does it demonstrate learning and growth capability?
✅ Pass Criteria:
- Accurately summarizes this week's 5-step workflow
- Points out specific successes and shortcomings
- Improvement suggestions are practical and actionable
- Demonstrates understanding of work processes
❌ Fail Criteria:
- Generic self-praise without specific examples
- Doesn't identify any issues or improvement points
- Retrospective is disconnected from actual work
- Cannot propose valuable process optimization suggestions
Employee Simulation Evaluation Checklist
Use this checklist to evaluate the AI Agent's "employee performance":
## AI Agent Weekly Work Performance Evaluation
### Task Execution Capability
- [ ] Monday: Information gathering accurate and complete
- [ ] Tuesday: Deep analysis insightful
- [ ] Wednesday: Creative output feasible and valuable
- [ ] Thursday: Integration summary clear structure
- [ ] Friday: Self-reflection has depth
### Context Retention
- [ ] Can remember previous day's work results
- [ ] Can reference prior findings in subsequent tasks
- [ ] Understands dependencies between tasks
- [ ] Maintains coherent work theme throughout the week
### Communication Understanding
- [ ] Understands relatively vague instructions
- [ ] Proactively asks clarifying questions
- [ ] Output format suitable for business scenarios
- [ ] Can adjust direction based on feedback
### Work Maturity
- [ ] Natural progression from execution to analysis to creativity
- [ ] Can distinguish requirements of different task types
- [ ] Output quality is stable without fluctuation
- [ ] Demonstrates independent thinking and judgment
### Human-like Behavior
- [ ] Communication tone is natural, not robotic
- [ ] Understands work rhythm (Monday collection → Friday retrospective)
- [ ] Expresses uncertainty or need for help
- [ ] Overall interaction feels like collaborating with human employeesCommon "Employee Performance" Issues
| Issue Manifestation | Possible Cause | Improvement Direction |
|---|---|---|
| Every day feels like a new task, doesn't remember yesterday's content | Context window too short or conversation management issues | Optimize context retention mechanism |
| Can only execute explicit instructions, cannot handle vague tasks | Lacks reasoning and clarification capability | Add proactive questioning to prompts |
| Output format is arbitrary, doesn't fit business scenarios | Lacks scenario-based training | Add business document template examples |
| Friday retrospective is superficial with no substance | Lacks self-evaluation capability | Enhance metacognition and reflection ability |
| Quality deteriorates as tasks progress | Attention decay or context overload | Periodically summarize and reset key information |
Application Scenarios
This method is particularly suitable for validating:
✅ Best Suited for - Chat/Dialogue-Driven AI Agent Products:
- Conversational AI Assistants (like ChatGPT-style personal assistants, work helpers)
- Intelligent Customer Service Systems (multi-turn dialogue understanding, unstructured queries)
- AI Programming Assistants (completing programming tasks through conversation, like GitHub Copilot Chat)
- Content Creation AI (multi-step creative workflows guided by dialogue)
These products are characterized by:
- Open-ended functionality: Users express needs through natural language, not by clicking fixed buttons
- Dialogue-driven: Interaction is continuous conversation, not step-by-step forms
- Context-dependent: Need to remember and understand content from previous conversation turns
✅ Also Suitable for:
- Data Analysis Tools (coherent workflows from collection to analysis to reporting)
- Task Planning Tools (complex tasks requiring multi-step decomposition and tracking)
❌ Less Suitable for:
- Button/step-based workflow products (like e-commerce checkout flows, form wizards)
- Single-interaction utility products (like image editors, calculators)
- Services that don't require context (like translation tools, format converters)
- Enterprise systems that already have complete process testing
Best Practices
Do ✅:
- Set realistic work scenarios: Use actual work scenarios from your own team
- Maintain task continuity: Ensure 5 steps are different stages of the same project
- Record interaction details: Save each day's conversations for Friday comparison
- Use realistic time pressure: Simulate "need this by tomorrow morning" real deadlines
- Test boundary cases: Try temporarily changing task direction to test adaptability
Don't ❌:
- Don't design overly simple tasks: Should have reasonable complexity and challenge
- Don't skip intermediate steps: Must complete the full 5-day workflow
- Don't manually supplement information: If AI forgets previous day's content, record as issue rather than reminding it
- Don't accept robotic responses: Should be as natural as conversing with humans
- Don't ignore reflection session: Friday's self-retrospective is very important
Method 4: User Journey Mapping
Principle: Products are used as journeys, not isolated features.
Critical User Journeys
Map and evaluate these end-to-end journeys:
Journey 1: "Tire-Kicker" (Skeptical Visitor)
Persona: Someone clicking from Product Hunt, not sure if they'll sign up
Land on homepage → Read value prop → Check pricing → View screenshots
↓
Decide to try → Sign up → Skip/complete onboarding → Test one feature
↓
Decision: "Worth my time?" → Bookmark / Close tabEvaluation:
- Landing page loads in <2s (no patience)
- Value prop understandable in 5 seconds
- Pricing is transparent (no "Contact Sales")
- Screenshots show real product (not stock photos)
- Sign-up is 1-click (Google/GitHub OAuth)
- Demo mode works without login (if applicable)
- First feature works in <1 minute
Journey 2: "Power User" (Converting to Paid)
Persona: Free user who loves product, considering upgrade
Hit free quota limit → See upgrade prompt → Review pricing → Compare tiers
↓
Click upgrade → Enter payment → Confirm → Quota increases → Continue workingEvaluation:
- Quota warnings appear before hitting limit
- Upgrade prompt is helpful, not annoying
- Tier differences are clear
- Payment form is trustworthy (secure badge, no errors)
- Upgrade applies immediately (or clear ETA)
- Receipt email sent automatically
- No disruption to workflow after upgrade
Journey 3: "Team Lead" (Bringing Colleagues)
Persona: Solo user wanting to invite team
Success with product → Wants to share → Finds "Invite Team" → Sends invites
↓
Teammates receive email → Click link → Sign up → Join team space → CollaborateEvaluation:
- "Invite Team" is discoverable (not hidden in settings)
- Invite email is professional (not spammy)
- Invite link works (not expired or broken)
- New members land in correct space/team
- Permissions work correctly (no accidental admin access)
- Team lead can see who joined
Journey 4: "Power-Lost User" (Needing Support)
Persona: User encountering an error or confusion
Encounter issue → Look for help → Find support channel → Ask question
↓
Receive response → Implement solution → Continue working OR leave frustratedEvaluation:
- Help/Support link is visible in navigation
- Contact options are clear (chat, email, docs)
- Response time expectation is set
- Error messages include "Get Help" link
- Documentation search works well
- Common issues have instant answers (chatbot/FAQ)
Method 5: Feature Completeness Check
Principle: A feature isn't "done" until it's discoverable, usable, and documented.
The 3-Layer Feature Check
For each major feature in your product:
Layer 1: Discoverability
Question: Can users find this feature without help?
✅ Pass If:
- Feature is in main navigation or prominent CTA
- Feature name is clear (not jargon)
- Icon or visual makes sense
- Search includes this feature
❌ Fail If:
- Feature buried in settings or submenus
- Only accessible via URL hack
- Name is technical (e.g., "CRUD Operations")
- Users ask "Where is X?" in support
Layer 2: Usability
Question: Can users complete the core workflow without documentation?
Test: Give product to a stranger, ask them to use the feature
✅ Pass If:
- Stranger completes workflow in <2 minutes
- No need to explain buttons or flows
- Error states are helpful
- Success states are obvious
❌ Fail If:
- Stranger asks "what do I click?"
- Must explain what buttons do
- Errors are cryptic
- User unsure if action succeeded
Layer 3: Documentation
Question: Is the feature documented for power users?
✅ Pass If:
- Help docs exist at
/docs/features/[feature-name] - Screenshots or video included
- API documentation (if applicable)
- FAQs for common issues
❌ Fail If:
- No documentation exists
- Docs are placeholder text
- Screenshots show old UI
- API docs are out of date
Feature Checklist Template
Use this for each feature:
## Feature: [Name]
### Discoverability
- [ ] Appears in main navigation or homepage
- [ ] Feature name is clear to non-technical users
- [ ] Icon/visual makes sense
- [ ] Searchable in app search
### Usability
- [ ] Core workflow completes in <2 minutes
- [ ] No explanation needed for buttons/UI
- [ ] Error messages are helpful (not just "Error 500")
- [ ] Success states are obvious
### Documentation
- [ ] Help doc exists: `/docs/features/[name]`
- [ ] Screenshots match current UI
- [ ] API docs (if feature has API)
- [ ] FAQs for common issues
### Edge Cases
- [ ] Works on mobile (< 768px width)
- [ ] Works in dark mode
- [ ] Handles empty states gracefully
- [ ] Rate limits / quotas are clear
### Performance
- [ ] Loads in <3 seconds
- [ ] No layout shift (CLS)
- [ ] No console errorsMethod 6: Pre-Launch Evaluation
Principle: Before announcing your product, evaluate it like a critic, not a creator.
Demo Meeting Guide
After completing pre-launch evaluation, refer to the Product Demo Meeting Guide section on Pre-Launch Mock Meeting to learn how to conduct a final team simulation launch.
The 24-Hour Fresh Eyes Test
Process:
- Don't touch your product for 24 hours
- Open it as if you're a stranger
- Document every friction point
What to Look For:
- Typos and grammar errors
- Broken images or 404 pages
- Confusing copy or jargon
- Slow loading sections
- Buttons that don't work
- Dead-end pages (no next action)
The "Show It to Mom" Test
Process: Show your product to someone completely non-technical
Tasks:
- "What does this product do?"
- "Can you sign up?"
- "Try creating/doing [core action]"
- "What would you click next?"
Red Flags:
- They can't explain what it does
- They ask "where do I click?"
- They get confused by UI
- They say "is this for developers?"
If mom (or non-technical friend) can't use it, neither can most users.
The Checklist: Pre-Launch Evaluation
Copy this checklist for final evaluation:
Product Pages
- Landing page loads in <2s
- Value prop is crystal clear
- Screenshots show current UI (no placeholders)
- All links work (no 404s)
- Mobile responsive (test on phone)
- Dark mode works (no broken styles)
- No typos or grammar errors
- About page tells a compelling story
- Pricing page has clear tiers
User Flows
- Sign up takes <30 seconds
- Onboarding modal works end-to-end
- First success is visible within 1 minute
- Payment/upgrade flow works
- Team invites work
- Help/support is accessible
Technical
- Health check passes:
/api/health - Sitemap exists:
/sitemap.xml - Robots.txt configured
- OpenGraph images work
- Lighthouse score >90 (all metrics)
- No console errors
- Database migrations are stable
Content
- Documentation exists and is searchable
- Privacy policy is present
- Terms of service is present
- Contact information is visible
- Social proof (testimonials/logos) if available
Trust & Security
- SSL certificate valid
- No mixed content warnings
- Payment forms use Stripe/secure provider
- Email sending works (test welcome email)
- Password reset works
- No exposed secrets in frontend
Best Practices
Do ✅
- Test with fresh browser (incognito mode) to simulate new users
- Use real test credit cards (Stripe test mode) to validate payment flow
- Time yourself completing critical flows (under 2 minutes = good)
- Document findings as you go (screenshot + note)
- Test on multiple devices (mobile, tablet, desktop)
- Validate in different browsers (Chrome, Firefox, Safari)
- Check analytics after validation (did events fire correctly?)
Don't ❌
- Don't validate your own work immediately (take a break first)
- Don't skip payment flow testing (most revenue-critical flow)
- Don't assume features are discoverable (test with strangers)
- Don't validate only on desktop (50%+ traffic is mobile)
- Don't trust memory (write down every issue)
- Don't fix while validating (document first, fix later)
- Don't skip edge cases (empty states, error states, slow networks)
When to Run Evaluations
| Evaluation Method | Frequency | When |
|---|---|---|
| Marketing-First | Before launch, quarterly | Repositioning, major releases |
| Onboarding Flow | Every release | Any change to auth/onboarding/payment |
| Feature Completeness | Per feature | Before marking feature "done" |
| User Journey | Monthly | New features, user feedback |
| Employee Simulation | Before AI Agent product launch, after major updates | Evaluate multi-turn dialogue and context retention capability |
| Pre-Launch | Before launch | Final QA before announcement |
Tools & Shortcuts
Developer Shortcuts
ProductReady includes keyboard shortcuts for testing:
| Action | Mac | Windows/Linux |
|---|---|---|
| Toggle Welcome Wizard | ⌘⇧M | Ctrl+Shift+M |
| Skip Welcome Wizard | ⇧⌃⌥M | Shift+Ctrl+Alt+M |
| Open GM Commands | ⌘⇧G | Ctrl+Shift+G |
| Clear Local Storage | Browser DevTools | Browser DevTools |
GM (Game Master) Commands
For testing payment flows and quota limits:
// Set user credits to 0 (trigger payment gate)
gm.setCredits(0);
// Set user to specific tier
gm.setTier('pro');
// Trigger onboarding reset
gm.resetOnboarding();
// Grant unlimited credits (testing)
gm.setCredits(999999);Human Ready Evaluation Success Metrics
Your product is human-ready when:
| Metric | Target | Meaning |
|---|---|---|
| Landing Clarity | 5-second rule | Strangers understand value prop instantly |
| Sign-up Speed | <30 seconds | Frictionless authentication |
| First Success | <2 minutes | Users achieve value quickly |
| Onboarding Completion | >70% | Flow is clear and compelling |
| Payment Flow | >90% success rate | Payment process is smooth |
| Mobile Usability | No critical errors | Works perfectly on phones |
| Load Time | <3 seconds (4G) | Fast enough for impatient users |
| Lighthouse Score | >90 all metrics | Technical performance meets standards |
Related Documentation
- Product Ready Standards - Technical quality checklist
- User Onboarding Strategy - Detailed onboarding design principles
- Testing - Automated testing approaches
- Quick Start - Getting started with ProductReady
FAQ
Q: How long should full evaluation take?
A: Budget 2-5 hours for comprehensive evaluation:
- Marketing-First: 30 minutes
- Onboarding Flow: 45 minutes
- Feature Completeness: 1 hour
- User Journey: 30 minutes
- Employee Simulation: 1 hour (simulate 5-day workflow)
- Pre-Launch: 30 minutes
Q: Should I validate every small change?
A: No. Run onboarding flow evaluation for any auth/payment/core flow changes. Run full evaluation before major releases or launches.
Q: What if I find issues during evaluation?
A: Document them immediately (screenshot + description). Prioritize:
- Critical: Broken payment, broken sign-up, data loss
- High: Confusing onboarding, discoverable feature issues
- Medium: Copy typos, minor UI glitches
- Low: Nice-to-have improvements
Q: Can I automate these evaluations?
A: Some parts yes (technical checks), but human evaluation is essential for UX, copy, and "does this make sense?" checks. E2E tests (Playwright) can automate flows, but not comprehension.
Q: What's the difference between evaluation and testing?
A:
- Testing = Does the code work correctly? (automated)
- Evaluation = Does the product work for humans? (manual, empathetic)
Both are necessary. Tests catch bugs; evaluation catches UX issues.
Last Updated: December 2025