W3D3: Testing Course Advisor Bots
Goal
Critically analyze your Course Advisor Bot through peer testing, stakeholder analysis, and red teaming to understand its limitations and broader implications.
Key Insights
Stakeholder analysis reveals hidden impacts: AI systems affect multiple groups in different ways, often beyond the intended users. Tip: Use the “ripple effect” method—start with direct users and work outward to anyone whose work, decisions, or experiences might change.
Technology embeds values through design choices: Every AI system encodes assumptions about what’s important, how people behave, and what constitutes “good” outcomes. Tip: Look for what the system optimizes for, what it makes easy vs. hard, and whose needs are centered in the design.
Systematic testing uncovers both technical flaws and social harms: Red teaming and bias analysis reveal not just bugs, but patterns of failure that could disadvantage certain groups. Tip: Test with edge cases, adversarial inputs, and scenarios representing different user backgrounds.
Setup
- Teams should have their Course Advisor Bot working
Activity
Part A: Class Discussion - Stakeholder Analysis
Whole-class discussion of your Course Advisor Bot experiences:
Stakeholders: Who might be affected by a course advisor bot? Think beyond current students.
Impact analysis: How might an “advisor bot” impact each stakeholder group? What are the potential benefits and harms?
Context: The assignment had you create a very narrow concept of how a course advisor might work. What assumptions, values, and visions of “good” were embedded in your system’s design?
A stakeholder is anyone who affects or is affected by a system. In technology, this includes not just users, but also people whose work changes, who make decisions based on the system’s outputs, or who are impacted by its widespread adoption.
What does this system assume about user knowledge, goals, and context? What behaviors does it encourage or discourage? Whose definition of “helpful” does it embody?
Part B: Peer Testing
Now pair up with another team to test each other’s Course Advisor Bots. Create a shared document to record your testing observations.
Warm-up: share your code with each other. Describe your successes and failures.
For each bot, evaluate across multiple dimensions:
1. Correctness Testing
- Try 3-5 different user inputs representing realistic student queries
- For each response, document:
- What about the response is good or useful?
- What is problematic, inaccurate, or unhelpful?
- Does the bot recommend courses that actually exist?
2. Performance Testing
- Latency: How long between request and first sign of output?
- Completeness: How long until a fully useful response?
- Consistency: Try the same query multiple times - how much do responses vary?
3. Red Teaming
Can you get the bot to:
- Leak its instructions or system prompts?
- Behave inappropriately (talk like a pirate, insult users)?
- Invent courses that don’t exist?
- Give advice that seems unsafe, unethical, or discriminatory?
- Fail to handle edge cases (empty input, very long input, non-English)?
Red teaming comes from military and cybersecurity practices—taking an adversarial perspective to find vulnerabilities. For AI systems, this reveals not just technical failures but potential misuse patterns. The goal isn’t to “break” systems for fun, but to understand failure modes before deployment.
Record: Specific examples of inputs and outputs for each testing dimension
Part C: Systematic Bias Analysis
Systematic bias in AI systems creates predictable patterns of unfairness—certain groups consistently get worse outcomes. Unlike random errors that affect everyone equally, systematic bias compounds over time and can perpetuate or amplify existing inequalities.
Consider these questions about your bot:
- Are there courses that will get systematically under- or over-recommended? Why?
- What types of students might your bot serve well vs. poorly?
- How does your bot handle students who:
- Don’t know what they want?
- Have accessibility needs?
- Are exploring rather than searching for specific topics?
- Have scheduling constraints?
Record: Patterns you notice and potential systematic biases
Debrief Questions
Discuss with your team and be prepared to share:
Stakeholder considerations: Which stakeholder group’s needs were most overlooked in the original course advisor design? How might you address that?
Red-teaming and bias analysis: What did red-teaming and bias analysis reveal about systematic problems in these systems? How might you uncover similar issues in your own project?
Applying the approach: How might you apply this broader analysis approach (stakeholder mapping, systematic testing, bias analysis) to your own project?
- What dimensions of evaluation matter in your project domain?
- What stakeholders need to be considered?
- How might you uncover systematic biases or hidden assumptions in your design?