Human-Centered AI Analysis and Design Project

This semester-long project integrates critical analysis, technical evaluation, and user-centered design around a single AI system. You will examine how existing AI systems serve (or fail to serve) human flourishing, then build and test your own approach.

You’ll choose from a list of project options, or propose your own.

Project Components

You should write a report with the following main sections:

Part 1: Critical Analysis

Design Analysis: Examine how your chosen system’s key design decisions impact human flourishing. Consider questions of agency, capability building, relationship to work/learning, privacy, etc.
Alternative Envisioning: Identify one significant alternative design choice and analyze its potential benefits and drawbacks for human wellbeing.

You might use a framework like:

the Design Norms
the Amershi et al. Guidelines for Human-AI Interaction | CHI 19 (which informed the Microsoft HAX Toolkit)
the Google PAIR Guidebook (see Principles)
the Anti-Heroes cards

I suggest using an AI chatbot to help you do a quick competitive analysis of similar systems.

Part 2: Technical Evaluation

Building a Toy Model: Create a simplified but functional version of your system’s core AI functionality, leveraging existing APIs where possible.
Systematic Testing: Design and conduct a quantitative evaluation of either the original implementation or your toy model according to some measures that connect to human contexts.
Performance Analysis: Document where your approach succeeds, fails, and differs from the original

Scope

Don’t train your own model here – unless there’s something really lacking about existing models for your task. Instead, use existing APIs (e.g., OpenAI, Google Gemini, etc.) or open-source models (e.g., via Hugging Face, Ollama, etc.) to build a simplified but functional version of your system’s core AI functionality.

Part 3: User-Centered Design

Prototype Development: Build a testable version of some system in the neighborhood of the system you analyzed. (Ideally you’d incorporate your envisioned improvement from Part 1, but this might not necessarily be feasible.)
User Testing: Have someone try it out. Reflect on the experience. Revise the prototype. Repeat.
Iteration Documentation: Track how user feedback shaped your design decisions
Reflection: What did building and testing reveal about your original analysis?

Deliverables

You should use a collaborative writing environment (like Google Docs or Word) and a collaborative coding environment (like GitHub) for your project. Share the documents with teammates and the instructor.

Each week you’ll submit an update on your project progress as a weekly milestone. In-class activities will also often have a component of “Apply this to your project”.

Projects will be presented during the last 2 class meetings.

You should also make a portfolio version of this work in a form that you might share with future employers or graduate schools.

Strong projects will demonstrate how your initial analysis informed what you chose to build, how building and measuring revealed blind spots in your analysis, and how user feedback challenged or refined your assumptions about both the original system and the design space.

Milestones

Milestone 1: Project proposal, including team members, chosen existing system, and rough outline:
- what questions might you raise in your analysis?
- what will be the input and output of your toy model?
- what situation are you considering for your prototype?
Milestone 2: Revised proposal. Here are some ideas of things to deepen:
- Analysis:
  - what’s a design choice that would distinguish an existing system from your proposed alternative?
  - Who are some stakeholders that you might not have considered?
- Toy model
  - try a few specific prompts in an API; report what you tried and what the results were like. (What evaluation metrics might you use?)
  - How might your toy model break? vulnerabilities, systematic biases, hidden assumptions?
- Prototype:
  - write a brief storyboard or narrative about how someone would use your system. Be as specific as possible.
  - OR: vibe-prototype a few key interactions
Milestone 3: A draft of your report that has something in every section, even if it’s rough.

Assessment Criteria

As a post-script to your project report, include a self-assessment where you:

Cite evidence that your project meets the baselines below
Identify the strongest aspects of your project, with evidence
Identify areas where you could have gone deeper, and how you would do so

We will also discuss your projects in your final-grade meeting.

Baselines

Every project should show evidence of the following:

All three parts completed (analysis, evaluation, design)
Evidence of testing something with actual people (not just your friends saying “looks cool!”)
Some iteration based on seeing something not quite work as you expected in the real world
A result that could go in your portfolio / LinkedIn (even if you choose not to share it publicly)

Depth

Projects should go beyond the baselines in some ways. Here are some ideas:

Integration across the parts of the project (e.g., how did your analysis inform your design? how did building and testing reveal blind spots in your analysis?)
Analysis:
- Consideration of a wider range of stakeholders and their needs
- Real insights gained from systematic analysis using a framework
- Alternative design choices that are non-obvious and motivated by real-world experience or research
Evaluation:
- Systematic testing protocol that is well-matched to human contexts
- Honest and insightful analysis of results, including limitations, and what they mean for the system’s impact on human flourishing
Design:
- Technical design decisions that connect with what you’ve learned about how AI systems work
- Thoughtful incorporation of user feedback into design iterations
- Non-obvious reflections on how user testing challenged or refined your assumptions

Project Options

Here are some project options that you can choose from, to extend and customize.

Writing Feedback

Can AI help people write, without doing the writing for them?

Analyze some existing writing feedback app like Grammarly
Evaluate how well the feedback addresses the writing feedback needs that someone actually has
Prototype a system that … provides a different kind of feedback, offers coaching, helps people reflect on their writing product or process, supports interactions with a writing coach, etc.

(For an inspiration, see the Thoughtful App).

Translation Tools

Can AI help people communicate across language barriers, without being wrong?

Analyze existing translation systems like Google Translate’s Conversation mode or Messages translation on iOS.
Evaluate translations within conversational or other continuous contexts (e.g., how does it handle ambiguous references? how does it make decisions when multiple options are possible?). Or, evaluate the instructor’s careful translation workflow or live note-taking translation system.
Prototype a system that provides translations with some interventions that reduce the chance of mistranslation.

Event Summarization

Can AI help with synchronous events?

Analyze some meeting summary app.
Evaluate how useful the meeting summary is for various purposes, e.g., catching up for someone who was late or had to step out for a few minutes
Prototype a system that, e.g., offers text suggestions for outline points

(For an inspiration, see my Live Outline project).

Pedagogy

Can AI help students reflect on what they’ve learned? Or make a course more accessible or understandable?

Analyze some AI pedagogy app or system (e.g., ChatGPT’s Study Mode)
Evaluate how well an AI does, e.g., with reference to some things we know from learning sciences, or by evaluating Feedback Bot conversations
Prototype a system for tutoring, reflection, feedback, etc.

Mini-projects

We may or may not actually do these:

Imitate some AI work that we see in the world:

What specific LLM API calls might some service be making?
Make an oversimplified proof-of-concept using a vibe-code approach; where does it fail?
What evals are they using to check if their results are good – that they might run before deciding to switch to a new model