W2D2: Course Advisor Bot
This exercise is focused on prompting and structured output techniques to attempt to make a useful and reliable system out of an LLM.
You may have done this in CS 375/376. If so, take this opportunity to try to test your approach.
The fancy (resume/buzzword) name for what we’re going to do here is Agentic RAG. But we’re going to own our control flow rather than letting the LLM fully drive the interaction. We’re also going to be practicing engineering techniques to make the system reliable and measure its performance.
Task: Make a course advisor bot
We’ll try to create a chatbot that can help students choose courses according to their interests and goals, using retrieval-augmented generation (RAG) techniques to query the course catalog.
You may choose to do this as a Streamlit app or a Jupyter / Colab notebook.
Here’s how I approached it:
First, I defined a set of “tools” that the bot can use, for example, a tool that can query the course catalog and a tool that can recommend a set of courses. (Note that tools are just structured outputs, so we don’t need the model to specifically be trained to “use tools”.)
Then, I wrote a function that basically did the following:
def get_courses_matching_interests(interests):
= [{
messages "role": "system",
"content": ""# System message describing the goal, the tools available, and guidance for the conversation.
}]
messages.append({"role": "user",
"content": interests
})
# Get search queriers from the LLM
= do_llm_call() # with Search Query output format required
search_query_tool
messages.append({"role": "assistant",
"content": search_query_tool.model_dump_json()
})# Search for courses matching the queries.
= search_courses(search_query_tool.queries)
courses
messages.append({"role": "user",
"content": format_courses(courses)
})
if len(courses) == 0:
# Repeat the previous request, so the model can try a different search.
# Get recommendations from the LLM
= do_llm_call() # with the Recommendations output format required
recommendations
return recommendations
We’ll walk through the LLM calls and the course search process below.
Part 1: Structured Output from an LLM
When we’re making a larger system out of component modules, it’s critical that each module have a well-defined interface. Fortunately, we can constrain the LLM to generate responses of a desired format.
Start by getting an OpenAI-compatible LLM endpoint. Here’s a few options:
- Use the OpenAI API (e.g.,
gpt-4o
), but that will cost money. So… - Use a free Google Gemini API key (but the OpenAI API, following these instructions), Or:
- Run locally, using Ollama.
I’d recommend you start with Gemini (using its OpenAI-compatible API) to get things running, then once you’ve got the basics, move to Ollama, because (1) it’s actually running on your computer, and (2) you’ll be constrained to smaller LLMs, so prompt engineering will make a bigger difference. To do this:
- Install
ollama
. - Start the server:
ollama serve
. - Pull the model:
ollama pull gemma3:1b-it-qat
If you have a lot of memory, or a good GPU, you can try
gpt-oss:20b
(to use OpenAI’s GPT-OSS) orgemma3:4b-it-qat
.
If you do this, you can use:
from openai import OpenAI
= OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
client = "gemma3:1b-it-qat" model
Test that your model is working
= client.chat.completions.create(
completion =model,
model=[
messages"role": "user", "content": "What is the capital of France?"},
{
],
)print(completion.choices[0].message.content)
Structured Output
A common library for working with structured data in Python is Pydantic. It allows you to define a data model (not to be confused with an AI model) and then validate that the data you get from any source (an API result, an LLM call, etc.) matches that model.
Here’s an example Pydantic model for a search query:
from typing import Literal
from pydantic import BaseModel
class SearchTool(BaseModel):
"search_course_catalog"] = "search_course_catalog"
tool_name: Literal[str
thinking: list[str]
queries:
= SearchTool(
example_search ="The user wants to know some trivia.",
thinking=[
queries"What is the capital of France?",
"What is the largest mammal?",
])
print(example_search.thinking)
print('; '.join(example_search.queries))
And here’s how we might use it in an OpenAI-compatible LLM call:
= client.chat.completions.parse(
completion =model,
model=[
messages"role": "system", "content": f"""Write 10 search queries."""},
{"role": "user", "content": "I'm looking for courses related to AI."},
{
],=SearchTool,
response_format=0.5
temperature
)
= completion.choices[0].message.parsed
event event
Observe that the response_format
parameter is set to SearchTool
, which means the LLM will be forced to output JSON that matches the SearchTool
schema.
Try making the following changes to the system prompt and see how they affect the output:
Add an example
For in-context learning, it can sometimes be helpful to provide examples of the kind of output that you expect. But it can also sometimes lead to the model getting fixated on your specific examples. Try it out by adding an example to the system prompt. Try adding something like:
Example:
Student interest: "art"
Queries: ["art", "photography", "visual rhetoric", "painting", "sculpture", "art history", "graphic design", "digital media", "art theory", "contemporary art"]
How useful was adding this example?
Add additional instructions
You might try adding a “notes” section to the system prompt to give the model additional guidance. For example, you could say:
Notes:
- Before responding, write a short thought about what kinds of courses might be relevant to the user's interest.
- Assume that queries will be run against a specific course catalog, so avoid general terms like "course" or "department".
- Ensure that each query would match the title or description of one or more specific courses in an undergraduate program
Did these notes help the model produce better output? How would you measure that?
Add the output schema
You might add (within the f
-string):
The output should be JSON with the following schema: {json.dumps(SearchTool.model_json_schema())}
Overall, which of these changes was most helpful for getting the model to produce useful output? Are there any other changes you could make? Refer to our course readings on prompt engineering for more ideas.
Part 2: Search the Course Catalog
Now we need to find courses that match those queries.
To keep it fast and simple, we’ll use a local mirror of the course catalog.
- I found a list of course sections that will be offered in FA25 by inspecting what Network requests were made by the Calvin Course Offerings tool.
- To avoid hitting their site too hard, I downloaded that JSON file and mirrored it here
Here’s how to load that file and search it:
import requests
= requests.get(sections_json_url)
sections_json
sections_json.raise_for_status()= sections_json.json()
sections
= next(section for section in sections if section['SectionName'].startswith('CS 108'))
example_section print(example_section)
The listing is by section, so it’ll be helpful to organize by course instead:
= {
course_descriptions 'SectionName'].split('-', 1)[0].strip(): (section["SectionTitle"], section["CourseDescription"])
section[for section in sections
if "CourseDescription" in section
and section.get('AcademicLevel') == 'Undergraduate'
and section.get('Campus') == 'Grand Rapids Campus'
}
print("Found", len(course_descriptions), "courses")
print(course_descriptions["CS 108"])
Here’s a function to find courses matching a query:
def search_courses(query: str):
"""
Search for courses that match the query.
"""
= query.lower()
query = []
matches for course, (title, description) in course_descriptions.items():
if query in title.lower() or query in description.lower():
matches.append((course, title, description))return matches
"programming") search_courses(
If you have multiple queries, you might want to combine the results:
def find_courses_matching_queries(queries: list[str]):
"""
Find courses that match any of the queries.
"""
return set(
coursefor query in queries
for course in search_courses(query)
)"programming", "AI"]) find_courses_matching_queries([
Part 3: Recommendations
Here’s a possible recommendation output format (it has some issues that you might want to fix later):
class CourseRecommendation(BaseModel):
str
course_code: str
course_title: str
course_description: str
reasoning:
class RecommendTool(BaseModel):
"recommend_course"] = "recommend_course"
tool_name: Literal[str
thinking: list[CourseRecommendation] recommendations:
Now you put it together to make a course advisor bot! First try running these steps “by hand” to see how it works. Then, put it all together in a function. You can follow the rough outline given in the code snippet above.
Part 4: Testing
Test your bot with a few different student interests. Measure at least the following:
- How often does it fail to respond, or give poor results?
- can you fix this, e.g., by changing the system prompt?
- Latency of the system, broken down by part. (which part is slow? Does that make sense?)
- can you improve this, e.g., by changing the input or output formats?
- Are the search queries relevant for the student’s interest?
- Are the courses returned relevant for the search queries?
- Are the recommendations relevant for the interest?
You’ll have to think about how to measure this. You might want to ask a few friends to try it out and give you feedback.
Think about how you could improve the system.