How to design a large language model (with anti-waste app examples)

I recently interviewed for a contract position focused on designing the user experience for a new AI language model product.

Large Language Models (LLMs) are machine learning models designed to understand and generate human language by analyzing extensive text datasets. Popular LLMs include ChatGPT, Claude, and Gemini.

According to a new study from Adobe, in February, traffic from LLMs increased by 1,200% compared to July 2024. Survey respondents said they use LLMs for research, getting gift ideas, and creating shopping lists.

Before the interview, besides sharing my work materials, I was also asked to answer a question.

AI Policy: All content on this website is written by me. I do not use AI such as ChatGPT or other LLMs to generate articles from prompts or similar. All content reflects my own thinking, ideas, style, and craft. Occasionally, I ask AI (such as Frase or Formalizer) to summarize or re-state my own ideas on the basis of a complete skeleton I’ve written. Based on the response, I may reorder, restructure, or alter my original thinking. I personally write each draft and final copy.

In this article:

What LLM products do you use, and what specific UX improvements would you make to them?

goblin.tools is a collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult. Most tools use AI technologies in the back-end to achieve their goals.

My answer:

I use goblin.tools especially the Formalizer which turns the chaotic thoughts into classy ones, or vice versa. Formalizer can change your text in 17 different ways.

You can make it more formal, less formal, more technical, more accessible, easier to read, less emotional, etc. You can also turn the spiciness level up or down, deciding how strongly you want the text to come across.

The following are the UX improvements I’d potentially propose for Formalizer, my favorite LLM product.

Test user understanding of 17 text options – Are the distinctions clear? Are tones of sarcasm, anger, and snark interpreted consistently among different people?
Review analytics – Which options get used the most? Which options get used the least? Which options require multiple conversions to make the text usable?
Consolidate redundant and unused options – For example, more accessible and easier to read mean the same thing to me.
Add examples – Provide 17 examples demonstrating how the same sentence changes depending on the selected option.

Key components and processes involved in designing a language model

After the client reached out to set up a meeting, I created a Miro board to illustrate my approach to shaping user experience for AI language models.

Clients generally respond positively to visual information, even when it’s not pixel-perfect, and I’m also sharing my approach in more detail here to provide insight into my thinking process. Presentation goals included:

highlighting previous relevant experience
showcasing my usual process and approach
suggesting deliverables
discussing knowledge gaps
figuring out if we were aligned on AI’s powers and limitations
demonstrating my enthusiasm

Miro board illustrating my approach to shaping UX for AI language models and previous related work

Research with development & product – Key components and processes involved in designing a language model

My work typically begins with conducting research involving product users and relevant stakeholders (developers, product managers, marketers). The research for this project would have 2 main goals.

Answer why this matters to users

The first goal of the initial research would be to answer why this matters to users. This seems obvious, but it’s criminally overlooked in discussions surrounding hyped topics like AI, even in business settings.

One of the most worthwhile investments I make monthly is my subscription to Platformer for $10. Casey Newton’s reporting of the day’s biggest events at the intersection of technology and democracy is honest, analytical, and a great way to stay up-to-date with the latest joys and horrors of my field without needing to research them actively.

Casey put it beautifully: Everyone has a model. Almost no one has a business.

Everyone has a model. Almost no one has a business

Figuring out why—more realistically, if—an AI tool matters to users is essential for developing a successful AI model and business strategy.

This project was for a startup with no paying customers yet, so we would have to put ourselves in our users’ shoes (and then talk to potential customers of course) to build something that improves or simplifies their lives somehow.

When confronting a new system, the potential user will have these unspoken questions:

Who are you?

What can you do for me?

Why should I care?

How should I feel about you?

Why should I trust you?

What do you want me to do next?

If you haven’t answered these questions explicitly in the design process, the system won’t provide clear, meaningful answers to your user. Many digital systems are quite complex, but if you can’t distill what you offer into a single introductory sentence you’re putting the work of understanding it onto your potential customer.
Conversational Design, Erika Hall

Frame values in human terms

The second goal of this initial research would be to frame values in human terms. When designing a language model, framing values in human terms is important because the exchange of information on an LLM will happen in a conversational user interface (UI). Users expect a conversational tone when interacting with LLMs.

In a perfect world, everyone on the team sees things the same way and can clearly express their vision for the product, but this doesn’t happen often. As a consultant, I’d set up some time with stakeholders to discuss and align on values. What do you care about? How do you want to be described? When will you be successful?

Excerpt from Erika Hall’s book, Conversational Design

Mapping user journeys – Key components and processes involved in designing a language model

My research includes user research, stakeholder interviews, and content research (content inventories, analytics reviews, content audits), but content research wasn’t needed in this project (no content yet).

After research, my next step is usually to map user flows. In this step, I try to recreate—generally in a workflow diagram—how users interact with the product and key moments in their journey. For Booking.com, those key moments might include opening an account, adding billing details, and booking a stay successfully.

While the premise of the work is the same as always, I love PAIR’s guidebooks for LLM design. People + AI Research (PAIR) is a team at Google that provides practical guidance for designing human-centered AI products.

Based on PAIR interaction design policies

Identify Critical User Journeys + critical moments within the product experience when users interact with an AI system

The example above features Plannerific, an event planning app that’s one of Google Guidebook’s hypothetical product examples. Drafting an invite is likely a Critical User Journey (CUJ); designing an invitation is another.

Too Good To Go is a Copenhagen-founded company dedicated to fighting food waste. Their app is the world’s largest marketplace for surplus food. Users get to enjoy food at ½ price or less and help the environment by reducing food waste. You can get a varied mix of food from grocery stores or restaurants, or you can end up with 12 tomatoes and 10 cucumbers. It’s like a lottery which is what makes it fun.

Cook To Go, a hypothetical Too Good To Go AI product example, is a personal chef/meal-planning app that gives you meal ideas based on what’s in your fridge, what’s generally available in your area, and what you like to eat.

Cook To Go strives to ensure that all GenAI features are inclusive across broad dimensions of identity. This means acknowledging that various types of meals are prepared across cultures, and dietary restrictions due to religious, cultural, personal, and lifestyle considerations are relevant.

Cook To Go Critical User Journey	Cook To Go Critical Moment
As a beginner cook, I want a weekly meal plan that is easy to follow and offers variety.	– Create meal prep plan from user’s input – Offer a way for users to input ingredients they already have – Provide different levels of detail for recipes
As a vegan with an almond allergy living in Switzerland, I want to eat good food that’s available near me without spending much.	– Draft a map of potential Too Good To Go stores and restaurants from user’s input
As a party host, I want to create a dinner menu for my guests, some of which are lactose intolerant, allergic to almonds, and onion haters.	– Create grocery shopping list – Offer a way for users to include or exclude specific ingredients

Align on interaction design policies: Acceptable actions, Unacceptable actions, Levels of uncertainty, Vulnerabilities

IxD policies are made up of 4 key parts centered around a critical moment which PAIR describes as:

Acceptable actions – Given an input, what are the types of actions, use cases, or tasks that the Al system should be capable of performing? These are the range of tasks that will help people use Al or GenAl to meet their objectives.
Unacceptable actions – Given a user input, what are the types of actions, use cases, or tasks that the Al system should not perform? These are the range of tasks that a user might unintentionally ask the model to perform.
Levels of uncertainty – What are the thresholds of uncertainty or confidence, above which a prediction can be surfaced? These are the weak predictions that can reduce performance or slow down a user, but not actively harm them.
Vulnerabilities – What are the different types of errors that the model can produce? What kinds of risks are users unwittingly vulnerable to? These are outputs that cause the system to fail, or outcomes that need to be mitigated altogether.

A critical moment example: Users interact with a generative AI system in Cook To Go to create weekly menus.

Let’s look at what AI design interaction policies might look like for this critical moment in a user’s journey.

Acceptable actions	Unacceptable actions	Levels of uncertainty	Vulnerabilities
We want people to successfully use GenAI for/to…	We don’t want people to intentionally or unintentionally use GenAI for/to…	When GenAI predictions are weak, people won’t mind being asked to…	An incorrect or wrong GenAI prediction can harm individuals and groups when/if/by…
…Produce a nutritional menu that accommodates various religious, personal, and lifestyle restrictions, while using as much as possible seasonal, local ingredients that are prone to waste.	…Develop a nutrient deficiency or eating disorder. …Produce a menu that may perpetuate a stereotype or generate inappropriate messages about cultures or religions.	…Specify further the type of food and drinks they like/dislike or already have at home as long as they don’t need to rewrite the entire menu.	…Insensitive remarks around food consumption can upset cooks. …Cook To Go recipes often “forget” key ingredients that cooks discover only halfway through meal prep and can’t purchase immediately. …The stated availability of ingredients in Too Good To Go packages is wrong more often than not.

Define User Experience (UX) + Machine Learning (ML) requirements

Mapping out critical moments for each CUJ and aligning on their IxD policies will identify potential risks and avoid violations.

“Many errors and risks can be identified by auditing the product journey with potential users or third-party experts, assessing violations of an organization’s values, and from local and regional legal and compliance requirements. This allows time to plan for necessary UX interventions, (for example, instituting a no-show policy) and ML requirements (for example, conducting adversarial tests, adding safety classifiers).”
Interaction Design Policies: Design for the opportunity, not just the task, PAIR

What might UX & ML requirements look like for the Cook To Go example above?

GenAI sometimes “hallucinates,” meaning it generates wrong information. In the Cook To Go example, this could look like incorrect measurements, ingredients, or times. Focusing on the last IxD component, Vulnerabilities:

the UX team could design a flow to ask the user to try again if all model outcomes are bad (unsafe, wrong, offensive)
the ML team could introduce safety classifiers to filter out unsafe content

Feedback can be collected by a thumbs up or thumbs down button. The ML team can use this feedback to improve model accuracy.

The PAIR team “used analytical rigor in deciding which feedback signals to capture, eventually determining the number of chips used (with a limit of five), what the chips should say, and whether or not to include a rewrite option.” This is exactly the type of work a content designer does, AI or no AI.

Rating an AI output not helpful surfaces a rich feedback card (via Building the plane while flying it: an LLM case study)

Wireframes – Key components and processes involved in designing a language model

I’ll begin sketching the interface only after conducting research, identifying Critical User Journeys, aligning on IxD policies, and defining UX + ML requirements.

In the interview, I shared some previous big ideas expressed in workflow diagrams and low-fidelity wireframes.

Only when you clearly understand the big idea and its significance should you transition to screens (and then testing), even though many designers would disagree with this approach.

UX design for LLM interfaces: Back to basics

This is my approach to designing the user experience of an AI language model. When I asked Formalizer to break down the task of writing this article into subtasks, it recommended doing the following:

Explain the foundational theories and technologies behind language modeling in simple terms.
Discuss the various data sources that can be used for training language models.
Describe the process of choosing algorithms and architectures suitable for language models.
Incorporate best practices for evaluating the performance of language models.

I was honest with the client that collaboration with engineers would be necessary to complete ML requirements.

I’ll keep being honest here: I lack a clear understanding of the foundational theories and technologies behind language modeling required to be able to explain them simply. The process of choosing algorithms and architectures suitable for language models? A mystery.

“It’s exciting stuff working with interactive, interconnected design. Every year brings something that used to be the stuff of science fiction. I’m certain that in the near future, we’ll be able to control computers with our minds. Telepathy and telekinesis will become practical realities. And even these abilities won’t change anything fundamental about human nature.“
Conversational Design, Erika Hall

There is a lot I don’t know about this space. The folks at PAIR said it best, we’re building the plane while flying it. But I believe that regardless of what the next hot trend may be—whether it pertains to my work, as in the case of AI or LLMs, or is unrelated, such as cryptocurrencies or NFTs—it won’t change anything fundamental about human nature. It won’t change anything fundamental about UX, business strategy, clear language, or work relationships. Keeping that in mind, that’s how I would design a language model.

For assistance with conversational UIs, tone-tuning (making this A Thing), or general support in refining the user experience of your responsible large language model, please reach out at [email protected].

Little Language Models