Building a Better AI for Group Chats: Part 2

For the past few months, my side project has been building an interactive AI designed for group chats. I’m interested in figuring out how the AI/UX for this use case should behave, and exploring how the nature of a group chat evolves as people grow accustomed to hanging out with an AI. Since February I’ve been testing it in a group of ~10 of my friends from college.

In Part 1, I covered the motivation up through v1, which was designed to mimic one of the group members. In this post, I’ll cover v2 which shifted into a more interactive AI.

In the previous post I broke down three principles of group chats, which I could build upon to create an engaging AI for this use case.

Group chats are always “on”: In the best group chats, there’s enough people and enough activity that there’s always something going on
Group chats have open conversations: Even pairwise conversations happen out loud, so everyone else can amuse themselves through reading along; or contribute with a lightweight interaction; or jump in
Group chats tap into a shared history: Most social circles come with people who have their own personality — a history of inside jokes, common memories, a sense of connection, etc.

v1 experimented heavily with the third principle of shared history, by building an AI that literally replicates one of the people in the channel in order and acts as a kind of digital sidekick. This translated to taking that person’s chat history as training data, and fine-tuning a model and embeddings on that history, in order to mimic that person’s style of speech and opinions on known topics. It was engaging initially, but had diminishing returns.

In mulling v2, a different approach would be to focus on the second principle of open conversations, by taking more of an interactive chatbot-style implementation, on top of a model like ChatGPT (GPT-3.5).

These types of models are limited in how much input data they can actually take in, as everything needs to be passed in at run-time through a capped-length prompt. Rather than passing years of history, I’d be limited to whatever I can fit into a few thousand characters. This would make it harder to specifically mimic somebody.

But it would also opens up a lot more opportunity for multi-turn conversations and clearer tone of voice, which could spur more interactions.

While I was debating this in March, OpenAI released GPT-4.

And GPT-4 is… awesome. It’s head and shoulders above even ChatGPT at being able to maintain a coherent tone and personality, yet remain malleable. So I pivoted and rebuilt D4ve on top of GPT-4.

This also meant shifting the product from a bot intended to mimic someone’s actual opinions, into a more of group member designed for interactivity with just traces of that specific person’s style implemented in the prompt – from “Mimic” to “Chatty”.

*v2 (“Chatty D4ve”) was an interactive bot based on GPT-4. This had better initial results than v1, but qualitative projects soon emerged that led me to cut off v2 and move right to v3*.

The shift from a personality-driven mimic to open-ended interactivity led to a big new burst of engagement. And while it still had a novelty effect, the peak was higher and more sustained than v1.

That said — while engagement was higher, there was still a drop off. And it didn’t seem like it was going to really improve. Because within a week, my friends were getting frustrated at how anodyne the generic GPT-4 responses were, which ultimately led to dead-end conversations.

However, studying the usage patterns did give me an idea.

As I observed the chat logs for D4ve v2, there was an interesting pattern – different people interacted with the AI in different, but consistent ways. One made jokes with it, one probed it as an AI to test its limits, one asked legitimate questions like it was a friend, and so on.

Rather than trying to mimic a single coherent personality, perhaps there was something about complementing everybody with their own unique kind of response. It’s still just one AI — not multiple bots for multiple people — but could the AI be more of a social chameleon, able to respond to different people in different ways?

So I paused v2 early and moved onto v3 with that idea — The Chameleon — which I cover in the next post.

(Side note — GPT-4 is also… expensive. It’s non-trivial to figure out how to replicate these results on a custom model, or a smaller model, etc. But I’m leaving that for Future Me to figure out.)

*Daily usage costs for my Open AI account. The red line represents the last day of blissful ignorance*.

Thanks for reading. See here for Part 1 and Part 3 of this series.If you’ve made it this far and you’re a builder interested in jamming on the AI/UX space, I’m happy to hear from you – you can reach me at skynetandebert at gmail dot com or @ajaymkalia on Twitter.

Building a Better AI for Group Chats: Part 2

Published by ajaymkalia

Leave a reply. Cancel reply

Share this:

Published by ajaymkalia

Leave a reply. Cancel reply