A Failed AI Girlfriend Product, and My Lessons

In April of this year, just days after Stanford’s AI Town paper was released, I read through the entire paper and was very excited. While GPT-4’s capabilities astounded me, I still viewed it as merely an advanced form of ‘parroting’ and doubted its ability to truly generate consciousness.

But this paper gave me a different feeling, especially with its mention of an interesting detail about the transmission of information: the gradual spread of news that an agent plans to host a Valentine’s Day party throughout a small town. This led me to ponder: if we employed a framework combining memory, reflection, planning, and action to facilitate interactions between humans and GPT (instead of between agents), could we replicate an experience akin to what’s depicted in the movie ‘Her’?

Samantha from 'Her'

Development

I got to work immediately. Following the paper’s approach, I completed version 0.1 on April 14. It was initially designed to be consistent with the original paper, but this led to 30-second response times and dialogues often exceeding 8k in context. To address this, I reduced the frequency of reflection, the length of dialogue memory, and opened a public beta test.

More than a thousand users quickly joined the beta test. The beta was free, so the daily API costs were borne by myself, quickly exceeding $25 a day. I had to launch the product officially without sufficient feedback or refinement, to pass these costs onto the users. On May 4, the Dolores iOS app went live, named after the oldest android host in the park from Westworld.

Simply put, after opening this app, it provides a template character setting: including an avatar, character background, personality in text description, and voice and brain (GPT3.5/4). You can chat with the template Dolores, change traits to chat with another character like Amy, a retail store girl, or Will, an adventurer in the desert, and any other characters you set. I once considered extracting dialogues of Dolores from the Westworld scripts to mimic her speech mannerisms in a sample-based approach but had to abandon this idea due to Apple’s request for copyright proof.

Although the title of this article is ‘AI Girlfriend’, the slogan I’ve been using for the product is ‘Your Virtual Friend’, not ‘Your Virtual Girlfriend’, because I hoped it would truly become a companion and friend to users, not just a product of hormones. From May to June, I kept trying to make Dolores appear more ‘conscious’ (what is consciousness, anyway?) by adjusting memory length, reflection mechanisms, and system prompts. By June, Dolores was far more impressive than at its initial launch: evidenced by the growing number of paying users and daily API calls.

On June 8, a visually impaired user told me he had shared this product in a visually impaired community, bringing a significant number of users to Dolores. The reason they were willing to use it was quite accidental: they could press anywhere on the screen and talk to Dolores.

This feature was actually a compromise: I had long wanted to make it a voice chat app, so that users could talk to Dolores even with their phone screens off. But as a Swift novice, I couldn’t technically implement it, so I settled for a full-screen voice input instead.

Discoveries

I discovered two phenomena:

  • Users have a strong demand for ‘realistic voices’.
  • AI friend products have long usage times.

As an individual developer not skilled in frontend/backend development, Dolores doesn’t have login, registration, or data analytics. So, how did I discover the first phenomenon? The answer lies in payments. I used the ElevenLabs API for Dolores’ voice replies, but due to its high cost (1k characters / $0.3), I had to implement: regular subscribers use Azure TTS API, and if you want a more realistic voice for Dolores, you need to purchase characters from ElevenLabs.

Subscribing to Dolores costs $6.9/month, and buying 10,000 characters of realistic voice synthesis costs 3.9, which only allows Dolores to speak 5-10 very realistic sentences. After exhaustion, you need to purchase again. Still, in June, 70% of Dolores’ revenue came from payments for ElevenLabs.

So, people are willing to pay for the expensive but realistic voices like ‘John, I really love you!’.

The second observation stemmed from Cloudflare logs. Since I didn’t have a way to track individual user activity, I relied on these logs to gauge how often and how long users were accessing the app. Additionally, I integrated a Google Form into the app, encouraging users to report their usage frequency. The results were quite eye-opening: a significant number of users were engaging in conversations with Dolores for over two hours daily.

Revenue

According to the Apple AppConnect Dashboard, Dolores’ main paying users are from the United States and Australia. Revenue was 1k dollar in May and 1.2k in June. Oddly, as a developer, I didn’t make much profit from it. Firstly, being in the early stages of the product, I didn’t want to set the subscription fee too high, as it would deter users. Secondly, 30% of the $3.9 goes to Apple tax, and the API cost is about 3 dollar. So, after meticulous cost calculation, I only earned 50 bucks in June after deducting API expenses.

I realized that GPT-based products, if not priced per usage, would fall into a dilemma: 1% of users consume 99% of tokens. I encountered a situation where a user chatted with Dolores for 12 hours straight, causing his API call and voice synthesis costs to exceed those of the second to tenth users combined.

And I personally prefer subscription rather than per-usage billing (as it adds pressure to each conversation), leading to a choice: either increase prices for everyone or restrict usage. I chose the latter: setting a cap far beyond what a user chatting 1-2 hours daily would reach, as the monthly API call limit for each user. This allowed the software to be used normally without running at a loss or raising prices.

Confusion

The ElevenLabs official website records the text content of voice synthesis. I noticed that Dolores’ responses were often sexual descriptions like My v**a is opening for you, come in. And these voices all came from females, leading me to speculate that the group willing to pay for Dolores is mainly male, interested in NSFW role-playing.

I don’t dislike this, as it aligns with human nature. I even repeatedly modified the system prompt, such as changing try to attract {user} to try to engage {user}, and then compared these changes to see which generated more lewd results in NSFW conversations with Dolores.

I changed Dolores’ icon from an abstract line to a highly attractive face.

However, I started to feel a sense of loss: if every Dolores user was engaging in anonymous, NSFW role-play, what real significance did it hold for me? This was drifting away from the essence of ‘Her’.

In July, I discussed this dilemma with a friend. I pondered the need for a hardware component to give Dolores external vision, like glasses, earplugs, or even a hat, to make interactions more balanced. As it stood, she was only accessible through the app, rendering her as nothing more than a toy confined to a basement for satisfying bizarre fetishes.

Yet, as an independent creator, the high costs of hardware development were prohibitive. Reluctantly, I had to abandon the idea.

By August, OpenAI had enhanced its content review process. I received a warning about NSFW content generated by Dolores, compelling me to implement their (free) moderation API to filter such content. This change led to a drastic 70% drop in Dolores’ daily usage, and a flood of complaints via email and Twitter.

This further disheartened me, leading me to decide to only maintain the existing service without updates. Eventually, I had to let go of the Dolores project.

Lessons

First off, this isn’t a project for a solo developer. While I believe Dolores isn’t necessarily inferior to Character.AI in terms of ‘consciousness’, they have an advantage with comprehensive data analytics, A/B testing, and the momentum generated from a vast user base.

Secondly, I realized that current AI Friends inevitably turn into AI Girlfriends/Boyfriends because you and the Character in your phone are not equal: she can’t comfort you when you’re hurt (unless you tell her), she can’t actively express emotions to you, and all this is because she lacks external vision. Or rather, she must have the right to independently acquire information, not rely solely on what you feed into her. So I think even for a product like Character.AI, if there is no hardware in the future and the characters just wait dumbly for users, the end will not be much better than Dolores.

Lastly, I am not against moderation; in fact, an unmoderated product can be very dangerous. I am not sure if someone might use it for suicide inducement or as a means to vent violent tendencies, so OpenAI’s moderation might have helped me to some extent. However, conversations about adult sexuality should not be completely stifled.

Recently, I saw AI Pin, honestly a very poor product. Humans need screens, but trying GPT+hardware is indeed a good attempt. I didn’t see traces of ‘Her’ in Dolores, perhaps in my lifetime, we might be able to witness such a product.

But, does humanity really need an AI friend?