It’s been nine months since the release of ChatGPT, which awakened the broader public to the possibilities of generative AI. After the initial hype, what are the areas in which excitement about generative AI is warranted?
I’m excited to publish a panel discussion on this topic that I participated in. Matt Dupree, the CEO of ATLAS, organized and moderated the conversation. Jonathan Pedoeem, Founder of PromptLayer, and Vijay Umapathy, Sr. Director of Product at Heap, were also panelists.
What exactly is the value of ChatGPT?
What are some examples of where generative AI has failed to live up to initial expectations?
What are some emerging categories of generative AI where it’s proving to be truly useful?
You can listen to the conversation or else read the lightly edited transcript below. Enjoy!
Thanks for reading Allison Pickens' Newsletter! Subscribe for free to receive new posts and support my work.
Is there a new use case for generative AI that you’re excited about? I’d love to hear about it. Feel free to respond to this email newsletter or message me on LinkedIn.
Matt Dupree (Atlas): I'm excited to moderate a panel of folks to talk about AI and product strategy. So first, we're going to have Jonathan Pedoeem, who's the founder of PromptLayer.
Jonathan Pedoeem (PromptLayer): Thank you for having me.
MD: You've seen a lot of interesting use cases for LLMs as a part of building out PromptLayer. And so I’m really excited to have you here and get your insights on the state of things.
Next, we have Allison Pickens, AI-focused investor and former COO of Gainsight. I’m also really excited to have her perspective given her background.
Then finally we have Vijay Umapathy, the director of product at Heap. Heap is doing some interesting things with LLMs and thinking really in a nuanced way about prioritization. I’m really excited to have his perspective. Thanks for joining us everybody.
Jonathan, you said earlier today in your talk that the AI hype is calming down a little bit. I think I saw a Reuters article about ChatGPT usage being down. There's been some decline of usage for other AI native startups. Do you folks agree with Jonathan that the hype is dying down a little bit? Or is this just some weird seasonality?
Allison Pickens: With the ChatGPT usage decline specifically, there's probably some element of seasonality to it with students taking off the summer. Students were probably a pretty significant percentage of total usage. But it’s also likely that major adoption cycles like this are not linear, but actually are a little bit more bumpy than we would expect. Particularly, people are trying to figure out how to incorporate LLMs into their workflows. That's actually a huge opportunity for the future, and for a lot of startups. If I'm trying to use ChatGPT in marketing, sales, or customer success, I’ll probably want more purpose-built tools for that.
Vijay Umapathy (Heap): Agreed. I don't think it was ever a reasonable assumption that the world was going to stop using the rest of the internet and only interface the information through a chat window and ChatGPT. Especially when they released plugins, some people were really hyped about that. But if you go back to UX basics, you don't want to have this high-friction experience to give an LLM the right context. And so ChatGPT has probably inspired a lot of companies to realize the usefulness of LLMs. We're just getting started on the value of LLMs. A lot of these companies are trying to figure out how to integrate it into their workflows. That’s the easiest way to give the right context to these models to actually get value.
MD: That makes sense. You think it's an S-shaped thing. It's not like it's just going to go exponential crazy with it.
VU: Or ChatGPT is not a good measure. I don't think any of us really care about ChatGPT daily active users as a measure of the usefulness of LLMs. I think it’s not a useful exercise.
MD: I love it.
AP: The primary value of ChatGPT was to educate the broader lay-public on what LLMs can do to change their lives. And what's more interesting is, again, the more purpose-built applications to come. It's likely that a lot of folks got pretty excited about LLMs for a period and started to try out some use cases and then realized that the workflow wasn't sufficient.
For example, several months ago, my husband and I decided to geek out on a Saturday night and try out Midjourney for the first time. Midjourney was not built with a particular user in mind, and particularly not for a layperson. So, we tried it out for a few hours. I eventually hired an intern to poke around and see if there were use cases for my fund [for example, a fun LinkedIn headshot]. But after that initial attempt, I'll probably wait until there's some application that's built into my day-to-day.
MD: That makes sense. Jonathan, did you have any thoughts?
JP: At least from our perspective, there's probably two empirical things. We do see a transition of more mature users from just people flooding in. Hobbyists and hackers were coming in at the beginning. So, there's maybe less volume, but the quality is that it's now real companies that are coming through our door and talking to us. Another thing empirically that I noticed from my non-technical friends or older people I know in community is that when ChatGPT was coming out, everybody was just talking about it. They’d say, "Oh, I used it for this email. I used it to write my daughter's entrance letter to this camp," or something like that. Now I’ve had a few people come to me and say, "Yeah, man. ChatGPT didn't do so well on this task or that other task." Now, they realize they can't use it everywhere. There’s definitely a maturity of the users that are coming through our door. A lot more mature companies are coming through.
MD: Now that you mention it, I've noticed something similar. Some of the folks that have signed up for our wait list, early on, they had just seen us in the AI newsletter and don't work for a particular company. They're just messing around. And that's been less true lately.
It seems like the group consensus is that ChatGPT is basically a demo or a toy. It's not really serious. But I'm curious if people do think there are any use cases where ChatGPT will replace more traditional uses of the internet. People were saying this about Google for a while. I think that's worn off. But I'm curious if there is any sliver of use case or usage that'll be displaced by ChatGPT.
VU: Cheating on homework.
MD: Fair enough. That's about the answer I would expect given what you all said earlier. That's a good one. Wasn't there a professor that asked ChatGPT if people were cheating and it said yes. And then he gave people zeros? Did people see this? Did I make this up? Did I hallucinate this?
VU: I don't know.
MD: I feel like I saw something like this. All right. All right. Nobody remembers. I hallucinated it. Sorry.
VU: It's tricky. Honestly, I can't think of a whole lot of them because in a lot of these cases, you're inevitably going to get better by providing better context and you're only going to do that if you're embedded in actual workflows that have that context. The end state is very few things actually make sense in ChatGPT and not in a separate tool.
MD: I met somebody last week who said this is the end of the internet. Everybody's just going to use ChatGPT. I wish that he was in this room so that you guys could duke it out.
JP: As someone who codes, I slowly used ChatGPT for some coding questions. It's a good UX for that, in my opinion.
VU: What about integrating in your IDE though? Sometimes if I'm writing something in a language I'm not that intimately familiar with and I'm going to make mistakes, I will paste something into Chat. Or I'll ask ChatGPT to write me a boilerplate starting point. It'll do that.
Inevitably you reach a point where you get an error. Then I go into my terminal and I copy, paste the error that I found and I say, "I ran into this error. What do you think I should do?" But that step of hopping out of my context into my separate terminal application, copying, pasting it back in and not necessarily having a good notion of the history of all that, that's a lot of work I was doing. This is why I think CoPilot and all these other tools are going to be the central place to do this.
JP: Fair point. I agree.
MD: We figured it out. ChatGPT is just a demo. It's done. We hinted a little bit at the answer to this question too, but I'd like to hear takes on whether current LLM capabilities are over or underrated.
AP: I think they're overrated where there's precision that's required in the answer or some kind of convergent thinking about concrete solutions. Interestingly, there's been this emerging category of companies that have been trying to help people get access to their company's data, to query the data and find, for example, how many of their users gave them a certain Net Promoter Score. The expectations there were pretty high, and it's hard for these companies to meet those expectations. When you're chatting with generative AI, you're chatting with someone who is supremely creative and is more attuned to divergent thinking.
MD: That's a really interesting point. So it’s overrated in this convergent thinking area specifically. I think that's right. We're trying to use LLMs to help people learn how to use software. And part of the decision to make that our mission is really a recognition of the limitations of a more ambitious applications. You could try and make LLMs that would just use the software outright. That requires a kind of convergent thinking that I just don't think is there yet.
VU: I second this notion. I work at a product analytics company and so we are constantly exposed to everyone being obsessed with the text box interface for doing all analytics. I completely agree that it's an environment where the cost of hallucination is really high. It takes a lot of effort to build trust in data and it takes very little effort to erode that trust. For tools that have code as an input, for example, if you have a SQL runner or something that's syntactically specific that the average person may not be that good at, LLMs can be really helpful with giving you a leg up into those kinds of inputs. But even then, it comes a lot of caveats.
I would kind of break that into, for example, an LLM just to input SQL into a reporting tool. That is solving two problems and it's doing one of them a lot better than the other. It is solving a workflow automation problem of generating functioning SQL code and it's probably going to save you a lot of time there and do a decent job of it. But is it referencing the correct events? The little nuance pieces actually get you to the right answer, those are the areas where those kinds of products are probably going to struggle a lot.
That's why they may be overrated out-of-the-box, but underrated if you put the right investments in creating the right APIs underlying it that are exposing access to that data. If you do a good job of that and you invest a lot in that infrastructure to give the LLM the cleanest, most reliable dataset you can to reduce hallucinations, people may be underrating how useful it can be in the long run.
MD: I think that's right. Some of what Zach was just talking about was investing in the supporting infrastructure to make these LLMs really effective. You're right that it could be underrated taking that tack. It's kind of an all or nothing to say that if ChatGPT is an AGI, you’re not interested. This kind of sentiment is misguided.
This conversation's flowing really nicely with the flow of questions that I wrote, which I didn't use ChatGPT for, by the way. But I'm curious if you guys have seen any emerging technical or product patterns to deal with the limitations of LLMs? Vijay touched a little bit on building supporting infrastructure to make sure the right data is available to the LLMs. Are there any other patterns, technical or product, that you've seen?
JP: Well, there's one. Good prompt engineering. Having feedback loops where you can see in production where the problem is occurring and then addressing that, whether that be prompt engineering, fine-tuning, or those types of things. There's that feedback loop in the wild of making sure hallucinations don't happen and the system's not questioning or doing the wrong type of data selection.
There are also people trying very, very hard to come up with test sets and metrics to look at that will be able to tell you how well this will operate in the wild. Although, I'm not sure if we've found something that could work and be as flexible as people need it to be in production, beyond just letting it run in the wild and seeing where it makes mistakes.
MD: When you say looking at the actual behavior and production, you're almost talking about prompt observability or something like that?
MD: That's interesting. Is that a part of what PromptLayer does?
JP: Yeah. That's a big part of what we do. We have the feedback loop from iterating on your prompts to seeing how it responds in production.
For example, if you're a chat bot and you didn't anticipate that customers would use a specific type of language, that triggers this LLM to have a response that you don't want. So, you’ll see that in production. You didn't anticipate the specific edge case. And once that edge case has occurred, you have that data to adjust your prompt in order to deal with it.
MD: That makes a lot of sense. You also mentioned test cases. I'm hearing that a lot.
One thing we haven't talked about is prompt drift where you have a prompt that works at one point, and then it stops working. I asked Zach about this in the last session and he also brought up this idea of test cases and enabling folks to see when something stops working. Those definitely seem like patterns given that Zach and you brought it up. I spoke with another person a couple of weeks ago at another conference about trying to set up some sort of test case or test harness for these sorts of things.
AP: On the subject of what's the emerging technical or product landscape for solving for hallucinations and other things, it'll be interesting to see what kind of guardrail products are built over time. I've talked to some entrepreneurs who are interested in solving that as a horizontal problem. How do we just make sure there's a universal guardrail? I think that's really unlikely. More likely it'll be very catered to specific use cases. For example, I encountered a company called Nomos the other day, which is initially focused on creating compliance guardrails for financial services companies in particular. So, let's say you're a consumer, you're interested in certain banking products and you're talking with a chatbot about what products might be useful. That chatbot needs to say certain things about the cost of the potential investment opportunities and what the returns might be, and it needs to be in compliance with certain company standards, as well as legal standards. That's a very specific use case.
You could imagine financial services companies plugging into some kind of API in order to access Nomos and then maybe having the Nomos brand on their company websites so that people know they're protected by Nomos for this specific thing. But it’s likely not just going to be Nomos. There will be other categories of guardrails that need to be built and that might, again, build great brands around their trustworthiness.
MD: That does fit a little bit with Zach's talk before. A lot of the guardrails that he discussed, some of them are generalizable, but they feel pretty specific to what they're doing.
VU: So, we're a little earlier. We're only in the POC stage. One of the ways we're doing things, to one of the points that was made earlier, is trying to not test in demo-ware cases, but instead creating distributions and actually creating as large data sets as we can.
For example, when we look at a given use case, we'll start with maybe a test set of one or two examples and then create test data. Then we create several variations of prompts and test those different prompts.
Right now we're doing a lot of this in a spreadsheet. But over time as we scale, we'll probably move to more systematically testing out lots of different prompts for that domain. Then if that domain looks promising, we’ll start to scale out the data set accordingly and try to make the data set more and more representative of the real thing. One other thing is in terms of the way we're going about building these is as we get directional signal that a certain domain is going to probably click really well. We're using that signal to prioritize the infrastructure investments accordingly.
In an analytics tool, if you're looking at querying versus data governance, if the data governance use cases actually seem more promising and reliable, we will prioritize building more robust APIs for data governance over querying. We haven't gotten to a lot of those decision points yet, but that's how we're thinking about it and approaching it.
We’re being systematic, if there are any product people watching. It's very easy to toss 50% of your roadmap and just decide LLMs are cool and go do it. But then you end up sacrificing a lot of impact. It's important to test methodically as you go, because the technology is evolving really fast. But at the same time, it's also really easy to burn time on something that doesn't bear fruit when it's actually very easy to test.
MD: Sure. And just to be clear, you're prioritizing the infrastructure investments so that you can provide the proper context for the LLM?
VU: Exactly. But we're manually doing that in the near-term for proof of concepts.
MD: Sure. Nail it and scale it, right?
VU: Yeah. And the other lever that we look at is for the same task, how did cheaper models perform versus more expensive models? Although, it's interesting too, that those gaps are starting to go down. A lot of the more performant, more capable models are getting exponentially cheaper.
By the way, I do see one common pattern, at least in products that have productionized these things. Most of what is out there is hitting OpenAI's APIs. It's the fastest way to get into production and the fastest way to ship something reasonable. There are a couple categories. One is they're just going for it with a full chat domain and chat is the relevant interface and domain. Maybe for support automation sorts of things. They're just doing it very directly.
The other type is they're basically using... I totally forgot what the other type was. I'll think about that a little bit more. I totally forgot.
MD: No worries. It's tough. This stuff's complicated. I think we can move on to the next question. And hopefully, Vijay will think of it before we get too deep into it.
VU: I’ve got several minutes to try to remember.
MD: That's right. Vijay hinted at this just now, that an obvious mistake to make with LLMs and product strategy is to just throw away your roadmap, build the shiny things and not really be thoughtful about it. But I'm curious what other mistakes people are seeing as folks try to incorporate LLMs into their products. Allison, maybe you could give more of the mistakes that you're seeing AI native startups do.
AP: There are a number of companies that might've gotten started in 2022. You could call them pre-ChatGPT wave, but they're still young companies that were in their pursuit of product market fit and then the GPT thing became evident to everyone and they felt the need. Their board telling them they have to look into it and figure out how to use ChatGPT. So, they diverted their focus away from the search for product market fit toward how to just make sure that they have GPT embedded in their product in some way. Maybe just a chat interface. And I think that's a distraction.
I think if there's a real way that you can take out costs for your customers or draw attention in a big way to your app, that’s fine. Because there are a lot of products that were built before the GPT era, but then suddenly got traction because they used GPT. Gamma.app is a great example of this. They were founded I think in 2021, but then got 10,000 new users per day in the several weeks after their launch a couple months ago because they launched (relaunched basically) with generative AI. So, that's an opportunity. Taking out meaningful costs or adding a lot of value for your customers through GPT is a meaningful opportunity. But if it's not on your path, just searching for product market fit is a distraction, in my opinion.
VU: By the way, I remembered the thing I was saying earlier.
MD: Okay. We can work you in artfully. Don't use ChatGPT to help.
VU: No. The other pattern I was mentioning, which isn’t super surprising, is a lot of people are trying opt-in versus opt-out or default interfaces. So, instead of summarizing everything on a page automatically for you, they will make you ask for a summary. That gives them a simple feedback loop of knowing if this delivering value, which is a very good thing. But it's also controlling costs. That's just another pattern I've observed in different tools.
MD: That could be used just to understand the limitations of the LLM. If it's not providing value, people are not going to keep opting in.
MD: We're thinking about something like that, too, for Alice. We're guiding people to the right place in an application. And if they're not repeat users of that, it's a pretty clear indication that we got it wrong.
VU: Going back to your other question about some mistakes that people are making, Matt, there's two categories that maybe have two different types of mistakes. One is the AI platform companies that are potentially building the feature of a larger platform company. You have to ask yourself what is your defensibility in this situation?
The other category is people who are building end user applications that are going to come in and disrupt X type of marketing content tool and are going to take an LLM based approach. The advantage that these companies have over the existing companies is that the existing companies were maybe created in a world where they assumed they had no LLMs for content creation, et cetera. So, they would've staffed up and accordingly built out a very high marginal cost approach. Whereas you have the opportunity to disrupt on cost if you can create something that's sufficiently high quality with a lot of automation. But that also means your go to market can't be sales-led. If your ultimate advantage is going to be much better margins, you would better be going to market in a way that is much lower margin.
Some of these companies that going to do an AI-based approach to X market, you're making a marketing tool and you're competing with Canva. Canva can wake up one day and decide they’re going to go build an LLM based approach. And they have a massive distribution advantage. So how are you going to counteract that? And is your whole company, including the go-to-market motion, actually taking into account that advantage?
That's something that's going to be really interesting to see with a lot of these companies that are cropping up. They're under a lot of pressure with all this venture funding that they're raising to grow very quickly. And it's really easy to just decide that the best way to grow quickly is to hire a sales team. That then fundamentally changes your unit economics, so you don't have that advantage that you would've had otherwise. So I'm a big proponent of in these LLM companies. If their advantage is going to be costs, you’ve got to also sell in a way that's cost-efficient.
MD: We're wrapping up, and maybe we could just share resources that we consume for keeping up with what's going on with LLMs and product strategy. Are there things that people look at that's really useful?
AP: There are so many newsletters nowadays that are talking about LLMs, you could pick your favorite one. What's potentially more useful is trying to use as many products as you can yourself. Try it out. Experiment. Get the firsthand experience. And you'll develop the skills, as well, that we're all going to need in this new economy.
MD: I love it. Taste the soup.
JP: I think Twitter's pretty strong. A lot of noise there.
MD: I think he means X, right?
JP: I completely forgot about that. Twitter, X, Threads, Mastodon or whatever the other one was called. There's a lot of people putting interesting stuff there. I use that as a way to watch things bubble up. Every other day, there's a new library, a new technique. If it has staying power on Twitter, then you can decide that it’s time to block off some time on your calendar and dig deep.
VU: I like paper summaries on YouTube. AI Explained is a good one. I like the ones that are not opinion thought pieces. They're just a summary of a paper that was published. If you have a few minutes, can get you through a lot of content.
MD: We didn't get to talk about this, but I don’t think the fundamentals of product strategy have really changed even though LLMs exist in the world. I've found it really useful to just revisit some stuff. 7 Powers is a really interesting book on product and business strategy written by a former Stanford economist that is now doing VC stuff.
I'm finding that stuff extremely useful as I think about the shifting landscape with LLMs. The fundamentals are the same. There's some stuff that's changing up here, but knowing the principles and fundamentals has been really useful. We are out of time. Thank you, guys, so much. This was a delight.