The “modern data stack” has been a gigantic leap forward. It feels cutting-edge. But might there be an even newer frontier?
In this podcast episode, I take another trip down the Web3 rabbit hole with Danny Zuckerman, Co-Founder at 3Box Labs. We discussed:
What does the Web3 version of the "modern data stack" look like?
What is the value of decentralized data?
To whom is it valuable?
This conversation was mind-blowing to me. Feel free to reach out to me with any thoughts. You can also read the lightly edited transcript of the conversation below. Let's dive in!
If you’d like to hear more about topics related to scaling SaaS and Web3 businesses, you can subscribe to my newsletter (for free) here:
Leadership Roles
As always, I’ll share a few leadership roles at companies I’m excited about.
Various roles @ 3Box Labs (the company that Danny leads!)
Head of Marketing @ Swantide
Workstream:
Head of Content Marketing (Utah or SF)
VendorPM:
Director of Engineering @ LaunchNotes
Product Marketing / Head of marketing (US/Europe) @ Palette
Marketing leader @ Guide
Transcript
Allison: Danny, I'm so excited to have you here on the podcast today. You are squarely at the intersection of the data world and the Web3 world, both of which I’m very interested in. And I'm excited to help folks like me, who are coming from the Web2 world, understand why Web3 might be relevant to them and to the data world. So thanks for joining us today,
Danny: Of course, excited to be here, Allison.
A: To start out, can you tell us about exactly what your company does and how you got started?
D: I'm one of the founders of 3Box Labs. We got started about four years ago and have actually been working together even before that for a couple of years, trying to make decentralized and especially composable data possible for application developers to build products and services with the web.
What that really means is there has been a vision for quite a long time, going back to the dawn of the web, where the data that's generated, that powers most of the web, instead of being isolated and fragmented on siloed servers behind each individual application is instead shared across apps, products, and services, and is hyperlinked together so that you can build on it in really interesting ways. And in ways that give you a much more seamless and powerful experience across the web.
To do that, you need a few new primitives that didn't exist before Web3, a decentralized way to manage this data, and a way to store the data in a decentralized way. We started to build that the last couple of years. Our focus has been entirely on spec’ing, building and shipping Ceramic, which is a decentralized network for composable data that is open source and that eventually the community will fully govern and control.
A: For the lay folks out there, can you explain a little bit about what decentralized data means? And then also I'd love to learn a little bit more about your perspective about what open source means in your context as well, and whether it's different from open source in a Web2 world.
D: Decentralized means so many different things. So it's good to ask that question, and frankly, we should get away from using that as the leading term in our space. For us, what it really means is the opposite of siloed. Siloed is the result of centralized data.
When we talk about decentralized data, it’s less about, “let's get every app's data to be stored on each of our phones and laptops and away from AWS.” That is one form of decentralized data. But it's actually the control plane for data that we really care about decentralizing. So again, all these apps, products, and services that we use online are powered by code and data, to dramatically oversimplify. And the data is what fills them with information, which makes them very rich and useful, and also makes the apps that aggregate lots of users and data very powerful, because they have the data lock-in and network effects.
The idea of decentralized data is, let's remove this really powerful data from the silos of any one application or platform, so that the data that I have access to on a random new mom-and-pop e-commerce store is exactly the same as what I might see on the Shopify store on Amazon or anywhere else. When we say decentralize, what we really mean is changing what is often called the owner, but we really think of as the controller of data from single monolithic platforms, to anyone who participates in building or using the web and really pushing control to the edges, so that data can be brought from app to app and follow us around our experiences.
A: Anything you would add about what open source means in your context?
D: Open source is most commonly talked about for software. Most of the web today runs at least to some degree on open source software. What it enabled was apps everywhere to use the same code, not recreate redundant code that did the same thing. So it saved massive amounts of time and created a composable software stack that the web now runs on.
As open source was for code, blockchains are for finance. It's like open finance. It's a way to store this financial assets in an open, shared, composable way, where you now have what people call money legos.
We talk about open data as the final piece of this Web3 stack: instead of every app having a bunch of backend engineers design their data model, run a bunch of servers, completely the same as every other app, we can build the same data sources, data sets, and data management tools. This is much more powerful, because you get to share the same data across applications.
So this is really an extension of what's started with open source software.
A: I know there's a lot of debate broadly in the industry about what are the benefits of decentralization versus what are the benefits of centralization? Given security concerns and others, sometimes centralization makes a comeback. What do you think is the value proposition for decentralization, specifically for data?
D: I think there are many. The biggest one in my mind, though, is this composability — the ability to natively share data across the boundary of any given application or platform. And that makes every piece of data more valuable. Because now it can be used in more context, and instead of having to recreate it everywhere or have a limited experience with each app, you now have your whole identity, your whole set of data with you to interact with.
We have that in a super limited way on the web today through APIs. That's how centralized services have tried to make their data shareable beyond boundaries, but it requires an explicit decision by the company to give access to it. Often those are metered, and also it requires one-off integrations every single time you want to share that data. So it's a super, super limited form of composability. And it's also always at the risk of that platform turning off that API, which Twitter did with all their developers and Facebook has done with all their developers. Anyone time someone builds up enough power, they just do it.
This is a way to have native composability — a single API for any data that you want to get to.
That's not the entire benefit of decentralization, but it's the one that we think is going to drive the most adoption. We think especially startups will capitalize on this benefit, because now instead of having to build their user bases, data set, and social graph, from zero all the way up by themselves, now they can build on an open social graph, open data set, all the user's existing data. And so they get to very quickly leap ahead to where they would've been otherwise.
A: I think the Facebook example that you shared is a really compelling one because we've seen how much, as you said, Facebook has really amassed a lot of power because they can set limits around who can access their data. I'm wondering in a world where data is decentralized, do you think there will be less of a tendency toward monopoly that there will be a greater distribution of power among players? Will there be more competition? How do you see it?
D: Yeah, 100%. I mean, there will always be vectors towards centralization. So this will not eliminate some of that forever, but yes, Facebook people talk a lot about censorship when it comes to Facebook or Twitter, some of these other big social platforms, because they have so much power and we live there. They do have this censorship power that is so important, but that is tiny compared to the much bigger issue, which is the lack of innovation that happens because nobody can compete with Facebook. That’s because Facebook has such an entrenched advantage with the social graph and the data that they have. And so when users can opt to take their data with them to use multiple products at the same time, that lock-in isn't there. And so the bar for competition is much, much lower and the switching costs are much, much lower.
In traditional web, we often hear people talk about the need for a 10x better product. It has to be 10x better before people are willing to go through the pain of switching away from one to the other, because it is a real pain to switch. But when you can take all your data with you, it's not zero pain to switch, but it's not very much. I regularly switch between, for example, Zapperfi and Zerion, which are DeFi apps that access the same underlying data because they do slightly different things. And when you can switch that fast, you don't need a 10x better product. You need just a slightly better product and then you can start using it. And so one product might not be 10x better, but very, very quickly with that compounding innovation, you're 100x better than you were before. And so that permissionless innovation is what we think is really the most exciting and valuable part of composable data.
A: Do you think that Web3 data companies will replace Web2 data companies in the future? It's a provocative question in part because there's so much attention being paid to the Web2 data space right now. They command enormous valuation multiples. There's so much innovation happening, probably thousands of companies that are attempting to produce a product in this space. If actually those are at some point going to be disrupted by Web3 data companies, that's very interesting. What do you think that disruption path might look like, if it exists at all?
D: Think a lot of them are here to stay for quite a while. This is an imperfect analogy, but when we went from Web1 to Web2, on prem databases didn't completely disappear. There's still a role for them in certain circumstances. Similarly as we go from Web2 to Web3, centralized databases and centralized models won't completely disappear. There's still a role for them in the near-term for anything that's super, super low latency requirements, because decentralization does come at some cost, including latency. But web apps were higher latency than native apps, but they had all these other benefits. So people adopted them. But for certain very, very sensitive data, you should not put it on an open network, even encrypted.
There will always be a need for some form of even distributed, but centralized database storage, running very rapid analytics or AI models. Maybe you want to have a centralized data warehouse for those things. So there's things in that data stack that I think even as the web shifts from Web2 to Web3, those companies can adapt and play a very big role.
But I do think that in the order of three, five years, I hope, no new app is going to be built with a Web2 siloed database model from the beginning. You're not having to run a backend yourself. You're building on this composable ecosystem of data, and data models will just be so much bigger that no one would think to try to compete with all of that on their own. And so it won't happen all at once, but I do think it will start to happen more and more rapidly pretty soon.
A: That's so interesting. I'm wondering how the split of Web2 and Web3 will work. For example, do you imagine a data company having a centralized solution as one of the products that they sell, or could you imagine customers buying Web2 solutions for some use cases and also the same company having a Web3 solution for another use case?
D: That's an interesting question. I definitely think there will be a mixture in this stack, that companies will use a combination of centralized databases and Web3 distributed databases, just like companies today build their applications on a whole bunch of different databases. You might use Postgres for one thing and MongoDB for another, depending on the use case. So I think similarly, especially in the next few years, you'll see applications built with some stuff stored a centralized server, and analytics services and like all these extremely sophisticated products that are built up on the stack. But then for very core things where they need to tap into composability, they'll use Web3 data and then more and more, the Web3 stack will get more mature, more and more products will either be built or migrated over to serve them. And more things will migrate over. But I think there will be a mixture for quite a while, for sure.
A: For the companies that do decide to procure Web3 data solutions, what are the benefits to them in a concrete way and what might be the benefits to the end users of those web3 customers?
D: I'll start with the benefits to end users. People often talk about they'll have control of their data. They'll be able to have the same experience from app to app. They'll have lower switching costs. All of these are true and I think dramatically undervalued but it is very hard to get end users to take action on behalf of any of these fairly abstract benefits, including privacy, which is one reason that we've always focused on serving developers, not end users, as the route to get all of this adopted. But then end users will start to go to applications and have all of their data follow them there and have a way more customized experience. They won't feel like each time they go to a new app, they bend themselves to that app and its forms, its onboarding, its social graph. Instead, the internet bends to them and becomes their metaverse, to use the term that is sticking now and similar to how, very quickly, once we all had iPhones and everything was cloud, it became extremely painful to not have access to your Google docs or not have access to your messages or whatever, even when you were traveling and away from home or like once we felt it, we could never go back. It's going to be like that. But it's just in a different set of dimensions for developers. It's speed and cost of power are the things that you get when you build on the Web3 data stack.
Building on a shared data infrastructure, you don't have to stand up your own backend. You don't have to run the backend and maintain it. That's all happening on shared infrastructure. So as a front end developer, or pretty soon, even as a visual developer (whoever makes the call there) that call their kind of no-code coders, you'll be able to build a full application very, very quickly, because you don't need to build a back-end. You just pick and choose the data models for the features you want. You want your social graph, your blog buzz, you just use existing ones and that's your data portal. And then you throw it into the front end and now you have a full stack app. So you can build something much faster, get adopted much faster because users bring their own data to you. And so now you have an entire social graph right off the bat.
As an end user, I could choose to use Notion while some of my team was still using Evernote. We all go to what we want and everybody gets to still collaborate on the same data.
Because of composability, the experience that you as a startup are delivering to your users doesn't necessarily end when they leave your application, because they might write a blog post on your app because you have the best writing interface. Then they go to another app that's using the same underlying set of content, the same blog posts, but a whole bunch of comments are being made there — because it's designed more for social. Well when people come back to your app, those comments come with them too. And so the total experience that you're delivering is much wider than it used to be.
A: To make things more concrete for the audience: who are your target customers?
D: Right now, we are purely working in the core Web3 community. Everybody building on top of Ceramic today are part of either a blockchain ecosystem, building blockchain applications or distributed data applications, basically people who are already believers in some of the value propositions of this more distributed way of building applications. The reason for that is there are tradeoffs in this tech, including speed and reliability in certain ways. It’s folks that have been building in the Ethereum ecosystem, because that's where we have spent the last five years. They’re building a lot of collaboration tools — everything from note-taking, to future of work stuff, to communication.
DAOs, which don't necessarily have a traditional org structure, and many other projects that want to collaborate more fluidly, benefit from this composable data, because people can collaborate much more naturally. There are ton of reputation-related projects where reputational data is aggregating around the user. Let’s say you've done projects, you've completed an import of your financial history to get a decentralized credit score. All these things are being built on Ceramic today.
A: Danny, I'd love to get an understanding of your view of what the modern data stack would look like in the Web3 context. And especially if you've thought about what the analogues of Web2 companies would be in the Web3 context. That would be an interesting translation.
D: I do think that the stack is still in some ways sorting itself out, and it won't be similar to the Web2 data stack. Especially, it won't be a super clean stack. There's going to be so many different protocols that play in very thin layers and plug in in different ways. We don't know exactly what that looks like yet, but at the high level, the analogies for the raw storage layer — data centers in Virginia with AWS — are things like Filecoin and Arweave. And those provide at this point very well proven distributed file storage.
Then on top of that, you have the database layer. I called Ceramic a distributed database earlier, but that's only half true. Ceramic is like the “write” side of a database — writing transactions to a dynamic data protocol that then can be stored on Filecoin or Arweave. The flip side to the “write” side is “read” — indexing data and making it queryable, which is really powerful in Web3 because this data can be indexing all these different data protocols, blockchains, and Arweave and Filecoin and Ceramic, and making it all queryable in a single API, and making that all accessible to an application in one very easy interface. So that's kind of the “read” side.
And then on top of that, to dramatically oversimplify, are lots of different SDKs and dev tools that wrap these up and add different features and different components to them to make it really easy for developers to build on top. We are huge believers in the idea that devs really want to choose the right stack and right permutations for themselves.
There are a bunch of messaging protocols to share messaging between different applications in a centralized way — including XMTP and EPNS and others, lots of analytics protocols that can use this data.
There are identity companies like Spruce and Privy that are adding more sophisticated privacy and permissioning models to all of this.
The last one is the wallet layer — authentication — to actually sign these messages and manage this data. Magic, Web3Auth, MetaMask, and Phantom and others are the most user-facing of the stack.
A: In closing, Danny, I'd love to know, are you recruiting for any interesting roles right now that perhaps members of the audience would like to learn about?
D: We're hiring a product lead, someone that's taken some data products to market before and tons of interesting things around building this data ecosystem, as well as the data products that this person would oversee. Protocol engineers, whether that's from a cryptography background or distributed systems background, but helping build out the core protocol. A community lead to help architect and shape the developer community in Ceramic, which has been growing more rapidly than we can keep up with. A bunch of others too. They're all on careers.3boxlabs.com.
A: Great, Danny, thank you so much for joining us today. This has been a really educational conversation. I learned a lot about how the data space, which I'm really interested in, might evolve in the future. And I wish you the best of luck in helping to champion that innovation.