Big Data is a means to an end, not an end in itself.
“Data tends to be overrated and intelligence tends to be underrated,” said Antoine Blondeau, CEO of Sentient Technologies, speaking at CB Insights’ Innovation Summit. “We don’t need all of the core pattern data, we need interaction with the system to deliver the right experiences to our users.”
In fact, the very idea of big data — of the most data winning — may be obsolete, according to Blondeau. “I completely disagree with the idea that the data hoarders will win. There is a lot of opportunity for smart systems to render the need to start with lots of data obsolete.”
Blondeau gave the example of AI algorithms making adjustments in real time, as opposed to making adjustments from historical data sets. For example, segmenting website presentation based on user actions.
Blondeau was joined on the panel by Janet George, chief data officer from Western Digital, and Peter Coles, head economist at Airbnb, in a discussion moderated by Heather Somerville of Reuters.
George disagreed with Blondeau’s assessment. In her experience data still comes first, “then Artificial Intelligence (AI). Because in the world of AI you need a lot of training data, and from the data you find a useful signal to train and re-train the algorithm.”
Coles agreed. Lots of data up front can be useful for tailoring user experiences. “The more data we have, the better we can make our customer experience on Airbnb.”
And, even if AI can help reduce the amount of data companies ingest, George still expects companies to continue ingesting lots of it. “Storage is so cheap, so you can practically afford to hoard data,” she said. Big data has become the go-to route for optimizing systems and processes, largely because it is so cost effective.
All three panelists agreed, however, that data collection systems and processes must adapt over time. For George, this means “consistently asking about new ways of collecting data, because the existing data … won’t always answer the questions you will have going forward. You can’t be afraid to scrap your old data collection and synthesis strategies, in order to answer the questions that come up as your organization evolves.”
Heather Somerville, Reporter, Reuters, Moderator: Good afternoon everybody. How are you doing? Good. All right, so we are gonna talk about big data and also continuing in a theme that’s run through the day, a bit about AI. So I’ve got three great panelists here to talk about both of those. So you’re each from very different industries. I’d be interested to hear from each of you, maybe starting with Janet, about how does your company use big data as a competitive advantage. How do you use it to satisfy customers, to drive profits, and to make the company run more smoothly?
Janet George, “Fellow” Chief Data Officer, SanDisk/Western Digital: Okay. So Western Digital is a storage leader company. We are a very traditional manufacturing company. Our journey with data started about three years ago, and unlike most companies that produce data after the product… So you create a product and then you produce data, we actually produce data and then we produce product out of that data because we have no choice. We are leaders in storage technology, we create new memory nodes, and because of these new memory nodes that we create, we have to be able to analyze the data and create more new memory nodes. So one data point about I think in 2020, we’re going to have 40 billion zeta bytes of data and we’re gonna need a lot of storage. So that’s how we’re thinking about it. We have to deal with data, we have to confront data.
So my job when I came in…we actually started to think about what would it look like to have a data strategy that would take us into the next generation of data infrastructure, machine learning, all of it. So we started with the strategy, we implemented a completely new infrastructure, we started looking at our data very differently than we’ve ever looked at our data, and then we started machine learning and AI on that data. So now we are about three years into that journey and machine learning and AI is at the very basics of everything we do. I can say that it’s in our DNA.
Heather: Yeah, great. And Peter over at Airbnb, what do you do with big data to keep your guest coming back and beat the hotels and all the other travel businesses?
Peter Coles, Head Economist, Airbnb: Actually, we think of ourselves as growing the hospitality product.
Heather: Okay, excuse me.
Peter: Yeah, so just as show of hands here. How many people have used Airbnb as a guest? Okay. And as a host? All right. Good for you guys. As a host in Santa Barbara? Nobody, okay, all right. So Airbnb is a peer-to-peer marketplace where hosts are listing their spaces and guests who are interesting at travel are staying with them. We think of…we’re certainly a technology company. We are hospitality company, it is a community, and it is also a marketplace where we’re matching host and guests. And so the world that I come from, I come from academia before I came out to Silicon Valley, and the field that I studied and a field that is growing field in academia is called market design, and the question that is asked there is what can you do to design your marketplace so that it functions as effectively as possible to give the best possible experience for your users, which in our case are hosts and guests.
So there are a lot of dimensions where we’re constantly looking to improve. The fundamental is matching. How do you match the right guests with the right hosts? And that involves finding out what it is that the guests are looking for, what are the characteristics of these various units, because you have heterogeneity in both sides of the market and that contrasts what a lot of other spaces like Uber which is seeking to give you a ride from A to B in the absolutely the most standard, commoditized way possible.
So we use data to understand host quality, guest quality, make sure you’re matched with high quality partners on the other side, thinking about pricing, our hosts. We don’t set prices, our hosts set prices, but we want to be able to give them the most informative recommendations possible, and so the more data we have about who’s matching with who and what prices will help you get bookings, the better those recommendations can be, and the list goes on and on and on. Trust and safety, advertising and new sectors that we’re going into now too.
Heather: Okay, great. Thanks. And Antoine, big data runs through and through in what you do, but maybe you can offer a concrete example of how you guys have used big data to get a new customer.
Antoine Blondeau, CEO Sentient Technologies: So we used data obviously to train algorithms, AI algorithms, but I think very interestingly we tend to think of data of being the product sometimes of AI as opposed to the feeder into AI. Let me give you an example, ecommerce, intelligent commerce products, we use interaction with the user to improve the system in the real time, to improve the AI in real time. And so all these data that you have to feed into the engines in the beginning is just one part of the equation. The other part of the equation is as you plug the system online and you begin to learn from your user and you begin to evolve your system with the user to gather data on interactive processes, you progressively need less training data, and your system yet continues to become better.
Heather: Okay, great. One phenomenon that is really interesting to me is how regulators at all levels of government are lusting in a way after a lot of the big data at Silicon Valley technology companies have, and it’s created a lot of tension at times whether this is with Airbnb or Uber or other marketplace type of companies. So I’m curious what the three of you think about how one, whether it’s your own company or another company, uses big data and dealings with regulators, how big data might help drive some policy decisions and also if there’s any responsibility on that half of Silicon Valley companies to hand over data that may have some sort of public good, and Peter I’m gonna kick this one to you first for obvious reasons.
Peter: Okay. I hope other people answer this too because I’d be very interested to learn. The question about the responsibility is kind of an interesting one. I would say I was probably on my third week on the job when I got an email from a fairly prominent policy researcher and the first line was, “I want your data, and I want your money.” And I remember thinking, “Wow,” and then I saw she was kidding, but I really wasn’t sure if she really was kidding and it is interesting how there’s from many people’s perspectives, like here we are a company, we have this information, and it feels like it’s an absolute obligation to turn over everything that we have.
Obviously, at Airbnb just like probably every other private company in the world, the privacy of our users is paramount. I mean, people, they’re signing up with us, they’re giving information about us, and they expect that we will keep that information as confidential as possible and we do. But at the same time, we are a part of the public economy and the Airbnb’s relationship with cities is very important and we do have a responsibility to work with cities and we…back at the end of 2015, we issued something called the community compact which was a framework of how we are interested in working collaboratively with cities and one of the three tenants of that framework was to be transparent with our data to help cities make better decisions regarding home sharing and regulation surrounding home sharing. Because by the way, Airbnb is just as interested as municipal governments in arriving at sensible home sharing regulations so there’s clarity for our guest and our host about what the rules are, and we’re interested in getting to regulation that works for both our guest and our host and for cities so that we’re in a sustainable place. We don’t need to re-litigate the regulations every year.
So what does that mean? In many cases, a starting point might be, “Hey, here’s the information that we need, why don’t you just give us a list of everything? Why don’t you give us a list of all of your users and their addresses and their Social Security numbers and how much they’ve made and etc, etc, etc.” And the more you have conversations, the more you realize, what is the goal? What is the goal of this information and by having open conversations you start to realize, okay, well really you’re interested in how many people are in a commercial zone versus the non-commercial zone and you’re interested in totals and if turns out it’s only 10% better than the non-commercial zone then really the problem is a lot smaller than we think and so providing aggregate data in an anonymous form can actually satisfy the needs for city to make good decisions.
So like in many cases, just having great conversations and a little bit of creativity can help solve these challenges without violating the privacy of our users, and those are many examples where this is happening.
Heather: Can you give one?
Peter: Yeah, I mean, I’ll actually call it, the Canadians have been particularly I would say diplomatic and just great people generally.
Heather: I feel like that’s cheating, using Canada as your example.
Peter: I don’t know, like city of Vancouver for example was very interested in a lot of things like the level of activity, the level of activity in different neighborhoods, they were very interested who was coming, are these Canadians who are traveling? Are they international? Are they coming from the U.S.? Why were they interested in that, I’m not exactly sure, but it was very important. So they came up with a list of 10 things that were pretty important to them. Some of which were pretty…required a high level of detail like how often are people booking? Are they booking 30 days, 60 days, 90 days, 180 days. In the discussions, we described that we would certainly not release anything that would offer individually identifiable information about guests and hosts and they didn’t need it. So we offered this report and it’s publicly available in our website and there were forums where they discussed the information as an input to the regulatory process and it went from being a contentious situation to a situation where it’s very likely there will be some great mutually beneficial agreements.
Heather: And I think one example of how much local governments want this data can be seen in places like Santa Monica where they got a bunch of Airbnb data through a third party lane, Los Angeles area advocacy group, and used that to drive some of their policy making. So it seems that if they’re not gonna get it from Airbnb, they’re gonna try to get from somewhere else because there is such a need.
Peter: Right, and then often you’ll get data that is inaccurate in various ways and they’ll have to do their best with that data. So the more that we can offer the data directly probably the better it is for decision making.
Heather: And did Janet or Antoine want to weigh in…?
Janet: Yeah, I can talk about my experience at Yahoo. We collected a lot data and we anonymized the data from the get go. So not even the employees got to actually see any of the PII information. It was completely anonymized. So I think that was great because we didn’t get to see any user data or anything of that sorts. What we did get to see was when the users opted in. So we would ask for opt in and when they actually opted in, we could actually see that data.
My second example was working with the city of New York and I was so surprised when I did some machine learning use cases with the city of New York, I learnt that the city of New York had so much data, it kind of frightened me. They even had sewer data. So they could tell which households had a drinking problem. They had unemployment records, they had city lights on and off, they could correlate that to safety purchases that you did nearby. So there was incredible amount of connective tissue that they had in terms of the data, but I believe when we did the analysis we completely anonymized that data. So there was no individual identification, but the data was available. I think the data is out there, and it’s our responsibility to make sure that the data is completely anonymized and there is no personally identifiable information.
Heather: Yeah, sure.
Antoine: If I can take this and try to talk about how we can turn the problem around a little bit here. Because talking about privacy, and I think that data tends to be overrated and intelligence is underrated. We, let me give you an example, if you walk into a brick-and-mortar store and you go and talk to shoe sales person. He or she will not need to know what you did yesterday, where you’re coming from, what people like you do, when he or she interacts with you on helping you select the shoe that you’re interested in. You just have to point to something on the shelf and say, “I like this.” And immediately that person is gonna say, “You like this? Let me show you optionality” around what you just talked about. That interaction here is a very, very important clue to your intent in the moment.
There is no reason why AI for example cannot do the same in real time. We’re actually doing some of that with our clients and so in that context, the emphasis is on intelligence and on privacy where the interaction drives the understanding of the system, and you don’t need all this data that is the core data, the patent data, what you did yesterday, what you did last month. You don’t need all of this. What you really need to is to interact with the system and that system will then deliver you what you need.
Heather: I think you’re getting to my next question which is this interplay between AI and big data and how that appears to be changing. I mean, there has been this thought that big data always wins, but maybe we’re heading towards a place where AI comes first and big data comes later, and so how are you at Sentient technologies doing that? And can you talk a little bit about this future that you see where big data isn’t a necessary starting point?
Antoine: For sure, there was a point made this morning that the data hoarders will win. I completely disagree with that. In fact, I think there’s a lot of opportunities for smart systems to actually obviate completely the need to start with a lot of data. Let me give you an example. We have a system that evolves websites in real time. So take a website and you guys probably know about this, the industry of conversion rate optimization is A/B testing. So think of this not being A/B, but A/Z or AI in fact here, but you’re able to test millions of combinations in real time just through interaction with users, and as you do that, you will see hour-by-hour, day-by-day, and week-after-week, the conversion ratios of your website improve dramatically. And not only improve across the board to your entire audience, but also progressively segment the presentation of your website to different types of users over time and again, you didn’t start with any data, you just started with plugging a smart system or a self-running system online and grow from there, and we launched this five or six months ago. We have seen no less than at least a 40% increase in conversion for each of our client.
Heather: Janet, at Sandisk you guys have gone through a bit of evolution from big data to AI as well. Do you want to talk about that?
Janet: Yeah, I think for me, my experience has been data comes first, then AI because in the world of AI you need a lot of training data and you have…I was doing AI before big data and the difference was sample sets do not give you results that you can apply practically, especially to high volume manufacturing. You really need a lot of data, you need volume data and then so on the other side, when you don’t have data you’re dealing with the curse of sampling and when you sample those results are not scalable. On this side when you have a lot of data you’re dealing with dimensionality issues, you need to figure out from that data what’s actually used for signal that you can then use to train and you have to not just train but retrain and then relearn. So it’s a continuous process. You can’t just build an algorithm and say, “I’ve trained the algorithm and the algorithm is going to perform.” Very often it doesn’t. There are many, many outliers. False negatives, false positives. So you have to perfect the algorithm and that takes data to perfect. So I think it’s one helps the other.
Peter: Just to add to that, I was going to say, so the question of not just relying on more data is better data. One of the things that can happen is you become over reliant on that and you spend less time thinking critically about basic decision making, basic market place design, and specifically, what types of information would be most useful. I came from eBay, I spent two years at eBay before coming to Airbnb and at eBay, you have this funny property that we didn’t really have much of a catalog system. If you wanna put something for sale, it comes, it’s up for two weeks and it goes. Believe it or not, it was an incredibly hard problem to understand if two items were the same thing.
At Amazon, it’s really easy. These 30 guys are all selling this item, and in eBay, that was a machine learning challenge, clustering challenge to understand what was the same thing. There were teams of people of who are trying to come up with their clustering algorithms to match things together. None of them were great, none of them even knew each other in the company, and it was terrible user experience because you’re searching for something and you don’t even know where the other items are that match up with that. And so, at some point we had to make the tough business decision believe it or not to ask sellers for the manufacturer’s number, the UPC code when you’re uploading your item. Which is like why don’t you do that to begin with, but we were afraid to do that for a very long time because it added this additional friction. It was so easy to just put something up and sell it before, and if you ask this extra information, we were afraid we’d lose a little business. And you would in the short run, but by collecting that information, that wiped out this massive need for all kinds of data collection and analysis, and in the long run eBay will be much better off.
Heather: Great. So AI is among the buzziest words in Silicon Valley today, that’s for sure, and CB Insights has a report out today that talks about venture investment for the quarter and well, it was a downturn for just about everybody. AI companies managed to raise $705 million, a 16% increase on a quarterly basis. So my question for you Antoine is, you’ve been working on AI for what’d you say, over 18 years, something like that?
Antoine: That’s right.
Heather: You guys have been fairly successful in your fund raising, $130 million approximately. So why the interest from VCs now?
Antoine: So when we started eight years ago, nobody was talking about AI, even at the time nobody was talking about big data. So can you imagine this eight years later? So when we started we had no idea that eight years later we’d be in this environment where AI is a buzz word. Why the interest? It’s a combination of three things. Algorithm has matured. There’s not been enormous amount of innovation algorithms, but they have matured, and slightly more people now than eight years ago know how to use them, number one. Number two, data is available in ever greater numbers, and number three is having more flexible, affordable computers available.
When we started eight years ago, we had to imagine creating an enormous amount or accessing an enormous amount of compute ourselves, and that’s how we actually created our infrastructure which is one of the world’s largest now with about 2 million CPUs, and 5,000 GPU cards, a massive system. But if we were to start now, we would actually be able to access and develop this a lot quicker through Amazon Azure, GCE and so on and so forth. And that’s great, flexibility and affordability. So a combination of compute data and the algos together makes it possible now to have a meaningful impact quickly on businesses and consumers where 15 years ago, it was roughly impossible.
Heather: So big data is by definition big, so as more companies try to become big data companies recognizing the necessity for that, how do they know what data to collect? It is said that some 90% of the data collected by companies goes unanalyzed which strikes me as a waste of time and resources. So particularly from Janet and Peter, I’m interested to hear how do you decide what data to go after? And then Janet, you could talk a little bit about these categories that you have at SanDisk.
Janet: So we basically implemented a data lake and we decided actually…initially, we said we can analyze all the data we have. We started there, but what we found was we knew everything about the data we already were collecting and we had analyzed that. We had queried that data, we had some business intelligence around this. There was nothing really greatly new about the data we already had in existence, and the reason was because this data was already stuffed into some database, or it was stuffed into a data warehouse. It was already existing. And so we started to have to ask questions about what new data should we be collecting? What would that new data give us? How should we look at data differently? What will that data tell us?
And so when we created the data lake, we started bringing all the data and storage is so cheap that you can practically afford to collect hoards of data and in the manufacturing world, think internet of things. So you have these test equipments that are generating highly variable bits of data, near real time and you just dump them, you ingest them through machine learning algorithms into your data lake and then from your data lake as you are doing analysis up the Hadoop stack, you can see the movement of data, you know very quickly that in 30 days what’s hot, cold, or warm and you can move the data out of the data lake into some traditional archive. So you can be very efficient in your systems as you collect the data, but I think it’s important to think about new ways of collecting data because the existing data we have sometimes won’t give us the answers we need going forward.
Peter: That was everything interesting. Actually, at Airbnb we also recently released some visual tools that help us see which fields are accessed the most and interestingly which people are accessing which field which has by-product to learning sort of like what data is the most important, who’s working the hardest. At least on that particular data field.
But I think the evolution at Airbnb as I understand it, I’ve only been there for a year and a quarter, is that in the olden days, people would query the production databases, so the data that was collected was just exactly what the early engineers needed to run their business. So that was step one. And then step two was, you poured everything out into dedicated databases for the purposes of analytics, and you duplicate things. And then there was after some ad hoc work in deciding what are some things that we need to better understand our business, Airbnb had this big off site with the analysts and engineers who were there at the time and came up with something that was called core data. And so there, it was a common set metrics that the entire company agrees upon, like what they mean, what the definition is, queries, documentation, and so now that’s grown and we have a team of data architects who are responsible for just maintaining that core data, and since I joined we added core policy data. So geographic data that regulators might be interested in.
And so but in the meantime, the other way…another big way that you find out what data you might need is product managers. When they’re offering a new product you think very carefully about how are we going to, one, understand how people are using this data? How people are clicking through the website. That should be very high dimensional, very complex early on, and so you will specify and we recently overhauled our logging system. Here’s what we want to follow specifically about how users are interacting in this product, and then that data will be documented, available to others and will be part of our architecture.
Janet: I have one more example.
Janet: So one example I was talking about was HR came to me and said, “Can you do a very simple prediction problem?” And I’m like, “What is the simple prediction problem?” “Can you predict employee performance?”
We’ve all been through this, we can predict employee performance. So the first question is what data do you have and the data I was given was very traditional data that we collect in our success factors or traditional business processes. Managers calibrate employees on performance and then that’s the data I’m given to predict how employees are performing. So machine learning does its job, it predicts exactly what the manager predicted and so I take the data back to the managers and they say, “What did machine learning do that I didn’t already know?” I predicted what my employee performance looks like and machine learning did exactly the same thing. And so that was the start of a whole different conversation. Great, machine learning did what you did.
Peter: Fire all the managers. No, no, no, I’m kidding, I was kidding.
Janet: A better conversation is around what data should we be collecting. We should not allow the managers to calibrate, let machine learning do the calibration, but what data do the managers use for calibration and where are the biases in the calibration. That’s the conversation we need to have. And so we started talking about, how do we capture…we got rid of the rating system completely, and we said, “What do we need to talk about employee performance? Goals, impacts, how does the employee do with teams. Different characteristics, different features that we would want to look for calibration is then, is the conversation where it led to
So we completely got rid of the existing mechanism of how we were calibrating employees through managers taking that data and using that for prediction and bringing up completely different data, collecting that data and then doing calibration.
Heather: Thank you. So I’m just gonna bring this full circle to this morning the discussion about Alexa, and the creepy factor, and all the data being collected about us as consumers. Increasingly companies are collecting more and more data about their consumers whether it’s hosts and guests, whether it’s the shoppers that ecommerce companies that you work with are collecting data on… So I’d like to hear from you what additional safeguards needs to be in place to make sure that that data is being properly stored and shared as it ought to be. In addition, what sort of backlash do you see from consumers against companies and data collection? Is that something that will just go away because we’ll all just resign to the fact that everybody’s gonna know about us? Or do you see that tension lasting? Yeah, please.
Antoine: So perhaps I can gonna start with, the eventual line of defense is to have the same amount of intelligence at consumer than as it is available to the large consumer focused companies, e-tailers, travel sites, and what have you. So interestingly in fact, the chatbots and the Alexas and Echos of the world are the beginning of this. Today they are owned by large corporations mostly, but eventually if they become more pervasive and become your advocate, your consumer advocate, then you can be shielded. You can actually interact with the web and interact with services on your terms, not on someone else’s terms.
So if we can democratize access to AI for example and give it to consumers and make it not the domain of the large firms, then we can have our avatar, our proxy navigating the web and interacting with the services on our terms, and I think that’s one thing.
Heather: Can you unpack just a little more Antoine? Because that’s really interesting, but how would that work for an average consumer who’s shopping online. What will the look like?
Antoine: You’ll have your own robot that does your business and you will tell it what it needs to know about and you’re in complete control, and I think that is a distinct possibility as long as these types of technologies are made available to consumers, and I think it will…
Heather: And on a big enough scale, right? Like through Google.
Antoine: But if you leave it to Google, Amazon, and Facebook of course it won’t be, but if startups actually develop these types of technologies that then interface with the big guys, then it is a distinct possibility. That’s on the intelligent side, and I think it’s realistic but it’s work to do.
Heather: Great. And then Peter, I know that you guys deal with privacy a lot, it’s one of your…
Peter: I can’t speak to the technical storage of the data and security issue, that’s not my field, but since a lot of what I do is research. That’s another area where information we have about consumers, the privacy is absolutely paramount, and for us it’s like a cardinal rule that anything that we release in research results will just involve aggregate information and anonymized data. And a lot of this research is actually incredibly useful to consumers.
For example, one of the things that we did is we looked at the demographics of our guests and our hosts, and unlike most tech companies out there, our best hosts on Airbnb are seniors, 60 over, 65 and over. In particular, senior women, and that’s just like an incredibly cool result to see tech company were the seniors are at ahead of the game versus millenials, and there’s a lot to learn. The millenials can learn from seniors. And so we do this research, we shared it with AARP, AARP is excited about it, and now we’re figuring out how to make that experience even better. But there’s a lot more research that we could put out if we thought about things at the individual level, but doing things at the aggregate is just mission critical for us.
Heather: Yeah, it’s also a matter of consumers trusting the company themselves. If you wanna take Uber as an example, they fight tooth and nail against releasing data that they think will have personal information and yet there’s been ample reporting on their God View, which is a massive invasion of privacy. So I think there is also that tension within companies as well. I don’t know if you guys have had to work to deal with any of that sort of tension internally at Airbnb?
Peter: Yes, I mean it’s interesting like…I’m gonna have to think more carefully about this, but I spend all of my time working with data internally and I just think it’s very unlikely that there is anything that we’re collecting about consumers that they would be surprised that we know. It’s like where people are booking, where they’re staying, their patterns, do they like beaches and so forth. We have that information. It’s very useful to help them, the next time they’re searching on Airbnb, find a great a place. So I haven’t seen as much of that conflict as…I mean, we’ve been missing it.
Heather: Last thoughts, Janet.
Janet: So you know my take it’s amazing to see how little security we had before big data, because we would throw our data into a database, and databases are subject to security. So if you break through the firewall you’re instantly in the database and you can filter all your data out, but it was not scary to us. We had all our data in different databases, it was siloed off and the data was sitting there. What’s scary now is that with big data, you can aggregate all the silos and then you can, when you look at all this aggregated data, it’s scare the living bejesus out of you. You’re like, “Wow. They had that information about me from the time I was born and where I lived and who I work with?” And everything, it’s really scary, but technology has improved so what you can do now is you can go beyond perimeter security. You can do stack level security, you can do application layered security.
So when you do these kinds of layered security beyond perimeter security, it’s almost impossible to get your data once it’s in a platform and if the stack is properly secured and if it’s on a private cloud. Public cloud versus private cloud, there are many pros and cons to both, but it you make the right architecture choices, you can secure the data technologically far better than you ever did before, but a lot more of your data is out there and a lot more people aggregating that data so that’s where the risk is. And if your data is aggregated and you’re surprised with it, and it’s intrusive, it’s really bad. But if it’s actually aggregated and it’s turned out to be useful for you, the consumers don’t mind it as much.
Heather: Great. Thank you. I’ve gone a couple minutes over but I’d like to see if there are questions.
Cameron McCurdy, Moderator, CB Insights: Yeah, we’re running a little bit over so we’re just gonna ask one quick question.
Cameron: What’s the mistake that your companies encounter when it comes to relying or analyzing data? Has it ever led you astray or been misinterpreted?
Heather: Have you ever been led astray?
Peter: I’m trying to think about this. I mean, I don’t know if this is… This might be a mistake, but so the pricing very early on, what we did was we offered a pricing tool where when you listed your home, you got a suggestion right at the beginning. That was better than nothing. Version before that was laissez faire, and like let people fend for themselves and figure out what their place is worth, but there was no updating after that. And that led to a lot of funny situations where the Superbowl would be coming to town to Houston next week and if you weren’t a sports fan, you might not know about that, and your room might booked out for $100 instead of three or four times that much, and so the hosts sometimes were unhappy with those recommendation and often they’re like, wasn’t a very good match in the market of supply and demand. So it took a little while to realize how big of a problem that was, and then we finally overhauled it to make these recommendation dynamics so you can set and forget and let them adjust according to supply and demand signals.
Antoine: Dynamic is the keyword here, dynamic, because that’s how effectively you can use what you have, a system to learn progressively how to adapt, and based on feedback improve over time and the key is to make the systems dynamic.
Peter: Oh yeah, and actually I do also want to say one more thing is that people make individual mistakes all the time, all the time, and in fact we internally at our team, in the data science team, we have something called “air fails,” and this is to celebrate the fact that sometimes you need to perform some analysis and make a mistake, and every quarter the people who did the craziest things, they win a prize. So for example if there’s one person who didn’t realize that they querying the production database and they couldn’t understand why their results were so quick, and you wonder if all of the…any performance slowness might be correlated to how hard that person was working that Saturday, and so after a while some data engineer realized that that was hole and had to fix it, and said we’re back in the clear, but mistakes are very common and in fact are encouraged.
Heather: On that note, thank you everybody, thank you Janet, thank you Peter, thank you Antoine. Brilliant discussion.
Want more data on AI technologies? Log in to CB Insights or sign up for free below.