Predict your next investment


See what CB Insights has to offer


Acquired | Acquired

About I.AM

I.AM is a hospitality group.

I.AM Headquarter Location

Las Vegas, Nevada,

United States

Latest I.AM News

How I built this: Machine learning with Amazon Personalize and a Customer Data Platform

Jan 17, 2021

Simplify tracking code, improve performance, and eliminate vendor overhead with a single API for customer data. By making off-the-rack machine learning models accessible for anyone to use, cloud ML services like  Amazon Personalize  help make ML-driven customer experiences available to teams at any scale. You no longer need in-house data science and machine learning experts to get the benefit of propensity scoring or product recommendations. Key Challenges with Machine Learning However, while models can be outsourced, your data can't. The effectiveness of machine learning insights will always be limited by the quality and completeness of the data they are based on. Cloud ML platforms (by themselves) leave three key challenges unsolved: Collecting and supplying quality user data to train and update your model Making the insights gained from the model available where they are needed Knowing how well your ML-driven experiences are working These are infrastructure challenges, and one way they can be overcome is with a Customer Data Platform (CDP). The goal of a CDP is to get customer data from wherever it is, organize it into a single view of the customer, and make that view available to all services that need it. Instead of thinking about machine learning as just another data silo, a CDP can help you build machine learning insights into your core data infrastructure by connecting ML-driven learnings to additional external services for activation. Let's dig into how a CDP can help you solve each of the three infrastructure challenges. 1. Collecting and Supplying Quality User Data To train an ML model, you need accurate data about user behavior, and lots of it. Data quality can be broken down into three components: Identity Resolution - To be able to generate recommendations based on all actions of a user you need to be able to resolve user identity. Many off-the rack ML solutions skip this requirement, tracking activity occurring on a particular device and calculating insights for that device only. This method is convenient, but it doesn't reflect a customer's true history of interaction with your brand across your digital properties and therefore can lead to incomplete insights. Identity resolution is a core capability of a CDP. Consistency across platforms - Once you solve the identity resolution challenge, you still need to map data from all those different sources to a single schema that you can use to train your model. This means bringing together multiple teams of developers across multiple languages and platforms to collect data under a single schema. Updating in real time - Finally you need to upload all that data to your ML platform and keep on updating it in close to real time, or your recommendations will quickly become outdated. 2. Making ML Insights Available and Actionable Just as the data that powers an ML model can come from any platform, the insights that machine learning models generate are most valuable when they can be used to power personalized experiences for your website, apps, brick-and-mortar stores, call centers, etc. Without modern customer data infrastructure, making ML actionable is a huge challenge. For example: say you've used ML to generate churn risk scores for your customers: Without the ability to automatically connect those insights to additional systems, can your call center automation system treat high risk customers differently? Do your customer support representatives know when they're speaking with a high churn risk customer? Can your website surface retention offers? Can you segment on churn risk in your ESP? Without the data connections provided by a CDP, making your ML scores available where they’re needed would require dedicated development work and additional cost. 3. The Project: Personalized Product Recommendations Items4U ("The finest items, which you will particularly enjoy"), operates a retail business across it's website, native iOS and Android apps, and network of brick-and-mortar stores throughout the country. Our challenge is that the sheer number of items we offer can make the shopping experience on our apps feel a little scattershot. I want to use ML to figure out which products I should focus on surfacing for each user. By the end of this project, I'll have set up a mechanism to deliver personalized product recommendations to each user, which will automatically continue to grow and improve over time. I'll be using Amazon Personalize and mParticle as my Customer Data Platform. At the end of the project I'll be using Amplitude to measure success. At a high level, the data flow looks like this: mParticle collects commerce data from my website, apps and stores. Each action is attributed to a master mParticle User ID and forwarded on to Amazon, using Amazon’s Kinesis streaming service. An AWS Lambda function converts the data into a format that can be used to train ML models and uploads it to Amazon Personalize. The same function requests custom product recommendations from Amazon Personalize, and uploads the recs back to a master customer profile in mParticle. The mParticle customer profile powers personalization on the Items4U website and apps, as well as making the same information available in my messaging and analytics platforms. There’s a fair amount of work required to set up the AWS assets we need, but the good news is that most of it can be automated for subsequent iterations. For this reason, I’m using the AWS CLI and other scripting-friendly tools wherever possible. In this post, we’ll walk through how to: Collect commerce event data through mParticle Create a Kinesis Stream and start streaming event data from mParticle Create a Personalize dataset group Create an AWS Lambda function to load data into my Personalize dataset group until I have enough data to train an ML model Create a Personalize campaign Update my Lambda function to request recommendations for each customer and store the recommendations on mParticle’s customer profile Collect Data with a CDP To train an ML model to give product recommendations, I need data about how my customers interact with products. Fortunately, I don't have to start from scratch just for ML. Capturing commerce data is a core function of mParticle, and by the time a retail brand like Items4U is ready to explore ML, the required data is already being captured and used for more basic use cases, like app analytics, segmentation and re-targeting. When ready to begin integrating ML with a CDP, I've already: Set up inputs to collect data from the following channels: iOS, Android, Web, Custom Feed (Point of Sale), Custom Feed (Amazon Personalize) Added mParticle's client-side SDKs to my  iOS ,  Android  and  Web  apps, and configured my point-of-sale platform to forward purchase events to mParticle using the  NodeJS  server-side SDK. Capture Product Interactions mParticle uses a single standard schema for capturing commerce events, and this schema is enforced by the SDKs. This means I don't have to rely on individual developers on each platform picking the right event names. To my ML model, a purchase made through the iOS app will look the same as a purchase made on the website, or in-store. For example, here's how I would log a purchase on my web app. // 1. Create the productvar product = mParticle.eCommerce.createProduct( 'Skateboard', // Name 'prsx-10', // SKU 100.00, // Price 1 // Quantity);// 2. Summarize the transactionvar transactionAttributes = { Id: 'some-transaction-id', Revenue: 100, Tax: 9};// 3. Log the purchase event;mParticle.eCommerce.logProductAction( mParticle.ProductActionType.Purchase, [product], null, //optional custom attributes would go here null, //optional custom flags would go here transactionAttributes); What mParticle forwards to downstream services, like my ML model (stripped down to just the fields we care about), will look like this: { "mpid" 761556044463767215, // master user identity "environment": "production", "user_identities": { "email": "" }, "user_attributes": { "$firstname": "Milo", "$lastname": "Minderbinder" }, "events": [{ "data": { "product_action": { "action": "view_detail", // Others actions are "add_to_cart", "remove_from_cart", and "purchase" "products": [{ "id": "prsx-10", // Product SKU "price": 100 }] }, "timestamp": 1604695231872 }, "event_type": "commerce_event" }]} Identity Resolution Ideally, my product interaction data is linked to a customer ID that works on my website, on my mobile apps and in-store. Here, that's the mParticle ID (MPID). mParticle's identity resolution allows me to gradually build up identities for each channel and resolve those identities to a single MPID. For example: when a customer visits the website for the first time, I can link a cookie ID to the MPID. If the customer creates an account, I can add an email address, and perhaps a phone number. If they make an online purchase, I can add a secure hash of their credit card number. This means that if the same person then makes a purchase in a physical store with the same credit card, I can attribute that purchase to the same customer profile. This process lets me train my ML models based on a complete set of customer interactions. Create the AWS Assets A  Kinesis  stream receives events from mParticle A  Personalize  campaign creates product recommendations A  Lambda  function acts as a broker. It transforms data from mParticle into a format accepted by Personalize, and uploads product recommendations back to mParticle. IAM  controls access and permissions for the other components. These services can be configured in the AWS UI, but I'll be using Amazon's CLI tool. This way, I can reuse my work by creating a script to quickly spin up future iterations. I've followed Amazon's documentation to  create an IAM user  with access to the above four systems and log in to the console. As I go, I’ll need to save the Amazon Resource Number (ARN) for each asset I create. I’ll need these ARNs to set up interactions between the different resources I create. Create a Kinesis Stream Kinesis is a tool for processing streaming data. mParticle will forward commerce event data to Kinesis, where they will be picked up by the Lambda function I'll set up later. 1. Create the stream Save the 2. Create a role for mParticle to assume For mParticle to be able to upload to the Kinesis stream, I need to create an IAM role for mParticle to assume. This role needs a policy allowing PutRecord access to Kinesis ( sample ), and a trust policy ( sample ) allowing mParticle to assume the role. aws iam create-role --role-name mparticle-kinesis-role --assume-role-policy-document file:///path/to/mp-trust-policy.jsonaws iam put-role-policy --role-name mparticle-kinesis-role --policy-name mp-kinesis-put --policy-document file:///path/to/mp-kinesis-role.json 3. Connect mParticle to Kinesis. mParticle offers an "event" output for streaming event data to Kinesis. This can be set up and controlled from the mParticle dashboard. You can read an overview of event outputs in the  mParticle docs . Create Configuration First, I need to create an overall configuration for Kinesis. This holds all the settings that will remain the same for every input I connect. Each mParticle integration requires different settings. For example, API keys are commonly required. For Kinesis, I've already granted mParticle write access using IAM, so I only need to provide my AWS account number here. Connect All Sources Now I need to connect each of my four inputs: iOS, Android, Web and POS, to Kinesis. Set Filters mParticle lets me switch each individual event name on or off for a particular output, like Kinesis. These help me ensure that I'm only sending to Kinesis the data that I need to train my ML model. I'm interested in 4 types of commerce events: Add to cart A Dataset Group can include up to three datasets: Items: Contains detail about products, including price, category, color, etc. Users: Contains detail about customers, like age, location, gender, etc. Interactions: Details interactions between users and items. For example, a user viewing a product, purchasing it, or adding it to a cart or wishlist. Only the Interactions dataset is required, so to keep things simple it's the only one I'll use. I can come back later and improve future iterations of my model by adding other datasets. Before I can create the dataset, I need a schema. For this example, I use the following elements: User ID Train the Model In order to train a Machine Learning solution, I need at least 1000 records in my dataset. One way to do this is to upload CSVs of historical events. mParticle integrates with several Data Warehouses, including Amazon Redshift. If I have access, I can easily create a training set from my past data. The CSV would look something like this: USER_ID,EVENT_TYPE,ITEM_ID,SESSION_ID,TIMESTAMP761556044463767215,view_detail,prsx-23,Q8bQC4gnO8J7ewB,1595492950-6907502341961927698,purchase,prsx-14,VA9AUJBhoJXAKr7,1595492945 However, training the model on historical data is not strictly required, and since data warehouse access is often tightly controlled, this step can be a huge bottleneck in attempts to implement ML. An alternative way to train the model is simply to start forwarding real-time event data as it comes in. To do this I need to set up my Lambda function. Eventually, the function will perform three tasks every time a new event is received at my Kinesis stream: Transform the mParticle data into my Interactions schema, and upload it to my Personalize dataset. Call my Personalize Campaign and ask for updated product recommendations for the user. Use the mParticle API to store the updated recommendations on the mParticle user profile. However, since I can't create a Personalize Campaign until I can train a Solution, this first version of the Lambda performs only the first task, while I collect the minimum 1000 events. Lambdas can use several different languages and runtimes. I'll use Node for mine. The first version looks like this: // Import Dependenciesconst AWS = require('aws-sdk');const JSONBig = require('json-bigint')({storeAsString: true}); // needed to parse 64-bit integer MPID// Define the product actions we want to report to Personalizeconst report_actions = ["purchase", "view_detail", "add_to_cart", "add_to_wishlist"];// Initialize Personalizeconst personalizeevents = new AWS.PersonalizeEvents({apiVersion: '2018-03-22'});exports.handler = (event, context) => { for (const record of event.Records) { // Parse encoded payload const payload = JSONBig.parse(Buffer.from(, 'base64').toString('ascii')); // Extract required params const events =; const mpid = payload.mpid; const sessionId = payload.message_id; const params = { sessionId: sessionId, userId: mpid, trackingId: process.env.TRACKING_ID }; // Get interactions from events array const eventList = []; for (const e of events) { if (e.event_type === "commerce_event" && report_actions.indexOf( >= 0) { const timestamp = Math.floor( / 1000); const action =; const event_id =; for (const product of { const obj = { itemId:, }; eventList.push({ properties: obj, sentAt: timestamp, eventId: event_id, eventType: action }); } } } if (eventList.length > 0) { params.eventList = eventList; // Upload interactions to tracker personalizeevents.putEvents(params, function(err) { if (err) console.log(err, err.stack); else console.log(`Uploaded ${eventList.length} events`) }); } }}; 1. Create the IAM role: As before I need to create an IAM role to grant my Lambda function the permissions it needs to access Kinesis and Personalize. The necessary trust policy can be found  here . aws iam create-role \ --role-name items4u-lambda-personalize-role \ --assume-role-policy-document file:///path/to/lambda-trust-policy.json Save the aws iam attach-role-policy \ --role-name items4u-lambda-personalize-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRoleaws iam attach-role-policy \ --role-name items4u-lambda-personalize-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess 2. Create the Lambda: To create the Lambda I need a zip file including the function itself, as well as it's dependencies in the node_modules folder. I'll also need the mParticle API credentials for the Custom Feed I created for Amazon Personalize, and supply these as environment variables for the Lambda, as well as the Dataset Tracker ID. aws lambda create-function \ --function-name Items4UPersonalizeLambda \ --runtime nodejs12.x \ --zip-file fileb:///path/to/ \ --role {{role arn}} \ --handler index.handler \ --environment Variables="{MP_KEY=SomeAccessKey,MP_SECRET=SomeAccessSecret,TRACKER_ID=SomeTrackerID}" 3. Create an event-source mapping: Configure the Lambda to be triggered by new events received at the Kinesis stream. aws lambda create-event-source-mapping \ --function-name Items4UPersonalizeLambda \ --event-source-arn {{Kinesis stream arn}} \ --starting-position LATEST Wait... By now, every time a commerce event is collected across any of my app platforms, mParticle is forwarding it to Kinesis. From here, the Lambda uploads the event to my Personalize dataset. Now I need to wait to get at least 1000 records loaded. This can take some time. In the meantime, I can check the logs in AWS Cloudwatch to make sure the Lambda function is being invoked as expected. Create an ML Campaign A Personalize Campaign requires three components: A "Solution" which describes the particular ML recipe we want to use for the campaign. One dataset group can contain many solutions. A "Solution Version" is an instance of a Solution trained on a specific dataset. The "Campaign" is what will actually dispense product recommendations for a user. 1. Create a Solution: aws personalize create-solution \ --name Items4URecsSolution \ --dataset-group-arn {{dataset group ARN}} \ --recipe-arn arn:aws:personalize:::recipe/aws-user-personalization Save the

Predict your next investment

The CB Insights tech market intelligence platform analyzes millions of data points on venture capital, startups, patents , partnerships and news mentions to help you see tomorrow's opportunities, today.

CB Insights uses Cookies

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.