All About CTV
HomeOur TeamContact

Lookalike Audiences Algorithm with GA4 & Salesforce Data

By Mariusz Przydatek
Published in Code
March 19, 2025
2 min read
Lookalike Audiences Algorithm with GA4 & Salesforce Data

Table Of Contents

01
Algorithm Overview
02
Example Code
03
Explanation
04
Addressing the Error
05
Example Use Case
06
Integration with Other Tools
07
Best Practices

Here’s an algorithm for creating lookalike audiences using standard data from Google Analytics and Salesforce. This algorithm combines user behavior metrics from Google Analytics with customer demographics and purchase history from Salesforce to identify similar users.

Algorithm Overview

  1. Data Collection:
    • Google Analytics Data: Collect user behavior metrics such as page views, session duration, bounce rate, and goal completions.
    • Salesforce Data: Gather customer demographics (e.g., age, income) and purchase history.
  2. Data Integration:
    • Merge Google Analytics and Salesforce data on a common identifier (e.g., user ID).
  3. Feature Selection:
    • Select relevant features from the combined data, such as page views, session duration, bounce rate, goal completions, age, income, and purchase history.
  4. Data Standardization:
    • Use techniques like StandardScaler to standardize the features, ensuring all metrics are on the same scale.
  5. Clustering:
    • Apply clustering algorithms (e.g., K-Means) to group users based on their characteristics.
  6. Lookalike Audience Creation:
    • Identify the cluster(s) that most closely match your seed audience (e.g., existing customers).
    • Create the lookalike audience by selecting users from the matching cluster.

Example Code

Here’s a simplified Python code example using Pandas and Scikit-learn for creating a lookalike audience:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
def create_lookalike_audience(seed_data, google_analytics_data, salesforce_data):
"""
Create a lookalike audience using seed data, Google Analytics, and Salesforce data.
Parameters:
- seed_data (list of dict): The seed audience data containing key attributes.
- google_analytics_data (list of dict): Google Analytics data with user behavior metrics.
- salesforce_data (list of dict): Salesforce CRM data with customer demographics and purchase history.
Returns:
- lookalike_audience (list of dict): A list of potential users for the lookalike audience.
"""
# Convert data to DataFrames
seed_df = pd.DataFrame(seed_data)
ga_df = pd.DataFrame(google_analytics_data)
sf_df = pd.DataFrame(salesforce_data)
# Merge data on a common identifier (e.g., user_id)
combined_df = pd.merge(seed_df, ga_df, on='user_id', how='inner')
combined_df = pd.merge(combined_df, sf_df, on='user_id', how='inner')
# Select relevant features for clustering
# Ensure features exist in the DataFrame
available_features = combined_df.columns.tolist()
desired_features = ['page_views', 'session_duration', 'bounce_rate', 'goal_completions', 'age', 'income', 'purchase_history']
features = [f for f in desired_features if f in available_features]
# Standardize the features
scaler = StandardScaler()
standardized_features = scaler.fit_transform(combined_df[features])
# Apply K-Means clustering to identify similar users
kmeans = KMeans(n_clusters=5, random_state=42)
combined_df['cluster'] = kmeans.fit_predict(standardized_features)
# Identify the cluster(s) that most closely match the seed audience
seed_cluster = combined_df.loc[combined_df['user_id'].isin(seed_df['user_id']), 'cluster'].mode()
# Create the lookalike audience by selecting users from the matching cluster
lookalike_audience = combined_df[combined_df['cluster'] == seed_cluster]
# Return the lookalike audience as a list of dictionaries
return lookalike_audience.to_dict(orient='records')
# Example input data
seed_data = [
{'user_id': 1, 'age': 30, 'income': 60000, 'purchase_history': 5},
{'user_id': 2, 'age': 25, 'income': 50000, 'purchase_history': 3}
]
google_analytics_data = [
{'user_id': 1, 'page_views': 10,'session_duration': 300, 'bounce_rate': 20, 'goal_completions': 2},
{'user_id': 2, 'page_views': 8,'session_duration': 250, 'bounce_rate': 30, 'goal_completions': 1},
{'user_id': 3, 'page_views': 15,'session_duration': 400, 'bounce_rate': 10, 'goal_completions': 3}
]
salesforce_data = [
{'user_id': 1, 'age': 30, 'income': 60000, 'purchase_history': 5},
{'user_id': 2, 'age': 25, 'income': 50000, 'purchase_history': 3},
{'user_id': 3, 'age': 35, 'income': 70000, 'purchase_history': 7}
]
# Generate the lookalike audience
lookalike_audience = create_lookalike_audience(seed_data, google_analytics_data, salesforce_data)
# Output the lookalike audience
print(lookalike_audience)

Explanation

  1. Data Integration: The algorithm starts by merging Google Analytics and Salesforce data on a common identifier like user_id. This ensures that each user’s behavior metrics and demographic data are linked.
  2. Feature Selection: It selects relevant features from the combined data, such as page_views, session_duration, bounce_rate, goal_completions, age, income, and purchase_history.
  3. Data Standardization: The features are standardized using StandardScaler to ensure all metrics are on the same scale, which is crucial for clustering algorithms.
  4. Clustering: K-Means clustering is applied to group users based on their characteristics. The number of clusters can be adjusted based on the desired granularity.
  5. Lookalike Audience Creation: The algorithm identifies the cluster(s) that most closely match the seed audience (e.g., existing customers) and creates the lookalike audience by selecting users from these matching clusters.

Addressing the Error

If you encounter a KeyError indicating that certain features are not in the index, ensure that the features you’re trying to select exist in your DataFrame. You can do this by checking the available columns in your DataFrame and adjusting your feature selection accordingly.


Example Use Case

This algorithm can be used to target new customers who resemble your existing customer base. For instance, if you’re a fashion brand, you can use this algorithm to identify potential customers who have similar browsing behaviors and demographics to your existing customers, thereby increasing the likelihood of conversion.


Integration with Other Tools

To further enhance the effectiveness of your lookalike audiences, consider integrating them with other marketing tools like CRM systems or email marketing platforms. This allows you to automate workflows and personalize marketing campaigns based on the characteristics of your lookalike audience.


Best Practices

  • Regularly Review and Update Models: Ensure that your clustering models remain relevant by regularly reviewing and updating them based on new data.
  • Use Real-Time Data: Leverage real-time data from Google Analytics and Salesforce to keep your lookalike audiences accurate and up-to-date.
  • Collaborate Across Teams: Encourage collaboration between sales, marketing, and analytics teams to ensure that insights from lookalike audiences inform broader business strategies.

Tags

#algorithms#code#ga4salesforce#lookalike#audiences

Share

Previous Article
Maximize Marketing ROI with Customer Lifetime Value (LTV)
Mariusz Przydatek

Mariusz Przydatek

Chief Editor

Table Of Contents

1
Algorithm Overview
2
Example Code
3
Explanation
4
Addressing the Error
5
Example Use Case
6
Integration with Other Tools
7
Best Practices

Related Posts

Streamline LTV Calculation with GA4 and Monday.com CRM
March 20, 2025
2 min
All About CTV
© 2025, All Rights Reserved.

Quick Links

Advertise with usAbout UsContact Us

Social Media