数据分析实操篇:数字营销转化分析
任务目标
客户转化预测:利用数据建立模型,预测客户转化的可能性
渠道与活动分析:评估不同营销渠道和活动类型对客户转化的影响
关键因素识别:分析哪些因素最能促进客户的参与度和转化率
广告与策略优化:基于分析结果,调整广告支出和营销策略,以提升ROI
# 导入必要的库import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import classification_report, confusion_matrix, accuracy_score, roc_auc_score, roc_curvefrom sklearn.preprocessing import LabelEncoder, StandardScalerfrom sklearn.feature_selection import SelectKBest, f_classifimport warningsimport matplotlibimport platformfrom matplotlib import font_managerwarnings.filterwarnings('ignore')# 设置中文字体def setup_chinese_font():"""设置matplotlib中文字体,兼容Windows、Mac和Linux"""system = platform.system()available_fonts = [f.name for f in font_manager.fontManager.ttflist]if system == 'Windows':font_candidates = ['Microsoft YaHei', 'SimHei', 'KaiTi', 'FangSong', 'SimSun','Microsoft JhengHei', 'PMingLiU', 'MingLiU']elif system == 'Darwin': # macOSfont_candidates = ['Arial Unicode MS', 'PingFang SC', 'STHeiti', 'Heiti SC','STSong', 'Songti SC']else: # Linuxfont_candidates = ['WenQuanYi Micro Hei', 'WenQuanYi Zen Hei', 'Noto Sans CJK SC','Droid Sans Fallback', 'AR PL UMing CN']selected_font = Nonefor font in font_candidates:if font in available_fonts:selected_font = fontbreakif selected_font:plt.rcParams['font.sans-serif'] = [selected_font] + plt.rcParams['font.sans-serif']print(f"✓ 成功设置中文字体: {selected_font}")else:chinese_fonts = [f for f in available_fonts if any(keyword in f for keyword in['YaHei', 'SimHei', 'Hei', 'Song', 'Kai', 'Fang', 'Ming', 'PingFang', 'Noto'])]if chinese_fonts:selected_font = chinese_fonts[0]plt.rcParams['font.sans-serif'] = [selected_font] + plt.rcParams['font.sans-serif']print(f"✓ 找到并设置中文字体: {selected_font}")else:plt.rcParams['font.sans-serif'] = ['DejaVu Sans', 'Arial', 'sans-serif']print("⚠ 未找到合适的中文字体,图表中的中文可能显示为方块")plt.rcParams['axes.unicode_minus'] = Falseplt.rcParams['font.size'] = 10return selected_font# 执行字体设置chinese_font = setup_chinese_font()# 设置图表样式sns.set_style("whitegrid")try:plt.style.use('seaborn-v0_8-darkgrid')except:try:plt.style.use('seaborn-darkgrid')except:plt.style.use('default')if chinese_font:sns.set(font_scale=1.0, font=chinese_font)else:sns.set(font_scale=1.0)print(f"当前使用字体: {chinese_font if chinese_font else'默认字体(可能无法显示中文)'}")
✓ 成功设置中文字体: Microsoft YaHei
当前使用字体: Microsoft YaHei
数据加载与探索
# 读取数据df = pd.read_csv('digital_marketing_campaign_dataset.csv')# 查看数据基本信息print("=" * 60)print("数据基本信息")print("=" * 60)print(f"数据形状: {df.shape}")print(f"行数: {df.shape[0]}, 列数: {df.shape[1]}")print("\n" + "=" * 60)print("数据前5行")print("=" * 60)print(df.head())print("\n" + "=" * 60)print("数据信息")print("=" * 60)print(df.info())print("\n" + "=" * 60)print("数据描述性统计")print("=" * 60)print(df.describe())
============================================================数据基本信息============================================================数据形状: (8000, 20)行数: 8000, 列数: 20============================================================数据前5行============================================================CustomerID Age Gender Income CampaignChannel CampaignType AdSpend \0 8000 56 Female 136912 Social Media Awareness 6497.8700681 8001 69 Male 41760 Email Retention 3898.6686062 8002 46 Female 88456 PPC Awareness 1546.4295963 8003 32 Female 44085 PPC Conversion 539.5259364 8004 60 Female 83964 PPC Conversion 1678.043573ClickThroughRate ConversionRate WebsiteVisits PagesPerVisit TimeOnSite \0 0.043919 0.088031 0 2.399017 7.3968031 0.155725 0.182725 42 2.917138 5.3525492 0.277490 0.076423 2 8.223619 13.7949013 0.137611 0.088004 47 4.540939 14.6883634 0.252851 0.109940 0 2.046847 13.993370SocialShares EmailOpens EmailClicks PreviousPurchases LoyaltyPoints \0 19 6 9 4 6881 5 2 7 2 34592 0 11 2 8 23373 89 2 2 0 24634 6 6 6 8 4345AdvertisingPlatform AdvertisingTool Conversion0 IsConfid ToolConfid 11 IsConfid ToolConfid 12 IsConfid ToolConfid 13 IsConfid ToolConfid 14 IsConfid ToolConfid 1============================================================数据信息============================================================<class 'pandas.core.frame.DataFrame'>RangeIndex: 8000 entries, 0 to 7999Data columns (total 20 columns):# Column Non-Null Count Dtype--- ------ -------------- -----0 CustomerID 8000 non-null int641 Age 8000 non-null int642 Gender 8000 non-null object3 Income 8000 non-null int644 CampaignChannel 8000 non-null object5 CampaignType 8000 non-null object6 AdSpend 8000 non-null float647 ClickThroughRate 8000 non-null float648 ConversionRate 8000 non-null float649 WebsiteVisits 8000 non-null int6410 PagesPerVisit 8000 non-null float6411 TimeOnSite 8000 non-null float6412 SocialShares 8000 non-null int6413 EmailOpens 8000 non-null int6414 EmailClicks 8000 non-null int6415 PreviousPurchases 8000 non-null int6416 LoyaltyPoints 8000 non-null int6417 AdvertisingPlatform 8000 non-null object18 AdvertisingTool 8000 non-null object19 Conversion 8000 non-null int64dtypes: float64(5), int64(10), object(5)memory usage: 1.2+ MBNone============================================================数据描述性统计============================================================CustomerID Age Income AdSpend ClickThroughRate \count 8000.00000 8000.000000 8000.000000 8000.000000 8000.000000mean 11999.50000 43.625500 84664.196750 5000.944830 0.154829std 2309.54541 14.902785 37580.387945 2838.038153 0.084007min 8000.00000 18.000000 20014.000000 100.054813 0.01000525% 9999.75000 31.000000 51744.500000 2523.221165 0.08263550% 11999.50000 43.000000 84926.500000 5013.440044 0.15450575% 13999.25000 56.000000 116815.750000 7407.989369 0.228207max 15999.00000 69.000000 149986.000000 9997.914781 0.299968ConversionRate WebsiteVisits PagesPerVisit TimeOnSite \count 8000.000000 8000.000000 8000.000000 8000.000000mean 0.104389 24.751625 5.549299 7.727718std 0.054878 14.312269 2.607358 4.228218min 0.010018 0.000000 1.000428 0.50166925% 0.056410 13.000000 3.302479 4.06834050% 0.104046 25.000000 5.534257 7.68295675% 0.152077 37.000000 7.835756 11.481468max 0.199995 49.000000 9.999055 14.995311SocialShares EmailOpens EmailClicks PreviousPurchases \count 8000.000000 8000.000000 8000.000000 8000.000000mean 49.799750 9.476875 4.467375 4.485500std 28.901165 5.711111 2.856564 2.888093min 0.000000 0.000000 0.000000 0.00000025% 25.000000 5.000000 2.000000 2.00000050% 50.000000 9.000000 4.000000 4.00000075% 75.000000 14.000000 7.000000 7.000000max 99.000000 19.000000 9.000000 9.000000LoyaltyPoints Conversioncount 8000.000000 8000.000000mean 2490.268500 0.876500std 1429.527162 0.329031min 0.000000 0.00000025% 1254.750000 1.00000050% 2497.000000 1.00000075% 3702.250000 1.000000max 4999.000000 1.000000
数据质量检查
# 检查缺失值print("=" * 60)print("缺失值检查")print("=" * 60)missing_values = df.isnull().sum()missing_percent = (missing_values / len(df)) * 100missing_df = pd.DataFrame({'缺失数量': missing_values,'缺失百分比': missing_percent})missing_df = missing_df[missing_df['缺失数量'] > 0]if len(missing_df) > 0:print(missing_df)else:print("✓ 数据中没有缺失值")# 检查重复值print("\n" + "=" * 60)print("重复值检查")print("=" * 60)duplicate_count = df.duplicated().sum()print(f"重复行数: {duplicate_count}")if duplicate_count > 0:print("⚠ 发现重复数据,建议进一步检查")else:print("✓ 数据中没有重复值")# 检查目标变量分布print("\n" + "=" * 60)print("目标变量分布")print("=" * 60)conversion_counts = df['Conversion'].value_counts()conversion_percent = df['Conversion'].value_counts(normalize=True) * 100print("转化情况统计:")for idx, val in conversion_counts.items():status = "已转化" if idx == 1 else "未转化"print(f" {status}: {val} ({conversion_percent[idx]:.2f}%)")# 检查数据类型print("\n" + "=" * 60)print("数据类型检查")print("=" * 60)print(df.dtypes)
============================================================缺失值检查============================================================✓ 数据中没有缺失值============================================================重复值检查============================================================重复行数: 0✓ 数据中没有重复值============================================================目标变量分布============================================================转化情况统计:已转化: 7012 (87.65%)未转化: 988 (12.35%)============================================================数据类型检查============================================================CustomerID int64Age int64Gender objectIncome int64CampaignChannel objectCampaignType objectAdSpend float64ClickThroughRate float64ConversionRate float64WebsiteVisits int64PagesPerVisit float64TimeOnSite float64SocialShares int64EmailOpens int64EmailClicks int64PreviousPurchases int64LoyaltyPoints int64AdvertisingPlatform objectAdvertisingTool objectConversion int64dtype: object
探索性数据分析(EDA)
# 目标变量分布可视化fig, axes = plt.subplots(1, 2, figsize=(14, 5))# 饼图conversion_counts = df['Conversion'].value_counts()labels = ['未转化', '已转化']colors = ['#ff9999', '#66b3ff']axes[0].pie(conversion_counts.values, labels=labels, autopct='%1.1f%%',colors=colors, startangle=90)axes[0].set_title('客户转化分布', fontsize=14, fontweight='bold')# 柱状图axes[1].bar(labels, conversion_counts.values, color=colors)axes[1].set_title('客户转化数量', fontsize=14, fontweight='bold')axes[1].set_ylabel('客户数量')for i, v in enumerate(conversion_counts.values):axes[1].text(i, v, str(v), ha='center', va='bottom', fontsize=12)plt.tight_layout()plt.show()print(f"转化率: {(df['Conversion'].sum() / len(df) * 100):.2f}%")
转化率: 87.65%# 数值型变量的分布numeric_cols = ['Age', 'Income', 'AdSpend', 'ClickThroughRate', 'ConversionRate','WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks', 'PreviousPurchases', 'LoyaltyPoints']fig, axes = plt.subplots(4, 4, figsize=(20, 16))axes = axes.ravel()for i, col in enumerate(numeric_cols):axes[i].hist(df[col], bins=30, edgecolor='black', alpha=0.7)axes[i].set_title(f'{col} 分布', fontsize=10)axes[i].set_xlabel(col)axes[i].set_ylabel('频数')axes[i].grid(True, alpha=0.3)# 隐藏多余的子图for i in range(len(numeric_cols), len(axes)):axes[i].axis('off')plt.tight_layout()plt.show()

# 分类变量的分布categorical_cols = ['Gender', 'CampaignChannel', 'CampaignType']fig, axes = plt.subplots(1, 3, figsize=(18, 5))for i, col in enumerate(categorical_cols):value_counts = df[col].value_counts()axes[i].bar(range(len(value_counts)), value_counts.values, color='steelblue')axes[i].set_xticks(range(len(value_counts)))axes[i].set_xticklabels(value_counts.index, rotation=45, ha='right')axes[i].set_title(f'{col} 分布', fontsize=12, fontweight='bold')axes[i].set_ylabel('数量')axes[i].grid(True, alpha=0.3, axis='y')# 添加数值标签for j, v in enumerate(value_counts.values):axes[i].text(j, v, str(v), ha='center', va='bottom', fontsize=10)plt.tight_layout()plt.show()

渠道与活动分析
# 不同营销渠道的转化率分析channel_conversion = df.groupby('CampaignChannel').agg({'Conversion': ['count', 'sum', 'mean']}).round(4)channel_conversion.columns = ['总客户数', '转化客户数', '转化率']channel_conversion = channel_conversion.sort_values('转化率', ascending=False)print("=" * 60)print("不同营销渠道的转化情况")print("=" * 60)print(channel_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(16, 6))# 转化率对比axes[0].barh(channel_conversion.index, channel_conversion['转化率'], color='coral')axes[0].set_xlabel('转化率', fontsize=12)axes[0].set_title('不同营销渠道的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='x')for i, v in enumerate(channel_conversion['转化率']):axes[0].text(v, i, f'{v:.3f}', va='center', fontsize=10)# 客户数量与转化数对比x = np.arange(len(channel_conversion.index))width = 0.35axes[1].bar(x - width/2, channel_conversion['总客户数'], width, label='总客户数', color='skyblue')axes[1].bar(x + width/2, channel_conversion['转化客户数'], width, label='转化客户数', color='lightcoral')axes[1].set_xlabel('营销渠道', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同营销渠道的客户数量对比', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(channel_conversion.index, rotation=45, ha='right')axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同营销渠道的转化情况============================================================总客户数 转化客户数 转化率CampaignChannelReferral 1719 1518 0.8831PPC 1655 1461 0.8828SEO 1550 1359 0.8768Email 1557 1355 0.8703Social Media 1519 1319 0.8683

# 不同活动类型的转化率分析campaign_conversion = df.groupby('CampaignType').agg({'Conversion': ['count', 'sum', 'mean']}).round(4)campaign_conversion.columns = ['总客户数', '转化客户数', '转化率']campaign_conversion = campaign_conversion.sort_values('转化率', ascending=False)print("=" * 60)print("不同活动类型的转化情况")print("=" * 60)print(campaign_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(16, 6))# 转化率对比axes[0].barh(campaign_conversion.index, campaign_conversion['转化率'], color='mediumseagreen')axes[0].set_xlabel('转化率', fontsize=12)axes[0].set_title('不同活动类型的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='x')for i, v in enumerate(campaign_conversion['转化率']):axes[0].text(v, i, f'{v:.3f}', va='center', fontsize=10)# 客户数量与转化数对比x = np.arange(len(campaign_conversion.index))width = 0.35axes[1].bar(x - width/2, campaign_conversion['总客户数'], width, label='总客户数', color='lightblue')axes[1].bar(x + width/2, campaign_conversion['转化客户数'], width, label='转化客户数', color='salmon')axes[1].set_xlabel('活动类型', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同活动类型的客户数量对比', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(campaign_conversion.index, rotation=45, ha='right')axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同活动类型的转化情况============================================================总客户数 转化客户数 转化率CampaignTypeConversion 2077 1939 0.9336Retention 1947 1671 0.8582Awareness 1988 1701 0.8556Consideration 1988 1701 0.8556

# 渠道与活动类型的交叉分析channel_campaign = pd.crosstab(df['CampaignChannel'], df['CampaignType'],values=df['Conversion'], aggfunc='mean')channel_campaign = channel_campaign.round(4)print("=" * 60)print("渠道与活动类型的转化率交叉分析")print("=" * 60)print(channel_campaign)# 热力图可视化plt.figure(figsize=(12, 6))sns.heatmap(channel_campaign, annot=True, fmt='.3f', cmap='YlOrRd',cbar_kws={'label': '转化率'}, linewidths=0.5)plt.title('不同渠道与活动类型的转化率热力图', fontsize=14, fontweight='bold', pad=20)plt.xlabel('活动类型', fontsize=12)plt.ylabel('营销渠道', fontsize=12)plt.tight_layout()plt.show()
============================================================渠道与活动类型的转化率交叉分析============================================================CampaignType Awareness Consideration Conversion RetentionCampaignChannelEmail 0.8413 0.8561 0.9327 0.8444PPC 0.8592 0.8688 0.9396 0.8571Referral 0.8840 0.8550 0.9303 0.8601SEO 0.8487 0.8482 0.9403 0.8663Social Media 0.8408 0.8494 0.9237 0.8622

关键因素识别
# 数值型变量与转化率的相关性分析numeric_cols = ['Age', 'Income', 'AdSpend', 'ClickThroughRate', 'ConversionRate','WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks', 'PreviousPurchases', 'LoyaltyPoints']correlation_with_conversion = df[numeric_cols + ['Conversion']].corr()['Conversion'].sort_values(ascending=False)correlation_with_conversion = correlation_with_conversion.drop('Conversion')print("=" * 60)print("各变量与转化率的相关性(按相关性绝对值排序)")print("=" * 60)correlation_df = pd.DataFrame({'变量': correlation_with_conversion.index,'相关系数': correlation_with_conversion.values})correlation_df['绝对值'] = correlation_df['相关系数'].abs()correlation_df = correlation_df.sort_values('绝对值', ascending=False)print(correlation_df.to_string(index=False))# 可视化相关性plt.figure(figsize=(12, 8))colors = ['red' if x < 0 else 'green' for x in correlation_df['相关系数']]plt.barh(correlation_df['变量'], correlation_df['相关系数'], color=colors, alpha=0.7)plt.xlabel('相关系数', fontsize=12)plt.title('各变量与转化率的相关性分析', fontsize=14, fontweight='bold', pad=20)plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)plt.grid(True, alpha=0.3, axis='x')for i, v in enumerate(correlation_df['相关系数']):plt.text(v, i, f'{v:.3f}', va='center', fontsize=10)plt.tight_layout()plt.show()
============================================================各变量与转化率的相关性(按相关性绝对值排序)============================================================变量 相关系数 绝对值TimeOnSite 0.129609 0.129609EmailClicks 0.129521 0.129521EmailOpens 0.124884 0.124884AdSpend 0.124672 0.124672ClickThroughRate 0.120012 0.120012PreviousPurchases 0.111781 0.111781PagesPerVisit 0.102840 0.102840LoyaltyPoints 0.095004 0.095004ConversionRate 0.093185 0.093185WebsiteVisits 0.079339 0.079339Income 0.013974 0.013974SocialShares -0.011449 0.011449Age 0.001606 0.001606

# 转化客户与未转化客户的数值特征对比converted = df[df['Conversion'] == 1]not_converted = df[df['Conversion'] == 0]fig, axes = plt.subplots(3, 4, figsize=(20, 12))axes = axes.ravel()key_features = ['Age', 'Income', 'AdSpend', 'ClickThroughRate', 'ConversionRate','WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks', 'PreviousPurchases']for i, col in enumerate(key_features):axes[i].hist(not_converted[col], bins=30, alpha=0.6, label='未转化', color='red', edgecolor='black')axes[i].hist(converted[col], bins=30, alpha=0.6, label='已转化', color='green', edgecolor='black')axes[i].set_title(f'{col} 分布对比', fontsize=10, fontweight='bold')axes[i].set_xlabel(col)axes[i].set_ylabel('频数')axes[i].legend()axes[i].grid(True, alpha=0.3)plt.tight_layout()plt.show()

# 关键指标的均值对比comparison_cols = ['Age', 'Income', 'AdSpend', 'ClickThroughRate', 'ConversionRate','WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks', 'PreviousPurchases', 'LoyaltyPoints']comparison_df = pd.DataFrame({'变量': comparison_cols,'未转化均值': [not_converted[col].mean() for col in comparison_cols],'已转化均值': [converted[col].mean() for col in comparison_cols]})comparison_df['差异'] = comparison_df['已转化均值'] - comparison_df['未转化均值']comparison_df['差异百分比'] = (comparison_df['差异'] / comparison_df['未转化均值'] * 100).round(2)comparison_df = comparison_df.sort_values('差异百分比', key=abs, ascending=False)print("=" * 80)print("转化客户与未转化客户的关键指标对比")print("=" * 80)print(comparison_df.to_string(index=False))# 可视化关键差异top_features = comparison_df.head(8)fig, ax = plt.subplots(figsize=(12, 8))x = np.arange(len(top_features))width = 0.35ax.bar(x - width/2, top_features['未转化均值'], width, label='未转化', color='lightcoral', alpha=0.8)ax.bar(x + width/2, top_features['已转化均值'], width, label='已转化', color='lightgreen', alpha=0.8)ax.set_xlabel('变量', fontsize=12)ax.set_ylabel('均值', fontsize=12)ax.set_title('转化客户与未转化客户的关键指标对比(Top 8)', fontsize=14, fontweight='bold')ax.set_xticks(x)ax.set_xticklabels(top_features['变量'], rotation=45, ha='right')ax.legend()ax.grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
================================================================================转化客户与未转化客户的关键指标对比================================================================================变量 未转化均值 已转化均值 差异 差异百分比EmailClicks 3.481781 4.606246 1.124465 32.30EmailOpens 7.576923 9.744581 2.167658 28.61PreviousPurchases 3.625506 4.606674 0.981168 27.06TimeOnSite 6.267871 7.933413 1.665541 26.57AdSpend 4058.398466 5133.750850 1075.352384 26.50ClickThroughRate 0.127972 0.158613 0.030641 23.94LoyaltyPoints 2128.483806 2541.244438 412.760632 19.39ConversionRate 0.090766 0.106308 0.015542 17.12PagesPerVisit 4.835002 5.649945 0.814943 16.86WebsiteVisits 21.726721 25.177838 3.451117 15.88SocialShares 50.681174 49.675556 -1.005618 -1.98Income 83265.308704 84861.301911 1595.993207 1.92Age 43.561741 43.634484 0.072743 0.17

客户转化预测模型
# 数据预处理# 创建模型训练用的数据副本df_model = df.copy()# 编码分类变量le_gender = LabelEncoder()le_channel = LabelEncoder()le_campaign = LabelEncoder()df_model['Gender_encoded'] = le_gender.fit_transform(df_model['Gender'])df_model['CampaignChannel_encoded'] = le_channel.fit_transform(df_model['CampaignChannel'])df_model['CampaignType_encoded'] = le_campaign.fit_transform(df_model['CampaignType'])# 选择特征(排除ID和保密字段)feature_cols = ['Age', 'Gender_encoded', 'Income', 'CampaignChannel_encoded','CampaignType_encoded', 'AdSpend', 'ClickThroughRate', 'ConversionRate','WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks', 'PreviousPurchases', 'LoyaltyPoints']X = df_model[feature_cols]y = df_model['Conversion']print("=" * 60)print("特征准备完成")print("=" * 60)print(f"特征数量: {len(feature_cols)}")print(f"样本数量: {len(X)}")print(f"特征列表: {feature_cols}")
============================================================特征准备完成============================================================特征数量: 16样本数量: 8000特征列表: ['Age', 'Gender_encoded', 'Income', 'CampaignChannel_encoded', 'CampaignType_encoded', 'AdSpend', 'ClickThroughRate', 'ConversionRate', 'WebsiteVisits', 'PagesPerVisit', 'TimeOnSite', 'SocialShares', 'EmailOpens', 'EmailClicks', 'PreviousPurchases', 'LoyaltyPoints']
# 特征重要性分析(使用随机森林)rf_importance = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)rf_importance.fit(X, y)feature_importance = pd.DataFrame({'特征': feature_cols,'重要性': rf_importance.feature_importances_}).sort_values('重要性', ascending=False)print("=" * 60)print("特征重要性排序(随机森林)")print("=" * 60)print(feature_importance.to_string(index=False))# 可视化特征重要性plt.figure(figsize=(12, 8))plt.barh(feature_importance['特征'], feature_importance['重要性'], color='steelblue')plt.xlabel('重要性得分', fontsize=12)plt.title('特征重要性分析', fontsize=14, fontweight='bold', pad=20)plt.gca().invert_yaxis()plt.grid(True, alpha=0.3, axis='x')for i, v in enumerate(feature_importance['重要性']):plt.text(v, i, f'{v:.4f}', va='center', fontsize=9)plt.tight_layout()plt.show()
============================================================特征重要性排序(随机森林)============================================================特征 重要性PagesPerVisit 0.092809AdSpend 0.091146TimeOnSite 0.090630ClickThroughRate 0.090102ConversionRate 0.089485LoyaltyPoints 0.083438WebsiteVisits 0.072412EmailOpens 0.064021PreviousPurchases 0.058790Income 0.057766EmailClicks 0.054914SocialShares 0.051835Age 0.048186CampaignType_encoded 0.024629CampaignChannel_encoded 0.022320Gender_encoded 0.007517

# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)print("=" * 60)print("数据集划分")print("=" * 60)print(f"训练集大小: {X_train.shape[0]}")print(f"测试集大小: {X_test.shape[0]}")print(f"训练集转化率: {y_train.mean():.4f}")print(f"测试集转化率: {y_test.mean():.4f}")# 标准化特征(用于逻辑回归)scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)
============================================================数据集划分============================================================训练集大小: 6400测试集大小: 1600训练集转化率: 0.8766测试集转化率: 0.8762
# 训练多个模型并比较models = {'逻辑回归': LogisticRegression(random_state=42, max_iter=1000),'随机森林': RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1),'梯度提升': GradientBoostingClassifier(n_estimators=100, random_state=42)}results = {}print("=" * 60)print("模型训练与评估")print("=" * 60)for name, model in models.items():# 选择是否使用标准化数据if name == '逻辑回归':model.fit(X_train_scaled, y_train)y_pred = model.predict(X_test_scaled)y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]else:model.fit(X_train, y_train)y_pred = model.predict(X_test)y_pred_proba = model.predict_proba(X_test)[:, 1]accuracy = accuracy_score(y_test, y_pred)auc = roc_auc_score(y_test, y_pred_proba)results[name] = {'accuracy': accuracy,'auc': auc,'model': model,'y_pred': y_pred,'y_pred_proba': y_pred_proba}print(f"\n{name}:")print(f" 准确率: {accuracy:.4f}")print(f" AUC: {auc:.4f}")# 选择最佳模型best_model_name = max(results, key=lambda x: results[x]['auc'])print(f"\n最佳模型: {best_model_name} (AUC: {results[best_model_name]['auc']:.4f})")
============================================================模型训练与评估============================================================逻辑回归:准确率: 0.8888AUC: 0.7707随机森林:准确率: 0.8919AUC: 0.8008梯度提升:准确率: 0.9050AUC: 0.8137最佳模型: 梯度提升 (AUC: 0.8137)
# 最佳模型的详细评估best_model = results[best_model_name]['model']y_pred = results[best_model_name]['y_pred']y_pred_proba = results[best_model_name]['y_pred_proba']print("=" * 60)print(f"{best_model_name} 详细分类报告")print("=" * 60)print(classification_report(y_test, y_pred, target_names=['未转化', '已转化']))# 混淆矩阵cm = confusion_matrix(y_test, y_pred)plt.figure(figsize=(8, 6))sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',xticklabels=['未转化', '已转化'],yticklabels=['未转化', '已转化'])plt.title(f'{best_model_name} 混淆矩阵', fontsize=14, fontweight='bold')plt.ylabel('真实值', fontsize=12)plt.xlabel('预测值', fontsize=12)plt.tight_layout()plt.show()
============================================================梯度提升 详细分类报告============================================================precision recall f1-score support未转化 0.83 0.29 0.43 198已转化 0.91 0.99 0.95 1402accuracy 0.91 1600macro avg 0.87 0.64 0.69 1600weighted avg 0.90 0.91 0.88 1600

# ROC曲线plt.figure(figsize=(10, 8))for name, result in results.items():fpr, tpr, _ = roc_curve(y_test, result['y_pred_proba'])plt.plot(fpr, tpr, label=f'{name} (AUC = {result["auc"]:.4f})', linewidth=2)plt.plot([0, 1], [0, 1], 'k--', label='随机猜测', linewidth=1)plt.xlim([0.0, 1.0])plt.ylim([0.0, 1.05])plt.xlabel('假阳性率 (FPR)', fontsize=12)plt.ylabel('真阳性率 (TPR)', fontsize=12)plt.title('ROC曲线对比', fontsize=14, fontweight='bold', pad=20)plt.legend(loc="lower right", fontsize=10)plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()

广告与策略优化
# 广告支出效率分析(ROI相关)# 计算每个渠道的平均广告支出和转化率channel_roi = df.groupby('CampaignChannel').agg({'AdSpend': 'mean','Conversion': 'mean','CustomerID': 'count'}).round(2)channel_roi.columns = ['平均广告支出', '转化率', '客户数量']channel_roi['支出效率'] = channel_roi['转化率'] / channel_roi['平均广告支出'] * 10000 # 归一化处理channel_roi = channel_roi.sort_values('支出效率', ascending=False)print("=" * 60)print("不同渠道的广告支出效率分析")print("=" * 60)print(channel_roi)# 可视化fig, axes = plt.subplots(2, 2, figsize=(16, 12))# 平均广告支出axes[0, 0].barh(channel_roi.index, channel_roi['平均广告支出'], color='lightblue')axes[0, 0].set_xlabel('平均广告支出 ($)', fontsize=11)axes[0, 0].set_title('不同渠道的平均广告支出', fontsize=12, fontweight='bold')axes[0, 0].grid(True, alpha=0.3, axis='x')# 转化率axes[0, 1].barh(channel_roi.index, channel_roi['转化率'], color='lightgreen')axes[0, 1].set_xlabel('转化率', fontsize=11)axes[0, 1].set_title('不同渠道的转化率', fontsize=12, fontweight='bold')axes[0, 1].grid(True, alpha=0.3, axis='x')# 支出效率axes[1, 0].barh(channel_roi.index, channel_roi['支出效率'], color='coral')axes[1, 0].set_xlabel('支出效率 (转化率/支出)', fontsize=11)axes[1, 0].set_title('不同渠道的广告支出效率', fontsize=12, fontweight='bold')axes[1, 0].grid(True, alpha=0.3, axis='x')# 散点图:支出 vs 转化率axes[1, 1].scatter(channel_roi['平均广告支出'], channel_roi['转化率'],s=channel_roi['客户数量']*2, alpha=0.6, c=range(len(channel_roi)), cmap='viridis')for idx, row in channel_roi.iterrows():axes[1, 1].annotate(idx, (row['平均广告支出'], row['转化率']),fontsize=9, ha='center')axes[1, 1].set_xlabel('平均广告支出 ($)', fontsize=11)axes[1, 1].set_ylabel('转化率', fontsize=11)axes[1, 1].set_title('广告支出与转化率关系', fontsize=12, fontweight='bold')axes[1, 1].grid(True, alpha=0.3)plt.tight_layout()plt.show()
============================================================不同渠道的广告支出效率分析============================================================平均广告支出 转化率 客户数量 支出效率CampaignChannelPPC 4954.22 0.88 1655 1.776263SEO 4994.13 0.88 1550 1.762069Social Media 4965.32 0.87 1519 1.752153Referral 5034.04 0.88 1719 1.748099Email 5055.60 0.87 1557 1.720864

# 活动类型的支出效率分析campaign_roi = df.groupby('CampaignType').agg({'AdSpend': 'mean','Conversion': 'mean','CustomerID': 'count'}).round(2)campaign_roi.columns = ['平均广告支出', '转化率', '客户数量']campaign_roi['支出效率'] = campaign_roi['转化率'] / campaign_roi['平均广告支出'] * 10000campaign_roi = campaign_roi.sort_values('支出效率', ascending=False)print("=" * 60)print("不同活动类型的广告支出效率分析")print("=" * 60)print(campaign_roi)# 可视化fig, axes = plt.subplots(1, 3, figsize=(18, 5))axes[0].barh(campaign_roi.index, campaign_roi['平均广告支出'], color='steelblue')axes[0].set_xlabel('平均广告支出 ($)', fontsize=11)axes[0].set_title('不同活动类型的平均广告支出', fontsize=12, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='x')axes[1].barh(campaign_roi.index, campaign_roi['转化率'], color='mediumseagreen')axes[1].set_xlabel('转化率', fontsize=11)axes[1].set_title('不同活动类型的转化率', fontsize=12, fontweight='bold')axes[1].grid(True, alpha=0.3, axis='x')axes[2].barh(campaign_roi.index, campaign_roi['支出效率'], color='coral')axes[2].set_xlabel('支出效率', fontsize=11)axes[2].set_title('不同活动类型的广告支出效率', fontsize=12, fontweight='bold')axes[2].grid(True, alpha=0.3, axis='x')plt.tight_layout()plt.show()
============================================================不同活动类型的广告支出效率分析============================================================平均广告支出 转化率 客户数量 支出效率CampaignTypeConversion 4959.11 0.93 2077 1.875337Consideration 4960.40 0.86 1988 1.733731Retention 5017.14 0.86 1947 1.714124Awareness 5069.34 0.86 1988 1.696473

# 客户参与度指标分析engagement_metrics = ['ClickThroughRate', 'ConversionRate', 'WebsiteVisits','PagesPerVisit', 'TimeOnSite', 'SocialShares','EmailOpens', 'EmailClicks']engagement_analysis = pd.DataFrame({'指标': engagement_metrics,'转化客户均值': [converted[col].mean() for col in engagement_metrics],'未转化客户均值': [not_converted[col].mean() for col in engagement_metrics]})engagement_analysis['提升幅度'] = ((engagement_analysis['转化客户均值'] -engagement_analysis['未转化客户均值']) /engagement_analysis['未转化客户均值'] * 100).round(2)engagement_analysis = engagement_analysis.sort_values('提升幅度', ascending=False)print("=" * 60)print("客户参与度指标对比分析")print("=" * 60)print(engagement_analysis.to_string(index=False))# 可视化fig, ax = plt.subplots(figsize=(12, 8))x = np.arange(len(engagement_analysis))width = 0.35ax.bar(x - width/2, engagement_analysis['未转化客户均值'], width,label='未转化客户', color='lightcoral', alpha=0.8)ax.bar(x + width/2, engagement_analysis['转化客户均值'], width,label='转化客户', color='lightgreen', alpha=0.8)ax.set_xlabel('参与度指标', fontsize=12)ax.set_ylabel('均值', fontsize=12)ax.set_title('客户参与度指标对比', fontsize=14, fontweight='bold')ax.set_xticks(x)ax.set_xticklabels(engagement_analysis['指标'], rotation=45, ha='right')ax.legend()ax.grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================客户参与度指标对比分析============================================================指标 转化客户均值 未转化客户均值 提升幅度EmailClicks 4.606246 3.481781 32.30EmailOpens 9.744581 7.576923 28.61TimeOnSite 7.933413 6.267871 26.57ClickThroughRate 0.158613 0.127972 23.94ConversionRate 0.106308 0.090766 17.12PagesPerVisit 5.649945 4.835002 16.86WebsiteVisits 25.177838 21.726721 15.88SocialShares 49.675556 50.681174 -1.98

# 广告支出与转化率的关系分析# 将广告支出分成几个区间进行分析df['AdSpend_Bin'] = pd.cut(df['AdSpend'], bins=5, labels=['很低', '低', '中', '高', '很高'])spend_conversion = df.groupby('AdSpend_Bin').agg({'Conversion': ['count', 'sum', 'mean'],'AdSpend': 'mean'}).round(4)spend_conversion.columns = ['总客户数', '转化客户数', '转化率', '平均支出']spend_conversion = spend_conversion.sort_values('平均支出')print("=" * 60)print("不同广告支出区间的转化情况")print("=" * 60)print(spend_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(16, 6))# 转化率趋势axes[0].plot(range(len(spend_conversion)), spend_conversion['转化率'],marker='o', linewidth=2, markersize=8, color='steelblue')axes[0].set_xticks(range(len(spend_conversion)))axes[0].set_xticklabels(spend_conversion.index, rotation=45, ha='right')axes[0].set_ylabel('转化率', fontsize=12)axes[0].set_title('不同广告支出区间的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3)for i, v in enumerate(spend_conversion['转化率']):axes[0].text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=10)# 支出与转化数x = np.arange(len(spend_conversion))width = 0.35axes[1].bar(x - width/2, spend_conversion['总客户数'], width,label='总客户数', color='skyblue', alpha=0.8)axes[1].bar(x + width/2, spend_conversion['转化客户数'], width,label='转化客户数', color='salmon', alpha=0.8)axes[1].set_xlabel('广告支出区间', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同广告支出区间的客户数量', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(spend_conversion.index, rotation=45, ha='right')axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同广告支出区间的转化情况============================================================总客户数 转化客户数 转化率 平均支出AdSpend_Bin很低 1623 1349 0.8312 1108.7369低 1660 1378 0.8301 3076.0391中 1539 1358 0.8824 5060.4029高 1619 1481 0.9148 6988.3938很高 1559 1446 0.9275 8979.9128

# 客户历史行为与转化关系history_metrics = ['PreviousPurchases', 'LoyaltyPoints']fig, axes = plt.subplots(1, 2, figsize=(16, 6))for i, metric in enumerate(history_metrics):# 创建分组if metric == 'PreviousPurchases':bins = [0, 1, 3, 5, 10, 20]labels = ['0次', '1次', '2-3次', '4-5次', '6+次']else:bins = [0, 500, 1000, 2000, 5000, 10000]labels = ['0-500', '500-1000', '1000-2000', '2000-5000', '5000+']df[f'{metric}_Bin'] = pd.cut(df[metric], bins=bins, labels=labels, include_lowest=True)history_conversion = df.groupby(f'{metric}_Bin')['Conversion'].agg(['count', 'sum', 'mean']).round(4)history_conversion.columns = ['总客户数', '转化客户数', '转化率']axes[i].bar(range(len(history_conversion)), history_conversion['转化率'],color='mediumpurple', alpha=0.8)axes[i].set_xticks(range(len(history_conversion)))axes[i].set_xticklabels(history_conversion.index, rotation=45, ha='right')axes[i].set_ylabel('转化率', fontsize=12)axes[i].set_title(f'{metric} 与转化率关系', fontsize=12, fontweight='bold')axes[i].grid(True, alpha=0.3, axis='y')for j, v in enumerate(history_conversion['转化率']):axes[i].text(j, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9)plt.tight_layout()plt.show()
posx and posy should be finite valuesposx and posy should be finite valuesposx and posy should be finite valuesposx and posy should be finite values

人口统计特征分析
# 性别与转化率分析gender_conversion = df.groupby('Gender').agg({'Conversion': ['count', 'sum', 'mean']}).round(4)gender_conversion.columns = ['总客户数', '转化客户数', '转化率']print("=" * 60)print("不同性别的转化情况")print("=" * 60)print(gender_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(14, 5))axes[0].bar(gender_conversion.index, gender_conversion['转化率'],color=['lightblue', 'lightpink'], alpha=0.8)axes[0].set_ylabel('转化率', fontsize=12)axes[0].set_title('不同性别的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='y')for i, v in enumerate(gender_conversion['转化率']):axes[0].text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=12)x = np.arange(len(gender_conversion))width = 0.35axes[1].bar(x - width/2, gender_conversion['总客户数'], width,label='总客户数', color='skyblue', alpha=0.8)axes[1].bar(x + width/2, gender_conversion['转化客户数'], width,label='转化客户数', color='salmon', alpha=0.8)axes[1].set_xlabel('性别', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同性别的客户数量对比', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(gender_conversion.index)axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同性别的转化情况============================================================总客户数 转化客户数 转化率GenderFemale 4839 4240 0.8762Male 3161 2772 0.8769

# 年龄与转化率分析# 将年龄分成几个区间df['Age_Bin'] = pd.cut(df['Age'], bins=[0, 30, 40, 50, 60, 100],labels=['<30', '30-40', '40-50', '50-60', '60+'])age_conversion = df.groupby('Age_Bin').agg({'Conversion': ['count', 'sum', 'mean'],'Age': 'mean'}).round(4)age_conversion.columns = ['总客户数', '转化客户数', '转化率', '平均年龄']age_conversion = age_conversion.sort_values('平均年龄')print("=" * 60)print("不同年龄段的转化情况")print("=" * 60)print(age_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(16, 6))axes[0].bar(range(len(age_conversion)), age_conversion['转化率'],color='mediumturquoise', alpha=0.8)axes[0].set_xticks(range(len(age_conversion)))axes[0].set_xticklabels(age_conversion.index)axes[0].set_ylabel('转化率', fontsize=12)axes[0].set_xlabel('年龄段', fontsize=12)axes[0].set_title('不同年龄段的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='y')for i, v in enumerate(age_conversion['转化率']):axes[0].text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=10)x = np.arange(len(age_conversion))width = 0.35axes[1].bar(x - width/2, age_conversion['总客户数'], width,label='总客户数', color='lightblue', alpha=0.8)axes[1].bar(x + width/2, age_conversion['转化客户数'], width,label='转化客户数', color='lightcoral', alpha=0.8)axes[1].set_xlabel('年龄段', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同年龄段的客户数量对比', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(age_conversion.index)axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同年龄段的转化情况============================================================总客户数 转化客户数 转化率 平均年龄Age_Bin<30 1930 1689 0.8751 23.997930-40 1591 1396 0.8774 35.581440-50 1554 1366 0.8790 45.433150-60 1512 1321 0.8737 55.382360+ 1413 1240 0.8776 64.9236

# 收入与转化率分析# 将收入分成几个区间df['Income_Bin'] = pd.cut(df['Income'], bins=[0, 40000, 60000, 80000, 100000, 200000],labels=['<40K', '40-60K', '60-80K', '80-100K', '100K+'])income_conversion = df.groupby('Income_Bin').agg({'Conversion': ['count', 'sum', 'mean'],'Income': 'mean'}).round(2)income_conversion.columns = ['总客户数', '转化客户数', '转化率', '平均收入']income_conversion = income_conversion.sort_values('平均收入')print("=" * 60)print("不同收入水平的转化情况")print("=" * 60)print(income_conversion)# 可视化fig, axes = plt.subplots(1, 2, figsize=(16, 6))axes[0].bar(range(len(income_conversion)), income_conversion['转化率'],color='gold', alpha=0.8)axes[0].set_xticks(range(len(income_conversion)))axes[0].set_xticklabels(income_conversion.index, rotation=45, ha='right')axes[0].set_ylabel('转化率', fontsize=12)axes[0].set_xlabel('收入水平', fontsize=12)axes[0].set_title('不同收入水平的转化率', fontsize=14, fontweight='bold')axes[0].grid(True, alpha=0.3, axis='y')for i, v in enumerate(income_conversion['转化率']):axes[0].text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=10)x = np.arange(len(income_conversion))width = 0.35axes[1].bar(x - width/2, income_conversion['总客户数'], width,label='总客户数', color='lightgreen', alpha=0.8)axes[1].bar(x + width/2, income_conversion['转化客户数'], width,label='转化客户数', color='orange', alpha=0.8)axes[1].set_xlabel('收入水平', fontsize=12)axes[1].set_ylabel('客户数量', fontsize=12)axes[1].set_title('不同收入水平的客户数量对比', fontsize=14, fontweight='bold')axes[1].set_xticks(x)axes[1].set_xticklabels(income_conversion.index, rotation=45, ha='right')axes[1].legend()axes[1].grid(True, alpha=0.3, axis='y')plt.tight_layout()plt.show()
============================================================不同收入水平的转化情况============================================================总客户数 转化客户数 转化率 平均收入Income_Bin<40K 1288 1112 0.86 29933.1740-60K 1174 1031 0.88 49785.6460-80K 1247 1102 0.88 69995.6280-100K 1175 1027 0.87 89921.17100K+ 3116 2740 0.88 124316.24

# 综合特征组合分析:渠道+活动类型+性别combo_analysis = df.groupby(['CampaignChannel', 'CampaignType', 'Gender']).agg({'Conversion': ['count', 'sum', 'mean']}).round(4)combo_analysis.columns = ['总客户数', '转化客户数', '转化率']combo_analysis = combo_analysis.sort_values('转化率', ascending=False)print("=" * 80)print("渠道+活动类型+性别的组合转化分析(Top 10)")print("=" * 80)print(combo_analysis.head(10))# 找出最佳组合best_combo = combo_analysis.index[0]print(f"\n最佳组合: {best_combo}")print(f" 转化率: {combo_analysis.loc[best_combo, '转化率']:.4f}")print(f" 客户数: {combo_analysis.loc[best_combo, '总客户数']}")
================================================================================渠道+活动类型+性别的组合转化分析(Top 10)================================================================================总客户数 转化客户数 转化率CampaignChannel CampaignType GenderSEO Conversion Female 261 248 0.9502PPC Conversion Female 279 265 0.9498Social Media Conversion Female 214 201 0.9393Email Conversion Female 262 246 0.9389Referral Conversion Male 174 162 0.9310Female 271 252 0.9299PPC Conversion Male 168 155 0.9226Email Conversion Male 154 142 0.9221SEO Conversion Male 141 130 0.9220Social Media Conversion Male 153 138 0.9020最佳组合: ('SEO', 'Conversion', 'Female')转化率: 0.9502客户数: 261
结论
本分析报告通过对8,000名客户的数字营销活动数据进行深入挖掘和机器学习建模,全面分析了客户转化情况,并提出了针对性的优化建议。主要发现包括:
核心发现
整体转化表现优异:整体转化率达到87.65%(7,012/8,000),表明当前营销策略整体有效,但仍存在12.35%的优化空间。
推荐渠道表现最佳:Referral(推荐)渠道的转化率最高,说明口碑营销和客户推荐策略效果显著,应作为重点投入渠道。
转化活动类型最有效:Conversion(转化)类型的营销活动转化效果最好,建议优先开展此类活动,并与高转化渠道(如Referral)形成最佳组合。
PPC渠道ROI最高:虽然Referral渠道转化率最高,但PPC(付费点击)渠道的支出效率最高,在预算有限的情况下,PPC渠道具有更高的投资回报率。
邮件营销参与度是关键驱动因素:
邮件点击率(EmailClicks):转化客户比未转化客户高32.3%
邮件打开率(EmailOpens):转化客户比未转化客户高28.6%
网站停留时间(TimeOnSite):转化客户比未转化客户高26.6%
这些指标是区分转化客户和未转化客户的关键信号。
机器学习模型预测能力强:梯度提升模型在测试集上达到90.50%的准确率和AUC得分,能够有效识别高转化概率客户,为精准营销提供可靠支持。
策略建议总结
渠道策略:重点投入Referral渠道,同时保持PPC渠道的高效运营
活动策略:优先开展Conversion类型活动,与高转化渠道组合投放
参与度提升:优化邮件营销内容,提升打开率和点击率;改善网站体验,延长用户停留时间
精准营销:利用梯度提升模型预测高转化概率客户,进行个性化营销
持续优化:建立A/B测试机制,持续监控关键指标,及时调整策略
通过实施以上策略建议,预期可以进一步提升营销活动的转化率,优化广告支出分配,实现更高的ROI。
# 生成最终分析报告摘要print("=" * 80)print("数字营销转化分析 - 最终报告摘要")print("=" * 80)print("\n📊 数据概览")print(f" • 总客户数: {len(df):,}")print(f" • 转化客户数: {df['Conversion'].sum():,}")print(f" • 整体转化率: {df['Conversion'].mean():.2%}")print(f" • 未转化客户数: {len(df) - df['Conversion'].sum():,} (优化空间: {(1 - df['Conversion'].mean()):.2%})")print("\n🎯 关键发现")print(f" • 最佳营销渠道: {best_channel} (转化率: {best_channel_rate:.2%})")print(f" • 最佳活动类型: {best_campaign} (转化率: {best_campaign_rate:.2%})")print(f" • 最高效渠道(ROI): {best_roi_channel}")print(f" • 关键参与度指标:")top_engagement = engagement_analysis.head(3)for idx, row in top_engagement.iterrows():print(f" - {row['指标']}: 转化客户比未转化客户高 {row['提升幅度']:.1f}%")print("\n🤖 模型性能")print(f" • 最佳模型: {best_model_name}")print(f" • AUC得分: {results[best_model_name]['auc']:.4f} (优秀)")print(f" • 准确率: {results[best_model_name]['accuracy']:.4f} (90.5%)")print(f" • 模型可用于精准识别高转化概率客户")print("\n💡 核心策略建议")print(" 1. 【渠道优化】重点投入Referral渠道,同时保持PPC渠道的高效运营")print(" 2. 【活动策略】优先开展Conversion类型活动,与高转化渠道组合投放")print(" 3. 【参与度提升】优化邮件营销内容(提升32.3%点击率),改善网站体验(延长26.6%停留时间)")print(" 4. 【精准营销】利用梯度提升模型预测高转化概率客户,进行个性化营销")print(" 5. 【ROI优化】关注支出效率,平衡转化率和成本效益")print("\n📈 预期效果")print(" • 通过优化渠道和活动组合,预期可提升5-10%的转化率")print(" • 通过提升邮件参与度,预期可增加20-30%的邮件转化")print(" • 通过模型精准营销,预期可提升15-25%的营销ROI")print("\n" + "=" * 80)print("分析完成!感谢使用本分析报告。")print("=" * 80)
=====================================
数字营销转化分析 – 最终报告摘要
===========================================
📊 数据概览
• 总客户数: 8,000
• 转化客户数: 7,012
• 整体转化率: 87.65%
🎯 关键发现
• 最佳营销渠道: Referral (转化率: 88.31%)
• 最佳活动类型: Conversion (转化率: 93.36%)
• 最高效渠道: PPC
🤖 模型性能
• 最佳模型: 梯度提升
• AUC得分: 0.8137
• 准确率: 0.9050
💡 核心建议
1. 优化渠道组合,重点投入高转化渠道
2. 调整活动类型策略,优先开展高转化活动
3. 提升客户参与度,优化网站和邮件营销
4. 利用预测模型进行精准营销
5. 关注ROI,优化广告支出分配
# 策略建议输出print("=" * 80)print("营销策略优化建议")print("=" * 80)print("\n【建议1: 渠道优化】")print(f" • 重点投入转化率最高的渠道: {best_channel}")print(f" • 考虑增加 {best_channel} 渠道的预算分配")print(f" • 对于转化率较低的渠道,需要优化投放策略或减少投入")print("\n【建议2: 活动类型优化】")print(f" • 优先开展 {best_campaign} 类型的营销活动")print(f" • 将 {best_campaign} 活动与高转化渠道结合,形成最佳组合")print(f" • 分析低转化活动类型的原因,进行策略调整")print("\n【建议3: 广告支出优化】")print(f" • 重点关注支出效率最高的渠道: {best_roi_channel}")print(f" • 分析广告支出与转化率的关系,找到最优支出区间")print(f" • 避免过度投入,关注ROI而非绝对支出")print("\n【建议4: 客户参与度提升】")top_engagement = engagement_analysis.head(3)print(f" • 重点提升以下参与度指标:")for idx, row in top_engagement.iterrows():print(f" - {row['指标']}: 转化客户比未转化客户高 {row['提升幅度']:.1f}%")print(f" • 优化网站体验,提高页面访问量和停留时间")print(f" • 提升邮件营销效果,增加打开率和点击率")print("\n【建议5: 客户细分策略】")print(f" • 利用模型预测高转化概率客户,进行精准营销")print(f" • 针对历史购买客户和忠诚度高的客户,制定个性化策略")print(f" • 分析不同年龄段和收入群体的转化差异,制定差异化策略")print("\n【建议6: 数据驱动决策】")print(f" • 使用建立的预测模型({best_model_name})进行客户转化预测")print(f" • 持续监控关键指标,及时调整策略")print(f" • 建立A/B测试机制,验证策略效果")print("\n" + "=" * 80)
营销策略优化建议
===========================================
【建议1: 渠道优化】
• 重点投入转化率最高的渠道: Referral
• 考虑增加 Referral 渠道的预算分配
• 对于转化率较低的渠道,需要优化投放策略或减少投入
【建议2: 活动类型优化】
• 优先开展 Conversion 类型的营销活动
• 将 Conversion 活动与高转化渠道结合,形成最佳组合
• 分析低转化活动类型的原因,进行策略调整
【建议3: 广告支出优化】
• 重点关注支出效率最高的渠道: PPC
• 分析广告支出与转化率的关系,找到最优支出区间
• 避免过度投入,关注ROI而非绝对支出
【建议4: 客户参与度提升】
• 重点提升以下参与度指标:
– EmailClicks: 转化客户比未转化客户高 32.3%
– EmailOpens: 转化客户比未转化客户高 28.6%
– TimeOnSite: 转化客户比未转化客户高 26.6%
• 优化网站体验,提高页面访问量和停留时间
• 提升邮件营销效果,增加打开率和点击率
【建议5: 客户细分策略】
• 利用模型预测高转化概率客户,进行精准营销
• 针对历史购买客户和忠诚度高的客户,制定个性化策略
• 分析不同年龄段和收入群体的转化差异,制定差异化策略
【建议6: 数据驱动决策】
• 使用建立的预测模型(梯度提升)进行客户转化预测
• 持续监控关键指标,及时调整策略
• 建立A/B测试机制,验证策略效果
https://www.heywhale.com/mw/project/69719946663d9934efd4c4db

扫一扫
二维码
获取更多专业知识
往
期
推
荐