The Feature Engineering Frontier: A Systematic Review of Purchase Prediction
A new systematic literature review synthesizes 47 studies on predicting customer purchases in e-commerce, focusing on two persistent challenges: feature engineering and class imbalance. The review finds that while ensemble learning methods like XGBoost and Random Forest remain dominant, their performance is heavily contingent on effective feature engineering, which now includes deep learning, graph-based extraction, and advanced selection techniques. For handling imbalanced datasets, methods like SMOTE and ensemble learning are standard, though combined sampling strategies are gaining interest despite limited validation. The key takeaway is that no single model is universally best; optimal performance depends on a careful interplay between feature engineering, imbalance handling, and model selection tailored to the specific data context.
Why it might matter to you: This review directly addresses core machine learning challenges you work with, such as feature selection, handling imbalanced datasets, and model evaluation. It provides a structured overview of current best practices and identifies critical research gaps, like the need for better validation of imbalance techniques, which can inform your own model development and hyperparameter tuning strategies. For anyone focused on building robust predictive systems, this synthesis offers a valuable roadmap for navigating the trade-offs between complex feature engineering and model performance.
Source →Stay curious. Stay informed — with Science Briefing.
Always double check the original article for accuracy.
