Member-only story
Important Considerations when Predictive Modeling with Tabular Data
My Notes for “How to Win a Data Science Competition From Top Kagglers” on Coursera.org
10 min readOct 10, 2020

Overall, I found the course helpful and insightful, 4.79/5. There were many ideas that I had not considered before so I am posting some of my notes here. More than likely, you have seen most of these ideas so I will try to focus on the most interesting ones. Here is the link to the course.
TOC
- Data Exploration Checklist
- Validation
- Target Leakage
- Metrics and Loss Functions
- Metric Optimization
- Mean Encoding
- Coding Tips
- Advanced Feature Engineering
- Ensemble Strategies
- StackNet
- Creating a Diverse Set of Models
- Tips on Meta-Learning and Stacking
- Text Based Features in XGBoost
- Sequence Feature Extraction (XGBoost)
- Semi-supervised & Pseudo Labeling
- My Opinion