Escalation Path
When default results don’t meet your needs, work through these steps in order. Each step is lower effort than the next, so start at the top.Improve your target column
For prediction tasks, the single highest-impact change is usually the target column itself.See Preparing a Prediction Target in the Data Preparation Guide.
Add domain features
The model can only learn from what is in the dataset. Adding columns that encode domain knowledge often produces a larger accuracy lift than any parameter change.High-value feature types:
Remove the raw columns once you’ve derived better ones—leaving both in can dilute the signal.
| Source column | Derived feature | Why it helps |
|---|---|---|
signup_date | days_since_signup, signup_month, signup_day_of_week | Captures time-based patterns the raw date obscures |
plan_type + monthly_spend | spend_per_plan_tier | Encodes relative value rather than absolute spend |
last_login_date, created_date | days_inactive | Surfaces recency without the model doing date math |
city, state | is_metro (boolean) | Replaces high-cardinality text with a meaningful signal |
num_logins_30d, num_logins_7d | login_trend (ratio) | Captures direction, not just volume |
Remove low-signal columns
Too many irrelevant columns add noise and slow down training. Remove columns that are unlikely to carry predictive signal before uploading.Safe to remove:
- Pure ID columns (
row_id,uuid,transaction_id) with no repeating values - Free-text fields with near-unique values per row (
notes,email,full_address) - Columns with more than 80–90% missing values
- Columns that are direct proxies for your target (leakage columns)
cancellation_date when predicting churn.Increase training data volume
More rows generally means better generalization, especially for rare classes or fine-grained segmentation.
- Add historical data. If you have older records you excluded, include them—even if they’re noisier.
- Combine related datasets. If you have similar data from different sources or time windows, union them before training.
- Reconsider row filters. If you filtered rows to simplify the problem, check whether those rows are actually irrelevant or just inconvenient.
Tune your feature set for the specific task
Different Wood Wide AI tasks respond to different feature profiles.See Preparing Data by Task for the per-task breakdown.
Re-examine your evaluation metric
Accuracy alone is often the wrong measure. Choose the metric that reflects what matters in your use case.
Evaluate on a held-out test set that mirrors your real inference distribution, not just cross-validation on training data.
| Situation | Better metric |
|---|---|
| Imbalanced classes (e.g., fraud detection) | Precision / Recall / F1 on the minority class |
| High cost of false positives | Precision |
| High cost of false negatives | Recall |
| Regression with outliers | MAE instead of RMSE |
| Regression with proportional errors | MAPE or RMSLE |
Iterate with compound workflows
When a single model isn’t enough, combine Wood Wide AI operations.
- Cluster then predict. Segment your data first, then train a prediction model per cluster. Groups with genuinely different behavior often produce better per-segment models than one model trained on everything.
- Anomaly filter then predict. Remove anomalous training rows before training a prediction model to reduce noise in the training set.
- Predict then explain. Run inference, then use factor analysis on the output to understand which features are driving predictions across different sub-populations.
Quick Reference
Target Quality
Clean labels, clear classes, minimal ambiguity. Usually the highest-impact fix.
Feature Engineering
Add domain columns, extract datetime components, compute ratios and trends.
Feature Pruning
Remove IDs, high-missing columns, leakage columns, and near-constant columns.
More Data
Add historical records, combine related datasets, revisit row filters.
Task-Specific Tuning
Match your feature set to the task: prediction, clustering, anomaly detection, or factor analysis.
Compound Workflows
Combine cluster → predict, anomaly filter → predict, or predict → explain.
Related
Data Preparation Guide
Formatting, column types, missing values, and pre-upload checklist.
Compound Insight Recipes
End-to-end examples combining multiple Wood Wide AI operations.