Improving Performance - Wood Wide AI SDK Documentation

Wood Wide AI produces strong results out of the box—the platform handles type inference, schema alignment, and missing-value treatment automatically. Before reaching for any of the techniques below, make sure your data is clean and well-structured. See the Data Preparation Guide for the full checklist.

Escalation Path

When default results don’t meet your needs, work through these steps in order. Each step is lower effort than the next, so start at the top.

Improve your target column

For prediction tasks, the single highest-impact change is usually the target column itself.See Preparing a Prediction Target in the Data Preparation Guide.

Add domain features

The model can only learn from what is in the dataset. Adding columns that encode domain knowledge often produces a larger accuracy lift than any parameter change.High-value feature types:

Source column	Derived feature	Why it helps
`signup_date`	`days_since_signup`, `signup_month`, `signup_day_of_week`	Captures time-based patterns the raw date obscures
`plan_type` + `monthly_spend`	`spend_per_plan_tier`	Encodes relative value rather than absolute spend
`last_login_date`, `created_date`	`days_inactive`	Surfaces recency without the model doing date math
`city`, `state`	`is_metro` (boolean)	Replaces high-cardinality text with a meaningful signal
`num_logins_30d`, `num_logins_7d`	`login_trend` (ratio)	Captures direction, not just volume

Remove the raw columns once you’ve derived better ones—leaving both in can dilute the signal.

Remove low-signal columns

Too many irrelevant columns add noise and slow down training. Remove columns that are unlikely to carry predictive signal before uploading.Safe to remove:

Pure ID columns (row_id, uuid, transaction_id) with no repeating values
Free-text fields with near-unique values per row (notes, email, full_address)
Columns with more than 80–90% missing values
Columns that are direct proxies for your target (leakage columns)

Leakage check: Remove columns that can only be populated after the outcome you’re predicting is already known. A common example: including cancellation_date when predicting churn.

Increase training data volume

More rows generally means better generalization, especially for rare classes or fine-grained segmentation.

Add historical data. If you have older records you excluded, include them—even if they’re noisier.
Combine related datasets. If you have similar data from different sources or time windows, union them before training.
Reconsider row filters. If you filtered rows to simplify the problem, check whether those rows are actually irrelevant or just inconvenient.

There is no strict row minimum, but prediction tasks typically benefit from at least a few thousand rows for complex patterns.

Tune your feature set for the specific task

Different Wood Wide AI tasks respond to different feature profiles.See Preparing Data by Task for the per-task breakdown.

Re-examine your evaluation metric

Accuracy alone is often the wrong measure. Choose the metric that reflects what matters in your use case.

Situation	Better metric
Imbalanced classes (e.g., fraud detection)	Precision / Recall / F1 on the minority class
High cost of false positives	Precision
High cost of false negatives	Recall
Regression with outliers	MAE instead of RMSE
Regression with proportional errors	MAPE or RMSLE

Evaluate on a held-out test set that mirrors your real inference distribution, not just cross-validation on training data.

Iterate with compound workflows

When a single model isn’t enough, combine Wood Wide AI operations.

Cluster then predict. Segment your data first, then train a prediction model per cluster. Groups with genuinely different behavior often produce better per-segment models than one model trained on everything.
Anomaly filter then predict. Remove anomalous training rows before training a prediction model to reduce noise in the training set.
Predict then explain. Run inference, then use factor analysis on the output to understand which features are driving predictions across different sub-populations.

See Compound Insight Recipes for worked examples.

Quick Reference

Target Quality

Clean labels, clear classes, minimal ambiguity. Usually the highest-impact fix.

Feature Engineering

Add domain columns, extract datetime components, compute ratios and trends.

Feature Pruning

Remove IDs, high-missing columns, leakage columns, and near-constant columns.

More Data

Add historical records, combine related datasets, revisit row filters.

Task-Specific Tuning

Match your feature set to the task: prediction, clustering, anomaly detection, or factor analysis.

Compound Workflows

Combine cluster → predict, anomaly filter → predict, or predict → explain.

Data Preparation Guide

Formatting, column types, missing values, and pre-upload checklist.

Compound Insight Recipes

End-to-end examples combining multiple Wood Wide AI operations.

​Escalation Path

​Quick Reference