Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.woodwide.ai/llms.txt

Use this file to discover all available pages before exploring further.

Here are 4 sample patterns for combining Wood Wide AI tasks to answer questions that no single model can do alone. Each task type: prediction, clustering, anomaly detection, and factor analysis, is useful on its own. But the most powerful analyses chain them together. These recipes show you how.

Recipe 1: Segment, Then Predict

This recipe helps you cluster your data first, then train a separate prediction model per cluster. Why try it? A single prediction model trained on all your data learns average patterns. However, the signal that predicts churn in your enterprise accounts looks nothing like the signal in your SMB accounts. Segmenting first lets each model focus on the patterns that actually matter for that group.

How to run it

1

Cluster your dataset

Upload your dataset and run Clustering. Use the columns that describe behavior or structure, not the outcome you want to predict. Let the model find the natural groupings.
2

Export each cluster as a separate dataset

Tag each row with its cluster label, then split into separate files, one per cluster.
3

Train a prediction model per cluster

Upload each cluster file and train a Prediction model on each one, using the same target column across all of them.
4

Run inference within each segment

When new data arrives, assign it to a cluster first (using the clustering model), then run inference with the matching prediction model.

Example use cases

Cluster customers by usage frequency, contract value, and product adoption metrics. Train a churn prediction model separately for each segment. High-usage enterprise customers and low-usage trial accounts have completely different churn patterns; one model for both will underperform on both.
Cluster open opportunities by deal size, sales cycle length, and number of stakeholders involved. Train a win/loss prediction model per cluster. A 500K multi−stakeholder deal has different predictive features than a 10K single-buyer deal.
Cluster accounts by payment history, average invoice size, and days-to-pay trends. Train a default prediction model per cluster. Seasonal businesses and steady recurring accounts default for different reasons at different times.
Cluster machines or devices by operating environment (load, temperature range, run hours). Train a failure prediction model per cluster. A machine running at 90% capacity fails differently than one running at 40%.

Recipe 2: Detect, Then Diagnose

This recipe uses anomaly detection to flag unusual records, then run factor analysis on the flagged set to understand what they have in common. Why try it? Anomaly detection tells you whether something is off. Factor analysis tells you what pattern the anomalies share. Together they turn a list of outliers into an actionable finding.

How to run it

1

Train an anomaly detection model

Upload your historical data and run Anomaly Detection. The model learns what normal looks like across your numeric columns.
2

Run inference and export flagged records

Run inference on your current data. Export only the rows with high anomaly scores; these are your outliers.
3

Run factor analysis on the flagged set

Upload the flagged records and run Factor Analysis. This surfaces the underlying dimensions that the anomalies have in common.
4

Interpret the factors

The factors point to which combinations of columns are driving the unusual pattern. Use this to form a hypothesis about root cause.

Example use cases

Train on 12 months of monthly revenue data by account (ARR, expansion, contraction, churn, new bookings). Flag months where the pattern breaks. Run factor analysis on flagged months to identify whether the anomaly is driven by contraction in a specific segment, a drop in new bookings, or an unusual churn spike.
Train on normal transaction data (amount, frequency, merchant category, time of day, geography). Flag transactions with high anomaly scores. Run factor analysis on flagged transactions to surface the combination of features (such as unusually high amounts at unusual hours in unusual locations) that characterizes the fraud pattern.
Train on healthy quarter-close pipeline data (deals by stage, average deal age, stage conversion rates, rep activity metrics). Flag quarters or territories where the pattern breaks. Factor analysis on the anomalies reveals whether the breakdown is concentrated in a specific stage, deal size band, or rep cohort.
Train on normal procurement and fulfillment data (lead times, order volumes, supplier fill rates, inventory levels). Flag disrupted periods. Factor analysis on the flagged records reveals whether disruptions cluster around specific suppliers, SKUs, or logistics routes.

Recipe 3: Predict, Then Slice by Segment

This recipe helps you train one prediction model on all your data, run inference, then break out the predicted outcomes by cluster to compare risk or opportunity across groups. Why try it? A single model gives you a score per row. Slicing those scores by segment tells you where the risk or opportunity is concentrated, and by how much. This helps turn a straight model into a decision-making tool.

How to run it

1

Train a prediction model on your full dataset

Upload your labeled data and train a Prediction model. This gives you a model that scores any new record.
2

Cluster your inference data

Upload your current (unlabeled) data and run Clustering to assign each record to a segment.
3

Run inference with the prediction model

Run your prediction model on the same dataset. Each record now has both a cluster label and a predicted outcome score.
4

Compare predicted outcomes across clusters

Group by cluster and look at the distribution of predicted scores. Which segments have the highest predicted churn? The highest predicted revenue? The most at-risk accounts?

Example use cases

Train a churn model on historical customer data. Cluster your current active accounts by spend, tenure, and engagement. Run inference and compare predicted churn rates across clusters. You might find that 70% of your predicted churn is concentrated in a single cluster e.g. mid-market accounts with declining usage in months 8–14.
Train a model to predict expansion revenue (accounts that upgraded in the past). Cluster current accounts by feature adoption and usage depth. Run inference and rank clusters by predicted expansion probability. Prioritize the highest-scoring cluster for outbound.
Train a default prediction model on historical loan data. Cluster current borrowers by loan size, term, payment history, and utilization. Compare predicted default rates across clusters to identify where portfolio risk is concentrated before it surfaces in actuals.
Train a conversion model on historical closed/lost deal data. Cluster inbound leads by firmographic and behavioral attributes (company size, industry, pages visited, time to first action). Run inference and compare predicted conversion rates across clusters to prioritize outreach.

Recipe 4: Baseline, Then Monitor

This recipe helps you train an anomaly detection model on clean historical data as your baseline, then run inference on each new period’s data to detect drift over time. Why try it? Anomaly detection is relative; it needs to know what normal looks like. By anchoring on a stable historical baseline, you can detect when current data starts behaving differently, before the difference shows up in your lagging indicators.

How to run it

1

Upload a clean historical dataset as your baseline

Choose a period that represents normal operations (not a period with a lot of known incidents), seasonality outliers, or data quality issues. Upload it as your baseline dataset.
2

Train an anomaly detection model on the baseline

Run Anomaly Detection on the baseline. The model encodes what normal looks like for this data.
3

Run inference on each new period

Each week, month, or quarter, upload your new data and run inference with the baseline model. Higher anomaly scores = greater deviation from the baseline pattern.
4

Track anomaly scores over time

Compare aggregate anomaly scores across periods. A rising score signals drift. Combine with Recipe 2 (Detect → Diagnose) to identify which factors are driving the drift.

Example use cases

Baseline on 12 months of stable ARR data (new bookings, expansion, contraction, churn by cohort). Run inference on each subsequent month. A spike in anomaly score in month 15 might catch an unusual contraction pattern two months before it shows up in net revenue retention.
Baseline on a representative period of transaction data (amounts, frequencies, category distributions, timing). Run inference on each new week or month. Useful for detecting shifts in customer purchasing behavior, seasonal anomalies outside expected ranges, or early signs of fraud pattern changes.
Baseline on quarters where pipeline converted at expected rates (stage progression speed, deal age by size, close rate by rep). Run inference each quarter. Flags when pipeline behavior deviates — deals stalling in a stage they don’t normally stall in, or close rates dropping in a segment before it shows in quota attainment.
Baseline on normal operating periods for any numeric operational dataset — logistics (delivery times, fill rates, order volumes), finance (expense ratios, budget utilization, invoice timing), or SaaS usage (logins, feature calls, API volume). Run inference each period to catch shifts before they become incidents.

AI-Assisted Analysis Prompts

Not sure how to interpret your results or structure your next step? Paste your data summary into Claude or ChatGPT with one of these prompts.
I'm running a two-step machine learning analysis on tabular data.

Step 1: I clustered my dataset using an unsupervised clustering model. Each row
now has a cluster label (e.g., Cluster 0, Cluster 1, Cluster 2). Here is a
summary of each cluster — the average values of key columns per cluster:

[PASTE CLUSTER SUMMARY HERE]

Step 2: I want to train a separate prediction model on each cluster to predict
[TARGET COLUMN — e.g., churned, converted, defaulted].

Please help me:
1. Describe what each cluster likely represents in plain business terms based
   on the column averages
2. Identify which columns are likely to be the strongest predictors of
   [TARGET COLUMN] within each cluster, and why they might differ across clusters
3. Flag any clusters that may be too small or too homogeneous to train a
   reliable prediction model on
4. Suggest whether any clusters should be merged before training
I ran anomaly detection on my dataset and flagged a set of records with high
anomaly scores. I then ran factor analysis on just the flagged records to
understand what they have in common.

Here is the factor analysis output — the columns with the highest loadings
on each factor:

[PASTE FACTOR ANALYSIS RESULTS HERE]

The dataset contains the following columns:
[LIST YOUR COLUMN NAMES AND WHAT THEY REPRESENT]

Please help me:
1. Interpret each factor in plain terms — what business concept does it represent?
2. Explain what the combination of high-loading columns suggests about why
   these records were flagged as anomalies
3. Suggest 2-3 hypotheses about the root cause of the anomaly pattern
4. Recommend what additional data or context I should look at to confirm
   or rule out each hypothesis
I trained a prediction model to predict [TARGET — e.g., churn, conversion,
default] and ran inference on my current dataset. I also clustered the same
dataset into segments. Each record now has both a predicted score and a
cluster label.

Here is the distribution of predicted scores by cluster:

[PASTE CLUSTER × PREDICTION SUMMARY — e.g., average predicted score,
% above threshold, count per cluster]

The clusters have these approximate characteristics:
[PASTE CLUSTER SUMMARY OR DESCRIPTION]

Please help me:
1. Identify which cluster(s) represent the highest concentration of risk
   or opportunity based on the predicted scores
2. Explain in plain terms why that cluster might score higher, based on
   its characteristics
3. Suggest what action to take for each cluster — prioritize, monitor,
   investigate, or ignore
4. Flag any clusters where the prediction scores seem surprising or
   inconsistent with the cluster profile, which might indicate a data issue
I trained an anomaly detection model on a historical baseline dataset
representing normal operations. I've been running inference on each new
period's data and tracking aggregate anomaly scores over time.

Here are my anomaly scores by period:

[PASTE PERIOD-BY-PERIOD ANOMALY SCORE SUMMARY]

The dataset tracks the following metrics:
[LIST YOUR COLUMN NAMES AND WHAT THEY REPRESENT]

Known context: [ADD ANY KNOWN EVENTS — e.g., "we ran a promotion in March",
"a new pricing tier launched in Q3", "leave blank if none"]

Please help me:
1. Identify which periods show meaningful deviation from baseline and which
   are within normal variation
2. Suggest whether the drift pattern looks like a sudden shift (single
   period spike) or a gradual trend (scores rising over multiple periods)
3. Recommend which columns to investigate first based on the period where
   scores started rising
4. Help me determine whether this drift is likely operational (a real change
   in behavior) or a data quality issue (a change in how data was recorded)

Combining Recipes

These recipes are composable. Here are a few combinations to get even more out of your data:
CombinationWhat it answers
Recipe 1 + Recipe 3Segment first, predict per segment, then compare predicted outcomes across segments for sharper prioritization
Recipe 4 + Recipe 2Monitor for drift over time, then diagnose what’s driving it when anomaly scores rise
Recipe 2 + Recipe 1Detect anomalous accounts, cluster the anomalies to find subgroups, then predict which subgroup is most likely to churn or default