Clustering

Clustering models automatically group rows in your data into meaningful clusters and generate human-readable descriptions for each cluster. The number of clusters is determined automatically.

Training

Training fits the model to your data and discovers cluster structure. A portion of the training data is held out to compute validation metrics:

Metric	Description
`n_clusters`	Number of clusters discovered.
`silhouette_score`	Silhouette coefficient measuring cluster separation (range -1 to 1; higher is better). Only computed when there are at least 2 clusters.

These metrics are available on the model object via GET /models/{model_id} in the current_metrics field. At training time, the platform also generates human-readable descriptions for each cluster, summarizing the distinguishing characteristics of rows in that cluster. These descriptions are included in inference output.

Python

import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# Upload data
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"dataset_name": "customers"},
    )
dataset_id = resp.json()["dataset"]["id"]

# Train a clustering model
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "customer_segments",
        "model_type": "clustering",
        "dataset_id": dataset_id,
    },
)
model_id = resp.json()["model"]["id"]

# Wait for training
while True:
    model = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()
    if model["status"] == "ready":
        break
    time.sleep(5)

print(model["current_metrics"])  # e.g. {"n_clusters": 4, "silhouette_score": 0.62}

Inference

To get cluster assignments, run inference on your data. Since clustering is unsupervised, it is common to run inference on the same dataset you trained on — this gives you the cluster assignment for each row. You can also run inference on new data, but rows will be assigned to the clusters that were discovered during training.

Python

# Run inference on the training data to get cluster assignments
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"output_type": "json"},
    )

results = resp.json()["data"]
print(results)

See Output Formats for the full output schema.

Wood Wide AI

Capabilities

Guides

Training

Inference

Wood Wide AI

Capabilities

Guides

Documentation Index

​Training

​Inference

Training

Inference