Skip to main content
Clustering models automatically group rows in your data into meaningful clusters and generate human-readable descriptions for each cluster. The number of clusters is determined automatically.

Training

Training fits the model to your data and discovers cluster structure. A portion of the training data is held out to compute validation metrics:
MetricDescription
n_clustersNumber of clusters discovered.
silhouette_scoreSilhouette coefficient measuring cluster separation (range -1 to 1; higher is better). Only computed when there are at least 2 clusters.
These metrics are available on the model object via GET /models/{model_id} in the current_metrics field. At training time, the platform also generates human-readable descriptions for each cluster, summarizing the distinguishing characteristics of rows in that cluster. These descriptions are included in inference output.
import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# Upload data
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"dataset_name": "customers"},
    )
dataset_id = resp.json()["dataset"]["id"]

# Train a clustering model
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "customer_segments",
        "model_type": "clustering",
        "dataset_id": dataset_id,
    },
)
model_id = resp.json()["model"]["id"]

# Wait for training
while True:
    model = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()
    if model["status"] == "ready":
        break
    time.sleep(5)

print(model["current_metrics"])  # e.g. {"n_clusters": 4, "silhouette_score": 0.62}

Inference

To get cluster assignments, run inference on your data. Since clustering is unsupervised, it is common to run inference on the same dataset you trained on — this gives you the cluster assignment for each row. You can also run inference on new data, but rows will be assigned to the clusters that were discovered during training.
# Run inference on the training data to get cluster assignments
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"output_type": "json"},
    )

results = resp.json()["data"]
print(results)
See Output Formats for the full output schema.