Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.woodwide.ai/llms.txt

Use this file to discover all available pages before exploring further.

Clustering models automatically group rows in your data into meaningful clusters and generate human-readable descriptions for each cluster. The number of clusters is determined automatically.

Training

Training fits the model to your data and discovers cluster structure. A portion of the training data is held out to compute validation metrics:
MetricDescription
n_clustersNumber of clusters discovered.
silhouette_scoreSilhouette coefficient measuring cluster separation (range -1 to 1; higher is better). Only computed when there are at least 2 clusters.
These metrics are available on the model object via GET /models/{model_id} in the current_metrics field. At training time, the platform also generates human-readable descriptions for each cluster, summarizing the distinguishing characteristics of rows in that cluster. These descriptions are included in inference output.
Python
import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# Upload data
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"dataset_name": "customers"},
    )
dataset_id = resp.json()["dataset"]["id"]

# Train a clustering model
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "customer_segments",
        "model_type": "clustering",
        "dataset_id": dataset_id,
    },
)
model_id = resp.json()["model"]["id"]

# Wait for training
while True:
    model = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()
    if model["status"] == "ready":
        break
    time.sleep(5)

print(model["current_metrics"])  # e.g. {"n_clusters": 4, "silhouette_score": 0.62}

Inference

To get cluster assignments, run inference on your data. Since clustering is unsupervised, it is common to run inference on the same dataset you trained on — this gives you the cluster assignment for each row. You can also run inference on new data, but rows will be assigned to the clusters that were discovered during training.
Python
# Run inference on the training data to get cluster assignments
with open("customers.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("customers.csv", f, "text/csv")},
        data={"output_type": "json"},
    )

results = resp.json()["data"]
print(results)
See Output Formats for the full output schema.