Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.woodwide.ai/llms.txt

Use this file to discover all available pages before exploring further.

Wood Wide AI provides a single API for training and running inference across six capabilities on tabular data. You upload a dataset, train a model by specifying a model type, and then run inference to get results.

Available Model Types

Prediction

Supervised classification and regression on a target column.

Clustering

Unsupervised grouping with human-readable cluster descriptions.

Anomaly Detection

Identify unusual rows in your data.

Embeddings

Generate dense vector representations of each row.

Search

Find the most similar training-set row for each query row.

Factor Analysis

Discover the latent factors that explain variance in your data.

General Workflow

Every capability follows the same three-step workflow:
  1. Upload a datasetPOST /datasets with a CSV or Parquet file.
  2. Train a modelPOST /models/train with the dataset_id, a model_name, and the desired model_type.
  3. Run inferencePOST /models/{model_id}/infer with a CSV file to get results.
Python
import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# 1. Upload dataset
with open("data.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("data.csv", f, "text/csv")},
        data={"dataset_name": "my_data"},
    )
dataset_id = resp.json()["dataset"]["id"]

# 2. Train model (change model_type for different capabilities)
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "my_model",
        "model_type": "prediction",  # or: clustering, anomaly, embedding, search, factors
        "dataset_id": dataset_id,
        "label_column": "target",    # required for prediction only
    },
)
model_id = resp.json()["model"]["id"]

# 3. Wait for training
while True:
    status = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()["status"]
    if status == "ready":
        break
    time.sleep(5)

# 4. Run inference
with open("data.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("data.csv", f, "text/csv")},
        data={"output_type": "json"},
    )
print(resp.json()["data"])
Training a second model on the same dataset — even for a different capability — is significantly faster than the first training run.

Supervised vs. Unsupervised Tasks

Prediction is the only supervised task — it requires a label_column at training time and predicts that column at inference time on new data. All other tasks (clustering, anomaly detection, embeddings, search, factors) are unsupervised. For these, it often makes sense to run inference on the same dataset you trained on. For example, to cluster your data, you would train a clustering model on that data and then run inference on the same dataset to get the cluster assignments. You can also run inference on different data, but the cluster assignments will be to the clusters learned from the training data. The same applies to the other unsupervised tasks — see the section for each model capability for details.