Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.woodwide.ai/llms.txt

Use this file to discover all available pages before exploring further.

Search models build an index over the training data at training time. At inference time, each row in the inference data is matched to the most semantically similar row in the training dataset. This is useful for finding nearest-neighbor matches, deduplication, record linkage, or recommendation systems.

Training

Training embeds the entire training dataset and builds a nearest-neighbor index over those embeddings. No label_column is needed. No validation metrics are computed for search models.
Python
import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# Upload the dataset to search against
with open("catalog.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("catalog.csv", f, "text/csv")},
        data={"dataset_name": "product_catalog"},
    )
dataset_id = resp.json()["dataset"]["id"]

# Train a search model
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "catalog_search",
        "model_type": "search",
        "dataset_id": dataset_id,
    },
)
model_id = resp.json()["model"]["id"]

# Wait for training
while True:
    model = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()
    if model["status"] == "ready":
        break
    time.sleep(5)

Inference

Provide a CSV of query rows. For each query row, the model returns the row ID of the closest match in the training dataset.
Python
# Find the closest catalog item for each query
with open("queries.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("queries.csv", f, "text/csv")},
        data={"output_type": "json"},
    )

results = resp.json()["data"]
print(results)
See Output Formats for the full output schema.