Search - Wood Wide AI SDK Documentation

Search models build an index over the training data at training time. At inference time, each row in the inference data is matched to the most semantically similar row in the training dataset. This is useful for finding nearest-neighbor matches, deduplication, record linkage, or recommendation systems.

Training

Training embeds the entire training dataset and builds a nearest-neighbor index over those embeddings. No label_column is needed. No validation metrics are computed for search models.

import os, time, requests

api_key = os.getenv("WOODWIDE_API_KEY")
base_url = "https://api.woodwide.ai"
headers = {"Authorization": f"Bearer {api_key}"}

# Upload the dataset to search against
with open("catalog.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/datasets",
        headers=headers,
        files={"file": ("catalog.csv", f, "text/csv")},
        data={"dataset_name": "product_catalog"},
    )
dataset_id = resp.json()["dataset"]["id"]

# Train a search model
resp = requests.post(
    f"{base_url}/models/train",
    headers=headers,
    json={
        "model_name": "catalog_search",
        "model_type": "search",
        "dataset_id": dataset_id,
    },
)
model_id = resp.json()["model"]["id"]

# Wait for training
while True:
    model = requests.get(
        f"{base_url}/models/{model_id}", headers=headers
    ).json()
    if model["status"] == "ready":
        break
    time.sleep(5)

Inference

Provide a CSV of query rows. For each query row, the model returns the row ID of the closest match in the training dataset.

# Find the closest catalog item for each query
with open("queries.csv", "rb") as f:
    resp = requests.post(
        f"{base_url}/models/{model_id}/infer",
        headers=headers,
        files={"file": ("queries.csv", f, "text/csv")},
        data={"output_type": "json"},
    )

results = resp.json()["data"]
print(results)

See Output Formats for the full output schema.

​Training

​Inference

Training

Inference