← Semitic Search

API documentation

All endpoints are open and rate-limited at the Vercel edge. Response format is JSON. No auth required for read endpoints.

GET /api/cognates

Find every lemma sharing a canonical root across the 17 Semitic varieties in our index. Supports both strict identity matching and Proto-Semitic reflex-aware fuzzy matching.

Query parameters

  • canonical — space-separated phonetic key (required). e.g. k t b
  • fuzzy1 to include PS-reflex-aware matches. Default: strict identity only.
  • per_lang — max lemmas per language. Default 5, cap 10.
  • exclude_lang — exclude a language (usually the query source).

Example

curl "https://semitic-search.andy-barr.com/api/cognates?canonical=k+l+b&fuzzy=1"

Returns {languages, language_names, lemmas, lemma_count, lang_count} with per-lemma via_reflex and attestation fields where available.

GET /api/reflexes

Inverse query: given a Proto-Semitic reconstructed root (e.g.*Ḏ-H-B), enumerate every surface reflex the reflex table predicts and return lemmas attested in each language.

Query parameters

  • proto — space-separated PS labels, uppercase (e.g. Ḏ H B). Required.
  • per_lang — max lemmas per language. Default 5, cap 10.

Example

curl "https://semitic-search.andy-barr.com/api/reflexes?proto=%E1%B8%8E+H+B"
# Ḏ H B — returns Ar ḏ-h-b, He z-h-b, Syc d-h-b reflexes plus lemmas

POST /api/reconstruct

Given a set of cognate roots across languages, infer the Proto-Semitic ancestor with per-slot confidence. Accepts either raw script (Arabic/Hebrew/Syriac/etc.) or pre-canonicalized keys.

Body

{ "cognates": [
    ["ar", "ذ ه ب"],
    ["he", "ז ה ב"],
    ["syc", "ܕ ܗ ܒ"]
]}

Example

curl -X POST "https://semitic-search.andy-barr.com/api/reconstruct" \
  -H "content-type: application/json" \
  -d '{"cognates":[["ar","ذ ه ب"],["he","ז ה ב"]]}'

# → { "ps_root": "Ḏ H B", "overall_confidence": 1.0,
#     "slots": [{ "position": 1, "ps_label": "Ḏ", "confidence": 1.0,
#                 "supporters": ["ar","he"], "dissenters": [], ... }, ...] }

Data downloads

Bulk JSON for the top 60 polyglot root families, plus the empirical reflex weight table derived from editor-curated cognate claims.

curl "https://semitic-search.andy-barr.com/data/root_families.json"
curl "https://semitic-search.andy-barr.com/data/family_slugs.json"
curl "https://semitic-search.andy-barr.com/data/reflex_weights.json"

Individual root families also have JSON + BibTeX download buttons on their /roots/[slug] pages.

Attribution

Wiktionary data used under CC-BY-SA. OSHB (Open Scriptures Hebrew Bible) under CC-BY. Quranic Arabic Corpus (Dukes 2011) under GPL. Built-in reflex table is hand-curated from Lipiński (2001), Huehnergard (2000), and Moscati (1964). Please cite the underlying sources when re-publishing data.