Navigate the fragmented landscape of Hebrew and Yiddish ML datasets and models. Covers ivrit.ai (22K+ hours of Hebrew audio, whisper-large-v3 ASR variants, Yiddish models), Dicta (DictaLM 3.0 LLM family, DictaBERT variants, HeQ reading comprehension), the Israeli National NLP Program / NNLP-IL (HebrewSentiment, HebNLI), AlephBERT, and Knesset Plenums. Helps researchers and ML engineers pick the right dataset for a task by use case, license (commercial vs research), Hebrew register coverage, and model-dataset pairing. Use when choosing training data for a Hebrew NLP or ASR project, verifying license compatibility for a commercial product, finding a baseline model for a Hebrew downstream task, or exploring Yiddish ML resources. Do NOT use for Arabic NLP, general HuggingFace dataset discovery, or Hebrew OCR dataset selection (use hebrew-ocr-forms).
Trust score 81/100 (Trusted) · 2 GitHub contributors · MIT license
The Israeli ML community punches above its weight, but the datasets and models are scattered. ivrit.ai publishes world-class Hebrew speech corpora on one HuggingFace org, Dicta publishes Hebrew LLMs and BERT variants on another, the Israeli National NLP Program maintains benchmarks under HebArabNlpProject. Licenses vary from fully commercial-friendly to research-only. A researcher trying to pick the right combination for fine-tuning a Hebrew sentiment classifier on customer support chat for a commercial product has to hunt across five orgs and read every dataset card.
npx skills-il add skills-il/developer-tools --skill hebrew-ml-datasets-navigator -a claude-codeI want to train a sentiment classifier on Hebrew customer support chat for a commercial SaaS product. Which dataset should I use, which starting model, and what does the license say about attribution?
I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?
I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?
I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?
Build and configure Make.com scenarios for Israeli business processes, including Morning (formerly Green Invoice) sync, iCount accounting, Monday.com board automation, Priority ERP data exports, WhatsApp Business messaging, and payment gateways (Cardcom, Tranzila, Grow, Bit). Covers Make.com AI Agents, Israel 2026 Invoice Reform, community modules for Israeli apps, Hebrew data transformations, Data Store for VAT period tracking, and Shabbat-aware scheduling. Do NOT use for n8n workflows (use n8n-hebrew-workflows) or Zapier Zaps (use zapier-israeli-integrations).
Build and manage shipping integrations with Israeli carriers, including Israel Post, Cheetah, HFD, and Mahir Li, plus locker pickup services (BOX2GO, Shlager, Done). Use when user asks about "shipping Israel", "Cheetah delivery", "meshloach", "shipping label", "HFD", "locker pickup Israel", or setting up carrier integrations for an e-commerce store. Covers carrier selection, Israeli address formatting, label generation, cross-carrier tracking system setup, and customer delivery notifications. Do NOT use for looking up a specific package tracking status (direct users to mypost.israelpost.co.il or hfd.co.il). Do NOT use for international shipping outside Israel or customs/import.
Build Hebrew voice bots and IVR systems with speech-to-text, text-to-speech, and telephony integration for Israeli businesses. Covers OpenAI Whisper Hebrew, Google Cloud STT/TTS, Azure Speech, Amazon Polly, IVR menu design for Sunday-Thursday hours, voicemail transcription, accent handling, and +972 phone integration. Do NOT use for text-based chatbots (use hebrew-chatbot-builder) or Hebrew NLP without voice (use hebrew-nlp-toolkit).
Want to build your own skill? Try the Skill Creator · Submit a Skill