by @skills-il
Guide developers in using Hebrew NLP models and tools including DictaLM, DictaBERT, AlephBERT, and ivrit.ai. Use when user asks about Hebrew text processing, Hebrew NLP, "ivrit", Hebrew tokenization, Hebrew NER, Hebrew sentiment analysis, Hebrew speech-to-text, or needs to process Hebrew language text programmatically. Covers model selection, preprocessing, and Hebrew-specific NLP challenges. Do NOT use for Arabic NLP (different tools) or general English NLP tasks.
npx skills-il add skills-il/localization --skill hebrew-nlp-toolkit| Task | Recommended Model | Size | Notes |
|---|---|---|---|
| Text generation | DictaLM 3.0 (14B) | 14B | Best Hebrew generation |
| Classification | DictaBERT | 110M | Fast, good accuracy |
| NER | DictaBERT-NER | 110M | Trained on Hebrew NER dataset |
| Sentiment | DictaBERT-Sentiment | 110M | Hebrew sentiment classification |
| Embedding/Search | AlephBERT | 110M | Good for similarity tasks |
| Speech-to-text | ivrit.ai Whisper | Various | 22K+ hours training data |
| Translation | DictaLM 3.0 (7B) | 7B | Hebrew to/from English |
| Tool calling | DictaLM 3.0 Chat | 7B/14B | Supports function calling |
DictaBERT (classification tasks):
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictabert")
model = AutoModelForSequenceClassification.from_pretrained("dicta-il/dictabert")DictaLM 3.0 (generation):
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictalm-3.0-7b-chat")
model = AutoModelForCausalLM.from_pretrained("dicta-il/dictalm-3.0-7b-chat")ivrit.ai Whisper (speech-to-text):
import whisper
# Use ivrit.ai fine-tuned model
model = whisper.load_model("ivrit-ai/whisper-large-v3-he")Before feeding text to models:
import re
import unicodedata
def preprocess_hebrew(text):
# Normalize Unicode
text = unicodedata.normalize('NFC', text)
# Remove niqqud (diacritics) - range U+0591 to U+05C7
text = re.sub(r'[\u0591-\u05C7]', '', text)
# Normalize whitespace
text = re.sub(r'\s+', ' ', text).strip()
return texthttps://huggingface.co/dicta-ilhttps://huggingface.co/ivrit-aihttps://huggingface.co/onlplab/alephbert-baseUser says: "I need to classify Hebrew customer reviews as positive or negative" Result: Guide to use DictaBERT-Sentiment with fine-tuning on domain data.
User says: "Extract company and person names from Hebrew articles" Result: Use DictaBERT-NER model, demonstrate with example text.
scripts/preprocess_hebrew.py — Normalize Hebrew text before feeding it to NLP models (DictaBERT, DictaLM, AlephBERT). Handles Unicode NFC normalization, niqqud removal, whitespace cleanup, URL stripping, shekel symbol normalization, and mixed Hebrew-English text segmentation. Run: python scripts/preprocess_hebrew.py --helpreferences/model-comparison.md — Side-by-side comparison of Hebrew NLP models (DictaLM 3.0, DictaBERT, AlephBERT, ivrit.ai Whisper, Hebrew-Gemma) with VRAM requirements, HuggingFace IDs, and a task-to-model mapping table. Consult when choosing which model to use for a specific Hebrew NLP task.Cause: Hebrew morphology splitting prefixes (b-, k-, l-, m-, sh-, v-) Solution: This is expected behavior. Hebrew words like "bveit" (in the house) are split into morphemes.
Cause: DictaLM 14B requires ~28GB VRAM Solution: Use the 7B or 1.7B variant, or quantize with bitsandbytes (4-bit).
Supported Agents
Trust Score
This skill can execute scripts and commands on your system.
by @skills-il
Schedule meetings, deployments, and events respecting Shabbat, Israeli holidays (chagim), and Hebrew calendar constraints. Use when user asks to schedule around Shabbat, "zmanim", check Israeli holidays, plan around chagim, set Israeli business hours, or needs Hebrew calendar-aware scheduling logic. Includes halachic times (zmanim) via HebCal API, full Israeli holiday calendar, and Israeli business hour conventions. Do NOT use for religious halachic rulings (consult a rabbi) or diaspora 2-day holiday scheduling.
by @skills-il
Write and edit professional content in Hebrew including marketing copy, UX text, articles, emails, and social media posts. Use when user asks to write in Hebrew, "ktov b'ivrit", create Hebrew marketing content, edit Hebrew text, write Hebrew UX copy, or optimize Hebrew content for SEO. Covers grammar rules, formal vs informal register, gendered language handling, and Hebrew SEO best practices. Do NOT use for Hebrew NLP/ML tasks (use hebrew-nlp-toolkit) or translation (use a translation skill).
by @skills-il
Implement right-to-left (RTL) layouts for Hebrew web and mobile applications. Use when user asks about RTL layout, Hebrew text direction, bidirectional (bidi) text, Hebrew CSS, "right to left", or needs to build Hebrew UI. Covers CSS logical properties, Tailwind RTL, React/Vue RTL, Hebrew typography, and font selection. Do NOT use for Arabic RTL (similar but different typography) unless user explicitly asks for shared RTL patterns.