Discover and install security-vetted AI agent skills for Claude Code, Cursor, GitHub Copilot, OpenClaw and more. Hebrew-ready, trusted, and open source.
1 skills found
Benchmark and compare LLMs on Hebrew reasoning, comprehension, sentiment, translation, and Israeli cultural knowledge. Wraps the HuggingFace Open Hebrew LLM Leaderboard tasks (HeQ, HebrewSentiment, Hebrew Winograd, translation) plus DictaLM 3.0 benchmark tasks (Summarization, Nikud, Israeli Trivia) into a reproducible evaluation harness. Runs evals against Claude, GPT, Gemini, AI21 Jamba, DictaLM, Llama, and local HuggingFace models. Produces comparison scorecards in JSON and markdown. Use when choosing an LLM for a Hebrew product, answering procurement questions about Hebrew performance, validating a fine-tuned Hebrew model, or tracking Hebrew regressions after a model upgrade. Do NOT use for Arabic NLP, ASR benchmarking, or general English benchmarks.