Vijil emerges from stealth

Vijil emerges from stealth

Vijil emerges from stealth

Opinion

vijil

Apr 27, 2024

When developers want to understand how well different Large Language Models (LLM) perform across a common set of tasks, they turn to standard benchmarks such as Massive Multitask Language Understanding (MMLU) and Grade School Math 8K (GSM8K). These benchmarks can also help them track changes in performance as a result of changes in the training data or in the model parameters. Running benchmarks for every change and every model, however, is time consuming and expensive. For example, the time and cost of running the MMLU-Pro, a more robust and challenging version of the MMLU, over three popular LLMs is as follows. 

Use Vijil Evaluate to run MMLU-Pro quickly and cheaply

Vijil Evaluate makes it easy to run MMLU-Pro — either the full or the Lite version — on any LLM to which you have access. Sign up at https://vijil.ai to use the minimalist UI and dashboard. Or just modify this Colab notebook with your provider and model. 

Vijil_Evaluate_MMLU.ipynb

You will need a Vijil API key to get started. To get one, send an email to contact@vijil.ai (tell them Aditya sent you). 

© 2025 Vijil. All rights reserved.

© 2025 Vijil. All rights reserved.

© 2025 Vijil. All rights reserved.