
Evaluating and documenting AI models and data across performance dimensions informed by system and mission constraints. Provides a high-level overview of critical T&E concepts that will be …
To evaluate DeepSeek models, which are available as open-weight models, CAISI downloaded their model weights from the model sharing platform Hugging Face and deployed the models on CAISI’s …
Large Language Models Pretraining (and how to train transformers for language modeling) Pretraining The big idea that underlies all the amazing performance of language models retrain a transf
Large-language models (LLMs) and multi-modal models like text and image are enabling new capabilities, from code generation to the creation of images based on natural language descriptions.
Dual-use foundation models with widely available model weights (referred to in this Report as open foun-dation models) introduce a wide spectrum of benefits. They diversify and expand the array of actors, …
Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a …
In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data.