The new phi-4 model is another classic example of data quality >>> data quantity.
We needed "big data" to get a foundation model, but need "quality data" to get to AGI.
A "foundation model" using "big data" is akin from a child growing from the age of 0 to 10 building a foundational model of the world.
However, intelligence that drives changes is usually founded on the formative years of 10 to 20 where the quality of your environment and surroundings are a lot more critical.
> We present phi-4 [...] focused on data quality. Unlike most language models [...] strategically incorporates synthetic data throughout the training process. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size [...] due to improved data [...]
I've been thinking of rewriting this whole blog post as my thinking is still mostly the same, but how I want to communicate this problem and opportunity has fundamentally changed.
The new phi-4 model is another classic example of data quality >>> data quantity.
We needed "big data" to get a foundation model, but need "quality data" to get to AGI.
A "foundation model" using "big data" is akin from a child growing from the age of 0 to 10 building a foundational model of the world.
However, intelligence that drives changes is usually founded on the formative years of 10 to 20 where the quality of your environment and surroundings are a lot more critical.
> We present phi-4 [...] focused on data quality. Unlike most language models [...] strategically incorporates synthetic data throughout the training process. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size [...] due to improved data [...]
https://simonwillison.net/2024/Dec/15/phi-4-technical-report/#atom-everything
https://arxiv.org/abs/2412.08905
---
I've been thinking of rewriting this whole blog post as my thinking is still mostly the same, but how I want to communicate this problem and opportunity has fundamentally changed.