We are excited to announce the 10 finalists named in the Datasets for Education Innovation track of the 2026 Tools Competition!
In this track, 150 teams sought to respond to the field’s growing need for high-quality, publicly available datasets that serve as shared infrastructure for education research and development. Winners will receive $100,000 to prepare and release robust, education-focused datasets.
Interest in this track has grown significantly, reflecting a broader emphasis on responsible data sharing and the role of data as a public good. Teams are working to expand access to datasets that enable replication, benchmarking, and more rigorous research, while reducing barriers to innovation. Their work also addresses critical challenges related to privacy, bias, and the responsible use of data in AI-driven systems.
Finalists of the Datasets for Education Innovation Track
- AIED-Unplugged Dataset | CESAR (Brazil)
The AIED-Unplugged Dataset will be composed of 100,000 images of students’ handwriting activities to inform the development of AI models that understand and support learning in public classrooms. - A Multimodal College-Level Writing and Feedback Dataset | University of Pittsburgh (USA)
The team presents a multimodal college-level writing dataset comprising student essays, instructor feedback, and office hours audio recordings, designed to support the study of writing and confidence development over time and to benchmark AI-generated feedback for idea development. - AskTilli: 360 Foundational Skills Dataset | Tilli (Jordan, Sri Lanka, India, and USA)
AskTilli will collect longitudinal cognitive and Social Emotional Learning skills data of 4-10-year-olds in diverse, non-WEIRD contexts in South Asia and the Middle East to help inform the design of equitable, culturally relevant AI models for curriculum design, learning outcomes prediction, assessment design, and other foundational learning models. - Curiosity Through Questions | Child-Centered AI Lab at Harvard Graduate School of Education (USA)
Curiosity Through Questions will capture elementary school students’ authentic question-asking behaviors with an AI chatbot. - Erandi Aprende Learning Data (United States, Mexico, and Chile)
The Erandi Aprende Learning Data captures teacher–student interactions, project designs, and learning outcomes to improve bilingual, project-based STEAM education through data-driven insights and adaptive AI models. - Multimodal Human-AI Tutoring | Carnegie Mellon University (USA)
Multimodal Human-AI Tutoring/PLUS is a multimodal dataset of year-long human-AI math tutoring with 1500+ low-income and diverse 6–8th graders and 200+ college-student tutors, merging ITS logs (MATHia, i-Ready, MobyMax) with Zoom transcripts, audio, and AI video summaries. It enables causal modeling of math learning to better understand mechanisms behind effective, equitable instruction. - Open Multimodal Dataset for Inclusive Learning Analytics (OMDILA) (Uganda)
OMDILA will collect multimodal learner interaction data (audio, text, and performance metrics) to understand and improve inclusive, multilingual AI-driven education systems in low-resource African contexts. - SkillFlix for Autistic Young Adults | dfusion, Inc (USA)
SkillFlix AYA dataset will collect conversation-embedded communication skill data to support social and educational interventions for neurodiverse learners. - South Asian K-5 Oral Reading Assessment Dataset | Beaj Education (Pakistan)
Using the Beaj Literacy Bot, we collect and tag oral reading data from South Asian K–5 learners to create an open dataset and advance AI tools for accent-inclusive literacy assessment. - The School Climate Data Commons Project | Yale Center for Emotional Intelligence (USA)
The School Climate Walkthrough Dataset includes more than 30,000 students’ perceptions of safety and belonging within their school environment, upon which we can continue to build to better understand national strengths, opportunities, and disparities in school experience for youth.
See finalists for all tracks here.
The Tools Competition has three phases of evaluation. As finalists enter the third and final phase, track finalists will pitch their tool before a panel of judges who will nominate winners of the competition.
The Tools Competition has previously named 150 winners from 48 countries, reaching nearly 50 million learners and educators worldwide, from early childhood to adulthood. Winners for all tracks in the 2026 competition cycle will be announced in summer 2026.
The 2026 Tools Competition is a program of Renaissance Philanthropy, organized by The Learning Agency, and is supported by: the Walton Family Foundation, Griffin Catalyst, Axim Collaborative, Oak Foundation, Cinelli Family Foundation, Gordon and Betty Moore Foundation, and Chan Zuckerberg Initiative.
Want to be kept in the loop for the next competition cycle? Join our mailing list!



