Public Assets for Better EdTech

Now in its fifth cycle, the Tools Competition has consistently placed learning engineering at the heart of the competition in order to drive progress in the field of learning science. Strong competitors showcase how their tools embody learning engineering principles, offering opportunities for both their teams and external researchers to gain deeper insights that enhance our understanding of learning.

The 2025 competition cycle aims to enhance this commitment by introducing (1) a competitive priority for teams that generate or share public assets as part of their proposal and (2) a Dataset Prize for teams preparing and releasing education-focused datasets.

Read on to learn more and understand what ideas may be competitive.

As a field, we have the opportunity to advance the development of more reliable, equitable, and effective technology by expanding the availability of public data and resources (public assets) that can be used to train other platforms. 

Public assets may include:

  • Datasets for training or evaluating AI models (see more below). These datasets are ideally provided in an ‘AI-ready’ format, meaning their size, structure, and features have already been developed and processed for effective modeling. 
  • Open-source AI models, where users have free access to underlying source code and can use or modify the code (e.g., fine-tune the models) with other custom datasets.
  • Other open-source frameworks and tools to streamline the process for building AI models (e.g., plug-ins or software packages for data preprocessing, model training, model inference, or model evaluation).

Visit the “Build” section of the Learning Engineering Hub to see examples of public assets and other helpful learning engineering resources. 

Given the opportunity for the Dataset Prize in the 2025 Tools Competition, this post takes a closer look at the types of datasets that would be compelling and how this can be reflected in your proposal. 

What sorts of datasets would be competitive?

While engineers have access to some educational datasets (e.g., student level assessment data), more robust and higher quality data can spur innovation and help respond to the questions of privacy, bias, and equity that remain in the use of AI in education. 

Benchmark datasets, specifically, can support the development of tools reliably serving diverse student populations and capturing a holistic picture of the learning. These are unique compared to other datasets because of their novel use in training and evaluating AI models. A benchmark dataset can help assess machine performance, as the basis for comparison of other models. 

Benchmark datasets: 

  • Are novel. They include data on something that isn’t already accessible – like the ASAP data which pioneered auto essay scoring algorithms or the Mind Wandering dataset which uses a webcam-based eye tracker to track mind wandering during learning.
  • Include high quality data. Quality may refer to the completeness and structure of the data, the sampling method, representation, the way the dataset addresses issues of bias and fairness, and other markers.
  • Have the potential to lead to a quality or generalizable algorithm. The robustness and quality of the data can train an algorithm with strong potential for impact and reach. 

A few key types of benchmark datasets would be particularly impactful:

  • Expanded datasets with more metadata. Among existing datasets, there is a lack of substantial metadata providing additional context on students, teachers, and the classroom. 

With limited data, engineers can train models that work for the general population, but this leaves out marginalized and other special learner populations. Expanded datasets would help customize AI models to ensure they serve diverse student populations and evaluate fairness and bias. Examples of metadata may include elements such as grade level, English-language learner status, race/ethnicity, special needs/IEP status, and others.

  • Multimodal datasets. These include data in multiple modalities, such as text, images, audio, and video. Multimodal datasets are valuable because they offer a more comprehensive and holistic view of students, enabling models to predict learning trajectories more accurately. This potential, however, remains underexplored in public education, likely due to the lack of consensus on how to best maintain student privacy. 

An example of a tool that captures multimodal data could be an AI reading tutor that records students’ audio during reading exercises, videos their interactions with peers, and tracks their reading performance over time. 

What is a competitive priority? How do I apply?

A subset of prizes are reserved for proposals meeting the designated competitive priority for public assets, ensuring that there are aligned proposals among the final slate of competition winners. You do not need to apply to the competitive priority directly. In Phase I, competitors will simply indicate alignment of their proposal to the competitive priority. In Phase II, competitors will have additional space in the proposal form to elaborate.

What is the Dataset Prize? How do I apply?

In addition to the competitive priority for public assets, the 2025 competition is offering prizes of $50,000 to teams preparing and releasing education-focused datasets. Competitors may apply directly to the Dataset Prize, or may include this as a supplement to their track proposal and be eligible for an additional $50,000. Follow the guidance in the submission form. Learn more here.

Which option is right for me and what could this look like in my proposal? 

Public assets may emerge as a byproduct of your tool or be a central focus of your proposal. If you plan to dedicate significant capacity and funding to preparing and releasing high quality datasets, consider applying to the Dataset Prize. Otherwise, we recommend focusing on your core track proposal and detailing its alignment to the competitive priority. In this case, you will not be eligible for supplemental funding, but it will give your proposal a competitive boost.

In Phase II, you will have dedicated space to elaborate on your plans, by: 

  • Describing the public assets you will generate including what they include; detail on their development, content, and quality; and their potential applications. 
  • Detailing your strategy for making these assets publicly available. Explain how you will ensure these assets reach those who need them to support advancements in the field. 

RECENT POSTS

The 2025 Tools Competition is live! Register for an Info Session on Sept. 19.