A Multimodal College-Level Writing and Feedback Dataset

Announcing the WInners of the 2026 Tools Competition!

2026 Winner

A Multimodal College-Level Writing and Feedback Dataset

Collecting writing, feedback, scoring, and classroom data to improve AI-supported writing instruction

Department of Computer Science, School of Computing and Information, University of Pittsburgh

United States of America

Focus Area:

Education Datasets

Prize Level:

Dataset Prize

Project Description

Our dataset consists of student essays with detailed instructor feedback (written or audio) across multiple drafts, collected from first-year English as a First Language (ESL) and creative writing courses at the University of Pittsburgh’s Department of English. Unlike existing writing-feedback corpora, the essays are substantially longer (2 to 5 pages for ESL, 5 to 8 pages for creative writing), and the feedback targets idea development, narrative plotting, readerly engagement, and clarity rather than surface-level error correction. To date, we have collected 566 essays from 157 students across nine classes and aim to collect 15–20k essays from 500 students. Future iterations will incorporate office-hour audio recordings to capture instructor-student interactions. This dataset supports education research on how students develop as writers through feedback, as well as AI and NLP applications such as feedback classification, generation, and personalized writing support systems.