Ignacio Luri – Synthetic Data: A New Tool for Teaching at DePaul

Ignacio Luri (Assistant Professor, Department of Marketing) presented “Synthetic Data as a Teaching Tool” at the AI in Teaching Symposium, May 31, 2024.

In his presentation, Luri explores the innovative use of synthetic data in the classroom. Synthetic data, as Luri explains, is artificially created data that mimics real-world data. This isn’t a new concept; simulations, for example, have long been used in teaching.

What’s new is the application of AI, particularly generative AI models, to create synthetic data. This AI-generated synthetic data simulates human responses, such as survey or interview data. While the validity of synthetic data for research or real-world applications is still being debated, its potential for teaching is significant.

AI as a Research Participant

Luri highlights the idea that large language models (LLMs) can effectively act as research participants in a teaching context. This opens exciting possibilities for engaging students in practical learning experiences.

Examples of Synthetic Data Use in Teaching

Here are three examples of how Luri uses synthetic data in his classes:

  1. Teaching Interview Skills: Students can practice interview techniques in a safe and controlled environment by interviewing a chatbot powered by ChatGPT. This allows them to apply their learning and receive feedback on their skills without the pressure of a real interview.
  2. Quantitative Data Analysis: Synthetic data can be used to teach quantitative data analysis skills. By generating synthetic survey data with specific characteristics, students can practice data cleaning, descriptive statistics, and even predictive modelling.
  3. Positioning and Segmentation: Tools like Synthetic Users can generate diverse synthetic participant profiles with unique backgrounds, personalities, and demographics. This allows students to explore real-world marketing and social science concepts like positioning, segmentation, and persona development.

Benefits of Using Synthetic Data

  • Accessibility: Synthetic data allows students to engage with data analysis and research methods without needing access to real-world data, which can be difficult or costly to obtain.
  • Safety and Ethics: Synthetic data avoids ethical concerns related to using sensitive or private real-world data, especially when teaching students.
  • Control and Experimentation: Instructors can control the characteristics of synthetic data to create specific scenarios and experiment with different variables, providing tailored learning experiences.

Luri believes that synthetic data will become increasingly mainstream in education. It’s a powerful tool for teaching research methods, data analysis, and critical thinking skills. Beyond these practical applications, synthetic data also provides a platform for discussing the broader ethical implications of AI, such as representation, bias, and human substitution.