Crafting University Experiences:

Perspectives and Practices in Teaching Introductory Data Science

Sinem Demirci
Mine Dogucu
Joshua Rosenberg
Andrew Zieffler

Research Team

Acknowledgements

  • This study has been sponsored by the Scientific and Technological Research Council of Türkiye & University College London, Mathematical and Physical Sciences.


  • Data Collection and preliminary analysis was completed in the Department of Statistical Science at University College London

Agenda


  • Background and Motivation
  • Aim of the Talk
  • Recruitment and Data Collection
  • Similarities & Differences in
    • Introductory Data Science (IDS) Courses
    • IDS Students
    • IDS Instructors
  • Highlights and Shadows in IDS Teaching/Learning Environments


Teaching Data Science in Higher Education


  • new compared to other fields (Kelleher & Tierney, 2018).
  • interdisciplinary (e.g., Asamoah et al., 2020)
  • unique challenges in determining the scope (Yan & Davis, 2019)


Some initiatives such as

Motivation Behind This Talk


“I think introduction to data science is a pretty difficult course to teach… So, the other [intro course that I taught] was quite stable. But from my current experience with data science, stable will not be the word that I will describe.” [Participant 14]


“I hope that what I’m doing here [teaching IDS course] is useful” [Participant 13]


I would say something I think about on this course a lot is… Is it necessary? Would students be better served by [other] classical intro data science course where they are doing [something else]. And I don’t know the answer. [Participant 09]


I know that other people have different configurations of introductory data science, and that includes … some aspects of statistics. So, I am curious about ultimately what you [this research] find out about that, and what is the right configuration? [Participant 07]

Aim of This Talk

  • to provide a glimpse into the practices of introductory data science courses with qualitative data.


  • to share range of examples across following dimensions:
    • content choices
    • student background
    • instructional approaches
    • course capacity


  • NOT prescribe the ideal configuration for IDS courses.

Methodology


We chose qualitative research design (Merriam, 2009).


Our aim was to understand


  • how IDS instructors interpret their teaching experiences in IDS courses

  • “what meaning they attribute to their experiences” (Merriam, 2009, p. 23).

Procedure to Recruit Participants


We recruited participants…

  • who taught an IDS course at least twice at the undergraduate level.

  • whose course titles included Data Science and one of the following keywords: Introduction, Principles, Elements or Fundamentals

  • 16 participants (2 pilot, 14 main study)

  • Gift card for their time.

Data Collection and Analysis

IDS Teaching/Learning Environments


Source:PHDCOMICS

IDS Teaching/Learning Environments - 1


Table 1: Institution type

Institution n
Research University 6
Liberal Arts College 8


Table 2: Where IDS Courses Are Offered

Course Offering Unit n
Mathematics 3
Mathematics and Statistics 2
Statistics 2
Center/Institute 2
Statistics and Data Science 2
Mathematics and Computer Science 1
Data Analytics 1
Other 1

IDS Teaching/Learning Environments - 2


Table 3: Prerequisite Courses of IDS Courses


Yes No Not Sure
Prerequisite 6 8 0
Prerequisite to follow-up 11 2 1

IDS Teaching/Learning Environments - 3

Table 5: Class Sizes

Class Size n
300+ 2
200-299 1
100-199 2
30-39 2
20-29 3
10-19 3
1-9 1

Large IDS classrooms tend to :

  • be in research universities,
  • have a teaching team (co-instructors, TAs, Graders, Student Mentors),
  • provide auto-graded feedback.


IDS instructors teaching in small class sizes tend to provide more details on:

  • their orientation to teach IDS,
  • topic-specific teaching strategies,
  • students’ learning & motivation
  • formative assessment practices.

IDS Instructors and Their Choices

Who are these IDS Instructors?


  • Participating from North America
  • Terminal degrees in varying subjects including statistics, mathematics, computer science, genetics, and economics.
  • Varying number of years of experience in teaching IDS, ranging from 1 to 10 years of experience.

Formal Training in Data Science

Formal Training in Teaching


Orientations to Teach IDS


Their perception related to purposes, goals, and reasons for teaching data science reflect three orientations.

  • Enhance Data Literacy/ Teach to Learn from Data

  • Familiarize Students with a Programming Language

  • Attracting Students to Major/Minor in Data Science

Content Choices of IDS Instructors

  • All IDS instructor teach
    • Introduction to Data Science
    • Data and Variables
    • Data Visualization
    • Data Wrangling
  • Some IDS instructor teach
Statistical Inference 7
Ethics 6
Introduction to Machine Learning 4
Text Analysis 4
Clustering 3
Programming Language n
R 7
Python 2
Both Python and R 2
Both SQL and R 2
No programming language 1


  • Most IDS instructors prepared their own materials.

Instructional Approaches

Subject-Specific Teaching Strategies

  • Every IDS instructor uses more than one teaching strategy.

Table 8: Most Commonly Used Teaching Strategies

Teaching Strategy n
Lecturing 9
Live Coding 8
Questioning 3
Group Work 3
Think-Pair-Share 3

Topic Specific Teaching Strategies

  • Only 3 IDS instructors mentioned topic-specific teaching strategies:

    • Storytelling while introducing real-world cases

    • Questioning while teaching data ethics

    • Role Playing while teaching how to join data sets

    • Tactile Simulation while teaching sampling distribution

Dynamic Nature of IDS Teaching


  • Almost all IDS instructor changed their teaching over time. No pattern related to years of experience was observed.

    • Adding/removing a prerequisite

    • Refined content choices

    • Refined pedagogical choices

    • No change (2 participants)

    • No longer co-teaching (2 participants)

IDS Students

Who are IDS Students?

  • Students come from almost every major/department & every grade level.

    • Majority – Mathematics, Statistics, Computer Science, Data Science
    • Others – Engineering, Business School, Social Science, Economics, Humanities, Life Sciences, Environmental Science, Political Science, Health Science, Undecided

Students’ Difficulties in IDS Courses

  • Students without a programming background tend to spend more time learning coding.

  • Almost every IDS student experiences difficulties in developing solid strategic knowledge*.

  • Strategic Knowledge: integrating syntactic knowledge of programming and conceptual knowledge of disciplines to solve novel problems (Qian &Lehman, 2017).
  • More details are in our paper: (Demirci et al., 2023). Learning Difficulties of Introductory Data Science Students.

Misconceptions and Stereotypes


  • Every IDS instructor shared their observations related to IDS students’ misconceptions, stereotypes and possible sources.

Some Example Statements

  • If you can code quickly, you’re a good data scientist.

  • They’re either good or bad at data science (lack of growth mindset).

  • Data science is a field to be a collection of big hammers that if they learn how to swing each of these hammers, they could whack every problem in the world.

Potential Sources:

  • Society, Popular Culture, Cultural Biases, Lack of Knowledge, YouTube…

Frustration and Challenges of Students


“They are frustrated that it’s not a simple task, very similar to math anxiety.” [Participant 04]


“At first, they don’t understand that programming is trial and error a lot.” [Participant 11]


“Their motivation decreases when they see error message.” [Participant 11 & 12]


“Sometimes they are frustrated when they failed to complete a task in R.” [Participant 15]





Seeing end products in data science creates an outcome-centric perception bias [among students].

[Participant 07]

Highlights and Shadows in IDS Education

Highlights

Shadows


  • How do IDS instructors balance teaching in different class sizes and diverse majors?

  • What is missing in large class sizes?

  • Lack of guidelines/student learning outcomes for IDS Courses

    • Despite high self-efficacy beliefs, the IDS instructors wondered how well their course aligns with the growing consensus on

      “What should an introductory data science course be?”

Conclusion


  • We are still beta-testing what to teach and how to teach an IDS course.

    • More systematic studies in IDS education with empirical data are needed.

    • A policy-level document is required to guide us along the way.

References


Asamoah, D. A., Doran, D., & Schiller, S. (2020). Interdisciplinarity in data science pedagogy: a foundational design.Journal of Computer Information Systems,60(4), 370-377, https://doi.org/10.1080/08874417.2018.1496803

De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., … & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science.Annual Review of Statistics and Its Application,4, 15-30.

Demirci S., Dogucu, M. Zieffler A. & Rosenberg, J.M. (2023). Learning Difficulties of Introductory Data Science Students. In E. Jones (Ed) Proceedings of the International Association for Statistical Education Satellite Conference. https://escholarship.org/uc/item/01p3k7f3

Kelleher, J. D., & Tierney, B. (2018). Data science. MIT press.

Merriam, S. B. (2009). Qualitative Research: A Guide to Design andImplementation. San Francisco: CA: Jossey-Bass.

National Academies of Sciences, Engineering and Medicine Consensus Report (2018). Data Science for Undergraduates: Opportunities and Options.Washington,https://nas.edu/envisioningds.

Qian, Y., & Lehman, J. (2017). Students’ misconceptions and other difficulties in introductory programming: A literature review.ACM Transactions on Computing Education (TOCE),18(1), 1-24,https://doi.org/10.3102/0002831213477680

Schwab-McCoy, A., Baker, C. M., & Gasper, R. E. (2021). Data science in 2020: Computing, curricula, and challenges for the next 10 years. Journal of Statistics and Data Science Education, 29(sup1), S40-S50.

Yan, D., & Davis, G. E. (2019). A first course in data science. Journal of Statistics Education, 27(2), 99-109,https://doi.org/10.1080/10691898.2019.1623136

Thank you!