An Assessment of Readiness to Teach Data Science in Higher Education
May 10th, 2023
Sinem Demirci, PhD
Postdoctoral Visiting Researcher/Lecturer - UCL
sinemdemirci.github.io
sinemdemirci
sinemmdemirci
drsinemdemirci
s.demirci@ucl.ac.uk
In this talk, I will be talking about
Figure 1. Venn diagram of data science (Mike and Hazzan, 2023)
The US Bureau of Labor Statistics, Occupational Outlook Handbook
LinkedIn Jobs on the Rise 2022: The 25 U.S. roles that are growing in demand: Machine Learning Engineer (4th)
6 In-Demand Data Scientist Jobs in 2023
Indeed Editorial Team outlined the reasons for high demand in data science jobs as follows:
Figure 3. Exploratory design. Source: (Fraenken, Wallen & Hyun, 2012, p.560)
Self-taught – 4 participants
Workshops – 4 participants
Industry experience – 2 participants
Others – 5 (enrolling some DS courses in graduate years, graduated from closely related areas such as Stat and CS.
Workshops – 3 participants
TA trainings – 3 participants
Course/internship – 3 participants
Degree – 1 participant
None – 5 participants
Students are coming from almost every major/department.
Prerequisite Yes – 6; No – 8
Prerequisite to any other course Yes – 11; No – 2; Not sure – 1
Table 1: Class sizes reported by IDS instructors
Class Size | n |
---|---|
300+ | 2 |
200-299 | 1 |
100-199 | 2 |
… | |
30-39 | 2 |
20-29 | 3 |
10-19 | 3 |
1-9 | 1 |
To enhance the trustworthiness of the study, we collected indicators for transferability, dependability, and credibility (Merriam & Tisdell, 2016).
In this part, we present the findings of our qualitative content analysis for a single component of PCK which were categorized into three themes: (1) Knowledge of Syntactic Difficulties; (2) Knowledge of Conceptual Difficulties; and (3) Knowledge of Strategic Knowledge Difficulties.
We also introduce a PCK map of an IDS instructor participated in our study.
Table 2. Knowledge of Students’ Syntactic Difficulties
Categories | Codes |
---|---|
Markup Languages and Reproducibility Tools | HTML, R Markdown, Quarto Markdown, Jupyter Notebook,Linux, Git/GitHub |
Programming Languages | Packages, Libraries, Misspelling, Adapting the Code, How to Read Data |
We categorized conceptual knowledge difficulties into five categories:
The codes that emerged from data are given in Table 3.
Table 3. Knowledge of Conceptual Difficulties
Category | Concepts and Topics |
---|---|
Mathematics | Algorithms, Permutation Testing |
Statistics | Types of Variables, Confidence Interval, Principles of Data Visualization, Hypothesis Testing, Correlation vs. Causality, Bootstrapping, Inductive Inference, Statistical Analysis Methods-Modelling, p-value, Sampling Distribution |
Computer Science | I/O File Management, Working Mechanisms of Markup Languages, Basics of Coding, Filter Function, Basics of Web Scraping, Select Function, Joining Data Sets, Mapping Functions, Loops, Creating Functions |
Domain-Specific Knowledge | Understanding Technical Writing, Understanding the Nature of Data |
Interdisciplinary Knowledge | Ethics, Machine Learning |
“…So certainly, so this so kind of so statistical analysis in so kind of correct statistical analysis in general is a problem. So, everyone is very tempted to just kind of throw any tool they can, they can at the problem and just like, look at the outputs to see if the if the p-value is significant. So, this so I try to instill this kind of skeptical mindset of like, you know, does that, does the model fit? Does the question make sense? … [conversation continues] So that, I would say, is kind of one of the more challenging things to teach.”
Table 4. Knowledge of Students’ Strategic Difficulties
Strategic Knowledge Difficulties | |
---|---|
Debugging | |
Communication | |
Data Wrangling | |
Appreciating the complexity of Interdisciplinary Research | |
Making Appropriate Data Visualization Decisions | |
Creative Thinking | |
Proper Use of Descriptive Statistics | |
Conducting a Good Research | |
Deciding Statistical Analysis Methods-Modelling | |
Working with Real and Messy Data | |
Handling Missing Data | |
Asking Good Questions | |
Web Scraping | |
Setting up Data Science Pipeline |
This study is funded by The Scientific and Technological Research Council of Turkey, TÜBİTAK and University College London.
Collaborators of this project are Dr Mine Dogucu, Assist. Prof. Dr Joshua M. Rosenberg and Teaching Assoc. Prof. Dr Andrew Zieffler
Asamoah, D. A., Doran, D., & Schiller, S. (2020). Interdisciplinarity in data science pedagogy: a foundational design. Journal of Computer Information Systems, 60(4), 370-377, https://doi.org/10.1080/08874417.2018.1496803
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., … & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15-30.
Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(sup1), 27-39, https://doi.org/10.1080/10691898.2020.1860725
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (Vol. 7, p. 429). New York: McGraw-hill.
Kelleher, J. D., & Tierney, B. (2018). Data science. MIT Press.
Magnusson, S.J., Borko, H., & Krajcik, J.S. (1999). Nature, source, and development of pedagogical content knowledge for science teaching. In: Gess-Newsome, J., & Lederman, N., (Eds.), Examining Pedagogical Content Knowledge. United States: Kluwer Press. pp. 95-132
Merriam, S. B. (2009). Qualitative Research: A Guide to Design and Implementation. San Francisco: CA: Jossey-Bass.
Merriam, S. B., & Tisdell, E. J. (2016). Qualitative Research: A Guide to Design and Implementation (Fourth Edition). San Francisco.
Mike K. & Hazzan, O. (February 2023). What is data science? Communications of the ACM, 66(2), 12–13, https://doi.org/10.1145/3575663
National Academies of Sciences, Engineering and Medicine Consensus Report (2018). Data Science for Undergraduates: Opportunities and Options. Washington, https://nas.edu/envisioningds.
Park, S., & Chen, Y. C. (2012). Mapping out the integration of the components of pedagogical content knowledge (PCK): Examples from high school biology classrooms. Journal of research in science teaching, 49(7), 922-941.
Park, S., & Oliver, J. S. (2008). Revisiting the conceptualisation of pedagogical content knowledge (PCK): PCK as a conceptual tool to understand teachers as professionals. Research in science Education, 38, 261-284.
Park, S., & Suh, J. K. (2019). The PCK map approach to capturing the complexity of enacted PCK (ePCK) and pedagogical reasoning in science teaching. In Repositioning pedagogical content knowledge in teachers’ knowledge for teaching science, 187-199.
Qian, Y., & Lehman, J. (2017). Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE), 18(1), 1-24, https://doi.org/10.3102/0002831213477680
Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1-23.
Schwab-McCoy, A., Baker, C. M., & Gasper, R. E. (2021). Data science in 2020: Computing, curricula, and challenges for the next 10 years. Journal of Statistics and Data Science Education, 29(sup1), S40-S50.
Yan, D., & Davis, G. E. (2019). A first course in data science. Journal of Statistics Education, 27(2), 99-109, https://doi.org/10.1080/10691898.2019.1623136