July 11th, 2023
Sinem Demirci, PhD
Postdoctoral Visiting Researcher/Lecturer - UCL
In this talk, I will be talking about
Figure 1. Venn diagram of data science (Mike and Hazzan, 2023)
Self-taught – 4 participants
Workshops – 4 participants
Industry experience – 2 participants
Others – 5 (enrolling some DS courses in graduate years, graduated from closely related areas such as Stat and CS.
Workshops – 3 participants
TA trainings – 3 participants
Course/internship – 3 participants
Degree – 1 participant
None – 5 participants
Students are coming from almost every major/department.
Prerequisite Yes – 6; No – 8
Prerequisite to any other course Yes – 11; No – 2; Not sure – 1
Table 1: Class sizes reported by IDS instructors
Class Size | n |
---|---|
300+ | 2 |
200-299 | 1 |
100-199 | 2 |
… | |
30-39 | 2 |
20-29 | 3 |
10-19 | 3 |
1-9 | 1 |
To enhance the trustworthiness of the study, we collected indicators for transferability, dependability, and credibility (Merriam & Tisdell, 2016).
In this part, we present the findings of study which were categorized into three themes (1) Syntactic Knowledge Difficulties; (2) Conceptual Knowledge Difficulties; and (3) Strategic Knowledge Difficulties.
Table 2. Knowledge of Students’ Syntactic Difficulties
Categories | Codes |
---|---|
Markup Languages and Reproducibility Tools | HTML, R Markdown, Quarto Markdown, Jupyter Notebook,Linux, Git/GitHub |
Programming Languages | Packages, Libraries, Misspelling, Adapting the Code, How to Read Data |
We categorized conceptual knowledge difficulties into five categories:
The codes that emerged from data are given in Table 3.
Table 3. Knowledge of Conceptual Difficulties
Category | Concepts and Topics |
---|---|
Mathematics | Algorithms, Permutation Testing |
Statistics | Types of Variables, Confidence Interval, Principles of Data Visualization, Hypothesis Testing, Correlation vs. Causality, Bootstrapping, Inductive Inference, Statistical Modelling, p-value, Sampling Distribution |
Computer Science | I/O File Management, Working Mechanisms of Markup Languages, Basics of Coding, Filter Function, Basics of Web Scraping, Select Function, Joining Data Sets, Mapping Functions, Loops, Creating Functions |
Domain-Specific Knowledge | Understanding Technical Writing, Understanding the Nature of Data |
Interdisciplinary Knowledge | Ethics, Machine Learning |
“…So certainly, so this so kind of so statistical analysis in so kind of correct statistical analysis in general is a problem. So, everyone is very tempted to just kind of throw any tool they can, they can at the problem and just like, look at the outputs to see if the if the p-value is significant. So, this so I try to instill this kind of skeptical mindset of like, you know, does that, does the model fit? Does the question make sense? … [conversation continues] So that, I would say, is kind of one of the more challenging things to teach.”
Table 4. Knowledge of Students’ Strategic Difficulties
Strategic Knowledge Difficulties | |
---|---|
Debugging | |
Communication | |
Data Wrangling | |
Appreciating the complexity of Interdisciplinary Research | |
Making Appropriate Data Visualization Decisions | |
Creative Thinking | |
Proper Use of Descriptive Statistics | |
Conducting a Good Research | |
Deciding Statistical Analysis Methods-Modelling | |
Working with Real and Messy Data | |
Handling Missing Data | |
Asking Good Questions | |
Web Scraping | |
Setting up Data Science Pipeline |
This study is funded by The Scientific and Technological Research Council of Turkey, TÜBİTAK and University College London.
Collaborators of this project are Dr Mine Dogucu, Assist. Prof. Dr Joshua M. Rosenberg and Teaching Assoc. Prof. Dr Andrew Zieffler
Asamoah, D. A., Doran, D., & Schiller, S. (2020). Interdisciplinarity in data science pedagogy: a foundational design. Journal of Computer Information Systems, 60(4), 370-377, https://doi.org/10.1080/08874417.2018.1496803
Bayman, P., & Mayer, R. E. (1988). Using conceptual models to teach BASIC computer programming. Journal of Educational Psychology, 80(3), 291, https://psycnet.apa.org/doi/10.1037/0022-0663.80.3.291
Danyluk, A., & Leidig, P. (2021). Computing competencies for undergraduate data science curricula: ACM data science task force. Peer-Reviewed Publications, 8, https://scholarworks.gvsu.edu/cispeerpubs/8
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., … & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15-30.
Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(sup1), 27-39, https://doi.org/10.1080/10691898.2020.1860725
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (Vol. 7, p. 429). New York: McGraw-hill.
Merriam, S. B. (2009). Qualitative Research: A Guide to Design and Implementation. San Francisco: CA: Jossey-Bass.
Merriam, S. B., & Tisdell, E. J. (2016). Qualitative Research: A Guide to Design and Implementation (Fourth Edition). San Francisco.
Mike K. & Hazzan, O. (February 2023). What is data science? Communications of the ACM, 66(2), 12–13, https://doi.org/10.1145/3575663
National Academies of Sciences, Engineering and Medicine Consensus Report (2018). Data Science for Undergraduates: Opportunities and Options. Washington, https://nas.edu/envisioningds.
Qian, Y., & Lehman, J. (2017). Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE), 18(1), 1-24, https://doi.org/10.3102/0002831213477680
Yan, D., & Davis, G. E. (2019). A first course in data science. Journal of Statistics Education, 27(2), 99-109, https://doi.org/10.1080/10691898.2019.1623136