From Instructor Pedagogy to Policy Perspectives
Sinem Demirci, PhD
Figure 1. Self-Conceptualization of Data Science Education Layers
Figure 2. Venn diagram of data science (Mike and Hazzan, 2023)
The US Bureau of Labor Statistics, Occupational Outlook Handbook
LinkedIn Jobs on the Rise 2022: The 25 U.S. roles that are growing in demand: Machine Learning Engineer (4th)
Indeed Editorial Team outlined the reasons for high demand in data science jobs as follows:
6 In-Demand Data Scientist Jobs in 2024
Image by jcomp on Freepik
Undergraduate Data Science Education at the Instructor Level
Self-taught – 4 participants
Workshops – 4 participants
Industry experience – 2 participants
Others – 5 (enrolling some DS courses in graduate years, graduated from closely related areas such as Stat and CS.
Workshops – 3 participants
TA trainings – 3 participants
Course/internship – 3 participants
Degree – 1 participant
None – 5 participants
Students are coming from almost every major/department.
Prerequisite Yes – 6; No – 8
Prerequisite to any other course Yes – 11; No – 2; Not sure – 1
Table 2: Class sizes reported by IDS instructors
Class Size | n |
---|---|
300+ | 2 |
200-299 | 1 |
100-199 | 2 |
… | |
30-39 | 2 |
20-29 | 3 |
10-19 | 3 |
1-9 | 1 |
Data analysis starts simultaneously with the data collection and proceeded iteratively throughout the qualitative research.
Qualitative content analysis (Merriam & Tisdell, 2016) was used for generating a comprehensive codebook.
To enhance the trustworthiness of the study, we collected indicators for transferability, dependability, and credibility (Merriam & Tisdell, 2016).
There were three recurring orientations among the IDS instructors:
In this study, IDS instructors shared their teaching styles, revealing their potential level of PCK as well as the interactions between its elements.
Although data analysis is still ongoing, initial findings have started to emerge. We anticipate unveiling interaction maps of IDS instructors’ PCK, intending to identify needs in undergraduate data science teaching.
While we found the PCK framework useful in providing an initial understanding of IDS instructors’ nature of PCK, we acknowledge the potential requirement for further modifications to actively determine the elements of PCK in data science education context.
Undergraduate Data Science Education at the Policy Level
Undergraduate data science education as a major focus in the data science education community.
Efforts being made to identify data science competencies such as
A global call for need by many professional organizations and scholars to understand undergraduate data science education as well as the scientific literature on this topic.
In this project, the goals were to
We conducted a systematic literature review (Evans & Benefield, 2001; Liberati et al., 2009) by using certain criteria.
We opted to extract data from six databases that potentially include publications on data science education. These databases were
ERIC ProQuest
IEEE Xplore
PubMed
Science Direct
Scopus and
Web of Science
In addition to making inclusion-exclusion decisions, a list of variables about publications was also finalized that may be worth examining during the in-depth analysis stage.
In this stage, abstracts were read.
We collected data related to
Dependability: We recorded our research steps such as how data were collected, how categories were derived and how we made decisions throughout the research study as suggested by (Merriam,2015).
Transferability: We provided a rich, thick description (Merriam,2015) of the data collection,data wrangling and data analysis procedures in the public repository which has open access to every reader who wish to examine closely or reproducing the data analysis.
We also ensured maximum variation by including different undergraduate data science education studies conducted in different fields and/or included different content areas.
The body of literature detected is very recent, increasing monotonically, and the oldest paper published in 2015.
Over the past eight years,
1. Open Access: A majority of published studies in undergraduate data science education are open access, marking a substantial strength in the field.
2. Interdisciplinarity: Scholars from diverse fields are contributing the data science education literature.
Data science education practices in different programs such as
- data science programs,
- computer science education (Bile Hassan & Liu, 2020),
- microbiology (Dill-McFarland et al., 2021), and
- business (Miah et al., 2020).
The course examples as
- introductory computing (Fisler, 2022),
- modern technologies course
- computer science and engineering students (Rao et al.,2019),
- general education IT course (Haynes et al.(2019)),
- medicine (Doudesis & Manataki(2022b)),
- psychology (Tucker et al.(2023)).
1. There is no sufficient empirical data: 44 studies out of 77 did not collect data.
Given the scopes of content areas such as calls to action, educational technology, and program examples coupled with the emergent nature of the field, the lack of empirical data is not a surprising finding.
Warning
Emphasis on lack of empirical data is not a promotion of empiricism over all other ‘ways of knowing’.
What is being highlighting is the disproportionately high percentage of studies lacking empirical data,
2. Reproducibility is one of the potential challenges in undergraduate data science education research: Speculatively, the absence of critical information about research designs,such as the lack of research questions, participants’ profile and non-collection of data may contribute to the reduced reproducibility of available studies.
One could argue that succinct nature of conference articles may inadvertently interfere with the comprehensive documentation necessary for the effective replication or modification of research.
A potential reason for this gap may also be the minimal training that most instructors receive in reproducibility (Horton et al. 2022).
3. Not all Data Science disciplines contribute equally to the overall body of knowledge: The prevailing trend indicates ongoing multidisciplinary collaborations.
Notably, computer science and data science emerge as the leading contributors to the literature.
This result aligns with the study of Wiktorski et al.(2017b), who reported that Mathematics and Statistics departments are not at the forefront of data science degree programs.
This is perhaps the most important finding for the statistics community.
Scientific studies are an integral part to review existing practices as well as to improve higher education institutions’ data science practices. Thus, we should
Recommendation 1: Prioritize investments in empirical studies.
Recommendation 2: Diversify research efforts to enrich the spectrum of studies.
Recommendation 3: Encourage scholars in key data science fields to contribute more to publications.
Undergraduate Data Science Education at the Classroom/Student Level
Over the next years, my research will extend and encompass university and community college students enrolled in data science courses.
I am planning to explore the role of students’ contexts in terms of multiple cognitive, affective, and multicultural variables to support their learning.
Note
Threshold concepts:“…‘conceptual gateways’ or ‘portals’ that lead to a previously inaccessible, and initially perhaps ‘troublesome’, way of thinking about something.” without which the students cannot proceed further in learning a certain discipline.”
There are studies reporting these threshold concepts (e.g., Beitelmal, Thomas et. al, 2010) but the literature on this topic is still intact to be explored systematically.
During the next 5 years of my research, I would like to
become proficient in supporting data science instructors’ PCK to teach data science
conduct research to determine threshold concepts and explore possible ways to promote students’ learning these concepts
sharing best practices with the data science education community.
Asamoah, D. A., Doran, D., & Schiller, S. (2020). Interdisciplinarity in data science pedagogy: a foundational design. Journal of Computer Information Systems, 60(4), 370-377, https://doi.org/10.1080/08874417.2018.1496803
Beitelmal, W. H., Littlejohn, R., Okonkwo, P. C., Hassan, I. U., Barhoumi, E. M., Khozaei, F., … & Alkaaf, K. A. (2022). Threshold Concepts Theory in Higher Education—Introductory Statistics Courses as an Example. Education Sciences, 12(11), 748.
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., … & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15-30.
Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(sup1), 27-39, https://doi.org/10.1080/10691898.2020.1860725
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (Vol. 7, p. 429). New York: McGraw-hill.
Kelleher, J. D., & Tierney, B. (2018). Data science. MIT Press.
Magnusson, S.J., Borko, H., & Krajcik, J.S. (1999). Nature, source, and development of pedagogical content knowledge for science teaching. In: Gess-Newsome, J., & Lederman, N., (Eds.), Examining Pedagogical Content Knowledge. United States: Kluwer Press. pp. 95-132
Merriam, S. B. (2009). Qualitative Research: A Guide to Design and Implementation. San Francisco: CA: Jossey-Bass.
Merriam, S. B., & Tisdell, E. J. (2016). Qualitative Research: A Guide to Design and Implementation (Fourth Edition). San Francisco.
Mike K. & Hazzan, O. (February 2023). What is data science? Communications of the ACM, 66(2), 12–13, https://doi.org/10.1145/3575663
National Academies of Sciences, Engineering and Medicine Consensus Report (2018). Data Science for Undergraduates: Opportunities and Options. Washington, https://nas.edu/envisioningds.
Park, S., & Chen, Y. C. (2012). Mapping out the integration of the components of pedagogical content knowledge (PCK): Examples from high school biology classrooms. Journal of research in science teaching, 49(7), 922-941.
Park, S., & Oliver, J. S. (2008). Revisiting the conceptualisation of pedagogical content knowledge (PCK): PCK as a conceptual tool to understand teachers as professionals. Research in science Education, 38, 261-284.
Park, S., & Suh, J. K. (2019). The PCK map approach to capturing the complexity of enacted PCK (ePCK) and pedagogical reasoning in science teaching. In Repositioning pedagogical content knowledge in teachers’ knowledge for teaching science, 187-199.
Qian, Y., & Lehman, J. (2017). Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE), 18(1), 1-24, https://doi.org/10.3102/0002831213477680
Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1-23.
Schwab-McCoy, A., Baker, C. M., & Gasper, R. E. (2021). Data science in 2020: Computing, curricula, and challenges for the next 10 years. Journal of Statistics and Data Science Education, 29(sup1), S40-S50.
Thomas, L., Boustedt, J., Eckerdal, A., McCartney, R., Moström, J. E., Sanders, K., & Zander, C. (2010). Threshold concepts in computer science: An ongoing empirical investigation. In Threshold concepts and transformational learning (pp. 241-257). Brill.
Yan, D., & Davis, G. E. (2019). A first course in data science. Journal of Statistics Education, 27(2), 99-109, https://doi.org/10.1080/10691898.2019.1623136