Toward A Person-Centered Approach To The Study Of Data Analysis

Matthew E. Vanaman
Postdoctoral Researcher
Advisor: Roger D. Peng

Data Analysis

Done badly

Done well

Data Analysis

In classroom

In real world

Objective: Understand (and hopefully improve) Real-World Analysis

Two Approaches: Top-Down vs Bottom-Up

Why Care?

Data Analysis Is Struggling

  • Crises of replication17 and reproducibility8,9
  • Questionable research/measurement practices1012
  • Statistical misunderstandings13
  • Data overwhelm (new medical diagnostics, internet of things, etc.)

Why Bottom-Up Approach?

Good Analyses Are Done By Good Analysts

If data analysis is to be well done, much of it must be a matter of judgment19.

Need Measure Of Individual Differences (Good) Analysis Skill

  • Important for hiring, training, and evaluating analysts

  • Must be valid to interpret score

Ensure Conceptualization Matches Reality

  • Theories of analysis ought to match practice of analysis

  • Bottom-up facilitates comparison and iterative refinement

It’s Been Done Before!

Toward a Person-Centered Approach to the Study of Data Analysis

General Idea

  • Study experts to understand why they are better than novices

  • Inspired by classic approaches in psychology - Industrial/organizational psychology - Judgment and decision-making

  • Proven history of usefulness across numerous domains20

Two Traditions

  • Heuristics and Biases21

  • Naturalistic Decision-Making22

Two Traditions

In Favor of the Naturalistic Decision-Making Approach

  • We already assume experts are better

  • NDM emphasizes judgment in real-world conditions

Our Pilot Study

  • Research question: what do experts think makes for good analysis, based on their experience?

Some Provisional Themes

Thank you!

Questions?

References

1.
Coiera E, Ammenwerth E, Georgiou A, Magrabi F. Does health informatics have a replication crisis? Journal of the American Medical Informatics Association. 2018;25(8):963-968. doi:10.1093/jamia/ocy028
2.
Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294(2):218-228. doi:10.1001/jama.294.2.218
3.
Valentine JC, Biglan A, Boruch RF, et al. Replication in prevention science. Prevention Science. 2011;12(2):103-117. doi:10.1007/s11121-011-0217-6
4.
Coiera E, Tong HL. Replication studies in the clinical decision support literature–frequency, fidelity, and impact. Journal of the American Medical Informatics Association : JAMIA. 2021;28(9):1815-1825. doi:10.1093/jamia/ocab049
5.
Wen H, Wang HY, He X, Wu CI. On the low reproducibility of cancer studies. National science review. 2018;5(5):619-624. doi:10.1093/nsr/nwy021
6.
Collaboration OS. Estimating the reproducibility of psychological science. Science. 2015;349(6251):aac4716. doi:10.1126/science.aac4716
7.
Dreber A, Johannesson M. Statistical significance and the replication crisis in the social sciences. In: Oxford Research Encyclopedia of Economics and Finance.; 2019. doi:10.1093/acrefore/9780190625979.013.461
8.
Munafò MR, Nosek BA, Bishop DVM, et al. A manifesto for reproducible science. Nature Human Behaviour. 2017;1(1):1-9. doi:10.1038/s41562-016-0021
9.
Peng R. The Reproducibility Crisis in Science: A Statistical Counterattack. Significance. 2015;12(3):30-32. doi:10.1111/j.1740-9713.2015.00827.x
10.
Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science. 2011;22(11):1359-1366. doi:10.1177/0956797611417632
11.
Artino ARJ, Driessen EW, Maggio LA. Ethical shades of gray: International frequency of scientific misconduct and questionable research practices in health professions education. Academic Medicine. 2019;94(1):76. doi:10.1097/ACM.0000000000002412
12.
Flake JK, Fried EI. Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science. 2020;3(4):456-465. doi:10.1177/2515245920952393
13.
Gigerenzer G. Statistical Rituals: The Replication Delusion and How We Got There. Advances in Methods and Practices in Psychological Science. 2018;1(2):198-218. doi:10.1177/2515245918771329
14.
Nosek BA, Spies JR, Motyl M. Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science. 2012;7(6):615-631. doi:10.1177/1745691612459058
15.
Edwards MA, Roy S. Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science. 2017;34(1):51-61. doi:10.1089/ees.2016.0223
16.
O’Mara RJ, Hsu SI, Wilson DR. Should MD-PhD programs encourage graduate training in disciplines beyond conventional biomedical or clinical sciences? Academic medicine : journal of the Association of American Medical Colleges. 2015;90(2):161-164. doi:10.1097/ACM.0000000000000540
17.
Rassie K. The apprenticeship model of clinical medical education: time for structural change. The New Zealand Medical Journal. 2017;130(1461):66-72.
18.
Stalmeijer RE, Dolmans DHJM, Wolfhagen IHAP, Scherpbier AJJA. Cognitive apprenticeship in clinical practice: Can it stimulate learning in the opinion of students? Advances in Health Sciences Education. 2009;14(4):535-546. doi:10.1007/s10459-008-9136-0
19.
Tukey JW. The future of data analysis. The Annals of Mathematical Statistics. 1962;33(1):1-67. doi:10.1214/aoms/1177704711
20.
Lipshitz R, Klein G, Orasanu J, Salas E. Taking stock of naturalistic decision making. Journal of Behavioral Decision Making. 2001;14(5):331-352. doi:10.1002/bdm.381
21.
Tversky A, Kahneman D. Judgment under Uncertainty: Heuristics and Biases. Science. 1974;185(4157):1124-1131. doi:10.1126/science.185.4157.1124
22.
Klein G. Naturalistic decision making. Human Factors. 2008;50(3):456-460. doi:10.1518/001872008X288385

Method

Focus group interviews with experienced analysts

Method

Sample Characteristics:

  • 4 Women, 3 men
  • 44.3 years old (SD = 10.1)
  • 5 White, 2 Asian
  • 2 Hispanic

Method

Sample Characteristics:

  • 18.4 years of data analysis experience (SD = 9.4)
  • 4 Academia, 3 Private Sector

Method

Sample Characteristics:

TitleN (%)
Professor2 (28.6)
Postdoctoral Reseacher1 (14.3)
Research Scientist1 (14.3)
Senior Data Scientist1 (14.3)
Senior Machine Learning Engineer1 (14.3)
User Experience Researcher1 (14.3)

Method

Example probing question:

Reflect back to when you were first starting out. Now that you have had all your years of experience analyzing data in the wild, what advice would you give your less-experienced self?

Analytic Approach

Grounded theory:

  • Open coding
  • Thematic (axial) coding
  • Selective coding

Analytic Approach

Grounded theory:

  • Open coding
  • Thematic (axial) coding ⇦
  • Selective coding