Other Resources
Macalester MSCS
Textbooks
Data Ethics Resources
Data Ethics Principles
In his book “The Data Revolution,” Rob Kitchin defined data ethics as ‘concerned with the thought and practice related to value concepts such as justice, equality, fairness, honesty, respect, rights, entitlements, and care’ with respect to how data is created, collected, shared, protected, and used. As we discuss specific examples of data, think about the people or individuals (the Who) and whether there is respect, justice, and benefits for them in how their data is used. In many countries, governments regulate the ethical use of data for the purposes of research. For a brief history of ethical regulation in human research, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3593469/.
Ethical Regulations
In the United States, the Belmont Report is the main federal document that provides the “Ethical Principles and Guidelines for the Protection of Human Subjects of Research”. The three fundamental ethical principles for using any human subjects for research are:
Respect for persons: This principle is about protecting the autonomy of all people and treating them with courtesy and respect and allowing for informed consent. Researchers must be truthful and conduct no deception;
Beneficence: This principle is the philosophy of “do no harm” while maximizing benefits for the research project and minimizing risks to the research subjects; and
Justice: This principle is about ensuring reasonable, non-exploitative, and well-considered procedures are administered fairly — the fair distribution of costs and benefits to potential research participants — and equally.
Data Privacy
You have many identifiers: a Social Security number, a student ID number, possibly a passport number, a health insurance number, and probably a Google account name. Privacy experts are worried that cyber thieves may match your identity in these different areas of your life, allowing, for example, your health, education, and financial records to be merged. Online companies such as Facebook and Google are able to link your online behavior to some of these identifiers, which carries with it both advantages and dangers. Unlike research, there are fewer regulations guiding the use of data that is also collected for commercial, non-research purposes. Did you realize that you are one of the cases in these data sets? Do you know what they are doing with your identifying data?
How Data Harms
Recommended Readings:
“Data Violence and How Bad Engineering Choices Can Damage Society” by Dr. Anna Lauren Hoffmann, Scholar of data, technology, culture, and ethics at The Information School at the University of Washington
Hello World by Dr. Hannah Fry (Available through Inter-Library Loan)
Weapons of Math Destruction by Dr. Cathy O’Neil (Paper copy available at the Mac library!)
Data Justice
Data Feminisim
by Catherine D’Agnazio and Lauren Klein (Reading Group Videos)
Data Justice Principles:
Examine Power
Challenge Power
Elevate Emotion and Embodiment
Rethink Binaries and Hierarchies
Embrace Pluralism
Consider Context
Make Labor Visible
Data Equity and Justice
Data Equity Framework from We All Count
Data for Black Lives
“A movement of activists, organizers, and mathematicians committed to the mission of using data science to create concrete and measurable change in the lives of Black people. Since the advent of computing, big data and algorithms have penetrated virtually every aspect of our social and economic lives. These new data systems have tremendous potential to empower communities of color. Tools like statistical modeling, data visualization, and crowd-sourcing, in the right hands, are powerful instruments for fighting bias, building progressive movements, and promoting civic engagement.”
“But history tells a different story, one in which data is too often wielded as an instrument of oppression, reinforcing inequality and perpetuating injustice. Redlining was a data-driven enterprise that resulted in the systematic exclusion of Black communities from key financial services. More recent trends like predictive policing, risk-based sentencing, and predatory lending are troubling variations ont he same theme. Today, discrimination is a high-tech enterprise.” (Links provided by Prof. Brianna Heggeseth)
W.E.B. Du Bois’ (+ others) Infographics
Serve Society: Model Responsibly
During the COVID-19 pandemic, statistical and mathematical models made it into the public spotlight. Since the models did not and could not predict how the pandemic would unfold, many people developed a distrust of models with any uncertainty. But you’ve learned that we can use that uncertainty to our benefit! We need to communicate that the models can help us explore questions as suggested the manifesto published in by a group of statisticians and mathematicians (https://www.nature.com/articles/d41586-020-01812-9). To ensure that models serve society,
Mind the assumptions: Assess the sensitivity of the model to your assumptions about the sources of uncertainty
Mind the hubris: Complexity can be the enemy of relevance; find a balance between complexity and error
Mind the framing: No model is neutral; it is matched to a purpose and a context
Mind the consequences: Quantification can give a false sense of certainty and undiscriminating use of statistical tests can be dangerous
Mind the unknowns: Communicate what is unknown in addition to what is known
Model responsibly.
Social Identity Categories
Demography is the statistical study of societies and populations using characteristics that can be used to create groups of similar individuals. Every individual has a unique set of identities and life experiences such that people often do not fit cleanly into categories. Nevertheless, broad imperfect social categorical variables such as race, gender, age, and religion can facilitate the study of general sociohistorical patterns, discrimination, and disparities within our society. For example, the documentation of race in health records provides an opportunity to detect racial disparities in health outcomes due to differential healthcare by physicians such as with the COVID-19 pandemic (Source). Broad social categories protect the privacy of an individual by placing them in a larger group, at the expense of not reflecting individual variation. So care must be taken to create appropriate groups to show useful general patterns while not harming individuals by erasing identities. We recognize that historical (and modern) data use binary gender exclude or misclassify non-binary individuals and we encourage future data collection to use more inclusive gender identity categories. To read more about data and gender, see Hoffmann, A.L., (2017). Data, technology, and gender: Thinking about (and from) trans lives. In A. Shew & J. Pitt (eds.), Spaces for the Future: A Companion to Philosophy of Technology (pp. 3-13). London, UK: Routledge.