16:05 - 16:10 Introduction + Disclaimers

(Slide 2)

Talking about ethics and biases in science can get emotional and heated, and some topics may be potentially more triggering to some. Some of these conversations can be difficult. We ask that you bring your thoughts forward with the following in mind:

“I share my thoughts into a space where I allow myself and others to be open and honestly express care, affection, responsibility, respect, commitment and trust.”

I/we don’t have all the answers.
Link to code of conduct: QR codes.

(Slide 3)

If you have been subject to or witnessed unacceptable behaviour, either during a meeting or on this repository, then you can get in touch with either Susana or Melanie.
If you would prefer to contact someone who is not directly involved in organising the group, then you can direct your concerns to

16:10 - 16:30 Data Hazards Intro

(Slide 4) Data hazards intro:

The Data Hazards framework was developed by Nina de Cara and Natalie Zelenka in 2020 due to a shared concern on how many projects have potentially significant societal impact; yet they don’t have those impacts scrutinized by an Ethics Committee, simply because they do not technically have human research participants. The aims of the data hazards project are essentially to improve the quality of ethical considerations in data science and we want to do that through a shared vocabulary of ethical issues that people can go to to help prompt them to think about what different ethical considerations they might want to bring onto their project. We call these Data Hazards, and I’ll talk about them in a moment.
As part of this project we wan to encourage people to bring together and respect diverse and interdisciplanry viewpoints because we strongly believe that ethics is something we need to do as a collective and not as something that an individual person working on a project can effectivelt do alone without bringing in other views.
We also want the project to be something as a whole that’s developed with an open source and collaborative ethos. And that means that we want people to be able to have a say on what the ethical considerations are that should be included, how these should be considered as part of data science projects and also for them to grow as time goes on, because might (and we have already) ecountered more and more issues that we may have not thought about before. And people can add those as time goes on.

(Slide 5) Data hazards intro:

So why is it important that we do something like this in research?
The variety of biases in science is really quite astounding, and honestly choosing examples is always a hard one for me. It is no secret that science affects our lives, and that therefore it has the potential to cause harm if we don’t carefully think about how to prevent it.
For example there are numerous examples of how skin of colour is under-represented in dermatology books and how this continues to carry a historical racial bias. We can use data sets that have racial bias in them and that result in increased discrimination or discrimination being replayed that we would want or hope that our tools are not reinforcing.
And also just to say that science can affect your life even if you never picked up a computer or phone, because data science is being used in government to make decisions, its being used potentially even just to decide how resources are allocated in your local area. And all of these things are done with excellent intentions in the hope that they can help. But we also just need to make sure that we’re thinking early on in these projects how we can prevent harm.

(Slide 6) Data hazards intro:

We can also see how data science can be incredibly useful to us and the things that it could achieve. But we have to make sure that we are doing it fairly.

(Slide 7) Data hazards intro:

So what are data hazard labels?
Data Hazards are presented like Control of Substances Hazardous to Health chemical hazards. Just like a chemical substance may be labelled as "corrosive" or "flammable", a scientific hazard can be labelled as "lacks informed consent" or "high environmental cost”.
They serve as a framework for people at all stages of data science technology development to communicate about the potential “hazards” of their research (no matter how far away these outcomes might seem). It provides an emerging, novel and exciting structure for standardization of thinking about ethics in the computational systems biology fields.
So each label has:
A label image,
A name, this one says "reinforces existing bias"
A description, so this one says "Reinforces unfair treatment of individuals and groups. This might be due to input data, algorithm, or software design choices, or society at large.
An example: So, Natural Language processing tools can reinforce sexist tropes about women. And this links to a paper about how language models will describe men as computer programmers much more often than women, for women they are more likely to say "homemaker".
Safety precautions: Just as with chemicals: we don't want people not to use data science because it has these risks, but we do want them to think about what they might want to do differently (with bleach it might be wear gloves), in this case we suggest (amongst other things) testing the effect of the algorithm for different marginalised groups: make sure it works for everyone.

(Slide 8) Data hazards intro:

These are the 11 current data hazards. You have them by your tables too.
All of them have the features that I just described.
I don't have time to go through them in detail now, but you can find all of them on our website! And you can also suggest more and suggest changes to wording, or more examples, or safety precautions.
But these are designed to have a large coverage of data science issues, including:
issues of privacy "by choosing to store this data, are you risking a data breach?"
mental health for people labelling the data (for example if you were building a classifier to predict domestic violence),
a lack of informed consent if you are taking data from twitter
to dangers of the algorithm being used in cases where it doesn't work or being use malevalently by others (e.g. using deep fakes to incriminate people).

An important thing to note about these hazards is that they're worst case scenarios. It's very possible for most or even all of these hazard labels to apply to a lot of data science projects. It doesn't need to be likely to happen for them to apply.

Another important thing to note is that these don't just apply to the project that you are developing right now, but all of the ways that other people could use that project, for example, how people may use the data you are making available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

presentation_scritp_COMBINE_2022.md

presentation_scritp_COMBINE_2022.md

16:05 - 16:10 Introduction + Disclaimers

(Slide 2)

(Slide 3)

16:10 - 16:30 Data Hazards Intro

(Slide 4) Data hazards intro:

(Slide 5) Data hazards intro:

(Slide 6) Data hazards intro:

(Slide 7) Data hazards intro:

(Slide 8) Data hazards intro:

(Slide 9) Data hazards intro:

(Slide 11) Data hazards intro:

(Slide 12) Broader discussion

Files

presentation_scritp_COMBINE_2022.md

Latest commit

History

presentation_scritp_COMBINE_2022.md

File metadata and controls

16:05 - 16:10 Introduction + Disclaimers

(Slide 2)

(Slide 3)

16:10 - 16:30 Data Hazards Intro

(Slide 4) Data hazards intro:

(Slide 5) Data hazards intro:

(Slide 6) Data hazards intro:

(Slide 7) Data hazards intro:

(Slide 8) Data hazards intro:

(Slide 9) Data hazards intro:

(Slide 11) Data hazards intro:

(Slide 12) Broader discussion