The Ethics in Data Modeling

Data ethics modeling is how you apply ethical principles to the design and use of data models. It’s about making sure your models are accurate, fair and transparent. A good ethical model thinks about what you can do with data and what you should do.

In the spring of 2020, the United Kingdom’s government faced a serious challenge. At the end of every school year, university bound students take something called the A-level exams. Students use the scores from the exams to apply for colleges and universities. But the global pandemic made it unsafe for these students to sit in crowded exam rooms.

So the government needed to come up with an accurate way to predict the scores.

The challenge, of course, is that there’s no way to know the scores that the students would have received if they’d actually sat for the exams.

So instead the government came up with a pretty simple formula to predict test scores. First, they looked at the students’ grades for the year and then they considered the historical track record for their school.

This formula actually resulted in more students getting higher scores than they did in years when students actually sat for the exam.

The algorithm also highlighted an uncomfortable reality. It predicted that if all of your friends at your school get into top universities, then it’s more likely that you’ll also get into a top university. The opposite was also true. If you go to a school where very few people get into top universities, then it’ll be much more difficult for you to place in a top university.

It’s an unfortunate truth that the exam likely would have also reflected this pattern. So the algorithms tried to do the same.

But even though this scoring may have been accurate, it still struck many people as unfair. It’s one thing to do poorly on an exam, but it’s an entirely different thing to have an algorithm decide that you were destined to do poorly.

So if you worked for the UK government how would you design this algorithm? Is it enough to be accurate, or do you have an ethical obligation to try and improve the student’s prospects?

From the start it looked like the UK was taking a utilitarian approach. They structured the algorithm in a way that gave most students higher scores. They tried to deliver the greatest good to the largest number of people.

But there are two types of utilitarianism. There’s act utilitarianism and rule utilitarianism. Act utilitarianism focuses on actions that create the best outcome. While rule utilitarianism focuses on designing a rule that creates the best outcome.

So if you look at it in terms of act utilitarianism, there were a lot of students who were overachievers at their school that actually got lower scores because of the algorithm. So for them it didn’t maximize their happiness.

But if you look at it in terms of rule utilitarianism, then most students received higher scores. So the overall outcome was better for the average student.

So when you’re working with data, try to keep in mind that an accurate prediction won’t always lead to an ethical outcome. If you look at it in terms of act utilitarianism then you might want to put a process in place that helps students challenge the accuracy of your algorithms.

Unfortunately, an accurate algorithm will often shine a light on underlying challenges in the data. Then it will be up to your organization to decide what ethical responsibilities you have in predicting the results.

The Importance of Ethical Decision-Making

When you prioritize ethics in data science, you make decisions that are both good and fair. Ethical models can help make things more equal by reducing unfairness and promoting inclusivity.

When companies care about doing the right thing, they also make their customers trust them more. People like to support businesses that show they value their information and rights. So, being ethical in data science isn't just good for society—it's also good for business success.

Integrating Ethics into Data Modeling

To create ethical data models, start by setting clear principles for how data is collected and used. Regularly check your models for bias and adjust them as needed.

Transparency is key, so make sure people can understand your data processes. And always respect privacy—this means obtaining clear consent and safeguarding sensitive information.

Think of ethical modeling as an ongoing process. As technology gets better, new challenges come up. These are some things you can do to integrate ethics into data modeling:

Establish Ethical Guidelines: Define clear principles for data practices.
Conduct Bias Audits: Regularly test models for implicit and explicit biases.
Enhance Data Transparency: Make data processes understandable to stakeholders.
Ensure Consent and Privacy: Respect users’ rights to control their data.

Ethics in Data Modeling

Data science is a powerful tool, but with great power comes great responsibility. When you work with data, you need to think about being fair, honest, and responsible. This means making sure that what you do has a positive impact on people.

Remember it's not just about what you can do, it's about what you should do. The decisions you make now will shape the future of data science and its role in society.

Frequently Asked Questions

What are the ethical considerations when using algorithms in data science?

Ethical considerations when using algorithms include ensuring fairness, avoiding bias, and maintaining transparency. Algorithms should be designed to respect digital privacy and avoid unethical data collection practices.

Why is managing data ethically important?

Managing data ethically is important to protect individuals' privacy, maintain trust, and prevent harm. It ensures that data subjects have a right to control their personal data and that organizations handle data responsibly, following privacy policies and terms and conditions.

What role does data governance play in data ethics?

Data governance plays a critical role in data ethics by establishing policies and procedures for managing data. It ensures data security, data quality, and compliance with regulations like the General Data Protection Regulation, helping organizations use data ethically.

How can organizations prevent unethical data collection?

Organizations can prevent unethical data collection by implementing strict data management policies, obtaining informed consent from data subjects, and regularly auditing their data collection practices to ensure compliance with ethical standards and regulations.

What are the risks of not handling data ethically?

Not handling data ethically can lead to breaches of data privacy, loss of trust, legal penalties, and damage to an organization's reputation. It can also result in the misuse of sensitive data and harm to individuals whose data is collected.

This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or LLMs. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and ethics.

This newsletter is 100% human written 💪 (* aside from a quick run through grammar and spell check).