Ethical data in beauty_Arbelle

Ethical data in beauty: Building next-gen beauty AI that works for everyone

by Mia Maras

Data used in AI to develop and test models has been a hot topic recently – especially when it comes to ethical data collection.

On the one hand, there are a lot of strict regulations to protect personal information. On the other, AI companies need as much data as possible to develop models that work for everyone, in all kinds of real-world situations.

In beauty AI, data is also a matter of trust. Foundation matching, skin tone analysis, and makeup visualization all depend on data that truly represents the full spectrum of beauty. At Arbelle, we’ve built our ethical data collection process on three principles: expertise, compliance, and inclusivity.

So, let’s see how this works in practice.

Where our data comes from

At Arbelle, we develop specialized models that require very specific data, which is usually hard to find. So, we approach our ethical data sourcing with precision and responsibility:

1. Trusted data providers

When suitable, we work with specialized data providers (companies, research institutions) who are capable of providing datasets that meet our quality and compliance requirements. Moreover, we require that the data is relevant, i.e., specifically designed for the purpose of AI model development (random data is not usable), inclusive, and capable of serving the purpose of bias detection and mitigation.

Furthermore, we require that our dataset providers guarantee that their process of collection of personal data was executed in compliance with privacy laws and GDPR, and that the data subjects are fully informed on the purposes of processing of personal data, along with other compliance requirements as laid down by the GDPR (e.g. that data subjects are capable of exercising their rights, given all relevant information, and that the data is collected and processed with a valid legal basis).  

2. Internal data collection

In most cases, we need very specific data that is not readily available. That is when we create it ourselves. Our internal data collections are thorough, controlled, and designed to ensure maximum diversity and accuracy. This gives us complete oversight of quality and guarantees our models reflect the real-world needs of our users.

The entire process of data collection is supervised by our data and legal team, who make sure that everything is done in compliance with GDPR and other laws. As a key point, we always collect data with consent and the right to opt out from further use after the consent was given.

3. Data security and compliance

We rely on well-governed datasets and build privacy, security, and transparency into every step, and aim to go even beyond GDPR by enabling stand-off periods and opt-out rights to ensure that data subjects can exercise their rights if needed. On collection, we exclude sensitive or intrusive sources and insist on transparency and consent. Access to datasets is strictly role-based, and storage is confined to defined locations with control protocols.

This approach reflects our commitment to AI and data ethics, ensuring our technologies respect both user rights and real-world diversity.

Often, the most important layer is the annotation work. Even when we source images or videos from trusted providers, our team of beauty experts steps in to annotate them – whether that’s identifying skin tone, face shape, or the precise placement of facial features. This extra step helps us boost both the reliability and inclusivity of our data. Again, this process is also regulated by appropriate contractual safeguards to ensure legality and compliance.

A look at our data collection process

All Arbelle’s beauty AI solutions are powered by exceptional data. But one of our biggest and most ambitious projects was building the dataset for our Shade Finder.

Shade Finder is a foundation matching tool based on the industry’s most inclusive foundation scale. Used by cosmetic brands and retailers, Shade Finder’s mission is simple but powerful: to give every makeup shopper the most accurate foundation shade recommendation possible.

But to make this happen, we needed something extraordinary – a dataset as diverse as the people who would use it. So, we rolled up our sleeves and built it ourselves.

✔ Building the foundation

Creating the Shade Finder dataset took months of planning and problem-solving. At the heart of this project were people – our dedicated team members and the incredible participants who helped bring it to life.

We carefully selected subjects to represent the broadest possible range of skin tones. To create a balanced dataset, we dedicated special sessions for tones often underrepresented in beauty. This ensures that Shade Finder works equally well for darker and lighter complexions.

✔ Prioritizing privacy

Inclusivity is important in our work, but so are respect and privacy. Before taking part, every participant knew exactly how their data would be used. All images were stored and processed according to strict privacy protocols. We collected only what was truly necessary, and nothing more. Also, we ensured that the principle of purpose limitation is followed through and that the processing is done under a pre-defined retention period with a right to opt-out in the meantime. This approach ensures that our datasets are not only high-quality but also ethical and legal.

✔ Capturing real-world scenarios

For the data to be truly useful, it had to reflect the real world. That meant capturing more than just faces – we needed accurate information on which foundation shades are suitable for which subjects, together with images with and without makeup in different light conditions.

Those variations matter. Different light conditions are important to make the model robust, while images with makeup are used for rendering purposes, helping users see how a product might look before purchase. Training the model under multiple scenarios made it better at working in everyday situations.

✔ Expert verification

From day one, professional makeup artists were a key part of the process. They helped assess undertones and pinpoint the perfect foundation match for each subject. We prioritized products from brands known for their inclusive shade ranges, making sure we had the best possible match for everyone.

✔ Great data for great performance

Collecting our own data may be challenging, but it gives us high-quality results that no off-the-shelf dataset can match, built to the highest standards of our solutions. This makes model development much more efficient and lays the foundation for top-tier performance.

Beauty AI that works for everyone

The right data makes all the difference, especially in the world of beauty. By combining ethical sourcing, expert verification, and an inclusivity-first approach, at Arbelle, we create AI models that work for everyone.

For example, one of the coolest things our data collection showed us is that most people – regardless of their skin tone – aren’t totally sure about their undertone or exact shade. So, not only did our data collection help us build more inclusive tech, but everyone who participated left with their perfect foundation match confirmed by both our tech and a professional makeup artist. It was fun, useful, and a win-win all around.

So, while building these datasets takes effort, the result is beauty AI that delivers results that are accurate, inclusive, and built with real-world diversity in mind. And that makes it all worth it.

For more insights on how we build amazing beauty AI, learn more about our approach here. And if you’re ready to grow your brand and get better results, reach out for our expert advice.

Contact us

Reach out to us antime to get more info or to get started with beauty AI.