By Pengyuan Zhou and Lik-Hang Lee for 360info
Many people believe that technology is neutral or unbiased, but the reality is more complex, especially in an immersive environment.
If the proponents of the metaverse have their way we’ll one day be lining up for healthcare or a mortgage in a virtual world run by virtual decision makers. Design of the artificial intelligence systems driving this world, still the task of humans, has real potential for harm.
Besides commercial incentives, implicit biases that exist offline based on ethnicity, gender and age are often reflected in the big data collected from the internet.
Machine learning models that are trained using these bias-embedded datasets unsurprisingly adopt these biases. For instance, in 2019, Facebook (now Meta) was sued by the US Department of Housing and Urban Development for “encouraging, enabling, and causing” discrimination on race, gender, and religion through its advertising platform.
Facebook later said it would take “meaningful steps” to stop such behaviour, but it continued to deliver the same discriminative ad service to over two billion users on the basis of their demographic information.
Technical flaws during data collection, sampling, and model design can further exacerbate unfairness by introducing outliers, sampling bias, and temporal bias (where a model works well at first, but fails in the future because future changes weren’t considered when building the model).
As AI pervades more of our daily lives regardless, governments and the tech giants have started talking about “Trustworthy AI”, a term formalised by the European Commission in 2019 with its Ethics Guidelines.
The guidelines speak to issues of fairness, but current systems are already challenged to define what’s fair on the current internet, let alone in the metaverse.
A recent study exploring Trustworthy AI and the metrics selected to deliver it found most were based on functionality-driven design as opposed to user-centred design.
Looking specifically at search engine ranking and recommendation systems, we already know search engine rankings sometimes systematically favour certain sites over others, distorting the objectiveness of the results and losing the trust of users.
In recommendation systems, the number of recommendations is often fixed to promote products or ads with greater commercial benefits instead of fair recommendations based on the ethical use of data.
To fix these issues and deliver “trustworthy AI”, search engines must guarantee that users receive neutral and impartial services. Deciding on the metrics for fairness is where it gets difficult.
A common strategy of metric selection is to focus on one factor and measure the deviation from the equality of that factor.
For example, for search engine rankings, focusing on the (potential) attention items receive from users in terms of factors such as click-through rates, exposure, or inferences of the content’s relevance. And then working out the gap between what an average user sees versus one where there is bias at play.
Reviewers of recommender systems have employed similar metrics for fairness, such as bias, average, and score disparities.
Trustworthy AI design and metric selection in such systems also often focuses on functionality during specific life-cycle phases. Ideally, it should consider trustworthiness through the whole life-cycle of usage.
These considerations will be even more important in the metaverse. Immersive in nature, the metaverse is more tied to users’ feelings and experiences than current cyberspace. These experiences are harder to quantify and assess, and pose more challenges for those trying to determine what “fair AI” is.
The current mindset of trustworthy AI design and metric selection, restricted by the aforementioned design philosophies, takes into consideration only part of human cognition, specifically the conscious and concrete areas that can be more easily measured and quantified.
Pattern recognition, language, attention, perception, and action are widely explored by AI communities.
The exploration of the unconscious and abstract areas of cognition, such as mental health and emotions, are still new. Methodological limits are a key reason for this, for example, the lack of devices and theories to accurately capture bioelectrical signals and to infer someone’s emotions from them.
A new set of metrics will be required for the metaverse to ensure fairness.
Designers will need to:
Carefully select data. It’s dangerous to just throw data at an AI model: the data often inherits the bias from the real world where it was collected. System operators should carefully select data samples focused on ensuring data diversity.
Design a fair system. The system should guarantee all users have neutral usage and not be influenced by factors such as age, education level, environment etc. A fair system design can help ensure the diversity of data collection.
Design a fair AI algorithm. Aiming at improving the utility of the majority, AI algorithms normally prioritise the optimisation of common performance metrics such as accuracy.
For this reason, many AI algorithms set up thresholds to avoid the participation of users that may impact this goal, such as those with bad networks. Balancing the trade-off between algorithm performance and fairness is important in fair AI algorithm design.
Ensure fair usage. After designing a fair system and algorithm and training with fairly collected data samples, the next step is to ensure fair usage for all users without bias based on ethnicity, gender, age etc.
This last piece of the cycle is the key to sustaining fairness by allowing continuous collection of diverse data and user feedback to optimise fairness.
(The authors are associated with the University of Science and Technology of China and the Korea Advanced Institute of Science and Technology respectively)