Why Machines Don’t Speak Spanish Well (and Why They Should)

June
7, 2021

7 min read

This article was translated from our Spanish edition using AI technologies. Errors may exist due to this process.


This story originally appeared on The Conversation

By Elena González-Blanco , IE University

Every day people talk more naturally about artificial intelligence (AI). We are getting used to this label – with a meaning for many still surrounded by an enigmatic halo – penetrating our routine more frequently.

Without being barely conscious, we smile to unlock the mobile phone without knowing that after that second in front of the camera, thousands of pixels converted into data feed deep learning algorithms at high speed. These are today capable of automating facial recognition in percentages greater than 98% accuracy.

The hatching has been stellar. We can consider the victory of DeepMind against the world’s first Go player in 2016 as an essential point. In just 5 years, the fortunate combination of the exponential volume of data generated, the creation of sufficiently powerful processing systems ( graphic processing units or GPUs ) and the maturity and liberation of neural network algorithms (such as Tensorflow ), have made it a reality. programmable all the mathematical theory that lays its foundations in the 50s of the 20th century, with the first theories of Marvin Minsky or John McCarthy on machine learning.

Talking robots? Its not that easy

Beneath that magic , which makes computing try to behave like the human brain, there is a combination of different technologies and types of data that does not work as successfully or in the same way to solve all problems.

The paradox is served: we are afraid of a world in which robots threaten to supplant our work functions, but today, assistants such as Siri , Alexa and Google Home are still unable to hold a conversation of more than a few minutes, beyond to request a series of data, give simple orders or establish specific routines.

Making machines talk – and write – is one of the most complex tasks computing has ever faced. Already in 1951 Alan Turing raised the challenge of the game of imitation , in which human and machine could be confused through language, a reality that is still very distant today.

Human language is highly complex and varied. It is a living system on which the algorithms that weave these digital neurons that constitute artificial intelligence are learning with the data from which they are fed. Thus, these computer cells are acquiring vocabulary and improving their linguistic structures thanks to their constant exposure to conversational data.

Better in English than Spanish

Reality shows us that, today, the competence of this technology is much higher in English than in other languages. This is because both the major scientific developments and the large companies that have exploited them commercially have been created in English-speaking countries and trained with data in English.

The linguistic reality is very different from the technological one: Spanish is the second language in the world and has more than 585 million speakers and an annual growth of 7.5%. Even so, there is no artificial intelligence today that is capable of quality processing its many variants (due to different geographical, social or contextual circumstances).

The reason for this delay with respect to English is due to the fragmentation of Hispanic language technology companies. In general, they are small and oriented to specific and very specific functions, with a history strongly linked to translation and the peninsular linguistic variety.

Furthermore, despite the large amount of data that we have in our language, these are not available for exploitation, as many are private. Even those that are in the hands of public and cultural institutions are in silos not prepared for open consumption.

For these reasons, on many occasions, companies and large clients choose solutions that have not been manufactured in our language or trained with our data, but through subsequent translation. This makes your level of success much lower.

Here are some examples: in order to train a robot in the Spanish legal field, it is necessary to have abundant legal texts in our language, but also a knowledge of Roman Law and the functioning of jurisprudence in Spain.

In order to discern the different varieties of Spanish in Latin America, it is essential to know not only the lexical variants, but also the phonetics, and even the situational (pragmatic) functioning of some expressions in certain contexts. All these nuances are lost in translation.

An opportunity for advancement

Despite everything, we are today at a time of growing interest in the development of artificial intelligence applied to language. There has been a 34.5% increase in scientific papers on natural language processing and AI applied to language between 2019 and 2020, which shows the growing maturity of the technology.

Furthermore, interest in its development has become key to economic development. Currently, China is strongly leading the technological revolution, followed by the United States. Meanwhile, Europe struggles not to be left behind looking for niches to shine linked to new opportunities and to the cultural, economic and historical reality of the old continent. Language is, without a doubt, one of them, since the assets that serve as a starting point, the data, are here and have hardly been used yet.

Within this race for the development of artificial intelligence, the importance of Spanish as a native language of AI, linked to the market potential and the richness and variety of its data, is a gold mine that has only just begun to be exploited.

It is not necessary to reinvent gunpowder, just provide open and available data to train existing algorithms and align the business fabric in the same direction.

The objective is to create an artificial intelligence as powerful as the number of Spanish speakers, which would pave the way not only to create new companies and better algorithms, but also for the digitalization and digital preservation of a cultural, linguistic and historical heritage that deserves a privileged space within the future of international digital transformation. The time is now, and those responsible, us. The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article .

No Comments Yet

Leave a Reply

Your email address will not be published.

 

The Abundance Pub (TAP) is a media source dedicated to all things positive in the world. Focusing on Health, Wealth and Happiness. The Abundance Pub serves as repository of positive news articles, blogs, Podcasts, Masterclasses and tips to help people live their best life!

FOLLOW US ON

Message From Founder