AI is getting very good at translating text and speech from one language to another, transposing not just words, but also meaning, and in real time. I previously wrote about the implications for opening up the world’s culture from behind language barriers (for example, only three percent of the world’s literature is translated into English).
But in thinking about ways to explain the conceptual shift that AI is bringing to how we interact with the world, it occurs to me that perhaps one way to describe what’s happening is AI as the development of a universal translator not just across text, but also across images, video, color, sound, smell, movement — basically any observable phenomenon, whether idea or action in the real world.
Until recently our machines were essentially confined to comparing like data. Search engines could find text within text. But we could find images or videos only if they were tagged with key words. We could analyze databases as long as we grouped the data in categories, then compared them. Our ability to find and listen to music is dependent on metadata that has been entered for each track. We’re searching (imperfect) descriptions rather than the actual music itself.
Since the rise of social media, the ability of users to tag and describe or execute some action such as clicking or “liking” or sharing has generated enormous new data that helped power the usefulness of curational algorithms to help us better find what we’re looking for.
But AI blows up the clunky and time-consuming requirement to describe everything before it can be found. To an AI model, a picture is data, sound and music are data, as is traditional spoken or written language. That data is translatable, interchangeable, and, most importantly, linkable and actionable. That means that video, music, sound, movement, image can interact in common language.
Within any medium there are rules of behavior, principles that define genre, and logic that provides scaffolding for understanding. And, in part because of this vast boost in context between mediums, AIs can detect and develop meaning in ways that seem familiar to we humans, who skate across image, sound and text with ease. But the AI, with its trillions of datapoints and powerful processors is exponentially more powerful than any human at making associations and analyzing data (aka the world) and seeing relationships.
Examples? In the social media age our preferred language of communication has gone from text to image to video. Sixty percent of time spent on Facebook is now video and YouTube and TikTok are the preferred medium of choice. There are 800 million videos on YouTube. Until now, most of that video has been invisible from the outside — that is, to find someone saying or doing something in a video you have to watch the video itself; it wasn’t searchable in the way text documents are, and crude means of discovery was dependent on tagging in the description. Using AI, any moment on any video is now searchable and accessible in the way text has long been.
Not only that, the moment in the video you’re looking for can be accessed directly, and, if you want, associated with text or images or sound elsewhere that relate to your search. Last week at its Cloud Developer Conference, Google unveiled its AI-powered Google Vids. With it, you can create full-production videos on the fly. If, for example you want to create a presentation, the AI will scour data wherever you tell it to look (about your company, about an idea, about a brand, in spreadsheets, on webpages, in video and image libraries) and create real-time video and messaging.
Or how about as an educational tool: Classical music is an opaque mystery for many newcomers and casual listeners. An AI that has “studied” the entire repertoire and every recorded performance could answer questions about pieces or performances that relate to one another, compare performances and genres, and start to interact with your preferences. The algorithms that choose your music today are primitive and often unsatisfying; they depend on the tagging of content and generic observation of users’ behavior rather than the content itself. With the actual music as data, AI makes music more accessible.
To understand how this might work, take a look at how Google searches have recently evolved. Type in a search now and instead of returning pages of links to thousands of websites where you might go looking for your answer, the Google AI returns an answer to your question directly, along with sources for that answer. Apply this to music and your ability to find and compare pieces, performers and performances is supercharged. Apply AI to finding and keeping track of live performances and calendar management determined by your interests, and your ability to find events you want to go to works at a whole different level. Calendar listings for arts events have been a disaster in the digital era. This could change the way arts organizations reach audiences.
Finally, as real-world sensors get more and more linked and networked — our phones, cameras, appliances, cars, etc — the universal translation of realtime data will become important. A company called Limitless makes a little device that clips to your clothing and records everything around you as you go through your day. Need a summary of the meeting you just had? Need a reminder for something you just promised a friend you’d do? Want an analysis of some interaction or more context for it? The device produces one, as well as itemizes and generates reminders for your calendar.
We all carry around data-collection devices in the form of our smartphones and smartwatches and health trackers. Linked up, they will increasingly provide real-time data about what we need at the moment we need it: when the next bus will arrive, reviews of the product we see in a store, the name of the person we’re about to encounter at a party, or warning of the car that’s about to run us down. Creepy? Perhaps. But AI as an extension of our ability to process and be aware of what’s around us — both online and in the real world — will change both how we perceive and understand the world.