Artificial Intelligence Innovation in 2023 The 3 biggest innovations of
AI
In a lot of ways, 2023 marked the year that people started
to recognize the true nature of Artificial Intelligence (AI) and its
capabilities. It was also the year when chatbots went viral and the year when
governments started to take AI risk seriously. These developments were not so
much new innovations as they were new technologies and ideas coming to the fore
after a long period of development.
However, there were also plenty of new innovations that took
place in 2023. Below are three of the most significant ones from the past year.
Multimodality
The term “multimodal” is often used interchangeably with
“text-first” or “text-only” AI models. Multimodality refers to an AI system’s
ability to process a wide range of data, including text, images, videos, audio,
and more.
This year saw the first public availability of
high-performance multimodal artificial intelligence models. One of the first to
be released by OpenAI was GPT-4, which allows users to upload both images and
text inputs.
GPT-4 is capable of “seeing” the content of an image. This
opens up a wide range of possibilities, such as asking GPT-4 to determine what
to make next for dinner based on an image of the fridge’s contents.
In September, OpenAI launched ChatGPT, which can be used
both as a voice and text interface.
The Gemini model, announced by Google DeepMind in December,
is capable of working with images as well as audio. In the launch video, which
Google shared on YouTube, the model identified a duck by following a line drawn
on a post it note. In another video, shown after being presented with an image
of the same color yarn and asked to create a picture, Gemini generated a
picture of a pink-and-blue octopus plush. (In the marketing video, Gemini
appeared to observe moving images and respond in real time to audio commands,
but Google said on its website in a post that the video was “edited for
brevity” and that the model is asked to generate still images rather than video
and text prompts rather than audio, though the model has audio capabilities.)
Constitutional AI
How to integrate this into human values is one of the
biggest unanswered questions in AI. If they become smarter and more powerful
than humans, they could cause untold damage to our species, some even calling
for total extinction, unless they are constrained by rules that put human
flourishing at the center of them. The process that OpenAI had been using to
align ChatGPT was working well, but it required a great deal of human effort in
order to prevent racial and gender biased behaviour from previous models, By
implementing a technique called Leveraging Learning with Human Feedback, or
RLHF. If the AI responds in a useful, harmless and compatible way with OpenAI
content rules, humans raters would evaluate its answers and render them an
equivalent of dog food. OpenAI has created an efficient and relatively benign
chatbot by rewarding the AI when it is good, but penalising it when it's bad.
Text-to-video
The rapid progress of text to video tools has been a major
consequence of the billions of dollars flowing into artificial intelligence
this year. Last year, texttoimage tools hardly developed from the outset; now
there are a number of companies that give you the ability to transform
sentences into motion pictures with more refined and precise accuracy.
Runway is one of these companies, an AI video startup in
Brooklyn that's committed to making movies available for free. Its latest
model, Gen-2, allows users to not just generate a video from text, but also
change the style of an existing video based on a text prompt (for example,
turning a shot of cereal boxes on a tabletop into a nighttime cityscape,) in a
process it calls video-to-video.
'Our mission,' Runway CEO Cristobal Valenzuela said to TIME
in May, 'is to develop tools for the creative minds of people. He acknowledges
that it will have consequences for creative jobs, where some forms of technical
knowledge are quickly being replaced by AI tools, but believes the world on the
other side is worth its disruption. Our vision is a world in which people's
creativity grows and increases, but there are less things about the craft,
budget, technical specifications.