top of page
The Communiqué News

In an effort to keep ahead of industry rivals, Microsoft-backed OpenAI has announced its latest breakthrough, Sora, a cutting-edge text-to-video model.


Pritish Bagdi

This action demonstrates OpenAI's dedication to preserving a competitive edge in the fast-growing field of artificial intelligence (AI) in an era where text-to-video solutions are becoming increasingly popular.


What is Sora?

Sora, which means sky in Japanese, is a text-to-video diffusion model capable of producing minute-long films that are difficult to distinguish from the original.

OpenAI stated in a post on the X platform (formerly Twitter) that "Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions."

According to the manufacturer, the new model can create lifelike films from still photos or user-supplied footage.

"We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction," the post read.

How are you going to attempt it?

The majority of us will have to wait to use the new AI model. Even though the text-to-video model was unveiled by the corporation on February 15, it is now in the red-teaming stage.

Red teaming is the process of simulating real-world use by a group of experts called the "red team" to find flaws and vulnerabilities in the system.

"We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals," the business stated.

Nonetheless, the business posted a number of demonstrations in the blog post, with OpenAI's CEO providing videos of user-requested prompts on X.

How does it operate?

Consider beginning with a loud, static image on a TV and gradually eliminating the fuzziness to reveal a clean, moving video. That's what Sora does. This unique software employs "transformer architecture" to progressively eliminate noise and produce videos.

Not just frames by frames, but complete films can be produced at once by it. Users can direct the video's content by feeding the model text descriptions, such as ensuring that a person remains visible even if they briefly walk off-screen.

Consider GPT models that produce text by word. Similar actions are taken by Sora, but with pictures and movies. Videos are divided into smaller segments known as patches it.

"Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully," the company said in the blog post.

However, the company has not provided any details on what kind of data the model is trained on.
















Gemini is a multimodal model that can effortlessly comprehend and combine many sorts of information, including text, code, voice, image, and video, according to Demis Hassabis, CEO and Co-Founder of Google DeepMind.


Pritish Bagdi

Gemini is a multimodal model that can effortlessly comprehend and combine many sorts of information, including text, code, voice, image, and video, according to Demis Hassabis, CEO and Co-Founder of Google DeepMind.


Understand the #GeminiAI with this video:




Gemini is unique in that it is natively multimodal, meaning that different modalities don't require separate components to be sewn together. This innovative strategy, refined through extensive cross-team collaboration across Google teams, presents Gemini as a versatile and effective model that can operate on everything from mobile devices to data centers. Gemini's powerful multimodal reasoning, which allows it to precisely extract insights from large datasets, is one of its most notable qualities. The model is also capable of comprehending and producing well-written code in widely used programming languages.



But even as Google steps into this new AI era, accountability and security are still top priorities. Gemini is subjected to thorough safety reviews, which include toxicity and bias analyses. Google is aggressively working with outside specialists to resolve any potential blind spots and guarantee the moral use of the model.

The Bard chatbot is among the Google products that Gemini 1.0 is now being rolled out. There are plans to integrate Gemini 1.0 with Search, Ads, Chrome, and Duet AI. Nevertheless, the Bard update won't be made available in Europe unless regulators give its approval.

Gemini Pro is available to developers and enterprise users through Google Cloud Vertex AI or Google AI Studio's Gemini API. using Android 14, a new system feature called AICore will enable Android developers to create using Gemini Nano.








Shopify has unveiled a slew of upgrades in its Summer '23 Edition, with AI at the forefront. According to the corporation, the primary goal of these changes is to provide merchants with increased efficiency, creativity, and skills, ushering in a new age in commerce.


Pritish Bagdi

AI

Sidekick, an AI-enabled commerce assistant built particularly for Shopify business owners, is one of the highlights of the Summer '23 Edition. This application allows entrepreneurs to converse with Sidekick, sparking the creative process, improving store quality, streamlining workflow, and making more informed business decisions.

Shopify Magic, a collection of AI-enabled capabilities seamlessly integrated across the platform, improves the merchant experience even further. AI-generated, tailored FAQ responses for particular stores, instant development of blog entries for holidays and campaigns, and compelling commerce-oriented emails that are meticulously optimised are all notable characteristics.

Shopify has also launched Marketplace Connect, which allows merchants to sell directly on major marketplaces like Amazon, eBay, and Walmart from within the Shopify ecosystem. This streamlined programme makes it much easier to handle numerous sales channels by optimising order fulfillment, inventory management, and product listings.










bottom of page