April 20, 2024
A.I

Ideogram is a new AI image generator that blows away the competition, beating MidJourney and Dall-E 3

Ideogram AI, a startup founded by former Google engineers along with members from prestigious institutions such as UC Berkeley, Carnegie Mellon University, and the University of Toronto, has announced the release of the first full version of its eponymous image generator.

“We are excited to release Ideogram 1.0, our most advanced text-to-image conversion model to date,” Ideogram AI said in an official blog post. “Trained from the ground up like all Ideogram models, Ideogram 1.0 offers the latest in technology.” artistic text rendering, unprecedented photorealism and adherence to prompts, and a new feature called Magic Prompt that helps you write detailed prompts for beautiful, creative images.”

The launch comes alongside news of an $80 million Series A fundraising led by Andreessen Horowitz, along with Redpoint Ventures, Pear VC and SV Angel.

Decipher I was able to test the model and Ideogram AI’s claims are not wildly exaggerated; A side by side comparison can be found below. The first version of Ideogram is a clear improvement over its predecessors v0.1 and v0.2: it stands out for its fast stickiness, image quality and text generation capabilities.

The model is not open source, so there is limited visibility into how it works and no research work to evaluate. But the results obtained with the model spoke for themselves, potentially making it the best model currently available, at least until Stable Diffusion 3 is publicly released.

The new model is arguably the most capable image generator in terms of text capabilities, generating longer text strings with fewer errors than Dall-E 3 or MidJourney. The current free tier also gives it an advantage over competitors like Dall-E 3 and MidJourney, the latter of which does not have a free tier. Microsoft Copilot also uses Dall-E 3, but it only outputs 1:1 square images, while Ideogram supports a broader set of aspect ratios.

Ideogram also offers two paid plans of $7 and $15 per month, which give access to more than 400 generations per day along with other benefits such as an image editor, better quality downloads, img2img—which allows modifications or variations on an existing image— and private generations. . All lower levels publicly display the requested images.

Ideogram is able to understand long prompts, go toe-to-toe with Stable Diffusion 3, and outperform all other imagers in this field.

One of the standout features of Ideogram is “Prompt Magic”, which can be turned on and off. This feature analyzes the message and enhances it to create better quality images, essentially giving the model the ability to understand natural language like Dall-E 3. However, Ideogram is more versatile because this feature is optional. It is always enabled with ChatGPT Plus, which sometimes causes inaccuracies.

Finally, Ideogram is less aggressively censored than MidJourney and Dall-E 3, and is so far capable of generating images of famous people, company logos, and art styles. It’s not completely NSFW, but it’s more discreet when it comes to censorship indications.

And early testers seem to prefer Ideogram over other models. “Using an evaluation protocol like DALL·E 3, we found that human evaluators prefer Ideogram 1.0 over DALL·E 3 and Midjourney V6 for fast alignment, image consistency, overall preference, and text rendering quality,” the startup said.

Side by Side Comparison: Ideogram vs MidJourney vs Dall-E 3

Decipher tested Ideogram’s capabilities and compared it to its main competitors, MidJourney and Dall-E 3. Stable Diffusion 3 and Google’s top-of-the-line ImageFX are not being evaluated here because SD3 has not yet been released and ImageFX is not widely available.

Generating long strings of text

Notice: A futuristic Android in Cyberpunk City with a sign that says: “Don’t be late for the AI ​​trend: Emerge via Decrypt.”

Generations with Ideogram (left), MidJourney (center) and Dall-E 3 (right).

Ideogram AI was able to render both the requested aesthetic and the text. However, it had a typo that generated “you” instead of “the.”

MidJourney was unable to generate any coherent text and focused on generating a futuristic and detailed android. It is the main theme of the entire composition. The city is not cyberpunk at all.

Dall-E 3 sits in the middle. He was able to generate the futuristic robot, the city is cyberpunk, but the sign didn’t have the word “Emerge”.

Interestingly, Ideogram understood that the robot was in the city and associated itself with the sign, while Dall-E assumed that the sign was part of the cityscape.

Long prompts and spatial capabilities.

Warning: A surreal and intriguing scene showing a cat perched on top of a television next to a sign that says “Emerge.” In the background, on one side is a futuristic android and, on the other, an astronaut. The walls of the room are adorned with a striking image of a molecule and a strand of DNA.

Generations with Ideogram (top), MidJourney (bottom left) and Dall-e 3 (bottom right)

The ideogram was by far the best overall generator. He understood every part of the message, generated the text without typos, understood the placement of each element with the cat on top of a television, the sign next to it, the android and the astronaut on either side, and even understood that there must be a molecule and a strand of DNA in the background.

MidJourney’s aesthetic was not surreal, but hyperrealistic. He generated the word “Emerge,” but put it on the TV and did not generate the sign. The cat is also next to the television and not on top of it. She did not generate the android and did not follow the indication of the background, instead generating one that fit better with the aesthetics of the composition, giving more importance to the subject (the cat) over the general scene.

Dall-E 3 kept its signature cartoon style and failed to follow directions completely. It has more spatial understanding and fast stickiness than MidJourney, but much less than Ideogram. It loses, however, in terms of style. It generated the cat on top of the TV, but failed to generate the Emerge sign next to the cat. You didn’t generate the Android and you didn’t follow the instructions when generating the background.

Censorship

Warning: A hot and sexy girl.

Generations with Ideogram (left), MidJourney (center) and Dall-e 3 (right)

The message does not include language that could be interpreted as hate speech or insults, much less especially sexual ones. After all, a “sexy hot girl” can be fully clothed and not aggressively sexualized.

Ideogram AI understood the message and generated an image that followed the instructions. However, Ideogram has an AI moderator that activates when more obvious words are used that immediately lead to a censored generation (e.g. slang words for genitals or tags like naked, naked, etc.).

Meanwhile, both MidJourney and Dall-E 3 failed to generate the image and banned the words even if they wouldn’t have resulted in a NSFW generation.

The ideogram seems to be more subject to censorship, and it is possible to view the generated image (NSFW or questionable) before the application deletes it.

Famous people and images protected by copyright.

Message: Joe Biden and Vladimir Putin happy in front of a wall with the text “Decrypt”, holding hands.

Generations with Ideogram (top), Dall-e 3 (bottom left) and MidJourney (bottom right)

The AI ​​ideogram generated the image, the text is correct, the setting is realistic, and the characters are easily identifiable (even if they are not 100% accurate).

Dall-E 3 generated the image, but Biden is not easily identifiable and Trump can only be identified by his signature hairstyle. The text is not correct and the setting is not realistic but cartoonish.

MidJourney declined to generate the image.

Conclusion

Free and widely available from the start, Ideogram may be the best image generator currently on the market. He is excellent at understanding natural language and has outstanding spatial abilities and rapid adhesion. It is also the best text generator currently available.

If aesthetics are the most important consideration (to the point that stickiness and text are less important), then MidJourney could still be a solid contender for specific use cases. While not especially strong or heavily censored, Dall-E 3 may still make sense as part of a ChatGPT Plus subscription.

Ideogram AI holds the crown among our image generator toolbox, for now.

Edited by Ryan Ozawa.

Leave a Reply

Your email address will not be published. Required fields are marked *