• Turing Post
  • Posts
  • FOD#8: Generative AI hype is overwhelming

FOD#8: Generative AI hype is overwhelming

But so many incredible things happen in AI&ML technology and research!

Froth on the Daydream (FOD) – our weekly summary of over 150 AI newsletters. We connect the dots and cut through the froth, bringing you a comprehensive picture of the ever-evolving AI landscape. Stay tuned for clarity amidst the surrealism and experimentation.

Today, we discuss the mass adoption of AI through sports, become aware of 'Model Collapse', explore the active activity of governments on AI regulatory front, highlight the latest advancements in generative AI and robotic, discover computer vision trends with Andrew Ng, introduce Inflection AI and its founders, and share a few noteworthy papers, articles, and releases for further reading and exploration in the AI field. Enjoy!

It's a peculiar situation with AI. For many years, we've been exposed to and guided by an AI-altered reality through social network algorithms, e-commerce recommendations, Instagram filters, iPhone face recognition and more. But it took generative AI to get us to the point where we started to speak about AI mass adoption and its consequences. We are at the end of the hype cycle, soon it all goes down and we will be able to concentrate on a really interesting technical and ethical part of AI. Let’s see what the last week brought us.

What’s New in Generative AI

CBInsights published their annual list of 100 most promising private AI companies in the world.

The last two weeks have seen generative AI in the audio and image sectors gaining significant momentum.

Meta recently unveiled Voicebox, an innovative text-to-speech generative AI model, while Google introduced AudioPaLM, a multimodal speech model that incorporates techniques from both PaLM2 and AudioLM. ElevenLabs, a company specializing in speech and voice cloning software, was mentioned in many newsletters, after its round of $19 million, led by a16z.

Last week was also big for image creation: we saw Midjourney's version 5.2 release, that offers enhancements in aesthetics, coherence, text understanding, sharper images, and outpainting. WhyTryAI gave it a try.

Stability AI launched SDXL 0.9, capable of generating hyper-realistic images, thereby marking a significant leap in the creative use cases for generative AI imagery. The company has specifically focused on improving the realism of human hands in their output.

Robotic Achievements

Google DeepMind just announced a RoboCat, a self-improving AI agent. With as few as 100 demonstrations, RoboCat can pick up a new task, thanks to its diverse dataset. It's not just learning tasks across different robotic arms, it's also generating its own training data to refine its skills.

Robots introduced by Carnegie Mellon University are learning to do chores by watching YouTube. If only it worked with kids! CMU researchers have developed an algorithm, VRB, that enables robots to learn tasks from videos. Unlike its predecessor, WHIRL, VRB doesn't require the robot to operate in the same setting as the video. The robots learn by identifying contact points and trajectory in tasks, like opening drawers. The researchers are using vast video databases for training data, turning our procrastination into robotic education.

Trends in Computer Vision

Andrew Ng recently went to the Computer Vision and Pattern Recognition Conference (CVPR) in Vancouver, Canada, and was quite impressed by the active interest in the computer vision field. He thinks that big changes could be coming soon and also pointed out some trends he noticed at CVPR:

  1. Vision transformers are starting to be seen as a good alternative to the usual convolutional neural networks.

  2. There's interesting work going on in image editing and giving users more control in image generation, especially when it comes to making faces.

  3. Neural Radiance Fields (NeRF) are getting a lot of attention for turning 2D images into 3D scenes. People are working on making them more scalable, efficient, and better at handling moving scenes.

  4. There's growing interest in multimodal models that can process both image and text inputs using transformers.

  5. Research continues on self-driving cars, and there's a feeling that big, pretrained transformers could be key to progress in this area.

Worth paying attention (papers, articles and releases):

  • "Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training" presents a new method for hyperparameter optimization using a Gaussian process and the concept of a Pareto frontier, designed for large-scale model training.

  • "How reinforcement learning from human feedback (RLHF) works" offers insight into how reinforcement learning models utilize human feedback to improve their performance.

  • "LLM Survey Report - MLOps Community" explores the diverse applications of LLMs, the challenges in deploying them, and the innovative solutions being applied.

  • "Eliminating Bias in AI/ML" explores the concept of bias in AI and ML, arguing that it's a complex issue that requires a thorough understanding for effective mitigation.

  • "Fast Segment Anything" introduces FastSAM, a real-time solution for the 'segment anything' task in computer vision, which combines instance segmentation with prompt-guided selection to achieve efficient and effective segmentation.

  • MosaicML introduces MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on H100s.

  • HuggingFace launches OpenLLaMA, an open-source version of Meta’s LLM LLaMA.

What I want to try next week:

  • GPT-Engineer a novel tool that generates software or games based on user concepts, asking clarifying questions and writing the necessary code until the project is complete.

What else is happening around AI

Mass adoption

From the man-of-the-street perspective, it becomes real when sports is involved. So here we go, this year, in a partnership with tech giant IBM, Wimbledon introduces AI-powered commentary. This isn't just a love game, it's a game changer. The AI commentary will be available on Wimbledon's app and website, separate from the BBC's coverage. The AI has been trained in the unique language of tennis (ooh! aah!), ready to deliver a smashing performance and deeper insights. This is a step towards full AI commentary on matches, and a clear sign that the future of sports coverage is here. So now, AI is officially real. Soon, on every screen of the world (as it basically was before, but who remembers that).

Synthetic data

As generative AI keeps generating, the researchers might run into a problem: their data will be generated on generated data. I know, mind boggling. In the paper, The Curse of Recursion: Training on Generated Data Makes Models Forget, the researchers discussed a phenomenon called 'Model Collapse' that causes concern. It happens when these models are trained on their own generated content, leading to irreversible defects and loss of original content. This issue isn't exclusive to language models but also affects Variational Autoencoders and Gaussian Mixture Models. The paper stresses the importance of addressing this issue to maintain the benefits of large-scale web data training and highlights the growing value of genuine human interaction data as language models become more prevalent online. Reddit’s CEO is rubbing his hands hoping for his new business model based on API high price to work.

Governments are in full attention

Wimbledon hasn’t even started, but governments are already on top of the AI problem. Last week, the Senate finally invited leaders from the open-source community. Here is a tweet from Clement Delangue, co-founder and CEO of Hugging Face, one of the most significant forces behind the open-source AI movement.

Senate Majority Leader Chuck Schumer (D-N.Y.) is ready to take charge in the realm of AI regulation with his "SAFE Innovation framework." He aims to protect, expand, and harness AI's potential while ensuring national security, economic stability for workers, and democratic ideals. With accountability, he tackles copyright issues, disinformation, and bias. Schumer wants businesses to explain AI systems' answers in simple terms. Ha! Hoping for swift action, Schumer urges Congress to join the AI revolution and produce a bill in the coming months. Usually, Congress often waits years or even decades before establishing guardrails for new industries. There is buzz around town, though, that ChatGPT can make everyone 10 times more efficient.

President Biden (reads: bAIden) also noticed that everything happens much faster now. During a meeting with AI experts like Fei-Fei Li and Oren Etzioni, he expressed his anticipation of witnessing "more change in the next 10 years than we've seen in the last 50 years and maybe beyond that," with AI driving this transformation. The Biden administration swiftly responded by announcing that the White House chief of staff's office actively conducts frequent meetings, two to three times a week, to develop secure AI strategies. Phew!

New Name in LLMs – Inflection AI

In the same week, the assiduous President Biden attended a fundraiser co-hosted by Reid Hoffman. What is Reid Hoffman famous for? A few things: he is the creator of LinkedIn, he introduced Mark Zuckerberg to Peter Thiel and then became his co-investor in Facebook's first financing round, he was an initial investor in OpenAI and served as its board member until March 2023, and he is also a co-founder of Inflection AI, AI studio and the creators of Pi chatbot, which he started in March 2022 with his long-term friend Mustafa Suleyman (co-founder of DeepMind and former colleague at Greylock). A heavyweight in the industry.

Inflection just unveiled its advanced language model Inflection-1, which, according to a technical memo, is the best model in its compute class, outperforming GPT-3.5, LLaMA, Chinchilla, and PaLM-540B on a wide range of benchmarks commonly used for comparing LLMs.

To further captivate everyone's attention, Inflection co-founder Mustafa Suleyman suggested that the traditional Turing test, proposed by Alan Turing in 1950, is no longer relevant for measuring AI. Instead, he proposes a new test called the "modern Turing test" or artificial capable intelligence (ACI). This test evaluates AI based on its ability to set and achieve complex goals with minimal human intervention, rather than solely focusing on artificial general intelligence (AGI) that matches or surpasses human cognitive abilities. Suleyman proposes an experiment where an AI is given $100,000 and challenged to turn it into $1 million through research, product blueprint creation, finding a manufacturer, and successfully selling the item on platforms like Amazon or Walmart. Suleyman believes that AI will achieve this threshold within the next two years and emphasizes the significance of practical applications of AI beyond conversational abilities. How many additional combinations of three letters containing "A" and "I" will we encounter?

Inflection AI is also mentioned in this post by AIsupremacy, where they offer Six more companies competing with OpenAI.

Speaking about OpenAI…


In the EU, as Time reports, OpenAI was caught in the act of lobbying fiercely to prevent their powerful AI systems, including GPT-3 and DALL-E, from being labeled as "high risk" under the EU AI Act. Their efforts paid off in the final draft, as their creations were spared from that designation. However, OpenAI's victory is not complete, as they now face more rigorous transparency requirements. At this moment, none of the companies comply with the Draft of EU AI Act.

Transparency is of huge importance! AI Snake Oil provides an insightful post suggesting that transparency reporting is technically feasible and can be mostly automated, requiring only a small sample of user interactions for analysis. Additionally, this Stanford-Princeton team offers three ideas for regulating generative AI.

Thank you for reading, please feel free to share with your friends and colleagues. Every referral will eventually lead to some great gifts 🤍

Join the conversation

or to participate.