This website uses cookies

Read our Privacy policy and Terms of use for more information.

Updates:

May 31, 2024

Occasionally, we will return and update our series about GenAI Unicorns. Today, for the first time, we examine a Chinese startup. Though Zhipu AI is not a new company, recently it has started to pop up in the news more often due to its consistent model deliveries and updates, varied partnerships, everyone's belief that they are an OpenAI rival, and the fact that it recently became Generative AI unicorn with $2.5 billion valuation. The story of Zhipu AI – an academic startup and one of the four new AI tigers of China (along with Moonshot AI, Baichuan, and MiniMax) – is a fascinating examination of how things are done in China, and also about a special kind of startup started at the university and led by PhDs and professors.

Zhipu AI’s CEO is sure that by the end of the year, they will be on par with GPT-4 and very close to AGI. However true this is and whatever Zhipu AI offers, let’s learn together.

In today’s episode

How this AI tiger was born

This is our journey to AGI (the mission)

First models

Recent research worth checking

Towards open-source

Financial situation

Zhipu AI Business Model and Partnerships

Zhipu AI vs OpenAI: What's Next

How Zhipu AI was born

In 2019, Zhipu.AI was established at Tsinghua University Science Park in Beijing, known as the "Center of the Universe" for tech startups. Founders Tang Jie and Li Juanzi, both professors at Tsinghua University's Department of Computer Science and Technology and active participants in the university's Knowledge Engineering Group (KEG), initially focused the company on building knowledge graphs to enhance research and innovation.

The early days of Zhipu AI in 2019 were challenging; as one of many academic startups without a clear business model, they struggled to secure initial investment. The administrative commission of Zhongguancun Science Park provided the team with rent-free office space for three months, which helped kickstart their development. But, in 2020 – before many others – in addition to their Knowledge Atlas business, they started to invest and develop the first large-model AI technologies, recognizing their potential. By September 2021, the company's shift in focus had paid off significantly with a major funding round, raising nearly $15 million from local venture capitalists.

The company's ethos is both academic and technologically idealistic. According to Zhang Peng, CEO of Zhipu AI and an alumnus of Tsinghua University, there is an internal saying: "No matter how much money we raise or how much money we make, it will be a hindrance on our road to AGI."

Currently, the company has more than 800 people, about 60 to 70% are related to research and development (R&D).

Zhipu AI's Mission: Making Machines Think Like Humans

Last December, Zhang Peng, CEO of Zhipu, said that “2024 will be the first year of AGI.” Our slogan is, he said in another interview, "Let machines think like humans", which is AGI.” And he thinks that these goals is what unites them with OpenAI (OpenAI recently switched to the term Superintelegence).

Today, I feel that Zhipu AI is undergoing a qualitative change from quantitative change, especially in terms of the emergence of large models. Standing at this juncture, whether looking forward or backward, I feel that our team is quite fortunate. The market has given us, hard workers, many opportunities and chances. Although the road of independent innovation and research and development will be difficult, we are still persevering. In addition, the development of AI will focus more on general artificial intelligence (AGI), achieving super cognitive intelligence beyond human level, realizing AI’s self-explanation, self-assessment, and self-supervision. At the same time, to ensure that the model’s performance aligns with human values and safety standards, a superalignment technology is under development with the goal of achieving machine automatic alignment with human intelligence and human values to enable model self-reflection and control.” Zhang Peng said.

Zhipu AI Models: GLM, ChatGLM, and GLM-4

The now famous ChatGLM conversational model open-sourced in 2023 won’t be possible without the General Language Model (GLM) model family forming its foundation. The GLM takes us back to 2021.

GLM and the first problems

Like many other startups in China, Zhipu has emerged from and still is in tight connection with the University. Tsinghua KEG initiated the GLM in December 2021 to create a highly accurate, bilingual language model for both Chinese and English. Unlike earlier models like GPT-3, GLM was engineered to perform robustly on single-server systems equipped with suitable GPUs, making advanced natural language processing (NLP) capabilities more accessible globally.

GLM's journey began with the ambitious goal of creating a 100-billion-parameter bilingual model. Immediately, they ran into a bunch of problems:

  • not enough investment for compute;

  • developing or adapting an algorithm for efficient bilingual training presented uncertainties.

  • the desired universal model required extensive resources which led to innovations in inference technologies.

Despite all these and with the aid of the Tsinghua PACMAN team (which focuses on parallel architecture and compiler technology), the project overcame hardware failures and software bugs during its early phases. Collaborative efforts with various tech platforms related to the university enabled the effective operation of pre-training algorithms across multiple systems.

Original model

By April 2022, GLM's training scaled across 96 A100 servers. It was featured at ACL 2022 under the paper titled “GLM: General Language Model Pretraining with Autoregressive Blank Infilling.” The model was open-sourced.

As with many other NLP models, GLM employs a Transformer-based architecture. The Transformer uses masked self-attention layers that allow it to consider the full context of the input data but GLM integrates innovative features:

  • Autoregressive Blank Infilling and 2D Positional Encoding: Enhancing text prediction capabilities by understanding the position and context of missing text spans within a sequence.

  • Arbitrary Order Span Prediction: This allows GLM to predict text in any order, providing flexibility essential for complex tasks.

  • Adaptive Task Training: GLM is designed to adapt its training strategy based on the task. By adjusting the number and length of the blanked spans, GLM can be tailored to excel at both NLU tasks, which may require fine-grained token-level predictions, and text generation tasks, which often benefit from longer, more continuous spans of text generation.

These technical advancements enable GLM to excel in tasks that include natural language understanding and text generation, often outperforming traditional models like BERT and GPT with fewer resources.

GLM-130B

In October 2023, the GLM model family saw an important update with the release of GLM-130B. This bilingual (English and Chinese) pre-trained language model, featuring 130 billion parameters, matches GPT-3 in size and capabilities but extends its accessibility and functionality through open-source availability and enhanced bilingual support. The model was prominently showcased in a paper “GLM-130B: An Open Bilingual Pre-trained Model” at ICLR 2023.

Key features and capabilities

  • Bilingual Performance: GLM-130B excels in handling tasks in English and Chinese due to its training on a diverse dataset comprising over 1.2 trillion tokens. It demonstrates superior performance over leading models like GPT-3 and significantly outperforms ERNIE TITAN 3.0 in Chinese language benchmarks.

  • Open-Source Availability: Unlike many large-scale models, GLM-130B is fully open-sourced, including its framework, model weights, and training logs →check GitHub repo.

  • Efficient Hardware Utilization: Through advanced techniques like INT4 quantization, GLM-130B operates efficiently on lower-end GPUs such as the RTX 3090 and RTX 2080 Ti, making cutting-edge AI technology more accessible to a broader audience. It supports inference on robust setups like an 8x A100 (40G) server, ensuring fast performance without high-end hardware.

This model was the sole entry from Asia to be recognized in Stanford's global evaluation in 2022.

ChatGLM

The company has demonstrated significant progress in its ChatGLM series, a line of Chinese-English bilingual dialogue models based on the GLM architecture. Each generation of ChatGLM has built on the previous, refining and extending capabilities to better meet user needs and technological advancements.

ChatGLM-6B

Launched in March 2023, the ChatGLM-6B model incorporates advanced quantization technology and is designed to run efficiently on consumer-grade graphics cards with a minimum of 6GB video memory. This model, a derivative of the open-sourced GLM-130B, supports robust bilingual capabilities through training on approximately 1 trillion tokens in both Chinese and English. It is optimized for nuanced conversations, featuring techniques such as supervised fine-tuning and RLHF. The model is fully open for academic research and commercial use, provided users complete a registration questionnaire. You can try the online demo on Hugging Face Spaces.

Later, launched Zhipu Qingyan, an assistant-based ChatGLM2.

ChatGLM2-6B

Released n June 2023, ChatGLM2-6B retains the user-friendly features of its predecessor while introducing significant enhancements:

  • Performance: Improved through pre-training on 1.4 trillion bilingual tokens and alignment with human preferences, showing substantial gains across various datasets.

  • Context Length: Expanded context capacity from 2K to 8K, supporting longer dialogue sessions, although it still has limitations with very long documents.

  • Inference Efficiency: Enhanced by the Multi-Query Attention technique, boosting inference speed by 42% and expanding supported dialogue length under constrained memory conditions.

ChatGLM3-6B

In collaboration with Tsinghua University’s KEG Laboratory, Zhipu AI developed ChatGLM3, further enhancing the model's functionality:

  • Advanced Training: Utilizes a diverse data set and optimized training strategies, making it the strongest performer in its class for tasks including semantics, mathematics, and code.

  • Extended Features: Supports complex interaction scenarios such as tool calling and code execution, catering to more specialized usage requirements.

  • Open Source Commitment: Continues Zhipu AI’s tradition of open-sourcing its models, promoting collaborative development and innovation within the community.

The developers claim that each version of ChatGLM has been designed with ethical and safe usage in mind. They also notify that despite efforts to ensure data accuracy, the stochastic nature of these models means that output reliability cannot be guaranteed, and they are susceptible to being misled by user input.

Other models in the Zhipu’s arsenal

  • Search enhancement model WebGLM (→read the paper)

  • Graph understanding model VisualGLM (→check GitHub)

  • Multimodal understanding model CogVLM (→read the paper)

  • Text-image understanding model CogView (→read the paper)

  • Code generation model CodeGeeX2-6B surpasses the 15 billion parameter StarCoder in performance, demonstrating enhanced coding capabilities across multiple programming languages with only 6GB of video memory required. 

  • Text quality evaluation model CritiqueLLM is designed to provide high-quality, low-cost scoring and explanations for text generated by large models.

  • AlignBench addresses the gap in evaluating the alignment of Chinese language models with human intentions, a crucial factor for their practical application. It is the first comprehensive benchmark tailored for Chinese LLMs, designed to assess models on their capability to follow instructions, understand intentions, and provide useful responses in real-world scenarios.

  • CogAgent, a visual GUI Agent based on the CogVLM architecture, capable of interpreting and interacting with GUI interfaces through visual modalities for more direct and effective decision-making.

    Image Credit: The original paper

GLM4

The team continues to refine its models. On January 16, 2024, Zhipu AI unveiled GLM-4, their new large language model. It approaches the capabilities of OpenAI's GPT-4, featuring enhanced support for longer contexts, stronger multimodal interactions, and accelerated inference speeds. This advancement significantly reduces computational costs while increasing the system's ability to handle multiple requests concurrently.

GLM-4 is designed as a multilingual model, adept at Q&A, multi-turn dialogues, and code generation. It excels in extended contextual understanding and enables the creation of personalized intelligent agents, simplifying the use of sophisticated AI technologies. These enhancements bolster GLM-4's utility across diverse applications, making it a robust tool for both developers and end-users.

Performance metrics for GLM-4 are impressive, with achievements in benchmark datasets like MMLU, GSM8K, MATH, BBH, HellaSwag, and HumanEval, where it closely matches or reaches GPT-4 levels. Additionally, GLM-4 shows remarkable proficiency in following instructions, with high compliance in both Chinese and English evaluations.

The model also demonstrates superior alignment capabilities, particularly in Chinese, suggesting an optimized understanding of nuanced linguistic elements. In long text assessments, GLM-4 surpasses Claude 2.1, and in the challenging "needle in the haystack" scenario, it maintains a perfect recall rate over lengthy spans.

In terms of multimodal capabilities, preliminary tests indicate that CogView3, a component of GLM-4, substantially outperforms DALLE3 in various metrics, underscoring GLM-4’s capacity to integrate and interpret visual and textual data effectively.

This combination of advancements solidifies GLM-4 as a highly potent development in the field of large language models.

UPDATE from Sept 9, 2024:

Zhipu AI has introduced a real-time video call feature for its chatbot, ChatGLM, allowing users to engage in live video conversations with AI. Launched at KDD 2024, the feature will be available to select users from August 30, with wider release plans. The AI accurately identifies objects and provides lifelike interactions, but occasionally misidentifies characters. Alongside this, Zhipu AI has upgraded its core models, including GLM-4-Plus, rivaling top models like GPT-4o and Llama3.1.

Zhipu AI Research: ChatGLM-Math and Latest Papers

ChatGLM-Math: Improving Math Problem-Solving in LLMs with a Self-Critique Pipeline

  • Researchers from Zhipu.AI and Tsinghua University developed a novel Self-Critique pipeline to enhance the math problem-solving abilities of LLMs without compromising language capabilities. This pipeline uses a Math-Critique model, trained from the LLM itself, to provide targeted feedback, improving mathematical precision. Tested on the challenging MATHUSEREVAL dataset, the approach showed significant performance improvements, surpassing larger models →read the paper

Zhipu AI Open-Source Initiative and Developer Fund

The company has launched an ambitious initiative to support the open-source AI community and energize local R&D and commercialization efforts in the LLM field. The company has established several support mechanisms, including:

  • an open-source fund that provides GPUs, cash, and free API access to assist developers

  • a $100 million "Z Ventures" startup fund aimed at innovative companies specializing in LLMs

  • expanding academic grants and industry partnerships through organizations like the China Computer Federation (CCF).

Zhipu AI Funding, Valuation, and Business Model

Its business model is centered around Model as a Service (MaaS), targeting the business-to-business (B2B) market. This approach focuses on selling AI-driven solutions and services directly to other businesses, leveraging their willingness to invest in advanced AI technologies. CEO Zhang Peng initially estimated a break-even point by 2026 or 2027, acknowledging that this timeline may need adjustment due to rapid changes in the market, such as rising computing costs and increasing competition.

Zhipu AI Business Model and Partnerships

Just a few recent partnerships:

  • BioGeometry Partnership: Developing a multimodal model linking human and biological languages to advance AI in life sciences and medical research →details

  • YanRong Tech Collaboration: Created a large-scale AI architecture with A100+ all-flash storage to enhance storage systems →details

  • Partnership with Qihoo: Integrated 360GLM model into Qihoo’s search engine to improve search and internet security.

  • Electronic Contracting with Shangshangshuo: Launched Hubble, the first AI product in electronic contracting using the GLM-130B model, to streamline contract management →details

Zhipu AI vs OpenAI: What's Next

Academic roots can be a powerful launchpad (as we’ve also seen in the story of Databricks). The rise of Zhipu AI from an academic project to an AI unicorn in just a few years is a testament to the unique dynamics and potential of China's AI startup ecosystem. Born out of Tsinghua University, the company leveraged its academic roots to attract top talent, forge key partnerships, and pioneer an open-source approach that has rapidly built a developer community and spurred adoption. Zhipu AI's idealistic mission to reach AGI, diverse industry partnerships, and B2B focus provide a unifying force and multiple paths to monetization. As Zhipu AI's string of cutting-edge model releases demonstrates, China's AI startups are emerging as serious global contenders. It’s important to closely track the AI pioneers emerging from China's universities. More AI tigers are coming.

Reply

Avatar

or to participate

Keep Reading