• Turing Post
  • Posts
  • Token 1.5: From Chain-of-Thoughts to Skeleton-of-Thoughts, and everything in between

Token 1.5: From Chain-of-Thoughts to Skeleton-of-Thoughts, and everything in between

How to distinguish all the 小oT-inspired concepts and use them for your projects


The groundbreaking paper by Google Brain at NeurIPS 2022 introduced the world to Chain-of-Thought Prompting (CoT). That changed a lot in prompting. But it didn鈥檛 stop there; it kicked off a whole new area of study, giving birth to various "chain" spin-offs and related research.

Just to give you an impression of the impact: a search for the keyword 鈥渃hain-of-thought鈥 pulls up 461 papers on Semantic Scholar and 374 on Arxiv. That鈥檚 a lot of papers! But we're not here to give you an exhaustive list you can find on any research platform. We aim to explore the papers with new ideas that have sprung from the original CoT research, map its influence, and decode the novelty and foundational principles of the followers.

In chronological order, we unfold the Chain-of-thought Lineage, with the following terms to explain:

  • Chain-of-thought prompting (recapping the fundamentals)

  • Self-consistency

  • Zero-Shot Chain-of-Thought (Zero-shot-CoT)

  • Automatic-Chain-of-Thought (Auto-CoT)

  • Program-of-Thoughts Prompting (PoT)

  • Multimodal Chain-of-Thought Reasoning (Multimodal-CoT)

  • Tree-of-Thoughts (ToT)

  • Graph-of-Thoughts (GoT)

  • Algorithm-of-Thoughts (AoT)

  • Skeleton-of-Thought (SoT)

No more confusion around CoT. Please use this up-to-date "dictionary" as a reliable reference point for understanding the complexities of this evolving field.

Recapping the fundamentals

Before diving into the nuanced world of chain-of-thought prompting 鈥 a specialized form of basic prompting 鈥 it's essential to revisit the foundational terminology in the prompting ecosystem.

Zero-shot prompting

The term "zero-shot prompting" derives from the concept of zero-shot learning*.

*Zero-shot learning is a model's ability to complete a task without having received or used any training examples.

When we apply this intuition to prompting, this means that our prompt doesn鈥檛 contain any additional information for the model and provide any examples. In other words, the model can only use the knowledge it acquired during its training to produce the output.

Task: Named Entity Recognition

Prompt: "Identify the name of the person in the sentence: 'Steve Jobs founded Apple.'"

Response from Model: Steve Jobs

In this brief example, the language model identifies the name "Steve Jobs" based on the prompt, without requiring any previous examples for named entity recognition. This effectively demonstrates the power of zero-shot prompting in action.

Zero-shot prompting contributes to the widespread adoption of LLMs. But sometimes it鈥檚 just not enough for a desired outcome. Adding a few examples for the model can help improve its output. And this is what we call few-shot prompting.

Few-shot prompting

Similar to its zero-shot counterpart, few-shot prompting also finds its roots in a similarly named learning approach: few-shot learning*.

*Few-shot learning is the process when a pre-trained model is given only a few examples to learn about the new, previously unseen category of data.

Few-shot prompting can be used as a technique to enable in-context learning* where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.

*In-context learning (ICL) is a specific method of prompt engineering where demonstrations of the task are provided to the model as part of the prompt (in natural language).
Task: Sentiment Analysis

"The sun is shining." - Positive
"I lost my keys." - Negative
"How is the statement 'The movie was a hit' classified?"

Response from Model: Positive

In this example, the first two statements serve as demonstrations to guide the model's behavior. The model then classifies the third statement as "Positive," using the preceding examples for contextual understanding.

Having established these foundational techniques, we now arrive at the intriguing world of chain-of-thought prompting. Let's talk about it.

Chain-of-thought prompting

For tasks demanding intricate reasoning and sequential thinking, merely providing a few examples proves to be insufficient. To address this, researchers suggested to use of a new technique called chain-of-thought prompting.

This new technique consists of modifying the original few-shot prompting by adding examples of problems and their solutions and a detailed description of intermediate reasoning steps while describing the solution. Consider this example from the original paper:

Image Credit: CoT Original Paper

The authors of this approach showed how complex reasoning abilities emerge naturally in sufficiently large LMs via a chain-of-thought prompting. A series of intermediate reasoning steps for a given task significantly improves the ability of LLMs to perform complex reasoning.

Crucially, chain-of-thought prompting is an emergent ability tied to model scale, as identified by the authors in the paper 鈥Emergent Abilities of Large Language Models.鈥澛 Chain-of-thought prompting does not positively impact performance for small models and only yields performance gains when used with models of 鈭100B parameters.

Chain-of-thought Lineage

While various prompting techniques exist, this review focuses on those intimately connected with chain-of-thought prompting. To ensure coherence, we'll proceed in chronological order to trace the evolution of ideas.

March 2022: Self-consistency

Researchers from Google Brain 鈥 most of whom authored the original paper 鈥introduced a nuanced technique they termed 'self-consistency.' Rather than overhauling the prompting landscape, this technique tweaks the response-generation phase. It proposes sampling an array of reasoning paths, as opposed to a single, definitive answer.

Intuition Behind the Approach. As problem complexity scales, so does the multiplicity of valid solution paths. Mirroring human cognitive processes, the authors advocate for choosing the most consistent answer among the generated options.

To illustrate that, the process consists of three main steps:

  1. Prompt a language model using chain-of-thought (CoT) prompting.

  2. Replace the standard step in CoT prompting with sampling from the language model's decoder, thereby generating a diverse array of reasoning paths.

  3. Evaluate these reasoning paths and select the most consistent answer as the model's final output.

May 2022: Zero-Shot Chain-of-Thought (almost poetry here!) or just Zero-shot-CoT 鈫 鈥楲et鈥檚 think step by step鈥 approach

Emerging five months after the original CoT paper, 鈥Large Language Models are Zero-Shot Reasoners鈥 was the result of a joint effort between the University of Tokyo and Google Brain. Intriguingly, the paper proposes replacing the entire CoT framework with a single phrase: "Let's think step by step."

It means that we鈥檙e not already in the few-shot paradigm as we moved to zero-shot. The authors demonstrated that LLMs have an internal 鈥渦nderstanding鈥 of CoT that can be invoked by one phrase in the prompt without adding any examples!

October 2022: Automatic Chain of Thought (Auto-CoT) 鈫

The authors of this paper decided to go further in automation. They proposed the Auto-CoT method to automatically construct demonstrations (examples) that were crafted manually under the original CoT approach.

To make Auto-CoT work, we need a dataset with sample reasoning tasks without solutions. Here is how the method works:

  1. The method defines clusters of sample tasks and divides them into these clusters.

  2. From each cluster, the method selects a representative question and generates the reasoning chain for each of the questions using Zero-shot-CoT.

  3. When the user prompts a new task for the model, the method appends the sample tasks and the solutions from step 2 that represent the few-shot examples for the model.

November 2022: Program-of-Thoughts Prompting (PoT) 鈫

Addressing limitations in LLMs, the Program-of-Thoughts Prompting (PoT) employs a dual-system strategy: leveraging Codex for text and programming statements, subsequently executed by an external interpreter like Python. This approach shines in numerical computations and complex mathematical tasks.

February 2023: Multimodal Chain-of-Thought Reasoning 鈫

This innovative paper introduces Multimodal-CoT, focusing primarily on the interplay between vision and language.

Multimodal-CoT operates in two distinct stages:

  • Rationale Generation. In this initial stage, the model is fed with both language and visual inputs to generate what are termed 'rationales.'

  • Answer Inference. Subsequently, in the answer inference stage, the rationale generated in the first stage is appended to the original language input. This modified language input, along with the original visual input, is then fed back into the model to derive the final answer.

While both stages employ the same underlying model architecture, they differ in their input-output dynamics.

May 2023: Tree of Thoughts (ToT) 鈫

Tree of Thoughts (ToT) introduces a novel, structured approach to problem-solving that could revolutionize the way language models tackle complex queries.

Researchers in the domain of human problem-solving have observed that human cognition often navigates through a combinatorial problem space, employing a series of heuristics to guide decision-making.

ToT adopts a more human-like approach to problem-solving by framing each task as a search across a tree of possibilities. Each node in this tree represents a partial solution. The core of ToT can be distilled into answering four essential questions:

  • Thought Decomposition: Unlike Chain of Thought (CoT), which doesn't delineate intermediate steps explicitly, ToT utilizes problem characteristics to segment the process into distinct thought steps.

  • Thought Generation: This phase leverages two strategies鈥攅ither sampling independently identical distributed (i.i.d.) thoughts from a CoT prompt, which is ideal for problems with expansive thought spaces, or sequentially proposing thoughts using a "propose prompt," best suited for problems with more constrained thought spaces.

  • State Evaluation: At this juncture, the ToT framework uses heuristic methods to evaluate states. There are two strategies under consideration: one that values each state independently and another that casts a vote across multiple states.

  • Search Algorithm: Depending on the structure of the problem tree, different search algorithms like Breadth-First Search (BFS) and Depth-First Search (DFS) can be deployed.

August 2023: Graph of thoughts (GoT) 鈫

Researchers have recently introduced the Graph of Thoughts (GoT) framework, designed to augment the reasoning capabilities of LLMs using a graph-based structure. This framework surpasses existing prompting techniques like Chain-of-Thought (CoT) by providing a structured, extensible mechanism for thought transformations, evaluations, and rankings.

The GoT framework is built as a set of interacting modules:

  • Prompter

  • Parser

  • Scoring module

  • Controller.

Each module performs a specialized task in the reasoning process, ranging from preparing the prompt to validating and scoring the generated thoughts. The Controller coordinates these modules and also houses two key elements: the Graph of Operations (GoO) and the Graph Reasoning State (GRS), which further assist in managing the LLM reasoning process.

September 2023: Algorithm-of-Thoughts (AoT) 鈫

The authors argue that while CoT generally improves the coherency of solution paths, it's prone to errors and biases. They propose an experimental framework to tackle this issue, relying heavily on search algorithms like Depth-First Search (DFS) and Breadth-First Search (BFS).

Core Concepts:

  • Algorithmic Reasoning Pathways: This strategy teaches LLMs to think more like algorithms. It improves the model's reasoning abilities by guiding it through well-defined logical steps.

  • In-Context Learning: Instead of pausing and restarting the LLM for each new piece of information, this method allows the model to learn and reason in a more continuous, fluid manner. This reduces computational overhead and costs.

  • Single or Few Queries: One of the big wins is the reduction in the number of queries needed for reasoning. This saves time, money, and computational resources.

October 2023: Skeleton-of-Thought (SoT) 鈫

Researchers have advanced the field of machine learning further with the introduction of the "Skeleton-of-Thought" (SoT) approach, a method that reimagines how LLMs generate text. Instead of building responses sequentially, SoT uses parallelism to enhance speed and accuracy. At its core, SoT creates a concise "skeleton" of the answer, then fills in details in parallel, mimicking the organized way humans think.

Key Components

  • Skeleton Stage: Utilizing a template, the LLM crafts a skeletal response to the user's question, from which key points are extracted.

  • Point-Expanding Stage: These points are then expanded upon in parallel, leading to a more detailed final answer.

Parallel decoding not only boosts computational efficiency but also significantly reduces end-to-end latency.


The Chain-of-Thought paper kicked off a flurry of new prompting methods, each aiming to better mirror how people think. From Self-Consistency to Skeleton-of-Thought, these techniques aren't just incremental improvements; they're part of a bigger push to make machine reasoning more like human thought. We are anticipating further development in this field and will keep you posted about it.

While we've dug deep into the ideas behind the Chain-of-Thought paper, know that the word "chain" isn't just stuck there. There are new ideas and methods (such as chain-of-verification (CoVe), chain-of-density (CoD), etc.) that also use the term "chain," but take it in different directions. We'll break down these other "chain" ideas in our next tokens. Stay tuned!

Chain-of-Thought Applications

We've compiled a list of research papers that detail the application of the 'Chain-of-Thought' methodology as a foundational baseline across various domains.

Thank you for reading. You can leave your feedback in the comment section.

Subscribe to keep reading

This content is free, but you must be subscribed to Turing Post to continue reading.

Already a subscriber?Sign In.Not now

Join the conversation

or to participate.