This website uses cookies

Read our Privacy policy and Terms of use for more information.

Welcome to our new series! AI Infrastructure Unicorns. These companies provide the hardware, software, and services necessary for Generative AI startups but even if GenAI will someday become extinct, these infrastructure builders won’t stay without the job as they serve a much bigger industry of AI/ML models in general.

Introduction

Next up on our list of AI infrastructure unicorns is Scale AI. Scale AI boasts one of the largest valuations in the highly competitive data labeling market. Despite facing early competition from Amazon’s Mechanical Turk and increasing challenges thereafter, it has managed to sustain itself, and this year marks its 8th anniversary! With its extensive history, Scale has undergone five funding rounds, the most recent in 2021, valuing it at $7.3 billion, and two secondary market rounds, with the most recent in September 2023. Just this week, The Information shared that 'CEO Alexandr Wang last year privately said he wanted to raise funding at a valuation of as much as $14 billion.' Roughly twice as much as the latest valuation!

Scale AI is also known for playing on all fields, starting from self-driving cars and ending with the military complex. Wang’s secret has to be his especially keen attunement with trends. He pivoted and offered new products just slightly before most of his competitors. Let’s take a closer look at Scale AI’s impressive journey of growth and expansion.

  1. Scale AI's beginnings: Starting a company without a clear vision

  2. Scale AI’s how-to succeed

  3. Main pivots + path to Generative AI

  4. Competition

  5. Products and acquisitions

  6. The AI War and How to Win It – scratching the military potential

  7. Scale AI in the present

  8. Scale AI’s mission throughout the years

  9. Funding rounds

  10. Conclusion

Scale AI's beginnings: Starting a company without a clear vision

Alexandr (Alex) Wang and Lucy Guo, the dynamic duo behind what is now known as Scale AI, first crossed paths at Quora, a platform dedicated to the exchange of knowledge through questions and answers. Alex, a mere 18 years old, had already ascended to the role of tech lead at Quora where Lucy, at 21, worked as a product designer with a software development background.

It was 2015 when they envisioned their first startup, a mobile app that would help people find and book doctors’ appointments. “But we couldn’t focus on product dev because we were just calling doctors all day,” recalls Lucy.

App to book doctor's appointments? One of their roommates jokingly suggests, "Hah! You should create an API for humans."

Lucy reflected on a challenge they faced in their early startup days: “We wished there were an API that would call the doctors for us every time someone made an appointment in the app.” This need became apparent not just in their venture but also during their experiences at Quora and Snapchat. Both these companies relied on manual content moderation processes for things like handling images and flagged posts.

Their roommate's joke was like a revelation. The young co-founders recognized how valuable human annotation and data labeling were for the business, especially in these times of growing AI potential, which needed massive amounts of data to be trained on (what’s called supervised learning*). As Accel VC, the venture capital firm that supported Scale AI's early funding rounds, pointed out, "Alex understood that progress in AI would not hinge on algorithms or technical limitations but on data availability."

Supervised learning is an ML method where models are trained using labeled data to predict outcomes from inputs.

Motivated by this understanding and after accepting an initial investment from YCombinator, Scale was born. It was offering developers an API they can plug into an app to automate the human-powered functions. What they proposed was anything from appointment scheduling to more complicated matters like content moderation, transcriptions, and more.

From its inception, when the startup was just three weeks old, early adopters like Houzz, HigherMe, Hush, RealTalk, and seven others began testing its services. Subsequently, major corporations such as P&G, Uber, and Alphabet became clients. Despite engaging with high-profile clients, Scale initially had no clear focus on specific labeling tasks, accepting a broad range, which demonstrated versatility but also posed scalability challenges for the fledgling startup. So how did they succeed in these first steps?

Scale’s how-to succeed

AI

2015 was also the time when AI took off. "After a half-decade of quiet breakthroughs in artificial intelligence, 2015 has been a landmark year," starts the insightful Bloomberg article covering market trends of this time. The availability of powerful cloud computing infrastructure, collected data, and cost-effective software development tools, played a critical role. All that made neural networks, the apogee of AI development of this time, much more accessible and affordable.

Notably, Google open-sourced TensorFlow, and Facebook shared its AI hardware design. That's led to rapid uptake by the tech industry's largest companies. Innovations like Google’s model that mastered Atari games, Microsoft’s new Skype system that can automatically translate from one language to another, and others emerged. Elon Musk and Sam Altman unveiled a $1 billion nonprofit called OpenAI. The public was not yet aware of AI's promises, but if you were in the tech world, you would just need to pay attention. Wang was very good at that.

All these companies needed data to train their AI systems. According to the data from CrowdFlower, which supplies structured data to companies, in 2015, it saw a dramatic uptick in the amount of data being requested by businesses to help them conduct AI research.

Scale API’s advantages

Before Scale API's launch in 2016, companies faced challenges with data labeling, typically relying on in-house teams or platforms like Amazon's Mechanical Turk. In-house teams were costly and limited to large tech companies, while the latter, though more accessible, often lacked quality control and was fraught with spam, as noted by Y Combinator's Jared Friedman and Scale’s Alexandr Wang.

Scale API introduced a game-changing solution by simplifying the data labeling process. Developers could integrate Scale with a simple line of code, enabling on-demand task completion. Scale's system involved routing data to its servers for initial software-based labeling, followed by human contractors for final edits and quality assurance, ensuring high-quality outputs.

“On Mechanical Turk, it’s basically a crowdsourced model where anybody can sign up to be a Turker, I think is what they call them. That’s caused quality to be very low as a result. When companies reach a certain scale or have a need for quality, Mechanical Turk doesn’t cut it.”said Alexandr Wang.

What distinguished Scale was its focus on quality, achieved through a rigorous vetting process for its contractors, known as Scalers. Contrary to the pay-per-task model of Turk, Scale compensated its workers with an hourly rate, significantly higher than what was typically earned on similar platforms. This payment structure encouraged thoroughness over speed.

Scale's service was more responsive and versatile than Mechanical Turk, providing on-demand tasks and a wider range of services, including phone calls. Scale democratized access to quality human labor, allowing companies of all sizes to benefit from a workforce comparable to that of tech giants like Google and Facebook.

Was they that better than the others? Maybe, but it’s possible that the significant role of their success played the people who were supporting them.

Later, Alex Wang admitted that the most challenging task personally for him was not the competition on the market but building a team of the best people who “can do things” instead of doing things himself; and learning how to do sales. He also shared the names of his mentors: “People in Silicon Valley are incredibly helpful. To name a few: Dan Levine, Mike Volpi, Nat Friedman, Adam D’Angelo, Ilya Sukhar, Jonathan Swanson, Albert Ni, Jeff Arnold, Charlie Cheever, and Drew Houston to name a few. I’m very very lucky.”

It's interesting how often Nat Friedman (ex-CEO of GitHub) and Adam D'Angelo (Quora's founder) names come up in the news around large companies. These guys are very influential, though also young. With the upcoming US elections, we allow ourselves a note: in the USA, the President must be at least 35 years of age. Meanwhile, quite a few of the biggest and most successful companies in this country are run or advised by people younger than that.

Back to Scale 😉 How did Scale manage the burgeoning number of tasks following their first investment by Y Combinator, and what strategies did they employ? →

Main pivots + path to Generative AI

Riding the self-driving cars’ wave

Following year, in 2017, Scale announced the release of six APIs, including Image Annotation, OCR Transcription, Categorization, Comparison, and Data Collection. Alex Wang remembers: “At the time there were all these opportunities around, a bunch of opportunities around imagery and different kinds of sensory data for computer vision. [At the same time] there are all these other opportunities around sort of more structured data reforms — PDF documents, etc.”

So Scale took a dual-focus strategy, tackling both areas simultaneously, but struggled to gain significant traction in either domain.

The first pivotal moment came with a decision to streamline their focus exclusively on imagery, computer vision, and sensor data. This shift was major for Scale AI, especially with the growing self-driving car market. By focusing on image-related APIs, the company finally found its specialty area and started getting a lot more attention and success in the industry.

This adaptability has been a key advantage for Scale, demonstrating that the capacity to evolve and refine one's approach in response to both failures and new insights is crucial for sustained progress and innovation.

In 2017, IEEE Spectrum dubbed it "The Year of Self-Driving Cars and Trucks," echoing BCG's earlier recognition in 2015 of the self-driving trend as a "Revolution in the Driver's Seat." Scale AI seized this opportunity. By 2018, when announcing their Series B funding round, they already had partnerships with leading names in the industry such as GM Cruise, Lyft, Zoox, and nuTonomy, having labeled over 200,000 miles of self-driving data – equivalent to the distance to the moon.

Self-driving vehicle startups were in dire need of vast amounts of high-quality training data, a demand that general-purpose data vendors couldn't meet. Scale AI stepped in to fill this gap, specializing in the intricate task of labeling complex data, such as LiDAR point clouds, more effectively than its competitors. This set them apart as a leader in the field.

In 2018, Lucy Guo left Scale AI, and all the decisions were left to Alex Wang.

What if the hype is over?

One of the Hacker News users posted in 2019: “Scale AI's secret sauce is in labeling lidar point clouds, which is really a necessity for the self-driving car industry. Assuming the self-driving bubble deflates (it will) then Scale AI will suddenly find itself a commodity business as labeling images is not so hard that it can't be easily copied. It will be a race to the bottom. Unless there is another sudden surge of demand for labeling lidar point clouds.”

That’s a logical question. Alex Wang answered in another HN thread:

“Self-driving is one of many applications of AI/ML to the real world, each of which likely requires high-quality labeled data to truly be production-ready. This includes other robotics, self-checkout like Amazon Go, natural language understanding, and more.

Second, self-driving as a problem space will need labels for a very long time. In an application where (1) verifiable model performance is paramount, and (2) the models need to be extremely robust for cars to be safe, the need for labeled data is only magnified.”

Alex’s position was supported by his actions as the company continued to grow. Even before succeeding with its series C financial round that valued the company over $1B in 2019, Scale approved their bold vision and entered a cutting-edge AI sphere of large language models by starting to work with OpenAI. They were the ones who worked on the dataset behind GPT-2 labeling 1M data points/week! Another big name was Standard Cognition, which is building software to automate the checkout process at retailers similar to Amazon Go.

To tackle the uncertainty part in the comment. Certainly, there were risks and plenty of competition in the data labeling market according to Bloomberg. Uber acquired the labeling automation startup Mighty AI. Startups like Hive and Alegion also did similar stuff.

Alex's ability to build a dream team, along with the support of powerful advisors and investors, was definitely a strong point for Scale. But what also propelled Scale forward, despite many claims that "focusing on data labeling is not a sustainable option," rising competition, and fears that the market would soon be oversaturated, was their technology and experience. Scale's investors always said Wang's tools were more advanced and could label data faster and more cheaply. And they were right.

Scale has built software that looks over the images first. In many cases, it’s able to label most of the objects automatically. Workers are then asked to review the images. If they need to intervene, the system lets them click once somewhere, say, in the middle of a car, and it traces the object for them. Tasks that used to take hours end up taking just a couple of minutes.

That’s how it works:

Pandemics and e-commerce

2019 was a year of the real hype around Scale AI. It was the hero of many articles as it reached a $1B valuation and was selected as one of Forbes AI 50: America’s Most Promising AI Companies and it did it justify this promise. One year later, in 2020, Scale's financial health reached a new milestone, achieving a break-even status while doubling its annualized revenue run rate year-over-year in the third quarter. This growth was further endorsed by a $155 million funding round led by Tiger Global, valuing the company at over $3.5 billion.

Apart from previously mentioned Open AI and Standard Cognition, it started to work with companies from the e-commerce – DoorDash, logistics – Flexport, and insurance sectors. Scale received a significant growth spurt facilitated by work with DoorDash, which saw an uptick in demand during the coronavirus pandemic. “We’re frankly just trying to keep up,” Wang said.

Competition

Image Credit: Sacra.com

Despite competition from entities like Appen and Amazon's Mechanical Turk, Scale distinguishes itself by delivering high-quality data and leveraging technology to solve problems at a large scale. Amidst the economic downturn triggered by the pandemic, Scale adopted a conservative financial approach, moderating its hiring pace to ensure longevity and continued support for its clients.

Products and acquisitions

Scale AI's robust financial health has facilitated strategic moves, like the acquisition of Helia AI, a startup specializing in real-time AI applications for video streams. Most importantly, it brought onboard specialists from prestigious projects, including Tesla's Autopilot, and marked Scale's evolution beyond data labeling toward offering comprehensive software services by 2021.

Emerging from Helia AI’s technology, a key innovation named Nucleus appeared. It empowered Scale users to identify and re-label mislabeled information potentially hampering AI performance. This was especially important for "edge cases" – uncommon scenarios not well-represented in the training data, as Russell Kaplan, a former Tesla senior machine learning engineer who leads the Nucleus team at Scale, shed light on it.

Scale Rapid

Another product introduced by Scale was Scale Rapid. As you might guess from its name, it was created to deal with slow labeling processes. Scale Rapid enabled teams to label a data sample within one to three hours. This further contributed to the status of Scale AI services as fast, reliable, and high quality. Brad Porter, the company’s CTO, explained that companies just having “massive armies of contractors” was not enough now. It needs to have a stable end-to-end workflow that works fast and doesn't take weeks to set up.

Scale has also acquired SiaSearch, a data management platform from the European AI venture studio Merantix. This acquisition enhanced Scale's suite of AI tools, extended its presence in European markets, and enriched the company's technical expertise. SiaSearch, known for its innovative interval querying engine suitable for rapid searches across large video and LiDAR datasets, was integrated into Scale's Nucleus business unit. That deal brought them partnerships with leading automakers, such as Porsche and Volkswagen, allowing Scale’s expansion in Germany and enhancing its service capabilities for European customers.

Scale’s position was also reinforced by the Series E funding amounting to $325 million at a $7 billion valuation. Wang said about it: “When we started Scale nearly five years ago, our mission was to accelerate the development of AI. Today, I’m proud to say that we’re seeing this mission come to life. But we’re still just scratching the surface of the potential that AI has to transform every business and industry.”

The AI War and How to Win It – scratching the military potential

One of the most intriguing aspects of Scale AI's growth has been its ventures into collaborations with U.S. Government and military organizations. In 2018, two years post-founding, Alex Wang's visit to China shed light on the dual-use nature of AI technologies. He was introduced to a facial recognition startup's capabilities through a demonstration involving a giant screen that displayed demographic information of individuals entering the lobby, an experience that Wang found unsettling.

By 2020, Scale AI had forged a significant partnership with the U.S. Army Research Lab, securing a governmental contract worth $90,865,236. This contract was aimed at creating and refining high-quality annotated datasets crucial for AI and ML development within the Department of Defense. Scale was among 34 companies recognized as small businesses to receive such a contract. Uncovering the details of this collaboration between Scale and the US government required some digging, as coverage was primarily confined to specialized media circles.

Once in the system, 2022, Scale continued to work with the US government. In 2022, it was awarded a $249 million contract to supply a broad range of AI technologies to the Defense Department, already counting entities like the Army, Air Force, Marine Corps University, and military truck maker Oshkosh among its clients. That year also marked a surge in Scale's visibility, following a $325 million funding round that elevated its valuation to $7 billion in 2021.

Soon Wang felt comfortable openly discussing the intersection of AI and warfare, notably through a blog post titled "The AI War and How to Win It."

His main points were:

  1. “AI will disrupt warfare.

  2. China is currently outpacing the United States.

  3. The US, both the government and AI technologists, need to start acting.

  4. The AI War is at the core of the future of our world. Will authoritarianism prevail over democracy? Do we want to find out?”

Additionally, Scale provided free AI-ready datasets to support Ukraine, offering damage assessments to those in immediate need and sharing structure recognition training datasets with the wider AI community. In 2022, the startup also worked with both the US and Ukrainian governments to glean insights into what was happening in Ukraine by running AI algorithms over satellite data and mapping out the level of damage to 370,000 buildings in major cities – on a day-to-day basis. The insights helped to direct humanitarian and medical resources to where they were needed most.

Coincidentally or not, in 2022, Wang's acquaintance, Mike Gallagher, was appointed chair of the Select Committee on the Chinese Communist Party, a committee before which Wang presented briefings twice in 2023. Reports by Semafor indicated that Scale spent over $1 million on federal lobbying in 2022.

Another factor that likely influenced Scale's contracts with the government was the Department of Defense's search for a new data labeling vendor for Project Maven. This AI initiative faced protests from Google employees. According to multiple sources familiar with the matter, the Defense Department invited Scale to apply for the contract.

In May 2023, Scale became the first AI company to deploy a large language model, akin to ChatGPT, on a classified network after signing a deal with the Army’s XVIII Airborne Corps. The chatbot, dubbed Donovan, is designed to summarize intelligence and expedite commanders' decision-making processes.

Two months later, Wang testified before a House Armed Services Subcommittee, outlining Scale AI’s contributions to US defense and advocating for a comprehensive AI strategy to maintain technological superiority against global competitors.

Scale AI in the present

Wang emphasizes Scale AI's broad client base, distinct from its defense contracts, showcasing its support for U.S. leadership in a complex world. He sees AI's power in warfare as crucial for both military strength and a robust economy.

  • In 2022, swiftly moving to capitalize on Generative AI's potential, Scale launched its Enterprise Generative AI Platform (EGP) and ventured into synthetic data with Scale Synthetic and AI-generated imagery with Scale Forge .

  • In 2023, Scale AI released its AI Readiness Report, positioning proprietary data as a strategic asset – a good call to action to use Scale AI services.

  • By 2024, Scale partnered with the U.S. Department of Defense to develop a LLM testing framework, aiming to enhance military AI's safety and effectiveness. Additionally, Scale's collaboration with the National Institute of Standards and Technology (NIST) underlines its continuous involvement in the regulatory aspect of AI.

This February, Forbes surprised us with the news about Scale's brief engagement with TikTok, a move contradicting Wang's views on the US-China AI race. This partnership sparking controversy ended swiftly after reports of potential surveillance, protecting Scale's reputation and its future government collaborations.

Scale AI’s mission throughout the years

Scale AI's mission has evolved to facilitate the widespread application of AI technologies across industries. Since its inception, the company has adapted its vision to the changing landscape of AI and ML, consistently focusing on a core objective while expanding its scope and methodologies over time.

  • 2016: Scale AI set out with a vision to automate human-powered processes for companies, foreseeing a future where AI and machine learning would handle 90% of requests, leaving the remainder to humans for high-quality resolution of complex cases.

  • 2018: Scale declared its mission to accelerate AI applications, acknowledging the transformative impact of AI and deep learning across various sectors. This year highlighted the challenge of obtaining labeled data as the primary bottleneck in AI development.

  • 2021: Scale reiterated its commitment to accelerating AI development, now offering a data-centric, end-to-end solution that covers everything from data annotation to automation and evaluation.

  • 2022: The company focused on removing the data bottleneck in AI development as a key player in facilitating ambitious AI projects globally.

  • 2023: Scale AI focused on the adoption of Generative AI and the shift from older models to more advanced generative models. Scale's mission remained to empower ambitious AI projects by providing comprehensive tools for data collection, curation, and model optimization.

Funding rounds

Scale API's strategic edge significantly contributed to Y Combinator's decision to embrace and fund the company in August 2016. This seed investment initiated Scale's venture into the startup world. But who really believed in them was Accel Partners investor Dan Levine. When the pair of co-founders explained the vision for Scale, Dan Levine was immediately hooked. He said he had experienced the pain of trying to find freelancers when he was a data engineer and relied on them to help with moderating edits. After that, raising funds wasn’t hard for Scale AI.

Conclusion

All these years, Alex Wang was able to scale his Scale AI, catering to all industries, be it self-driving cars, military, or the recent Gen AI. Wang's power lies in his ability to see trends and execute on them immediately. This agility of the leader—no matter his age—can transform a company without a concrete idea into a market leader with an evolving mission and solid strategy. He is not very public but seems accessible. His company delivers, while he works hard to gain influence.

However, some of his approaches raise questions about the depth of Scale AI's commitment to specific markets and the long-term sustainability of a model that might prioritize breadth over depth. As AI continues to evolve, will Scale AI's rapid adaptation strategy keep it at the forefront, or could it lead to overextension and a dilution of expertise? Moreover, the move into highly sensitive areas like military contracts invites scrutiny over ethical considerations and the potential impact on company reputation. Wang's journey with Scale AI is a testament to visionary leadership, yet it also underscores the need for a careful balance between rapid growth and strategic focus in the ever-changing tech landscape.

Thank you for reading, please feel free to share with your friends and colleagues. 🤍

Reply

Avatar

or to participate

Keep Reading