The race in generative AI technology has gained immense traction, entering what can only be described as its zenith phaseThis year, OpenAI has unveiled several noteworthy innovations including models such as Sora, GPT-4o, and the o1 series, captivating the tech community across the globeNot far behind, other companies like Runway have also made significant strides with their latest image model, FramerMeanwhile, Midjourney is gearing up to showcase version 7 of their renowned model, and Claude 3.5 is set for an upgradeOn the hardware front, NVIDIA has announced its latest AI audio model, Fugatto.
The advancements are not confined to international giants; Asian tech firms are equally making wavesCompanies such as ByteDance, Baidu, and Tencent have reported significant developments in their large models, with a focus on leveraging these technologies to enhance cloud services and thus create added value.
As the momentum intensifies, the landscape for start-ups dedicated to large models is also shifting rapidlyA vivid illustration of this is the recent emergence of StepFun, a company zeroed in on developing Artificial General Intelligence (AGI) modelsOn November 27, 2023, StepFun quietly initiated internal testing for Step-Video, a video generation model that allows users to apply through their "Leap Inquiry" websiteAdditionally, the development of the second version of this model is already underway.
This low-profile but ambitious start-up has accomplished a remarkable feat by launching at least six foundational models within just eight months, marking its strong presence on the international stageWithin a week, their multimodal understanding model, Step-1V, and trillion-parameter language model, Step-2, have secured positions at the forefront of global evaluations, particularly making headlines in authoritative assessments such as the LMSYS chatbot arena and LiveBench, asserting the lead among Chinese models.
In these evaluations, Step-1V achieved performance metrics equaling that of Gemini-1.5-Flash-8B-Exp-0827 in the LMSYS Chatbot Arena
Advertisements
Meanwhile, Step-2's results were closely approaching those of OpenAI's o1-mini-2024-09-12, surpassing other mainstream international models like gpt-4o-2024-08-06. Notably, Step-2 stands as the sole Chinese language model to feature within the top ten of these rankings.
As we approach December 1, 2023, the tech world will commemorate the two-year anniversary of the AI chatbot, ChatGPT, which sparked a fresh global enthusiasm for the development of AI modelsReports reveal that the total number of AI models has soared to 1,328 worldwide, with China accounting for 36%—solidifying positions within the forefront of the industry.
The current competitive framework of the AI model market is becoming increasingly fierceAmong these players, start-ups have often taken the lead, especially StepFun, which was founded merely in April 2023 and has quickly acquired an edge in comprehensive technical capabilities within a mere span of 600 days.
This cutting-edge company has introduced the Step series—an expansive array of models capable of handling everything from understanding to generation, including written and multimodal tasksThe Step-1 model boasts 100 billion parameters and has quickly showcased its prowess by outperforming GPT-3.5 in several areas including logical reasoning and knowledge both in Chinese and English.
The Step-1V model, categorized as multimodal, has achieved performance equivalent to GPT-4V by accurately interpreting and describing various forms of information in images, which paves the way for new tasks such as content creation, logical reasoning, and data analysisFast forward, the Step-2, with a trillion parameters, is distinguished as the first release from a start-up leveraging the MoE architecture, focusing on advanced exploratory depth in intelligence.
This model inventively excels in language generation while maintaining stringent control over details—allowing for improved understanding and adherence to human instructions
Advertisements
Step-1.5V has iterated on Step-1V, enhancing its multifaceted understanding capabilities, which now encompass interpreting and generating video contentLastly, the Step-Video video generation model stands out with its ability to transform text into video, producing 10-second 1080P clips efficiently, thus marking another significant milestone.
Considering these advancements, it is evident that StepFun is a formidable player particularly within the "six small tigers" of large models, gaining a reputation for its strong multimodal model technologyTheir founder, CEO Jiang Daxin, outlines an ambitious trajectory towards achieving AGI, initiating from single-modality to multimodal systems, leading up to a unified understanding and generation model that will finally establish a world model driving towards AGI.
Jiang emphasizes that to truly build a world model, the integration of multimodal understanding and generation is crucial, paving the way for embodied intelligence which will ultimately lead to AGI—enhancing societal capabilities and economic value.
Predictions from research firm IDC indicate that by 2028, global spendings on AI technologies may reach an astounding $632 billion, almost doubling the current expenditure and indicating a compound annual growth rate (CAGR) of 29% over the next five yearsThe explosive growth of generative AI is projected to be a significant contributor to this boom, estimated to attract investments of $202 billion, which would represent a 32% share of the total AI expenditures.
However, it must be acknowledged that the generative AI industry is still in its infancyAcademia figures such as Gao Wen, an academician at the Chinese Academy of Engineering and professor at Peking University, liken AGI to a toddler just learning to walkYet, from a usability standpoint, AI is already adept at addressing crucial production, social, and service-oriented issuesThere is no need to await a flawless model—incremental development, enhancement, and iteration are the logical steps forward.
Increasingly, developers and enterprises are now harnessing the power of StepFun's model array to create a variety of AI applications
Advertisements
Advertisements
Advertisements