Surpassing GPT-5 and Gemini Deep Research, Renmin University Hillhouse AI Financial Analyst is proficient in data analysis, chart creation, and research report writing.

This article is machine translated
Show original

AI financial analysts who can automatically retrieve data, write analyses, and generate professional financial charts are here!

Recently, the Gaoling School of Artificial Intelligence at Renmin University of China proposed a multimodal research report generation system for real-world financial investment research scenarios— Yulan-FinSight .

To meet users' research needs, FinSight can automatically break down tasks, collect heterogeneous data from multiple sources , including stock prices, financial reports, and news, from the Internet and financial databases, and generate 10,000-word graphic reports with chapters such as "Development History," "Core Business Architecture," and "Competitive Landscape."

The system also won first place among 1,289 teams in the AFAC 2025 Financial Intelligence Innovation Competition Challenge Group , and surpassed GPT-5 w/Search, OpenAI Deep Research and Gemini-2.5-Pro Deep Research in multiple evaluations, demonstrating financial analysis and writing capabilities close to those of human experts.

Let's look at the details below.

Why can't general AI produce good financial research reports?

Researchers believe the key issue is not that the model "cannot write," but that research reports in the financial industry are highly structured, logically rigorous, and visually compelling expert-level work involving multiple processes.

Compared to general question answering, retrieval, or text generation tasks, financial investment research places higher demands on data integration capabilities, analytical depth, and expression formats.

Specifically, existing general-purpose AI systems mainly face three challenges:

1. The disconnect between domain knowledge and data:

General-purpose search systems struggle to effectively integrate structured financial data such as stock prices and financial statements with unstructured information like news and announcements. Due to the lack of a unified data representation and multi-agent collaborative analysis mechanism, these systems often only perform superficial processing on single information sources, making it difficult to generate systematic financial insights.

2. Lack of professional-grade visualization capabilities:

Financial research reports rely heavily on charts to convey high-density information, but existing models can only generate static images or simple line charts, which are difficult to support professional financial visualization needs such as multi-dimensional comparison and event annotation. There is also a lack of strict data consistency constraints between the text and the charts, such as irrelevant text and charts or contradictory and conflicting information.

3. Lack of "iterative research" capability:

Most systems still use a fixed "search first, generate later" process, and once the research path is determined, it is difficult to adjust.

In contrast, human analysts tend to continuously adjust their research focus based on intermediate findings, and this ability to dynamically adjust strategies based on intermediate results is precisely what existing general AI systems generally lack.

FinSight's core idea: Work like a financial analyst

To overcome these limitations, FinSight did not simply "pile up models," but instead started with cognitive processes, simulating the working methods of human financial experts, and proposed three key technological innovations.

Core Architecture: Code-driven, variable-memory intelligent agent architecture

FinSight employs a novel multi-agent architecture called Code-Driven Variable-Memory (CAVM) at its core.

As shown in the figure, the existing Agent architecture is still essentially limited by the conversational memory paradigm, which uses history such as messages or task progress as the state carrier. This paradigm is prone to revealing structural bottlenecks in expressiveness and controllability as task complexity and process length increase.

CAVM restructures this paradigm into a code-driven variable memory space. The system no longer uses natural language dialogue as a collaboration medium, but instead maps data, tools, and intermediate inference results into readable and writable program variables, with multiple Code Agents completing collaborative inference by sharing the variable space.

By elevating "memory" from a message sequence to an operable variable structure, CAVM enables complex tasks to be explicitly modeled, continuously revised, and modularly combined, providing the necessary structural support for long-term, multi-process expert-level reasoning.

In this design, data, tools, and agents are uniformly abstracted into a programmable variable space:

Financial statements, market data, and news texts as data variables

Search, analysis, and graphing capabilities are used as instrumental variables.

Agents with different functions are scheduled and collaborated through Python code.

This "code-centric" design enables the system to efficiently process large-scale heterogeneous financial data and support complex multi-process task collaboration.

Visual Breakthrough: Iterative Visual Enhancement Mechanism

To address the common issues of professionalism and credibility in financial chart generation, researchers have proposed Iterative Vision-Enhanced Mechanism , which models the drawing process as an iteratively optimized visual generation problem.

This mechanism adopts the Actor-Critic collaborative paradigm :

The text model, acting as an Actor , is responsible for generating compilable and executable drawing code, fully leveraging its advantages in code generation and logic control; while the visual language model, acting as a Critic , directly examines the image from a visual perspective, providing feedback on dimensions such as data integrity and overall aesthetics.

The key to this design lies in complementary strengths : language models excel at encoding and thinking, but struggle to obtain realistic visual feedback; visual models possess powerful perception and discrimination capabilities, but are limited in generating complex code.

By decoupling the two and placing them in a closed loop, the system continuously optimizes itself through multiple rounds of "generation-evaluation-correction" during test time , so that the drawing quality naturally improves with the number of iterations.

Ultimately, the system can reliably generate professional financial charts that include dual-axis alignment, event annotation, and complex structures, as shown in the figure, transforming the originally static results generated in one go into a test-time scaling process.

Two-stage writing framework: first analyze, then write.

At the writing level, FinSight does not attempt to generate a complete long research report in one go, but instead restructures research report writing into a two-stage process of "analysis-integration" .

First, the system generates a set of "Chain-of-Analysis" (CoA): each analysis chain corresponds to a specific sub-task (such as company history, financial analysis, competitor analysis, risk factors, etc.), which completes evidence collection, key judgments and core conclusions extraction within a local scope.

This step is necessary because a research report is often composed of multiple coupled sub-problems. If a long article is generated directly end-to-end, it is difficult to ensure the accuracy and depth of all analyses.

Subsequently, the system uses these CoAs as a "skeleton" to organize and arrange the scattered insights at the global level, generate an outline, and write chapters one by one: while ensuring the coherence of the chapter structure and the chain of argumentation, it aligns the textual description, data citations, and chart presentations, and finally synthesizes them into a logically consistent long report.

This "analyze first, then write" strategy effectively avoids the common problem of loose logic in long articles, ensuring that reports remain structurally clear and deeply argued even when they exceed 20,000 words.

To further ensure the accuracy of facts and consistency between text and graphics in the long research report, the authors also introduced a generative retrieval mechanism during the writing stage.

Unlike the traditional post-processing approach of "retrieve first, then generate", this method embeds the retrieval process into the writing itself: when generating specific paragraphs, the model dynamically generates index identifiers for data and images based on the current analysis chain and writing context, and then embeds them uniformly through post-processing.

In this way, the accuracy of citations and the consistency between text and images are guaranteed to the greatest extent.

In this way, FinSight can continuously align textual narratives, data sources, and visualization results during the writing process of long reports, avoiding common problems such as factual mismatch and disconnect between text and graphics. As a result, it can maintain the stability and consistency of the overall logic and chain of evidence even as the length of the report continues to expand.

Experimental Results: Completely Surpasses Existing Deep Research Systems

The authors conducted a systematic evaluation of FinSight on high-quality benchmarks covering both corporate and industry research.

The results show that FinSight significantly outperforms Gemini-2.5-Pro Deep Research and OpenAI Deep Research in all three core metrics: factual accuracy, analytical depth, and presentation quality, achieving a comprehensive score of 8.09 .

In terms of visualization, thanks to its iterative visual enhancement mechanism, FinSight achieved a score of 9.00 , significantly outperforming the comparison system and demonstrating an effective improvement in the ability to generate professional financial charts.

The results of iterative plotting analysis are equally impressive:

In long text generation scenarios, the research reports generated by the system have an average length of over 20,000 words, include more than 50 charts and structured data references, and the quality of the reports remains stable as the length increases, without significant degradation.

Furthermore, in the AFAC 2025 Financial Intelligence Innovation Competition, FinSight ranked first among 1,289 participating teams from enterprises and universities, winning the championship in Challenge Group Question 4, further validating its practicality and robustness in real-world scenarios.

Researchers believe that FinSight is not just a financial tool, but demonstrates the potential of agent architecture in highly complex vertical fields.

By unifying data, tools, and intelligent agents, and introducing a multi-stage closed loop of vision and writing, the AI system has demonstrated near-human analyst capabilities for the first time in the "expert-intensive" scenario of financial investment research.

The significance of this paradigm extends beyond finance.

This indicates that in "expert-intensive" scenarios that heavily rely on specialized knowledge, long-term reasoning, and multimodal representation, AI systems are no longer merely information aggregators, but are beginning to take on roles similar to those of human experts.

The process involves breaking down the problem, verifying hypotheses, revising conclusions, and ultimately producing a complete and traceable outcome.

From this perspective, FinSight is more like a starting point .

As the Agent architecture continues to mature, complex fields such as scientific research analysis, legal judgment, and medical decision-making may gradually usher in a new generation of productivity centered on expert-level AI Agents.

Paper and project authors: Renmin University of China, Gaoling School of Artificial Intelligence: Jiajie Jin, Yuyao Zhang, Yimeng Xu, Hongjin Qian, Yutao Zhu, Zhicheng Dou

Paper link: https://arxiv.org/abs/2510.16844

Code link: https://github.com/RUC-NLPIR/FinSight

This article is from the WeChat official account "Quantum Bit" , authored by the FinSight team, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments