2025 Year in Review

2025 was my first full year as an RA in Hangzhou after finishing my undergraduate studies. Judging by the key metric, it was not a smooth year: all of my submitted papers were rejected. Meanwhile, technology continued to move faster than expected, and my technical debt kept growing. It also became harder to predict how different research areas would evolve. As Prof. Zhang put it, “while people are still debating the ending of 3D, some fields have already quietly collapsed in an unnoticed corner, without much controversy.”

I started working on generative models around the end of 2023, so it has been about two years. At the beginning, during my internship in industry, I worked on many applications of generative models. After returning to school as an RA, I gradually moved toward more fundamental problems. More precisely, this was a transition from application research to fundamental research. There was inevitably some discomfort during the transition, but in retrospect, it was valuable. Fundamental research requires more active problem definition, and it also reminds me not to work on problems that appear self-consistent but lack clear validation criteria.

The pace of technical progress in 2025 made me realize that some directions I was familiar with were not necessarily suitable in an academic setting, at least not fully aligned with my current research preferences. For example, controllable generation, which I had spent considerable time on, has already been redefined by newer general-purpose models and product forms. This corresponds to the “quietly collapsed” direction mentioned above. Starting in June, I began to adjust my research direction. My most basic criterion for deciding whether a direction is worth investing in is whether its evaluation is objective enough: whether there are widely accepted metrics and evaluation data, namely benchmarks; or at least whether it is possible to build a stable and reproducible evaluation system that avoids relying on unquantifiable subjective judgment as much as possible.

This is also part of training research taste: learning to judge what remains worth studying even as model capabilities continue to improve. In a sense, this also relates to The Bitter Lesson. Many judgments cannot be built only through reading and discussion. Even after reading the relevant arguments, they may still feel abstract. Only after going through enough concrete projects can one truly build and accept these judgments.

In research, good ideas matter, but consistently producing “good enough” ideas is difficult. Compute is limited, time is limited, and capability is limited. In the past, I was prone to the illusion that if I kept thinking and kept reading papers, a good idea would naturally appear. In reality, producing a paper often involves a degree of contingency. It requires continuous exploration in an unknown space. Exploration is hard to fully explain with a fixed methodology. In simple terms, it means trying things repeatedly and building understanding through those attempts.

This process of exploration depends heavily on engineering ability. Engineering ability is also one of the few abilities that is relatively measurable and can be continuously improved through training: whether the research problem can be defined more precisely, whether the experimental tools are good enough to iterate on different ideas, and whether the experiments are truly reproducible rather than accidental. Many moments of “lacking ideas” are, in essence, caused by not actually validating things, while the experimental environment and system are too messy to determine what problem is being solved. In other words, it is a form of laziness in action.

In 2026, I need to focus on more important problems, improve my engineering ability, and make the research clearer. At the same time, I added a concrete criterion for myself: after completing a project, I should be able to produce a directly reusable handbook or guideline that explains the development of the related research line and shows how the final method is derived step by step from the baseline. The point is not simply “open source.” It is to provide a stable, controllable, and continuously extensible evaluation tool for future research: others can reproduce it, and I can use the same benchmark to quickly verify whether the next change is real progress or self-deception.

There was a saying at the beginning of 2026 that we are about to enter the summer of the 21st century. Looking back at 2025, the year still had clear gains, especially in expanding my map of Earth Online.

Figure 1. 2025 trip in the US and Japan.