Search papers, labs, and topics across Lattice.
The paper introduces CrossAlpha, a new benchmark for cross-market factor research using annual report data from five markets (US, Japan, Taiwan, South Korea, and Hong Kong). It addresses challenges in cross-market analysis by standardizing filings into English business descriptions, constructing PCA-whitened cross-market firm-pair scores from schema-level disclosures, and aligning data for feasible trading-time evaluation. Experiments demonstrate that disclosure-derived cross-market peers outperform domestic benchmarks in predicting returns, particularly in the US-to-Japan setting.
Forget domestic data – cross-market signals hidden in annual reports can significantly boost return prediction, especially when transferring insights from the US to Japan.
Cross-market factor research studies whether firm-level signals from one or more markets can predict returns in a target market, but existing public benchmarks do not support cross-market disclosure-to-return evaluation. Building such a benchmark is challenging because filings differ across languages and regulatory systems, disclosure-derived similarity can be biased by common reporting components, and cross-market signals must be evaluated under feasible trading-time alignment. We introduce \textbf{CrossAlpha}, a public annual-report benchmark for cross-market factor research. CrossAlpha addresses these challenges through three corresponding components: \emph{Disclosure Distillation}, which standardises heterogeneous filings into ten-category English business descriptions; \emph{Residual Schema Graph Construction}, which builds PCA-whitened cross-market firm-pair scores from schema-level disclosures; and \emph{Timing-Aligned Evaluation}, which pairs the graph with 11 years of daily OHLCV data to construct forward-return labels under feasible cross-market execution protocols. CrossAlpha covers about 3,600 firms and 10,700 firm-year reports from the United States, Japan, Taiwan, South Korea, and Hong Kong, and releases about 19M directed firm-pair scores. In experiments, disclosure-derived cross-market peers outperform domestic text, industry-code, and return-correlation peers in the US-to-Japan setting (ICIR 0.39 versus 0.07--0.18), and cross-market sources beat the domestic text baseline in most target markets. CrossAlpha offers an open-sourced, reusable, return-grounded benchmark for cross-market financial NLP.