Ren Kishimoto

Institute of Science Tokyo

Papers on Lattice

Total citations

Topics

h-index

Research focus

Natural Language Processing (1)Recommendation & Information Retrieval (1)RLHF & Preference Learning (1)

Frequent co-authors

Koichi Tanaka (1)Bushun Kawagishi (1)Yusuke Narita (1)Yasuo Yamamoto (1)

Papers (1)

Mar 19, 2026

Mar 19, 2026·also Cornell, Institute of Science Tokyo, LY Corporation, Meiji University +2

Off-Policy Learning with Limited Supply

Greedy off-policy learning, optimal in theory, can fail spectacularly when supplies are limited, but a simple fix—prioritizing items with high *relative* reward—can restore performance.

Koichi Tanaka, Ren Kishimoto, Bushun Kawagishi +4

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

Search

Ren Kishimoto

Research focus

Frequent co-authors

Papers (1)