EVE Online Stackless Python tasklet memory reuse bug
CCP Games · EVE Online
CCP karkur initiated an investigation into an issue where the drone window in EVE Online would occasionally stop updating. Her initial debugging revealed that tasklets were failing to wake up from sleep, entering an unscheduled and unblocked state. A reproduction script was developed to reliably trigger this “sleeping disorder” in tasklets.
The problem was traced to a subtle memory reuse bug within the game’s Stackless Python implementation, specifically concerning channel objects used by sleeping tasklets. When a tasklet was killed, its associated channel object’s memory was immediately freed. If another tasklet then went to sleep on the same processing tick, it could inadvertently be assigned the same memory address for its new channel.
This premature memory reuse led to a critical flaw: when the system attempted to clean up the originally killed tasklet, it would mistakenly identify and remove the new tasklet’s channel from the sleeper heap. This left the new tasklet in a permanent “limbo” state, unable to be scheduled or woken up.
The customer impact included various UI elements, such as the drone window, overview, and health bars, failing to update. These issues were difficult to reproduce but were reported by players, particularly after large-scale events like mass tests or heavy combat, indicating a degraded gameplay experience.
The resolution involved a targeted code change: moving the Py_DECREF calls for channel objects into the SleepWallclock function. This ensures that channel objects are not prematurely recycled, thereby preventing the memory address collision and the subsequent erroneous removal of active tasklets from the sleeper queue.