{"UUID":"e21166ff-b35a-4309-99cd-4a8bf5e22305","URL":"http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html","ArchiveURL":"https://web.archive.org/web/20161230103247if_/http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html","Title":"Mars Pathfinder system resets due to priority inversion","StartTime":"0001-01-01T00:00:00Z","EndTime":"0001-01-01T00:00:00Z","Categories":null,"Keywords":["mars pathfinder","priority inversion","vxworks","flight software","nasa","jpl","real-time operating system","semaphore","spacecraft","system resets","july 1997"],"Company":"NASA","Product":"Mars Pathfinder flight software","SourcePublishedAt":"2025-12-02T16:36:12Z","SourceFetchedAt":"2026-05-04T18:19:40.925248Z","Summary":"NASA's Mars Pathfinder spacecraft experienced system resets a few days after landing on Mars (1997).  Debugging features were remotely enabled until the cause was found: a [priority inversion](https://en.wikipedia.org/wiki/Priority_inversion) problem in the VxWorks operating system.  The OS software was remotely patched (all the way to Mars) to fix the problem by adding priority inheritance to the task scheduler.","Description":"The Mars Pathfinder spacecraft experienced unexpected system resets a few days after its landing in July 1997. These resets occurred when the high-priority `bc_sched` task detected that the `bc_dist` task, responsible for data distribution, failed to complete its execution within its hard deadline. Each reset reinitialized hardware and software, terminating current ground-commanded activities and delaying daily operations, though no collected data was lost.\n\nThe root cause was identified as a priority inversion problem within the VxWorks operating system. A low-priority `ASI/MET` task acquired a mutual exclusion semaphore but was then preempted by several medium-priority tasks before releasing it. When the high-priority `bc_dist` task subsequently attempted to acquire the same semaphore, it became blocked, effectively waiting for the much lower-priority `ASI/MET` task to complete, leading to the deadline violation and system reset.\n\nThe problem was diagnosed by reproducing the failure in a lab environment. This was achieved using built-in debug and trace facilities that were part of the flight software. Once the failure was successfully reproduced, the underlying priority inversion issue became evident.\n\nThe solution involved remotely patching the spacecraft's flight software. This patch enabled priority inheritance for the semaphore used by the `select()` mechanism within VxWorks. This mechanism ensured that if a high-priority task blocked on a semaphore held by a lower-priority task, the lower-priority task would temporarily inherit the higher priority, allowing it to complete and release the semaphore promptly.\n\nExtensive testing was conducted on the ground to verify the fix and assess any potential performance impacts or behavioral changes. The remote software update process involved sending only the differences between the onboard and desired software versions, which were then applied by custom software on the spacecraft. This remediation successfully resolved the system resets, allowing the mission to continue its scientific objectives."}