NodeManager not Reachable: java.io.IOException: Invalid State File Format

Weblogic managed Servers cannot be started via the WLS console as Node Manager is not reachable.
Also, the nodemanager.log file is showing the following messages:
domainName\servers\serveName\data\nodemanager\startup.properties">
FMW\WLS1033\Oracle\Middleware\user_projects\domains\domainName\servers\serveName\data\nodemanager\startup.properties">
<Feb 9, 2011 11:06:01 AM> <WARNING> <There was a problem initializing the domain 'steffworld' at '\FMW\WLS1033\Oracle\Middleware\user_projects\domains\domainName'. Please make sure that this domainName: 'domainName' is registered and is fully enrolled for this nodemanager at: '\FMW\WLS1033\Oracle\Middleware\user_projects\domains\domainName'.>
<Feb 9, 2011 11:06:01 AM> <WARNING> <I/O error while reading domain directory>
java.io.IOException: Invalid state file format. State file contents:
at weblogic.nodemanager.common.StateInfo.load(StateInfo.java:135)
at weblogic.nodemanager.server.ServerMonitor.loadStateInfo(ServerMonitor.java:475)
at weblogic.nodemanager.server.ServerMonitor.isCleanupAfterCrashNeeded(ServerMonitor.java:139)
at weblogic.nodemanager.server.ServerManager.recoverServer(ServerManager.java:255)
at weblogic.nodemanager.server.DomainManager.initialize(DomainManager.java:103)
at weblogic.nodemanager.server.DomainManager.<init>(DomainManager.java:55)
at weblogic.nodemanager.server.NMServer.getDomainManager(NMServer.java:257)
at weblogic.nodemanager.server.Handler.handleDomain(Handler.java:224)
at weblogic.nodemanager.server.Handler.handleCommand(Handler.java:108)
at weblogic.nodemanager.server.Handler.run(Handler.java:70)
at java.lang.Thread.run(Thread.java:619)

Changes

After a power failure, the server machine (running as a Windows service) restarted automatically.

Cause

The state file of the managed server is in an invalid state: for example, it may be empty. Under each managed server directory, there is a NodeManager directory containing a state file <managed_server_name>.state. If this file is empty or corrupt, then the described errors occur.
For example, under \FMW\WLS1033\Oracle\Middleware\user_projects\domains\domainName\servers\<serverName>\data\nodemanager, the <serverName>.state file is empty.
The cause of the empty file could be one of the following:

  1. When the computer crash occurred, this particular state file was may have been in the process of being updated, For instance it may have been changing from "RUNNING:Y:N" to "SHUTTING_DOWN:Y:N" but did not have time to write the new line before the machine actually shutdown. This could have caused a empty file.
  2. A file corruption issue at the OS level.
In both the above cases, WebLogic Server cannot control the cause, so cannot avoid it. Thus, we have to execute a solution after the problem occurs to recover the environment, instead of preventing the problem occurrence.

Solution

We have two options to recover the environment.

OPTION 1:
  1. Edit the state file so that it contains a valid line, such as:
  2. FORCE_SUSPENDING:Y:N
  3. Please insert the above line into your <managed_server_name>.state file and save it. Then restart Node Manager. Finally, in the WLS console, the Node Manager Status will show:
  4. Status: Reachable
  5. Please, perform the above step for all managed servers in the domain.
OPTION 2:

  1. Stop the managed server if it is RUNNING.
  2. Stop the Admin Server.
  3. Stop Node Manager.
  4. Delete the following files:
    \servers\<managed_server_name>\data\nodemanager\<managed_server_name>.state
    \servers\<managed_server_name>\data\nodemanager\<managed_server_name>.lck
    \servers\<managed_server_name>\data\nodemanager\<managed_server_name>.pid
  5. Start Node Manager again.
  6. Start the Admin Server again.
  7. Start the managed server using the Admin Console.