by Shailesh K. Mishrah
How login failures in WebLogic can be diagnosed with the help of available debug flags and log files.
February 2014
Authentication in Oracle WebLogic Server can fail for multiple reasons. When failure is consistent in nature (i.e., it happens all the time), it is somewhat easy to debug and to fix if you understand how authentication is performed in WebLogic. However, when failure is intermittent, things get a little tricky. This article explores the debugging that has to be turned on and which log files should be consulted to diagnose intermittent authentication failures, especially when WebLogic is configured with an external system—like Lightweight Directory Access Protocol (LDAP)—for authentication. This article also discusses the scenario in which the user account is soft locked in WebLogic due to such intermittent authentication failures, how we can verify account soft lock, and how we can unlock it.
This article is motivated by a recent customer situation in which Oracle Identity Manager (OIM) APIs started failing intermittently due to authentication failures in WebLogic. It assumes that the reader has good understanding of WebLogic security concepts and authentication mechanisms. WebLogic version 10.3.6 was used for this article.
WebLogic uses authentication providers to prove the correctness of given credentials. The WebLogic Security Framework supports multiple authentication providers for a security realm; how these authentication providers are configured (the value of the Java Authentication and Authorization Service (JAAS) control flag for each provider) can affect the overall outcome of the authentication process. Below are the JAAS control flag values and how they control the overall authentication process (please see the references section for more detail):
As the description of the REQUIRED control flag demonstrates, the authentication provider with this flag MUST pass the authentication or the end user authentication will fail even if the provided credentials were correct; reasons for this include network issues, unexpected behavior of an external system (e.g., LDAP) to which the authentication provider talks, etc.
The first thing to be done when we start seeing login failures is to turn on security debugging in WebLogic; this has to be done on all servers where a request may land – on all servers configured in the load balancer or in the t3 url (t3://host1:port1,host2:port2). This setting is specific for each server. To turn on security debugging:
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
#### <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044556>
Below is a sample output for a failed login where the security realm configured with an LDAP authenticator, and failure happened because of an LDAP connection issue:
####
<[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'>
<> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1>
<1370498845643>
.
####
<[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'>
<> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1>
<1370498845644>
.
####
<[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'>
<> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1>
<1370498845673>
.
####
<[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'>
<> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1>
<1370498845673> <[Security:090294]could not get connection>
Unfortunately, the reasons for failure are not always this clear. For example, the following output (from an intermittent login failure case where the security realm configured with an LDAP authenticator) does not clearly reveal what is going wrong:
#### <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
<1381164261368>
<[Security:090295]caught unexpected exception>
.................................................................................................
#### <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
<1381164261368>
#### <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
<1381164261368>
In such cases, for out-of-the-box WebLogic LDAP authenticators, take a look at the ldap_trace.logATN log file, which is found under the domain directory. This file contains information on what's going on with LDAP communication. For the scenario above, this log file reveals that there is a connection drop issue with the LDAP server.
For code running in WebLogic and performing programmatic logins, the actual cause of failure can be propagated to the caller by out-of-the-box WebLogic authenticators. In the Provider Specific tab of the authenticator, there is a flag called Propagate Cause For Login Exception (Figure 1), which, if checked, propagates the actual cause of login exception to the caller, as shown in image below. This could help quickly diagnose the programmatic login failures.
Figure 1: Propagate Cause For Login Exception
Account soft lockout is a mechanism in WebLogic that prevents denial of service (DoS) attacks against a user account. For example, if the user account login is known, someone could try multiple invalid login attempts and cause this account to be locked permanently in the backend system that is managing the account (e.g., LDAP). The real user will not be able to login because the account is locked. To prevent such situations, WebLogic provides an account soft lockout feature that, when enabled, locks an account in WebLogic runtime itself for t1 time if there was n invalid login attempts within t2 time interval, where t1, t2 and n are configurable. Once the account has been soft locked in WebLogic runtime, it does not try to validate the account credentials against the backend system, thus preventing it from being permanently locked.
While this feature is very useful, sometimes (specifically, when we are seeing intermittent login failures) it can land us in tricky situations. Suppose there were a service account configured in an application to be used in a scheduled service, and that service runs very frequently. When the service starts, it retrieves the service account credentials and tries to perform a programmatic login. If this login fails multiple times, WebLogic runtime will soft lock this account, making the situation worse for this scheduled service: it can't login even if the reason for login failure is gone. The only way out would be to manually stop the service and remove the soft lock for the service account.
(In my opinion, you should never store service credentials in the system because it creates account life cycle management problems. For example, if the password for this account changes, it must be changed everywhere it is stored--otherwise, it will cause failures. I prefer to use identity assertion in these cases, which requires only the account's user id. Please refer to the references section for information on how to perform identity assertion in WebLogic with OPSS).
The section below describes a situation in which account soft lockout is configured in WebLogic, and how UserLockoutManager can be used to remove it.
You can see the soft lockout configuration by navigating to Security > realm name > User lockout, as shown in the image below
Figure 2: Soft Lockout Configuration
You can also see the stats around invalid logins for a particular WebLogic server instance by navigating to Servers > Server Name > Monitoring > Security, as shown in the image below:
Figure 3: Invalid Login Stats
To manually remove an account soft lockout, WebLogic provides a UserLockoutManager
mBean, which has isLockedOut
and clearLockout
methods. These methods take the user login ID as the parameter. Invoke the clearLockout
method to remove an account's soft lockout. You can check whether the account is soft locked by invoking the isLockedOut
method.
Figure 4: Account Soft Lock Status
This article explains how login failures in WebLogic can be diagnosed with the help of available debug flags and log files. For performance reasons, do not leave this debug logger enabled; once the authentication issue has been diagnosed, turn off this flag. This article also shows how intermittent login failures can result in an account being soft locked in WebLogic and how the UserLockoutManager
mBean can be used to remove such a soft lock.
The author would like to thank Shaun Pei for his help in triaging authenticator-related issues.
Shailesh K. Mishrah is part of the Oracle Identity Manager team. He earned his B. Tech. from IIT BHU and spends his free time exploring middleware performance and security.