Diagnosing Intermittent Authentication Failures and User Lock-Outs in Oracle WebLogic

by Shailesh K. Mishrah

How login failures in WebLogic can be diagnosed with the help of available debug flags and log files.

February 2014

Introduction

Authentication in Oracle WebLogic Server can fail for multiple reasons. When failure is consistent in nature (i.e., it happens all the time), it is somewhat easy to debug and to fix if you understand how authentication is performed in WebLogic. However, when failure is intermittent, things get a little tricky. This article explores the debugging that has to be turned on and which log files should be consulted to diagnose intermittent authentication failures, especially when WebLogic is configured with an external system—like Lightweight Directory Access Protocol (LDAP)—for authentication. This article also discusses the scenario in which the user account is soft locked in WebLogic due to such intermittent authentication failures, how we can verify account soft lock, and how we can unlock it.

This article is motivated by a recent customer situation in which Oracle Identity Manager (OIM) APIs started failing intermittently due to authentication failures in WebLogic. It assumes that the reader has good understanding of WebLogic security concepts and authentication mechanisms. WebLogic version 10.3.6 was used for this article.

Understanding Authentication Flow in WebLogic

WebLogic uses authentication providers to prove the correctness of given credentials. The WebLogic Security Framework supports multiple authentication providers for a security realm; how these authentication providers are configured (the value of the Java Authentication and Authorization Service (JAAS) control flag for each provider) can affect the overall outcome of the authentication process. Below are the JAAS control flag values and how they control the overall authentication process (please see the references section for more detail):

REQUIRED: The authentication provider is always called, and the user must always pass its authentication test. Whether authentication succeeds or fails, it continues down the list of providers.
REQUISITE: The user is required to pass the authentication test of this authentication provider. If the user passes, subsequent providers are executed but can fail (except for authentication providers with the JAAS control flag set to REQUIRED).
SUFFICIENT: The user is not required to pass the authentication test of the authentication provider. If authentication succeeds, no subsequent authentication providers are executed. If authentication fails, authentication continues down the list of providers.
OPTIONAL: The user is allowed to pass or fail the authentication test of this authentication provider. However, if all authentication providers configured in a security realm have the JAAS control flag set to OPTIONAL, the user must pass the authentication test of one of the configured providers.

As the description of the REQUIRED control flag demonstrates, the authentication provider with this flag MUST pass the authentication or the end user authentication will fail even if the provided credentials were correct; reasons for this include network issues, unexpected behavior of an external system (e.g., LDAP) to which the authentication provider talks, etc.

Turn On Security Debugging To See What's Going On

The first thing to be done when we start seeing login failures is to turn on security debugging in WebLogic; this has to be done on all servers where a request may land – on all servers configured in the load balancer or in the t3 url (t3://host1:port1,host2:port2). This setting is specific for each server. To turn on security debugging:

Go to Servers > server_Instance.
Select WebLogic > Security > atn > DebugSecurityAtn
Click the Enable button.

Copy



####     <[ACTIVE]
ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
<1389453044555>  
 
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044555>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044555>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044555>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044555>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>   
####     <[ACTIVE]
 ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <>
  <1389453044556>

Below is a sample output for a failed login where the security realm configured with an LDAP authenticator, and failure happened because of an LDAP connection issue:

Copy



####     
<[ACTIVE] ExecuteThread: '2' for queue:  'weblogic.kernel.Default (self-tuning)'> 
<> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1> 
<1370498845643>  
 
 . 
 ####     
 <[ACTIVE] ExecuteThread: '2' for queue:  'weblogic.kernel.Default (self-tuning)'> 
 <> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1> 
 <1370498845644>  
  
 . 
 ####     
 <[ACTIVE] ExecuteThread: '2' for queue:  'weblogic.kernel.Default (self-tuning)'> 
 <> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1> 
 <1370498845673>  
  
 . 
 ####    
 <[ACTIVE] ExecuteThread: '2' for queue:  'weblogic.kernel.Default (self-tuning)'> 
 <> <> <5b0dc9d8a952b6d1:-1eccb494:13f1505b73b:-8000-000000000001c5b1> 
 <1370498845673>  <[Security:090294]could not get connection>

Unfortunately, the reasons for failure are not always this clear. For example, the following output (from an intermittent login failure case where the security realm configured with an LDAP authenticator) does not clearly reveal what is going wrong:

Copy



####     <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
 <1381164261368> 
<[Security:090295]caught unexpected exception> 
.................................................................................................
####     <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
 <1381164261368>  

####     <[ACTIVE]
ExecuteThread: '246' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <>
 <1381164261368>

In such cases, for out-of-the-box WebLogic LDAP authenticators, take a look at the ldap_trace.logATN log file, which is found under the domain directory. This file contains information on what's going on with LDAP communication. For the scenario above, this log file reveals that there is a connection drop issue with the LDAP server.

Propagating failure exception to caller (applicable for callers running in same JVM)

For code running in WebLogic and performing programmatic logins, the actual cause of failure can be propagated to the caller by out-of-the-box WebLogic authenticators. In the Provider Specific tab of the authenticator, there is a flag called Propagate Cause For Login Exception (Figure 1), which, if checked, propagates the actual cause of login exception to the caller, as shown in image below. This could help quickly diagnose the programmatic login failures.

mishra-wls-auth-fig01

Figure 1: Propagate Cause For Login Exception

Account Soft Lockout

Account soft lockout is a mechanism in WebLogic that prevents denial of service (DoS) attacks against a user account. For example, if the user account login is known, someone could try multiple invalid login attempts and cause this account to be locked permanently in the backend system that is managing the account (e.g., LDAP). The real user will not be able to login because the account is locked. To prevent such situations, WebLogic provides an account soft lockout feature that, when enabled, locks an account in WebLogic runtime itself for t1 time if there was n invalid login attempts within t2 time interval, where t1, t2 and n are configurable. Once the account has been soft locked in WebLogic runtime, it does not try to validate the account credentials against the backend system, thus preventing it from being permanently locked.

While this feature is very useful, sometimes (specifically, when we are seeing intermittent login failures) it can land us in tricky situations. Suppose there were a service account configured in an application to be used in a scheduled service, and that service runs very frequently. When the service starts, it retrieves the service account credentials and tries to perform a programmatic login. If this login fails multiple times, WebLogic runtime will soft lock this account, making the situation worse for this scheduled service: it can't login even if the reason for login failure is gone. The only way out would be to manually stop the service and remove the soft lock for the service account.

(In my opinion, you should never store service credentials in the system because it creates account life cycle management problems. For example, if the password for this account changes, it must be changed everywhere it is stored--otherwise, it will cause failures. I prefer to use identity assertion in these cases, which requires only the account's user id. Please refer to the references section for information on how to perform identity assertion in WebLogic with OPSS).

The section below describes a situation in which account soft lockout is configured in WebLogic, and how UserLockoutManager can be used to remove it.

WebLogic Soft Lockout Configuration and Manager

You can see the soft lockout configuration by navigating to Security > realm name > User lockout, as shown in the image below

mishra-wls-auth-fig02

Figure 2: Soft Lockout Configuration

You can also see the stats around invalid logins for a particular WebLogic server instance by navigating to Servers > Server Name > Monitoring > Security, as shown in the image below:

mishra-wls-auth-fig03

Figure 3: Invalid Login Stats

To manually remove an account soft lockout, WebLogic provides a UserLockoutManager mBean, which has isLockedOut and clearLockout methods. These methods take the user login ID as the parameter. Invoke the clearLockout method to remove an account's soft lockout. You can check whether the account is soft locked by invoking the isLockedOut method.

mishra-wls-auth-fig04

Figure 4: Account Soft Lock Status

Conclusion

This article explains how login failures in WebLogic can be diagnosed with the help of available debug flags and log files. For performance reasons, do not leave this debug logger enabled; once the authentication issue has been diagnosed, turn off this flag. This article also shows how intermittent login failures can result in an account being soft locked in WebLogic and how the UserLockoutManager mBean can be used to remove such a soft lock.

References

Authentication Providers: http://docs.oracle.com/cd/E23943_01/web.1111/e13718/atn.htm
Configuring Authentication Providers: http://docs.oracle.com/cd/E29542_01/web.1111/e13707/atn.htm

The author would like to thank Shaun Pei for his help in triaging authenticator-related issues.

About the Author

Shailesh K. Mishrah is part of the Oracle Identity Manager team. He earned his B. Tech. from IIT BHU and spends his free time exploring middleware performance and security.