Introduction
During its 10 years of existence, Java* has evolved from a “run anywhere” client-side programming language to become today’s ideal development platform for enterprise server-side applications. Software developers designing applications that reach the edge of the organization now have a staple of Java building blocks for many system-level functions to choose from that can substantially decrease their development cycle.
Java applications, which are operating system independent, rely on the Java* Virtual Machine (JVM) to take advantage of the benefits of the underlying hardware architecture. Java's platform independent applications implicitly depend on the JVM to provide the optimal performance for the platform. How well a JVM handles code generation, thread management, memory allocation and garbage collection helps determine Java application performance.
What Is the Problem?
Java* enterprise applications need to access much larger data sets, such as massive financial information or collections of product specifications, and serve the information to increasing numbers of connections without perceptible lag time. Developers of Java applications for employee resource planning, supply chain management, and business intelligence are required to optimize code to accommodate the locking mechanisms and persistence requirements needed to access large databases.
Java applications must increasingly contend with larger datasets as companies migrate to Web services and store and distribute information in more robust formats, such as extensible markup language (XML) and real simple syndication (RSS). Applications also have to be sensitive to the additional overhead required by processing encrypted data that is increasingly used to boost security.
The introduction of the 64-bit Intel® Xeon® processor and Itanium® processor has made it possible for Java server-side applications to manipulate significantly larger data sets by providing access to large amounts of memory, therefore significantly reducing disk accesses. Itanium®-based servers apply EPIC technology to deliver explicit parallelism, massive resources and inherent scalability. This not only helps to reduce or avoid time-consuming disk-swapping, it also enhances the efficiency of software caching.
Itanium-based systems harness the power of Explicitly Parallel Instruction Computing (EPIC) by allowing the software to explicitly communicate with the processor whenever operations can be done in parallel, thus surpassing the sequential processing of conventional architectures.
Developers writing for Intel's 64-bit computing platforms can also create applications that divide functions into smaller tasks to take advantage of multiprocessor systems and Intel’s Hyper-Threading Technology, which effectively doubles the number of logical processors.
Increasing the number of users per processor while extending the complexity of business can stress Java applications so much that they can become bottlenecks. Performance optimization requires Java developers to fine-tune programming parameters for managing memory and retrieving and discarding resources as needed.
Whether you are programming applications for Windows* Server Enterprise Edition, Red Hat Linux*Advanced Server, or Red Hat Linux* Advanced Workstation, Intel's 64-bit parallel processing architectures can also enhance processor productivity by simultaneously executing up to six instructions. Developers should optimize software to take advantage of this parallel processing capability and therefore enable the processor to focus its resources on fast execution through predication (to reduce branch delays), and speculation, which preloads essential data.
Optimizing the runtime behavior of enterprise Java applications can require lengthy amounts of development time trying to understand how they will perform during changing operating conditions. However, it is ultimately the application and the JVM that determine runtime performance.
The Importance to Developers
To perform optimally, the Java* Virtual Machine (JVM) should automatically adapt its behavior based on the operating conditions of the applications and the underlying environment, such as variations in the number of concurrent users and memory requirements, and variations in the system resources such as available memory and CPUs.
Developers have to design code to execute efficiently, but it is equally important that the hardware can quickly process the code. Extracting maximum performance from a 64-bit server platform requires developers to design their applications to perform reliably under a variety of load conditions and to automatically scale as resources are added.
As organizations expand to include more mobile devices and client machines connecting from time zones throughout the world, the complexity of understanding how to respond to peak cycles continues to increase. Developers need to measure performance characteristics and tune application parameters based on real-world data collected from 24-hour cycles.
Depending on network performance and server resources, Java* server-side processing may require longer execution times than applications that process the data on the client workstations. Longer execution times can increase the burden on the developer in optimizing heap allocation, memory management, garbage collection, and other parameters.
Heap management is important in enterprise environments where users simultaneously run multiple instances of an application on the same system. Depending on how each JVM defines its heap size, launching instances of JVMs can degrade performance because of insufficient memory space. A larger heap allows for larger applications with fewer garbage collections, but can lead to heap fragmentation.
Depending on how well developers optimize garbage collection, applications can successfully scale, or they can significantly degrade performance under heavy loads. For example, frequent garbage collection pauses can not be tolerated during real-time applications such as financial services.
With Itanium® processors, the compiler has even greater responsibility than on other platforms because EPIC-enhanced systems allow the compiler significantly more room to extract benefit from the architecture. The compiler’s code scheduler evaluates available instructions and decides which it can effectively bundle together and execute during the same clock cycle. If compilers can observe a greater number of instructions, it is more likely to find code that can be bundled to take advantage of EPIC parallelism.
Thus, developers are challenged by the scope, or amount of code, that the code scheduler can access at a given time. Java code characteristically consists of a large number of classes and methods, so the code must balance how much space to dedicate to both. The size of the methods can sharply limit the scope seen by the scheduler, requiring the JVM to find ways to expand the scope.
The scope the code scheduler has visibility into is thus an important factor in application performance. Memory management under the 64-bit Intel® Xeon® processor and Itanium® processor provides significantly more addressable memory. Intel® Extended Memory 64 Technology (Intel®64), which is incorporated into all Xeon processor-based systems, improves performance by allowing the system to address more than 4 gigabytes of both virtual and physical memory. Applications that take advantage of the additional physical memory can significantly reduce time-consuming disk swapping.
Since Java applications can spend 90 percent or more of their execution time on approximately 10 percent of their methods, developers have to optimize the classes and methods that are cached. Unfortunately, the JVM startup times can degrade performance if all methods are optimized from the beginning.
Inadequate memory management in Java enterprise applications has the potential to cause performance problems, especially with the high user and transaction loads found in enterprise environments. Also, the speed of compiler can limit overall platform performance.
In Java, code generation occurs during application execution, so a slow code generator will negatively impact application performance. Therefore the compiler must generate good code and do so quickly. Also, slow application start-up is undesirable.
Java applications should also be neutral to the application server, whether it is JBoss Application Server*, Oracle Application Server*, Bea WebLogic* Server, or any other platform.
What Is the Solution?
So, how can developers best leverage the 64-bit Intel® Xeon® processor, Itanium® processor and Java* server-side applications to lower the total cost of ownership of enterprise applications?
Fortunately, BEA Systems has worked closely with Intel to make sure that the WebLogic* JRockit JVM maximizes the performance of the Xeon and Itanium processors. JRockit is fully certified as Java* 2 Standard Edition (J2SE) compliant, and is the only JVM designed to optimize application performance and scalability without requiring developers to tune any configuration parameters.
Developers do not need to worry about fine-tuning heap size during runtime when using JRockit because the JVM monitors its own memory utilization and dynamically adjusts the size of its own heap depending on its application requirements at the time. For example, financial applications that need more memory during peak transaction times would increase the heap during business hours and reduce the size in the off hours.
This avoids the problems associated with too small of a heap (out-of-memory errors) and too large of a heap (long garbage collection pauses or slow overall system performance because other applications are starved for memory).
JRockit also enables developers to choose whether response time for the application or throughput is the primary consideration. Compacting the heap during garbage collection alleviates the problem of fragmentation, but impacts performance since compacting large heaps is expensive. Avoiding compaction altogether can be problematic if portions of the heap become unusable, thus increasing the frequency of garbage collection.
JRockit solves this compaction conundrum by using a sliding compaction window, which compacts a different small part of the heap during each garbage collection. The properly sized window offers the best of both worlds; the heap performs as well with full compaction, while the cost of garbage collection stays small.
Optimizing Java requires using different garbage collection strategy at different times as application behavior changes depending on the demands of the environment at the moment. JRockit eliminates the complexity of coding applications to change garbage collection by providing an adaptive garbage collection mode that automatically detects the appropriate strategy.
JRockit switches garbage collection strategies during runtime, for example, using a parallel strategy when higher throughput is needed, or concurrent garbage collection for single-threaded batch-oriented applications.
JRockit’s Management Console and real-time monitoring APIs enable developers to track application performance and provide many features that automatically optimize the JVM for varying environmental conditions.
JRockit balances startup time and runtime performance by compiling each method as it is needed “just in time” (JIT) and then caching it for subsequent automatic reuse. The compiler builds code from startup and samples the application during runtime to identify what methods are good targets for optimization. To expedite startup, JRockit does not use all possible compiler optimizations, instead choosing only those methods whose optimization will most benefit application performance.
JRockit uses a sophisticated, low-cost, sampling-based technique to identify which functions merit optimization through a sampler thread that wakes up at periodic intervals and checks the status of application threads. It tracks thread executions, and uses the history to determine the methods that are frequently invoked and earmarks them for optimization.
JRockit provides real-time information on CPU utilization, garbage collection pause times, heap utilization, the number and state of threads, and other runtime behavior such as time spent in individual methods. The JRockit Management Console gives developers control over runtime behavior such as garbage collection parameters without introducing overhead that affects performance. The console includes rule-based alerts of events such as excessive heap utilization that can degrade performance.
This information helps developers to understand application behavior problems and correct them before they severely degrade system performance. Developers can create applications that access the JVM through the JRockit Monitoring and Management APIs to monitor runtime information on the application without having to instrument the application byte code.
The architectural advantages of the 64-bit Intel Itanium and Xeon processors are enabling Java to become the de facto programming language for large-scale, enterprise-level applications, if the many technical challenges of optimization can be met. When used together, the Intel 64-bit processors and the powerful engine of the BEA WebLogic JRockit JVM provide the processing capabilities and innovative code and memory handling to meet those challenges.
Using the SPECjbb2000 benchmark, WebLogic JRockit for the Itanium 2 processor has completed 50,000 operations per second for a two-processor system and more than 100,000 operations per second for a four-processor system, which illustrates the scalability of the processor/JVM tandem.
By shipping Java applications with JRockit embedded, independent software vendors will enable their clients to instantly realize a performance boost without modifying their existing architecture.
Additional Information