warm up - 🎯转了码的刘公子

how we can address the issue of jvm warm-up time and this is something that obviously affects lots of people when they're running applications that are compiled into byte codes and the idea behind this presentation is to sort of go into look at a number of different potential solutions to this problem because really there isn't one solution to fit everything so we'll look at a variety of different approaches to how you can alleviate or even eliminate the whole warm-up issue now the first thing to look at is what do we kind of mean by starting a Java application if we compare this to natively compiled code languages like C and C plus plus that we statically compile into a binary executable things are a little bit different in terms of java we're all aware of the fact that we have this jvm the Java virtual machine and we can kind of divide up how an application starts up into these three distinct areas ## 一、the startup of the jvm so the first one that we have is the startup of the jvm itself jvm being an executable that's compiled statically in Native instructions so that starts up and that's going to have to do various things to get itself running so it loads all the code it needs the libraries does the dynamic Library resolution that type of thing and then one of the interesting things that it does is it generates a set of templates for all of the byte codes that you have in the jvm instruction set so every time you start the jvm it will actually go and it will create its own set of templates the idea behind this is to try try and optimize as much as possible for the particular architecture that you're running on and not just in terms of Intel versus Arm but the specific micro architecture that you're using what it enables the jvm to do is to provide a template which is specific to the particular platform you're running on the particular micro architecture and hopefully get just that little bit more performance out of the bytecodes there are about 200 bytecodes so you've got 200 templates that are created that really doesn't take very long it's actually incredibly quick to do that and the only thing that it does in terms of optimization Beyond creating those templates is that there are a couple of situations where pairs of byte codes happen very frequently together so you'll get a couple of extra templates which double up those byte codes so once the jvm itself is running you need to load your application and that goes through the process of loading the classes that you need to start your application and then initializing some of those classes as well doesn't necessarily mean that you're going to initialize all the classes you load straight away because we do a lot of lazy loading and lazy initialization but you're effectively going to say right load the classes we absolutely have to have and initialize the classes we absolutely have to have and then do any application specific initialization then we enter the main entry point and we actually start running the code for your application that's when things kind of start getting more complicated because everything up to that point is fairly straightforward and you can kind of identify exactly what needs to be done and that's it but when your application starts running we need to go through this warm-up phase the warm-up phase is all about identifying where you've got frequently called pieces of code frequently called Methods and then determining at some point that you want to take those methods and compile them into native instructions so that they run faster than using the byte codes and then of course you've got the application specific workload part of it in terms of initializing whatever code you need initializing objects creating things like database connections and stuff like that and if we look at the the sort of breakdown of that the time that you get in terms of in starting up the jvm and starting up your application that's the time to First operation so the time that you can actually get to the first operation you can complete and then if you combine that with all of the warm-up time as well it's how long it takes to complete n operations so there's a startup time to get anything done and then there's the amount of time it takes to get everything done if you look at a comparison in terms of the performance of those parts what you'll find is that starting in jvm is very very quick there's really nothing we can do in terms of improving that beyond what it does now same with application startup that's somewhat dependent on how many classes you load and the structure of your application but it's it's basically quick and there's not really much we can do there application warm-up is the thing that takes a long time potentially again depending on the profile of your application what it does how it uses classes how it uses methods so obviously that performance warm-up of the jvm of your application code is dependent on a number of things as I said we start with byte codes we go through interpreted mode where we use the templates we execute native instructions for each byte code as we see it ## C1C2 as we identify these hot spots of code hence the name of the jvm we take our methods and we pass them to the C1 compiler jit compiler that will run a compiler that executes very quickly to generate code but doesn't do very much in terms of optimization once we've got some native code to run we'll then run that native code for the method each time it's called and we'll profile it to figure out how it's being used at the time the application is running at some second threshold when that method is identified as being really hot then we'll recompile it using the C2 jit compiler and that takes that profiling information spends longer compiling it to generate much more heavily optimized code eventually we get to a point where we've got all the frequently used pieces compiled with C1 then compile with C2 and we get our steady state in terms of performance if we look at a very typical performance graph it's not for everything but it's it's a very typical performance graph we would see for a jvm based application we're going to see something like this left hand side we start our application the tiny bit of yellow you can see there that's where we're doing interpreted mode then we pass methods to C1 we get compiled code which we run which is the green code that starts to improve the formats of our application and then we start to compile those methods again using C2 and we gradually get to the point where we've got a steady state this is good so it gives us the ability to run our application and compile it as it's running right once Run Anywhere we have application warm-up time and that could be minutes it could be hours it could be dates depending on what the application is doing and how it's being used the problem that we see though is that when we run our application the first time we go through this whole process of identifying the methods that are free called frequently compiling them with C1 tracking them profiling them then passing them C2 doing all that work getting to our steady state all good but then if we start the application again there is no knowledge of anything that's happened previously we have to go through the whole process exactly the same every time we run the application so first run looks like that second run looks like that third run and subsequent runs are all going to look the same this is not really ideal because you know surely we should be able to learn from what we've done in the past and ideally what we would like to see is we run the application we get it to a point where it's got a steady state of performance the optimized level and then when we run again we don't have to go through that learning process we just go right we know what we're doing take everything we've learned before start running the application exactly as it was and we get immediately the optimized level of performance and if we run it again we get the same thing that's what we'd like to do how can we go about doing it as I said we're going to talk about a number of different solutions that we can apply to this problem now the first of those which you know we've seen people kind of suggesting this right from the very beginning when Java first came out which is why don't we just compile into native instructions rather than doing compilation into byte codes and having to go through this jit compilation process let's just compile it into native code and run the native code straight away ahead of time compilation so this is something that people have been working on really since the start of programming languages we know how to do static compilers we can take source code we can parse it we can do the semantic analysis we can check the syntax we can create intermediate representation we can then take that and generate the executable code based on certain assumptions we can optimize in different ways we can do things like method inlining we can do Loop unrolling we can do all sorts of different things like Locker lesion and stuff like that this is great because we don't have to interpret bike codes so we're not doing the slow part of our graph we don't have to analyze the code as it's running we don't have to find any hotspots because it's already compiled we don't have to compile the code as the application is running that has a double benefit because not only are we avoiding that sort of slow warm-up we're also avoiding using the resources of whichever machine we're running on and it could be like a container and having to do the compilation at the same time that we're doing the work of the application so we're reducing the throughput of the application not just by the fact that we're running interpreted or C1 compile code but because we're doing the compilation at the same time so we start at full speed straight away this all looks really good so far so this is the Graal native image approach great okay so we take the code we use role native image and we solved our problem so at this point I can just walk away and say that's what we did except no you knew there was a catch to this not so fast and I mean this in more than one way not so fast there are some issues in terms of using statically compiled ahead of time compile code aot is by definition static it does not change so you compile your code once and that's what you're using the other thing is that you compile the code before it is run ahead of time very logically so the important thing about this is that the compiler doesn't have any knowledge of what that code is actually going to do at runtime it can look at the way the code's been written and as I said it can use techniques like you know Escape analysis Loop unrolling method inlining but it doesn't know exactly how that code is going to be used as it runs so there are limitations in terms of the optimization techniques that we can apply to that we can take an alternative approach which is what grav VM does as well and if you're using the Enterprise Edition you can use profile guided optimization technique that's been around again since the beginning of compilation which is to say right we'll compile the code statically we'll run the application and then we'll also profile it as it runs that way we can take the information that we've got about what it did when it was run feed that back into the compiler and say now optimize it again but with the knowledge of what happened when we ran it so you get better optimization but that only works so far because you can run the application you can have reps entity workload but there may be things that change as you're running the application you may have different workloads and so on so you're still getting that static nature of what you've done even though it's profile guided the other thing you can't do with static compilation is speculative optimizations and these are very important speculative optimizations are where you're looking at how the application has run up until the point where you start compiling and you make certain assumptions about the fact that it did things up until that point let's assume that they're going to continue in the future I'll give you a couple examples of that if we have a monomorphic site in this case I've got a simple class called animal which encapsulates a value color in order to access that I use and access a method so I call get color and it gives me the color great now if I use that in my code I have to call the get color method the field is private so I can't access it directly but the compiler will look at that and go well hang on if I compile that code as you've written it what I'm going to have to do is create a stack frame push it onto the stack get the value of the color pop the stack frame off and return the color that's very inefficient because I'm creating a stack frame even though all you're doing is returning the value of color because the compiler can do this precisely it's able to say well in this case what I'm going to do is I'm going to cheat if you like and I'm going to say because we have a method only returns color we'll inline that and we'll say let's treat color as if it was a public variable and avoid the overhead of the method call that will work fine so long as you only have one implementation of get color in the animal class Java is a dynamic language you can dynamically load classes at runtime you could change the way that the animal behaves and you could change the way that that method behaves so if you're trying to statically compile then you run into problems this kind of thing is what you can do with just internal compilation as a slightly more complex example we can also do Branch analysis now there's a slightly contrived example but it's very good at demonstrating the kind of thing that the compiler can do here I've got a method called compute magnitude takes a value and then we test to see if the value is greater than nine if it is we call a different method to compute the bias on that value if it's not greater than 9 we simply set the bias to be one and then we're going to return log base 10 of the bias that we've got Plus 99. Branch analysis says let's count how many times the code goes through the true Branch or the force Branch as we run the code when we come to compile it if we see that we have gone through the true Branch zero times we've never executed that code the compiler will assume that that's something we can depend on in the future and let's compile the code accordingly what it can end up doing is changing the code to do something like this still got our compute magnitude method we still have to have the test to see if the value is greater than nine we have a contract between the code and the compiler and we have to respect that if the value is greater than 9 what it means is that our assumption about the way the code has worked up until this point is now wrong and so we have to throw away the code that we're going to generate and recompile based on this new assumption and that's what we call the optimization so to do that we call uncommon trap and that goes off tells the jvm that has to throw away the code and recompile it however if we don't have a value that's ever greater than nine we can simplify our code quite a lot we can eliminate the fact that we've got the value what we've got the value bias being one one plus 99 is 100 to eliminate that and we can eliminate the call to log base 10 because we know log base over 100 is going to be 2. so all it does is return to much more efficient than having to do all of the code we had in the first method but it only works if our assumption holds true what we found as all is that speculative optimizations are responsible for about 50 of the performance gains we get from jit compilation they're really really useful for enabling you to get higher performing code so it's very important that you're able to do those if you want to get the best possible optimization however de-optimizations are very bad you've spent time compiling code you've used the CPU Cycles to do that the assumption is probably wrong you've had to throw it away and you've had to recompile the same code in a different way so de-optimizations are really what we don't want so if we compile aot with jet and look at the sort of pros and cons of both aot has limitations in terms of what we can do from a Java perspective as I said Java is a dynamic language so we can do class loading if you want to do aot you can't do Dynamic class loading at runtime similarly you can't do Dynamic bytecode generation because well there are no byte codes so you can't do Dynamic byte codes reflection is possible but you effectively have to declare everything in advance and it becomes more complicated to do reflection if you use ahead of time compilation than if you're using jit compilation because of the way that things work you can't use speculative optimizations because you don't have the ability to throw away code when a speculative optimization turns out to be wrong so you can't compile and say well I'm going to assume that everything works this way suddenly something doesn't work that way what do you do now yes you could argue well we could have different branches so that we say right we've got one sort of code for this method if that's assumption is true got a different set of code if a different assumption is true but that gets very very complicated makes your code bloat and really isn't a good solution to the problem overall typically I would say in every case but typically the performance you'll get from aot compiled code will be lower because you can't use all of the optimizations you can use with jit compilation the good things about aot are obviously you get full speed right from the very beginning there is no warm-up there is no waiting for the code to compile and because you're not doing the compilation at runtime you're not using the resources that you have to share with the application code so jit code basically is the opposite of that you can use aggressive method inlining and techniques like that you can use Dynamic bytecode generation reflection is relatively simple right now you never say reflection is simple but relatively simple compared to aot you can use speculative optimizations overall typically you will see better performance however downside is obviously the required warm-up time and the CPU overhead so we come back to our our graph we compare things if we're looking at jit compile code this is the profile that we're looking at if we look at what we get with aot we're going to see very fast startup so instant startup but the level of overall performance is going to be lower even my profile guided optimization brings it up a bit further but you won't necessarily get the level that you'll get with um jit compile code now I know that the growl team have published some results on this they've done some comparisons and they find that they can get very close to jit compile code in certain situations but overall you know there are going to be times when you see distinct differences so when do you use aot well ephemeral microservices are a great example of that if you're using things like AWS lambdas where they're well I'm gonna the terrible term serverless Computing no it's not serverless Computing because you have to run it somewhere you run it on a server but ephemeral microservices where they're very short-lived you don't really have very much time involved in these they function that perform something and then stops that's great for things with like aot compile codes because you're not running for a long time you won't get any benefit from doing jit compile code similarly because you have to include the garbage collection through in compile code with ephemeral microservices you're not really sort of too concerned about GC because most of the time they only live for a short space time where you don't even get GC Cycles running they're also very good if you're doing resource constrained Services if you're running like a 2v core container then jit compilation is going to significantly reduce the throughput you'll get in that kind of environment so aot does definitely have a benefit solution two what about if we store some of the jit compilation data so this is a project that we've created that we've done which is called ready now and this takes the same idea as profile guided optimization but not where we feed the information back into the compiler to generate static code but it's the same approach in terms of you have your application and you run it with a representative workload maybe in production and you let it warm up to the point where all of the code that you need on a regular basis has been compiled it's been optimized and you're happy with the level of the application performance at that point you take a profile and the profile consists of a number of pieces of information all the currently loaded classes all the currently initialized classes because again that is could be potentially a subset of loaded classes all the jit profiling data that was collected during the run of the C1 code and was used as input into the C2 compiled code very importantly we also need to learn by our mistakes so we'll record all the information about de-optimizations that happened so we can avoid those problems when we need to reuse this profile and then we'll also keep a copy of all the compiled code that was generated up until that point so now we've got like a snapshot of the compilation data and the code for that application when you restart the application rather than having to go through the whole process again of identifying the methods that need to be compiled doing the C1 compilation profiling them recompiling with C2 let's take the information from that profile and immediately use that to get our code so we can say right we know which classes we need to load we know which classes we need to initialize just do that straight away that obviously could be more than you would get when you start an application by default so we know the ones that you need at the point where we took the profile what we can then do is we can load the code that we have for our stored methods or if we need to we can recompile the code based on certain assumptions about how it works and we do all of that before we get to the main entry point that way when you start your application all of the code that you need for that application has already been compiled is ready to go when you start the application and you get into your main realistically you're going to be getting about 98 of the performance that you got when the profile was taken few transactions later you'll be at that 100 level there's a few things we can't do exactly in terms of getting to 100 of the level straight away but we get about 98 of the performance so this sounds like a good idea and it is so if we look at a profile simplified profile of the performance graph we see something like this we've got a warm-up slope and then we've got our steady state of application performance that's without using ready now if we apply ready now then we get a nicer looking graph suddenly we get much faster startup so much steeper slope on the warm-up and we immediately get to a very high level of performance that we had when we took our profile there's just one downside to this which is if you look closely at the graph you'll see that there's a little bit of a kind of Blank Spot where we're not doing any work for the application that's because obviously there's a lot more work involved before we get to the main entry point we have to do more class learning potentially more class initialization we have to look at the methods that we're using we have to determine if we can reuse the code for the cache where we have to recompile that code if it has to be compiled we're doing that before we start running any of the application that means the timed first transaction is potentially longer than we'll see without using ready now so again this is this is very good for certain situations if you're running something like a trading application where you know that nine o'clock is the time when trades start all you do is you say right we'll take into account the fact that we've got the timed first transaction being longer let's start our application at five to nine it could be longer than ten to nine Hopper state but you started before that allow it to have time to do all the work it needs to do but then you know that when the bell rings and you need to start processing transactions you're ready to go at a much faster rate than you would be if you started the application and warmed it up even using um because a lot of things like trading systems they they try to warm up the system in advance by feeding fake trades to them that still won't give you the same level of performance you get with real data that we can do it ready now solution number three what about if we decouple the jit compiler so if we look at the architecture of the jvm we see something like this and we don't need to go into too much detail about this but essentially you've got the class loader subsystem at the top which is what handles obviously loading the classes verifying them making sure that they don't do anything that they shouldn't do from a security perspective or breaking the standard then we've got our runtime area runtime data we've got the method area we've got Heap we've got Stacks we've got native stacks and memory that's used by that and then we've got the actual execution engine which is The Interpreter the jit compiler the garbage collector and all that good stuff so the thing that we need to focus on here is obviously the jit compiler because that's doing work whilst the application is also trying to do work and as I said you you're battling for resources in terms of having to do legit compilation at the same time so what we could do is we can say let's create a set of optimized code for a particular situation that will include the actual code for the method but it will also include a set of assumptions that we've got from when we compile the code those speculations that we had and we can include the ones from the earlier examples so we could say we've only got get one get color method and we've got a value that's less than 10. great so that means that now we've got code that will work in that situation we're going to have another set of code that's compiled which has more than one get color method or has a value that is also 10 or more now the thing about jit compilation as I've already said which I won't belabor the point is that jit is CPU intensive it requires a lot of processing to compile the code as the application is running and if we want to more heavily optimize the code if we want to get even better performance from that it's going to require even more Computing resources basically it's a sort of like the more effort you throw at it the better the optimizations now that works really well if we've got a nice big machine so if we've got a like a 64 V core machine with 64 gigs of RAM great we're not really going to notice too much degradation if we've got a two vehicle container with two gigs of memory we could see a 50 drop in throughput as One V core is used for compilation whilst One V core is used for application throughput versus two V chords being used for application throughput so we often end up in terms of the code that we generate and the compilation approach is it's a compromise balancing not degrading throughput too much but getting enough optimization graph of again let's take two vehicle container and compare throughput or look at throughput of our application and also look at CPU utilization top graph is the actual throughput and that's similar in terms of we got the warm-up phase here simple application warms up quite quickly and then we've got steady state but we can see on the bottom graph where we're thrashing the CPU in order to do the compilation at the beginning and then that suddenly drops down to one V core I think there's one we call there but essentially what we're saying is that you have to provision more resources to do that first bit and then you could actually get away with less for most of the time that you're running the application now if we take another graph we overlay this this is using azul's replacement for the C2 jet compiler we have a different jit compiler called Falcon that's based on an open source project called llvm it does a lot more optimization and we can see that that has both benefits and less benefits from the shape of these graphs so now what we're seeing is that the right hand side of the graph is good because the blue line which is what we get from Falcon is higher than the red line that we get from C2 good so we've got better overall performance but the left hand graph left-hand side of the graph shows that it takes longer to get to that point and we're seeing dips along the way which are these de-optimizations that occur more frequently because we're doing more aggressive speculation similarly if we look at the graph at the bottom we're seeing a lot more CPU utilization we need a lot more resources to get that code to compile and spikes there as well as we have to recompile when we de-optimize so what we said is let's decouple the jit compiler from the jvm let's make it into a cloud native compiler you know we're running pretty much everything in the cloud now why not take the jit compiler separate it from the jvm itself and make it as a service then all the jvms can rely on a single service pass the methods they need to be compiled to the service and have the results returned Network latency and overhead is not really a problem because most methods are quite small the amount of network traffic involved in this is going to be quite small as well so you won't see degradation from that point of view so if we look at our graph again and we say right so this is the problem we had in terms of getting better overall performance but it takes longer more fluctuation as we get to that point and a lot of heavy CPU utilization let's now overlay on that what we see as a graph if we use the cloud native compiler and so we get a green line now which is much better and if I drop out the blue one what we'll see is a kind of clearer picture here that we've got now the same profile at the top in terms of warm up so it warms up very quickly and gives us overall better performance for most of the time that we're running it because we're getting fast warm-up but even better the CPU utilization drops off very very quickly so we can literally get away with reducing the number of resources that we put in that application of course you might say well isn't this just shifting the cost you're Shifting the cost from each of the individual jvms into a centralized resource and you've still got to pay for that well yes you do but we're fit we're shifting it to a more efficient place because remember we can share the results when we start up our microservice a we need to compile certain methods pass those to our Cloud native compiler we get the results back if we start up another instance of microservice a it needs to compile that method we pass it to the cloud native compiler but now it's already got the results so all it needs to do is pass back the code straight away so you get the code back quicker don't have to do as much work in the cloud data compiler we're sharing the results amongst multiple instances we can also dedicate more resources to that so if we know that we need to do more heavy optimization then we can do that very efficiently by having more resources in that centralized service and allow it to spend the time working on doing that optimization so that we get the benefit across these smaller deployments using a minimal number of recalls in a container by caching the code as I say that's much more efficient way of doing things because we get that memory again across runs and now we not have to rely on having to do the same thing every time we start the application or start the service the other thing we can do is we can be kind of clever about this we talked about the idea of having a sort of compilation unit where we've got compiled code with the set of assumptions that we have for that particular way to compile the method we can do that and have multiple different sets of assumptions even though it's the same method it's compiled in different ways for a different set of assumptions when the jvm calls for a particular method it parses the profiling information that's it's kind of a bit like Shazam really for for methods you've got a fingerprint that you create of the method with the information about its profile and we match that against what we've got in our cache and say oh okay yes we've already got up a profile or we've already got a piece of compile code for that method that matches against that profile so we return that straight away if we don't have it we say right this is something we haven't seen before let's compile it using the jit as a service cloud data compiler then return the code next time somebody comes and asks for that particular thing with the profile again we can match against it and now we've got a new version that we can use for different applications so depending on what the application is doing we can provide the same method but with different compiled code what about if we combine Solutions two and three what do I mean by that well we came out with the idea that okay so we've got the cloud native compiler which is a centralized way of doing compilation why not compile combine that with the idea of ready now which is taking the information about which things need to be compiled in advance so we take our profile and we put that into the cloud as well so now what we're doing is saying when you start your application we can immediately go right we know which classes to load we know which class to initialize and which methods need to be compiled so we'll pass those to the cloud native compiler get the code where it's available potentially compile methods if they're needed and feed that into the compiler so we've got all the different bits coming together to give us kind of the best of the idea of having the the reading out profiles and the centralized compilation I mean the last solution I want to talk about is what about if we save the whole application state what I mean by this right so on Linux there is a thing called coordinated resume in user space or cry you as some people refer to it the idea about this is is really for migration of containers from one Linux instance to another and essentially what it allows you to do is take a running process and persist it into a set of user files and you freeze it at a particular point if you think about the way that the operating system works already we have context switching so you take a process you run it and you've got multiple processes running on the same machine you can easily have more processes running then you have cores or CPUs the way that you do that is by context switching between different processes do it fast enough and you give the illusion of the fact that they're all running at the same time it's well understood technology we've had it for you know decades essentially what we're doing is doing a context switch but rather than saving the context of the application to memory and then swapping it back in when we need to we write it out to a set of files which can be persisted for a lot longer potentially moved to a different machine so that's the idea behind cry you is to take the state of a running process and persist it into a set of user level files what we thought we could do is apply that to Java jvm is just a process so we can use cryu we can take a running jvm we can persist it into a set of user level files and then restart it at some time later great that's would work but there's some real problems with that as we'll see what we really want to do is to make the application where that it's going to be checkpointed so we came up with what we call crack coordinated restore at checkpoint we run our application as a Java application and then rather than just freezing the jvm itself which will also freeze all the state of the the running Java application we tell the application that it's going to be checkpointed that way it's aware that it needs to do certain things in order to tidy up so that when we restart the application we're in a better situation and more able to restart reliably we pause for some time and then the application is being aware that it's being restarted restored from a checkpoint and it can do whatever work is necessary to then restart now crack enforces more restrictions than cry you will do we say that you cannot have open file descriptors you cannot have open sockets open network connections when you do a checkpoint that's because if you have open file descriptors or check or um sockets or whatever and you do a restore it may be that those files have changed they may have disappeared network connections may have timed out and it becomes very difficult to restore those reliably by enforcing the closing of those in the application code you can then allow the application code to restore those in a controlled way when we do a restore and it makes it much more easy to have a as I say reliably restored application we use a very simple API for this so the idea is you have resources it's a simple interface two methods before checkpoint and after restore you'll never guess when those methods get called all classes you have where you've got things like file descriptors or network connections you implement the resource interface and then you implement the or provide implementation of the before checkpoint and after restore to do whatever work is necessary how we use that is by then taking those resources and you have to register them with a context and the context you would typically get from the the global context which is the jvm's context so that it knows when you do a checkpoint that it needs to call the before checkpoint method on all these different classes so go through the list call the before checkpoint method on those and then when it restores them it will also call the after restore method on all of those registered resources and off you go so you take your resources you register them with the context context you get from the the core class as a global context you can create your own if you want to so when you use this like I said you you implement the interface you add this to your code and then when the jvm does a checkpoint it will call the before checkpoint in the order that you've registered them so you can determine which is the the logical sequence for that to happen when you do a restore the sequence will be reversed so you go down the stack if you like and then when you come back out you go up the stack that way you've got a defined order and you know exactly which methods are getting called in which sequence and how they'll be called on the way out so you can take appropriate action as the restore is happening does it work well we did a proof of concept and you can download this from our website we've also made this a project on openjdk and this is the kind of results we get so we wrote some simple applications using things like spring boot Micronaut quarkus and we measured the time it would take to get to First transaction spring boot is a good example time to First transaction for whatever application we wrote on whatever Hardware we're using took just under four seconds using a crack restore from a checkpoint time to First transaction 38 milliseconds so that's two orders of magnitude faster and remember that when you get that 38 milliseconds you're at the same point you were when you run your application so all of the state is there all of your HEAP all of your data is pre-loaded all the jit compile code is there ready to go you've got all of the same level of performance identical to what you had when you did the checkpoint so you're running much faster with all the benefits of the optimizations Micronaut and quarkus again they're very similar in terms of results they took a little bit less time to do first transaction without using crack but again sort of two orders of magnitude faster for the results there so just to summarize basically solving the jvm Walmart problem the important thing to get across here is there isn't one solution that fits everything aot is good for fast startup if you're doing things like ephemeral services and you're concerned about the footprint because you'll get much smaller footprint with aot compile code then that's really good that works in that particular situation ready now provides a way of remembering the jit compilation data across runs so you can reuse that and you can say right I know what needs to be done before I start executing the application but clearly the the downside of that is you have to do a lot more work potentially before you get to the main entry point and that can allow you know can take longer to get to First transaction in that particular case Cloud native compiler offloads legit work so you don't have the impact on the the resources that you've got you have all of your vehicles to run your application code versus sharing them in order to do the jit compilation code as you're warming up right now orchestrator kind of combines the best of ready now and Cloud native compiler so you know what work to do and ideally you're not having to do all of that locally by using the cloud native compiler and crack is the idea of taking an application pausing it at a known point and then being able to restart it from exactly that point with the restrictions of having to restart file descriptors or network connections so on I will mention project Leiden which is an open jdk project I said crack is an open jdk project we contributed there project Leiden is another project that is looking at how to improve startup time they have some other ideas about approaches there's nothing really kind of concrete around this yet they've come up with a more sort of um higher level architectural view of what they call condensers and the idea of condensers is to have different potential stages involved in how you improve the startup performance of an application I think it'll be interesting to see how that develops and what they deliver in terms of actual implementation details if you're interested in trying out our Prime jvm you can try it for free just go to our azul.com website and there are stream downloads available there it's plug-in replacement you don't have to recompile any code you don't have to change any code it just gives you better potentially overall performance um just the drop-in replacement and so that's that I have one minute left I've done perfectly in terms of time so thank you very much