Garbage Collection in
.NET has been hailed as a smart new feature introduced to the .Net platform that is designed to relieve the programmer from having to perform explicit memory management roles. That way, it works as a sort of an automatic memory manager of sorts. And an efficient one at that too, performing in the desired manner in most cases though there might also be a few issues in the way it works, in some cases at least.
Role of the Garbage Collector
In the ideal scenario, here is all that the garbage collector does:
- It frees you from having to worry about memory management. That way, you can focus only on the application you are developing as you won’t have to manually free the memory for the object you created for the application.
- Garbage collector also provides for the safety of the memory block that is already being used by an object by preventing any other object from attempting to access the same memory allocation.
- It gets rid of the objects that are no longer needed, thereby clearing the memory used by those objects so that the same is available for allocation to another object.
- Efficiently allocate objects on the managed heap.
The working of Garbage Collector in its most basic forms
Posting its initialisation by the
CLR, the garbage collector is allocated a memory segment where it stores and manages the object. This memory block comes to be referred to as the managed heap. All managed processes have a dedicated managed heap and memory is allocated to objects on the same heap by all threads in the process.
The garbage collector works on a rather simple principle, that of identifying all those objects applicable to the program being run that can still be accessed. The process starts with what can be described as Garbage Collector roots or simply GC roots. Those again are memory locations that contain the links to the objects that the program has created. All such objects are marked live.
Next, GC roots will look for those objects referenced by the objects that it had initially marked live. Such objects too are designated as live. This way, it continues its search for objects that are still being referenced by live objects and creates a memory map of the same.
Specifically, the garbage collector uses information such as these to find out if the object is live.
Static data: This refers to static objects in the application that is being referred to by other objects.
Stack roots: These are essentially stack variables that the
just-in-time (JIT) compiler and stack walker makes available. Worth mentioning here, JIT optimisations can lead to shrinking or expansion of the code regions where the stack variables are reported to the garbage collector.
This way, once GC roots have identified all live objects, it discards all the remaining objects and space; thus reclaimed is made available once again for new objects. .Net also essentially performs memory defragmentation as well so that all memory blocks being used up by live objects are arranged in a row while free memory is stashed as a single block at the end of the heap. This way, the allocation of new memory to the new object is achieved extremely fast.
To reserve a segment of memory for an object, the garbage collector calls the Win32 function, VirtualAlloc. The same segment can then be released back to the operating system by calling the Win32 function VirtualFree so that another object can use the memory portion.
The heap meanwhile is a combination of two distinct heaps – the large object heap and the small object heap. The large object heap is allocated to objects of size 85KB or more which typically are arrays. That said, instance object too can be that big at times though only on rare occasions.
Types of Garbage Collector Root
There are four main types of roots in .NET, as described below.
- Local variable: any local variable initiated in a method and is currently in contention qualifies to be a GC root. The objects that these variables refer to are always accessible by how they have been initiated, which also justifies their own existence as well.
Such local variable can also last a lifetime though that has much to do with the way the program has been put together. For instance, a local variable in debug build can last for as long as the program is on the stack. However, in release builds, the local variable is ‘live’ till the point it is used by the method and will be discarded thereafter.
- Static variables: all static variables are considered GC roots since the objects they reference are always accessible by the class it is declared in, or all through the program if the static variable has been declared as public.
- Managed object: a managed object too qualifies to be a GC root if it is passed to an unmanaged COM+ library through interop. The reason for this to be so is that COM+ does not do garbage collection. Rather, it has a reference counting system. As such, when the COM+ library is done with any object, it sets object’s reference count to 0. At this point, it no longer is a GC root and is available for collection all over again.
- Object with a finalizer: such objects remain in existence even though the garbage collector has found it to be dead. Instead, it comes to be identified as a sort of a root, that is until .NET has called the finalizer method. That way, such objects might require several scans of the garbage collector until they are eventually removed, and the memory made available to new objects.
Conditions that Trigger Garbage Collection:
- If the system is running out of physical memory: In such a scenario, there is either a low memory warning signal put out by the OS or has been so detected by the host.
The memory allocated to objects has passed a set threshold, the latter again changing randomly as per the demands of the process.
- If the GC.Collect method is specifically called: Even though you might never need to call the method as the garbage collection process runs automatically in the background. Still, there always is the provision to call garbage collection if so needed.
The objects in memory are classified into generations which helps the garbage collector to better deal with long-lived and short-lived objects. Basically, there are three generations of objects that survive in the heap.
- Generation 0: These happen to be the objects with the shortest life span, for example, a temporary variable. Freshly allocated objects too come to be identified as generation 0 object. Needless to say, garbage collection is most widespread in this generation, with very few making to the next generation. For example, large objects that make it to generation 2 collection.
- Generation 1: This stage too comprises of short-lived objects but may have long-lived objects as well. That way plays host to both short-lived and long-lived objects.
- Generation 2: This almost always comprises of long-lived objects as it wouldn’t have survived thus far. Objects in server applications containing static data have a life span for the entire duration of the process and is a fine example of generation 2 object.
When an object is not picked up by the garbage collector, this is labelled survivor and is promoted to the next higher generation and so on. So, a generation 0 object gets promoted to generation 1 while a generation 1 object gets promoted to generation 2. A generation 2 object, if it survives being collected by the garbage collector will however continue to be in generation 2.
The garbage collector also dynamically alters the size of memory allocation for each generation depending on the object’s survival rate in each generation. So if objects in a particular generation have a high survival rate, there will be more memory allocated to the generation. In this respect, the CLR has to continuously keep a tab on two priorities, that of not letting the garbage collector take up a lot of time while also ensuring the application’s working set is within manageable limits.
Limitations of the Garbage Collector
Unused objects that are still referenced
Many times garbage collector is found lacking in its performance, that of its ability to pick up the unused objects. And that has much to do with the very core of how it’s designed to function in the first place. Specifically, the garbage collector is designed to trace a dead object by checking its reference trail.
However, there are instances where an object has ceased to be in existence anymore and is no longer referred to by the program. However, there still might be some path from it that could lead to a live object. And it is this that is preventing the garbage collector from picking up the dead object even though it is no longer in use. Which means there is still an object that has a reference to another live object but is itself not in use anymore. Such instance leads to memory leaks in .NET, which again can lead to hard-to-detect errors or other performance issues.
Fragmentation of the Heap
This is another issue that the garbage collector apparently hasn’t got much to do about. It is with the large object heap and object that are part of it are rarely moved by the runtime. This might lead to a situation where the memory is left out of space for new objects. Objects with a longer life span when eventually removed leaving behind gaps in the memory chain, so much that there might not be another single large block of memory available when it’s needed to allocate a large object.
Garbage Collector Modes
The garbage collector has two primary operation modes: concurrent mode and synchronous mode (also referred to as workstation and server). Further, while the garbage collector can adapt its performance to operate in wide-ranging scenarios, you always have the option to change the garbage collection type according to the workload. There are two types of garbage collection that the CLR supports:
Workstation Garbage Collection
this happens to be the default mode and applies to all client workstations or stand-alone PCs. Workstation garbage collection again can be of two types: concurrent and non-concurrent.
Server Garbage Collection
As the name implies, this is applicable to server applications where scalability and throughput are of prime importance. Server garbage collection also is of two types: non-concurrent or background. Server garbage collection is however faster than workstation garbage collection given that there are several garbage collection threads that work simultaneously in case of a server garbage collector. Server garbage collection is resources intensive, particularly during high memory load conditions.
So if there are multiple instances of an application – running into hundreds – concurrently running on a system, you can go for workstation garbage collection with concurrent garbage collection disabled. This should ensure better performance given that there will be less context switching.
Concurrent Garbage Collection
You have the option to enable concurrent garbage collection in workstation and server garbage collection. In this state, threads run simultaneously with a dedicated thread that performs garbage collection during the majority of the time the collection is in process.
An inherent positive of concurrent garbage collection is that there are minimum pauses during a collection, which in turn makes an interactive application to run more efficiently. Managed threads too can run most of the time when concurrent garbage collection thread is running thereby ensuring shorter pauses in between.
Background workstation garbage collection
Background garbage collection has come to replace concurrent workstation garbage collection with the advent of
.NET Framework 4. Further, it replaces concurrent server garbage collection starting with the
.NET Framework 4.5. When the background garbage collection is running while enough objects have been allocated in generation 0, it leads to the CLR performing a generation 0 or generation 1 foreground garbage collection.
Background garbage collection is enabled automatically by default and the dedicated background garbage collection thread makes a scan of the safe points at definite intervals to find out if there are any request for foreground garbage collection. In case there is any such request, the background collection is put on hold to enable the foreground garbage collection to run. It is only after the foreground garbage collection is complete that the dedicated background garbage collection thread and user threads will resume operations.
Background server garbage collection
This happens to be the default mode for server garbage collection post .NET Framework 4.5. It’s working is also quite similar to that of the background workstation garbage collection though there are a few areas where both differ.
For instance, while there is a single dedicated background garbage collection thread used by the background workstation garbage collection, background server garbage collection relies on multiple threads. Generally, there is a thread dedicated to each logical processor. Also, these threads do not time out, as is the case with threads used by the workstation background garbage collection.
Garbage collection is designed to happen automatically and with minimal supervision of the user. In fact, you need not even be bothered by the way it functions. Unfortunately, that’s the ideal scenario that often isn’t how things pan out in reality. Rather, there are often instances where programs and applications have been found to behave erroneously, and the reason can be traced to the .Net garbage collector not behaving the way it should. That should be reason enough for you to have a clear understanding of all that goes on in the guise of garbage collection, and how it impacts the program. The above discourse should get you off the ground at least on the subject.