If not ThreadLocal then what? ScopedValue!

2023-07-29 1787 words 9 minutes

/en/posts/2023-07-16_threadlocal/threads.jpg

Contents

What is a ScopedValue?

ScopedValue is a value that can be seen and used in methods, but hasn’t been passed to them as a parameter. Untill recently this was possible in Java language by means of ThreadLocal variables. Let’s look at ThreadLocals for a moment to refresh our knowledge and let’s see what new features are being introduced by newer replacements, that is: ScopedValues.

What is the ThreadLocal variable?

ThreadLocal variables are typically declared as private global variables (static fields) and ensure that the value we read from them (using the get() method) is the same value that was previously stored in them (using the set() method) within the same thread (or the one returned by calling the protected initialValue() method that returns the initial value of the ThreadLocal variable).

This way we can, for example, define a static method get() that returns the identifier of the current thread:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


 import java.util.concurrent.atomic.AtomicInteger;

 public class ThreadName {
     // Atomic integer containing the next thread ID to be assigned
     private static final AtomicInteger nextId = new AtomicInteger(0);

     // Thread local variable containing each thread's ID
     private static final ThreadLocal<Integer> threadId =
         new ThreadLocal<Integer>() {
             @Override protected Integer initialValue() {
                 return nextId.getAndIncrement();
         }
     };

     // Returns the current thread's unique ID, assigning it if necessary
     public static int get() {
         return threadId.get();
     }
 }

Source: ThreadName.java

How does ThreadLocal work internally?

I used to believe that ThreadLocal is implemented in a way that it stores a map like ConcurrentHashMap<Thread, T>, using Thread.currentThread() as the key and the value of the ThreadLocal variable as the corresponding value. I thought that depending on the currently executing thread, we retrieve (or set in the map) the appropriate value.

Yes, we run the risk of having a lot of thread contention with such a map. Additionally, since ThreadLocal variables are static, such a map would also need to be a static resource: it would be kept in memory along with the class, essentially throughout the entire application runtime. And the finished threads - keys in that map - along with their resources - couldn’t be released by the garbage collector. a Unless we used WeakHashMap - then the collector could reclaim a terminated thread (along with its whole “jungle”). But there would still be a problem of thread congestion around the map.

However, these variables don’t work that way at all!

Take a look:

Each thread stores (and has exclusive access to, so synchronization is not needed) a map where the keys are ThreadLocal variables, and the values are the values of those variables. Regardless of the code path taken in a given thread, all reads and writes to ThreadLocal variables are not “pulled” from object fields but from the map sitting in the thread object:

1
2


// a map created by each thread and updated by Thread instances
new WeakHashMap<ThreadLocal,T>()

Indeed, ThreadLocal utilizes an internal map implementation called ThreadLocal.ThreadLocalMap. This implementation optimizes the way it calculates hashes for keys to avoid excessive collisions when inserting values into the map. This is important because many ThreadLocal variables with default values are often created and initialized at once.

Additionally, the ThreadLocal.ThreadLocalMap takes care of removing entries with null keys. Weak references help in garbage collecting ThreadLocal objects associated with terminated threads. However, entries with null keys may still linger in the ThreadLocalMap, and the internal implementation handles the removal of such entries, particularly during collisions or when the map needs to resize.

In summary, the ThreadLocal.ThreadLocalMap implementation is designed to optimize hash calculations and efficiently manage entries to ensure proper handling of ThreadLocal variables across threads.

What are ThreadLocal variables used for?

Sessions

ThreadLocal variables can be used to store user-specific information, for example, session data of the currently logged-in user in a multi-threaded web application.

Database connections

In multi-threaded applications, each thread can have its own database connection, which helps to avoid thread contention when accessing the database and improves the overall performance of the application.

Request identification

ThreadLocal variables can be used to track requests - a thread can store the request identifier in a ThreadLocal variable to correlate logs belonging to that request as it traverses through the entire request handling path in the application.

Transaction management

ThreadLocal variables can store transaction identifiers or objects representing transaction states. Libraries and frameworks that assist in writing transactional code, such as Spring, often use ThreadLocal variables to store the transaction state. This is why attempting to commit a transaction in a different thread than the one in which the transaction was created leads to errors. This property allows the implementation of transactionality in a relatively “transparent” way for developers writing transactional code, for example, offering “declarative transactions” in the Spring framework.

Context

ThreadLocal variables can be used to store contextual information, such as the logging level or the target logging location (file/console/socket).

Locale

In multi-threaded web applications, ThreadLocal variables can store information about the user’s preferred language and locale, which is sent by the browser in the HTTP request header (see: Accept-Language).

Cache

Cache instances can be stored in ThreadLocal variables (thread-local cache) to avoid thread contention when accessing a global cache instance.

Logging

ThreadLocal variables can also be utilized to store information about:

Processing context (logging which business methods have been invoked).
Execution time (the start/end time of processing).
Request identification (in the context where one thread handles one request).

Indeed! The first time I encountered ThreadLocal variables was when I was delving into request logging and related transactions in large “enterprise” applications. I wasn’t aware of the existence of MDC (Mapped Diagnostic Context) and the problem it solves (spoiler: it allows you to configure logging in a way that automatically adds context information, such as which request, session, or transaction is being processed). Popular library implementations of MDC are based precisely on ThreadLocal variables, for example:

Logback:
Log4J:
- MDC in Log4j2
- MDC in Log4j1

MDC allows configuring logging in such a way that context information (for example, request identifier, session, or transaction) is automatically added to the logs. Multi-threaded applications typically have multiple threads, and without using the ThreadLocal mechanism, it would be necessary to manually pass this information between methods, which would be cumbersome and error-prone.

Thanks to ThreadLocal, each thread stores its own copy of the context, which can be easily read and used by different elements of the application during the execution of operations within the same thread. Therefore, MDC implementations based on ThreadLocal variables enable automatic addition of relevant information to the logs for each thread, making it easier to analyze and understand the application’s behavior in multi-threaded environments.

Scoped Values - details

After this lengthy digression, let’s take a look at what ScopedValue is. According to the description in the JavaDoc:

A value that may be safely and efficiently shared to methods without using method parameters.

“The same can essentially be said about ThreadLocal! However, when we delve into the description, we notice that values are not “assigned” or “set” (as in ThreadLocal variables) but “bound.”

The ScopedValue API works by calling a method in which the ScopedValue object is “bound” to a specific value for a “limited” (“scoped”) period during which the method is executed (…and this “binding” disappears after exiting the “scope”).”

Dynamic context

The “binding” of a value with a certain limited duration (or more precisely, with the lexical scope of the code where the binding is “valid”) is called a “dynamic context”: outside this context, the variable is not bound to any value.

One can think of these variables as follows:

We need a variable (let’s call it VAL) of type ScopedValue<V> (V being the type of the value stored in ScopedValue).
We need it to have a certain value v of type V.
We want to bind the VAL variable with the value v during the execution of a certain runnable: () -> process().

These requirements can be encoded as follows:

1
2
3
4
5


record V(String v){}

private static final ScopedValue<V> VAL = ScopedValue.newInstance();

ScopedValue.runWhere(VAL, V("Answer is: 42"), () -> doSomething());

Source: Scoped.java

Rebinding

ScopedValue variables should be declared as static final and understood as “keys” to obtain values within the context in which they are defined.

“Binding” always occurs within the context of the current thread, but it can be “rebound” for the needs of a called method: in this case, we nest a new dynamic scope (created after rebinding) inside the old context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


import java.util.concurrent.StructuredTaskScope;
import java.lang.Thread;

public class ScopedThread {
  private record V(String v){}
  
  private static final ScopedValue<V> VAL = ScopedValue.newInstance();

  public void printMyVal(){
    System.out.println(VAL.get());
  }

  public void test(String ...values) {
    for (String s : values) {
      ScopedValue.runWhere(VAL, new V(s), () ->{
          try (var scope = new StructuredTaskScope<String>()) {            
            scope.fork(() -> childTask(1));
            scope.fork(() -> childTask(2));
            try {
              scope.join();
            } catch(InterruptedException ex) {
              Thread.currentThread().interrupt();
            }
        }});
      }    
  }
  
  String childTask(int id) {
    // set name of thread - where do I execure?
    Thread.currentThread().setName("thread-" + id);    
    // modify "implicit parameter" from VAL by adding id
    ScopedValue.runWhere(VAL, new V(VAL.get().v + id), () ->{
      var val = "Task_%s: VAL=%s thread=%s".formatted(id, VAL.get(), Thread.currentThread().getName());
      System.out.println(val);
    });
    return "";
  }
  
  public static void main(String[] args) {
    var s = new ScopedThread();
    s.test("one", "two", "three");
  }
}

In the above program:

For each string from the command line, a new scope is created for the variable VAL, which is bound to a value of type V storing the string.
Within the scope, a structured scope StructuredTaskScope is created (I will write more about it in the next post) where two runnables are submitted for execution as virtual threads.
In the childTask(int id) function:
- I set the name of the current thread.
- I create a new scope in which I bind VAR to a value of V, to which I append the value of the parameter id.
- I print the value in VAL and the name of the current thread.

I know, this program makes no sense at all. But my goal was to check if I have the correct understanding of the model of what will happen :)

Source: ScopedThread.java