Nasty Java HashMap race condition

By Confusion on Tuesday 09 June 2009 22:03 - Comments (13)
Categories: Java, Software engineering, Views: 6.948

After reading this I think I need to check some applications I wrote in the past.

The piece is excellent and details matter, but I'll attempt a summary anyway:
if you use a regular HashMap in a multithreaded environment, it seems the worst that can happen is that you incur some additional cache misses due to race conditions. In a lot of situations that is perfectly acceptable and since wrapping the HashMap with Collections.synchronizedMap() incurs quite a performance penalty (and at that point, that was basically the choice), you are tempted to just leave the HashMap in. Don't. The 'put' operation may trigger a resize and redistribution of the internal data structure of the HashMap that can thoroughly hose it is concurrently accessed during the restructing , to the extent that your program goes into an infinite loop.

These days, there isn't any performance reason to decide in favor of the regular HashMap: the java.util.concurrent.ConcurrentHashMap has excellent performance, even in singlethreaded applications. However, I think I've made the mistake of using a regular HashMap somewhere in the past. However, the application has never malfunctioned (as far as I know), so it may well be that the chances of this bug occurring are so small that they are negligible for all practical purposes. Nevertheless, unless you want to do the math, replacing it by a ConcurrentHashMap is a safe bet.

Volgende: Undecipherable product descriptions 06-'09 Undecipherable product descriptions
Volgende: Will technological innovation ever end? 04-'09 Will technological innovation ever end?

Comments


By Tweakers user JeRa, Tuesday 09 June 2009 22:16

Well... why would you want to use an unsynchronized class in a multi-threaded application anyway? It's like warning people for putting sand into their gas tanks - they know it's wrong and at some point, all hell can break loose. So why do it?

By Tweakers user Jeanpaul145, Tuesday 09 June 2009 22:59

This is interesting. I didn't know about ConcurrentHashMap. Thanks for the tip!

By Tweakers user Nick_S, Tuesday 09 June 2009 23:48

Before using any class, please read the documentation:
Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.
This was even there in Java 1.4.

[Comment edited on Tuesday 09 June 2009 23:48]


By Tweakers user Confusion, Wednesday 10 June 2009 07:32

@Nick_S:
I doubt you have read all the documentation for all classes you use equally carefully ;). Had I read that carefully, the specific emphasis on 'structural modification' may have tipped me off. Of course, had I ever implemented a hashmap for myself, I would have realised this problem. However, my background isn't in CS and I think that holds for many in the field. Then again, I wonder how many CS students would make the same mistake.


@JeRa:
I think your analogy is misleading: you're comparing it to a situation that will always lead to failure, while this will only sometimes lead to failure. A HashMap whose size stays below the threshold where it rearranges itself can safely be used in a concurrent way. Even if it sometimes rearranges itself, the number of rearrangements is very small, causing this problem to be extremely unlikely to show up. I think you can better compare this to mountain climbing: you know it's a dangerous hobby and that people die every year, but in general it's safe to practice. Of course, it's never up to you as a developer to decide for a client that they want to run the risk, so, as I said in the blog post, it's still wrong to do it.

[Comment edited on Wednesday 10 June 2009 07:37]


By Tweakers user _Erikje_, Wednesday 10 June 2009 10:09

Just recently, I came accros a IBM article about the performance of HashTable and ConcurrentHashMap. I was totally stunned when I saw the figures, a dramatic gain in performance is visible.
The article: http://www.ibm.com/develo...a/library/j-jtp07233.html

[Comment edited on Wednesday 10 June 2009 10:09]


By Tweakers user YopY, Wednesday 10 June 2009 10:21

This topic made me a bit curious, as I'm currently using a project (servlet) that uses a hashmap. I read the particular section Nick_S read, and looked for a solution. The javadoc itself suggested using Collections.synchronizedMap(), which is a rather simple wrapper around a regular HashMap in which all methods are marked as Synchronized. As you can imagine, this isn't very optimal, but it's a simple and easy solution.

An alternative would be to put all operations on a HashMap in a synchronized block


Java:
1
2
3
4
5
6
Map<StringStringmyMap = new HashMap<StringString>();

// snip
synchronized(myMap) {
    myMap.put(someKeysomeValue);
}


etc.

Synchronized HashMap is a go-in-between solution, but it too requires you to put it in a sychronized block when iterating over it, to prevent concurrent access and such:


Java:
1
2
3
4
5
6
Map<StringStringmyMap = Collections.synchronizedMap(new HashMap<StringString>());
synchronized(myMap) {
  for (Map.Entry<StringStringentry : myMap.entrySet()) {
    // do something
  }
}


So far, I've used the above snippet, but with some testing and this post, I'm guessing ConcurrentHashMap is a better solution. In my particular test case, I even found that ConcurrentHashMap performs better than HashMap in a single-threaded environment! Dunno why this is though - a decompile of HashMap and ConcurrentHashMap and a quick glance over them does show that ConcurrentHashMap seems to use more low-level code (read: short variable names, =D). But I guess some more study on that is required - I'm more of a high-level programmer in any case.

Anyways, in my test, I performed three tests in which two arrays of Objects with a size of a million were inserted into a HashMap, a Synchronized map, and a ConcurrentHashMap in a row, and on my system, the results, averaged over 15 runs, were:

Average hashmap: 769ms
Average synchronized hashmap: 882ms
Average concurrent hashmap: 655ms

Note that this is in a single-threaded environment. I was rather surprised to see that ConcurrentHashMap performs better than HashMap, although that might just be my particular test case. I don't know (yet) how it performs in a multi-threaded environment, but perhaps someone else could whip up a test for that? Here's my test code, excuse the length


Java:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class Test {

    public static void main(String[] args) {
        new Test();
    }
    
    public Test() {
        long runs = 15;
        int size = 500000;
        long totalHash = 0;
        long totalSynch = 0;
        long totalConc = 0;
        
        for (int run = 0run < runsrun++) {
            Object[] keys = getObjects(size);
            Object[] values = getObjects(size);
            Map<ObjectObjecthashMap = new HashMap<ObjectObject>();
            long start = System.currentTimeMillis();
            for (int j = 0j < keys.lengthj++) {
                hashMap.put(keys~~~[j], values~~~[j]);
            }
            long end = System.currentTimeMillis();
            long time = end - start;
            System.out.println("Run " + run + ": Hashmap took " + time + " ms");
            totalHash += time;
    
            keys = null;
            values = null;
            System.gc();
            keys = getObjects(size);
            values = getObjects(size);
            
            hashMap = Collections.synchronizedMap(new HashMap<ObjectObject>());
            start = System.currentTimeMillis();
            for (int j = 0j < keys.lengthj++) {
                hashMap.put(keys~~~[j], values~~~[j]);
            }
            end = System.currentTimeMillis();
            time = end - start;
            System.out.println("Run " + run + ": Synchronized map took " + time + " ms");
            totalSynch += time;
    
            keys = null;
            values = null;
            System.gc();
            keys = getObjects(size);
            values = getObjects(size);
            hashMap = new ConcurrentHashMap<ObjectObject>();
            start = System.currentTimeMillis();
            for (int j = 0j < keys.lengthj++) {
                hashMap.put(keys[j], values[j]);
            }
            end = System.currentTimeMillis();
            time = end - start;
            System.out.println("Run " + run + ": Concurrent HashMap took " + time + " ms");
            totalConc += time;
        }
        
        totalHash = totalHash / runs;
        totalSynch = totalSynch / runs;
        totalConc = totalConc / runs;
        
        System.out.println("Average hashmap: " + totalHash);
        System.out.println("Average synchronized hashmap: " + totalSynch);
        System.out.println("Average concurrent hashmap: " + totalConc);
    }
    
    public Object[] getObjects(int length) {
        Object[] result = new Object[length];

        for (int i = 0i < lengthi++) {
            result[i] = new Object();
        }
        
        return result;
    }
}

[Comment edited on Wednesday 10 June 2009 10:23]


By Tweakers user JeRa, Wednesday 10 June 2009 13:46

@Confusion

You haven't answered my question: why use unsynchronised classes in a multithreaded application? You're right about the wrong analogy, but my point was that the problem is much more generic than just a wrong usage of a HashMap:

When you're developing a multithreaded application, always make sure you're using synchronized classes when accessing and manipulating data from more than one thread.

I agree with Nick_S on this one. Also, the automatic structural changes within the HashMap are irrelevant - another JRE developer could choose an entirely different implementation, which means the only real solution is to read and adhere to the documentation.

[Comment edited on Wednesday 10 June 2009 13:49]


By Tweakers user Confusion, Wednesday 10 June 2009 14:24

@JeRa:
The answer to 'why' is: because of (premature) optimization. Because of not thinking it through far enough. Because a naive HashMap implementation would actually work that way. Because of all the usual wrong reasons.

'Just using synchronized classes' usually isn't enough BTW. For example, I've added quite a few synchronized{} blocks around compound operations on synchronized classes since then.

[Comment edited on Wednesday 10 June 2009 14:31]


By Tweakers user YopY, Wednesday 10 June 2009 14:59

@confusion, I'd argue that in this case, using synchronized / threadsafe classes isn't 'premature optimization', it's 'solving (very) hard to find bugs before they occur'. As far as I've read (no, I don't have that much experience with multithreaded apps), locating concurrency-related bugs can be terribly difficult to spot.

Preferring ConcurrentHashMap over a regular HashMap in an environment where you know it'll be accessed by multiple threads at once is, in this case, and imo, a Good Thing. Just as long as you'll make sure to add synchronized blocks when and where required.

The investment in time of setting it up will be marginal compared to the time it would take to locate a potential bug.

By Tweakers user JeRa, Wednesday 10 June 2009 15:04

'Just using synchronized classes' usually isn't enough BTW. For example, I've added quite a few synchronized{} blocks around compound operations on synchronized classes since then.
Just exactly who are you quoting there? ;)

Of course that isn't enough - but not doing it usually results in mayhem :p any composite action which is used by multiple threads at once should be synchronised if anything can go wrong. It just so happens to be that most unsynchronized class methods consist of composite actions, such as HashMap.put() which can trigger the resize :)

By Tweakers user Confusion, Wednesday 10 June 2009 16:42

@YopY:
I think JeRa was questioning why I chose to do it back then.

@JeRa:
That paragraph was added as an afterthought, for those that didn't know that yet. Without it, our discussion may seem to indicate that just using synchronized classes solves all your concurrency problems.

By Javin @ Tibco RV Tutorial, Thursday 03 February 2011 10:03

Hi,
Thanks for this Nice article just to add while discussing about HashMap its worth mentioning following questions which frequently asked in Java interviews now days like <a href="http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html">How HashMap works in Java</a> or <a href="http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html">How get() method of HashMap works in JAVA </a> very often. on concept point of view these questions are great and expose the candidate if doesn't know deep details.

Javin
<a href="http://javarevisited.blogspot.com/2011/01/difference-between-fix-42-vs-fix-44-in.html" title="Difference between FIX4.2 vs FIX4.4">Difference between FIX4.2 vs FIX4.4</a>

By pranav, Thursday 11 August 2011 15:55

Very nice article and comments too.
I agree with you that now a days it doesn't matter which flavor of hashmap we are using as long as we are not doing extensive maths.

Comments are closed