Calculation of hash value in Java and compression to get index
Calculation of hash value in Java and compression to get index

Calculation of hash value in Java and compression to get index. For the basic data types, 해시게임 four categories and eight kinds of byte short int long float double char boolean calculation process.

it can be roughly divided into six categories (in fact, I divided it myself, I just wrote it clearly)

Calculate the hash value

//first class int


//The wrapper class of int type data (because hashCode() must be an object, so the wrapper class) to calculate the hash value and directly return its own value

Integer integer = 2147483647;
System.out.println(integer.hashCode());


//The second type byte short char


//These three calculation hash values ​​directly return the own value of the int type, which is similar or the same as the first type

Byte aByte = 127;
System.out.println(aByte.hashCode());

Short aShort = 32767;
System.out.println(aShort.hashCode());

Character character = '0';
System.out.println(character.hashCode());


//The third type of float is also 32-bit data, call floatToIntBits(float f) of Float, and returns the value of the binary int type corresponding to the current value

Float aFloat = 32.1321F;
System.out.println(aFloat.hashCode());


//The fourth type of long is 64-bit data. If it is directly converted to int (32-bit), there will be a loss of precision. Because the upper 32 bits are directly truncated, the same value in the lower 32 bits will return the same hash value.


//Is this okay, this is not good (please automatically fill in the image of Mr. Ma Baoguo), so perform an operation called "folding"

Long aLong = 2147483648L;


int longHashValue = (int) (aLong ^ (aLong >>> 32));//The ^ XOR here is to make the result more "random"


System.out.println(longHashValue);
System.out.println(aLong.hashCode());


//The fifth type of double


//First call Double's doubleToLongBits(doubled) to return the value of the binary long type corresponding to the current value


//Then fold the converted long value, the whole is as follows

Double aDouble = 412312.123213;
long tempLong = Double.doubleToLongBits(aDouble);
int doubleHashValue = (int) (tempLong ^ (tempLong >>> 32));
System.out.println(doubleHashValue);
System.out.println(aDouble.hashCode());


//The sixth type of boolean


//This is nothing to say, there are only two true returns 1231 false returns 1237

Why did I return these two numbers? I checked it. In order to make the hash value evenly distributed when seeking the index, I chose these two large prime numbers.


//But why does it happens to be these two, if there is a great god, please explain it

Boolean aBoolean = true;
System.out.println(aBoolean.hashCode());

  1. Hash a string

The above is to hash the basic data types, and the more commonly used is to hash the string. For the following,


The basic way to find a hash value for a string is to find a hash value for each character in the string, and then add up


But there is a problem with this, that is, strings with the same letter but in a different order (eg: hello, shell) will return the same hash value (collision, collision, collision), again, this is good, but this is not good


So the awesome hash function takes into account the order of the strings

String string = "tai";


//f(string) = f(t)b^2 + f(a)b^1 + f(i)b^0 (this ^ is the power, from b's (string length-1) The power starts and ends at the 0th power of b), b is a constant, and the designer uses 31 (the purpose is to reduce hash collisions) int string as value = (3131('to))+(31('a'))+(1*('i'));//ps.

The actual calculation here and the calculation order of the java source code are The difference is that the string here has only three digits, and there will be no overflow of the result.


//If there are too many digits, there may be (not possible, there will be) overflow, so the two values ​​below may not be equal, but this does not affect understanding


System.out.println(string as value);
System.out.println(string.hashCode());

"Compress" the hash value to get the index Calculation of hash value in Java and compression to get index

From hash value to index Calculation of hash value in Java and compression to get index

The above is the method of calculating the hash value of the basic type and the string type.


However, the obtained hash value is generally relatively large, and it is not easy to use it as an index directly, because it will exceed the index range. How to do the index?


The simple way is to take the remainder. In the above example, take 114588 and take the remainder of 16, and the remainder is 12.

hash value = 114588;
N = 16;
index = hash value % N; //index = 12

but there is a fast method. This fast means that the computer runs fast, that is, bit operation Calculation of hash value in Java and compression to get index


//When the above N is an integer power of 2,
index = hash value & (N -1);//Equivalent to hash value % N, but faster


// & represents AND operation, both are 1, and the result is 1

Ending: Calculation of hash value in Java and compression to get index

So far, the hash value calculation of basic data types and strings, and the process of corresponding index calculation are basically clarified


However, the source code in hashMap and HashSet also perform some additional operations to reduce hash collisions, but the overall meaning is that

look back and write. Calculation of hash value in Java and compression to get index