I want experts to look at my c-source for hashtables

Sven · Post by **Sven** » Sun Jun 14, 2009 8:33 pm

JensBNielsen wrote:I actually discovered the >> error by reading my own post here. It should be >= instead.

Have you checked whether your program works better with that fix?

JensBNielsen wrote:Thanks for discovering the difference in w_check_direction and b_check_direction. How did you discover that?!

The compiler prints a warning about an uninitialized variable, and I check each warning (and if it is my program, I fix the code in order to remove all warnings).

JensBNielsen wrote:First I would just like the hashtables to work properly.
Then I would like my code to be clean and straight, but I would also like to add new knowledge and ideas - and I have a lot!

If you have no problem to understand your own code, and if you therefore are able to identify bugs in the code which you detect by testing, then you may of course postpone making your code "clean and straight".

If, however, you see that your program does not work properly with respect to your hashtable implementation, and you do not understand all of that hashtable code well enough, then I strongly recommend - as most others here would probably do - to start _carefully_ with cleaning. You won't be able to fix any bug in complex code if you don't understand your code fully, which means that by changing two lines you'll most probably introduce two new bugs.

Of course cleaning is only a first step, and not the most important one, but I think it has to come first. You are tempted of course to say "Come on, I'm just one or two tiny bugs away from heaven, so why bother with cleaning now, I can do that later?!". Nice if you - or someone else here - find these tiny bugs quickly. But if you don't? You will try 20, 50, 100 different fixes at various places until you give up, having improved your Elo by some -50 points.

O.k., I exaggerated, but not too much.

After cleaning, now that you have managed to have code that is easier to change, the second step IMO is then to improve testability of the code. This is much work and may include things like:
- making the code "safe" by adding assert's;
- isolating functionality, i.e. making different pieces of code more independent from each other, in order to be able to test one functionality isolated from the other;
- writing specific debugging and testing code and managing to trigger specific tests from the user interface.

It is fun to reach a state where you can see that you really have your program under control - and not vice versa!

Just my 2 cents, again ...

Sven

JensBNielsen · Post by **JensBNielsen** » Sun Jun 14, 2009 10:29 pm

I have not spotted any change after correcting the >> error, as the hashtables did not work (because of other errors...). But it is nice to have that error corrected

)

Thanks for your help and advices.
I assume that it is about 5-10% of my program that I do not have full control over.
Fx the hashtables, but I know how they ought to work. I coded the hashtables in 1998 and stopped programming shortly after that.

I resumed programming a year ago, but the 10 years in between (without any programming in C) is a little challenge!

Best, Jens

PS. You did not spot why I have to alloce twice as much memory for hashtables than my program uses?!

Sven · Post by **Sven** » Mon Jun 15, 2009 12:38 am

JensBNielsen wrote:I assume that it is about 5-10% of my program that I do not have full control over.

That's quite a lot

JensBNielsen wrote:PS. You did not spot why I have to alloce twice as much memory for hashtables than my program uses?!

I don't know why you *have* to, and maybe this is because I don't know exactly what your variables mean. Could you please explain these: "farp", "compa", "hash", "hashint" ? What I know from the code is this:

- "farp" is a dynamically allocated data block of currenty 20MB size, represented as an array of integer (int * farp).

- "compa" is an array of currently 2^19 entries of type hash_struct where each entry *might* have a size of 20 bytes (but this depends on structure alignment of the compiler and on sizes of int and long). If you assume that the size of an int is 2 (i.e., a short int) then the struct has size 16 but today an int has typically 4 bytes.

- "hash" is a single struct of type hash_struct. In function read_hash() it is assigned the value compa[key] but then its first 16 bytes are immediately overwritten by some data from "farp". In function write_hash(), "hash" gets copied both into compa[key] and "farp".

- "hashint" seems to serve as an intermediate container for conversion only.

Now why are there both "compa" and "farp", and what about these funny size differences? Why gets "hash" overwritten within read_hash(), but only the first 16 bytes (assuming a hash_struct is bigger)?

Your turn ...

Sven

Sven · Post by **Sven** » Mon Jun 15, 2009 9:10 pm

Let me guess what you might need to do for your hashtable.

Define your hash entry properly. Here I assume you want each entry to have a size of 16 bytes, not 18 or 20, so I would use a short int for the score, assuming it fits into the 16 bit range.

Instead of "compa" and "farp" have only one table, allocate it with malloc(), and use it like a normal array. No conversion to and from integers needed IMO.

Keep it as simple as possible.

Code: Select all

// ...
#include <assert.h>
// ...

typedef struct &#123;
    unsigned long int hash_id;
    unsigned long int hash_idx;
    signed short int  hash_score;   
    signed char       hash_fr;
    signed char       hash_to;
    signed char       hash_score_final; // 0=not final   1=final   2=vurdering&#40;) 
    signed char       hash_depth;
    signed char       hash_ply;
    unsigned char     hash_mpc;
&#125; hash_struct;

// example
#define TOTAL_HASH_SIZE_MB    &#40;64&#41;
#define TOTAL_HASH_SIZE_BYTES &#40;1024*1024*TOTAL_HASH_SIZE_MB&#41;
#define NUMBER_HASHELEMS      &#40;TOTAL_HASH_SIZE_BYTES / sizeof&#40;hash_struct&#41;)

hash_struct * hash_table = 0;
hash_struct hash;

// ...
unsigned long int hash_maxindex=NUMBER_HASHELEMS-1, // ...
// ...

void read_hash&#40;unsigned long int idx&#41;
&#123;
    assert&#40;idx<=hash_maxindex&#41;;
    hash=hash_table&#91;idx&#93;;
&#125;

void write_hash&#40;unsigned long int idx&#41;
&#123;
    assert&#40;idx<=hash_maxindex&#41;;
    if &#40;swhash>0&#41; &#123;
        hash_table&#91;idx&#93;=hash;
    &#125;
&#125;

// ...

int CDECL main&#40;)
&#123;
    // ...
    // verify assumptions and consistent definitions
    assert&#40;sizeof&#40;hash_struct&#41; equal 16&#41;;
    assert&#40;TOTAL_HASH_SIZE_BYTES equal NUMBER_HASHELEMS * sizeof&#40;hash_struct&#41;);

    hash_table = malloc&#40;TOTAL_HASH_SIZE_BYTES&#41;;
    assert&#40;hash_table != 0&#41;;

    printf&#40;"allocated %lu bytes for hashtables\n", TOTAL_HASH_SIZE_BYTES&#41;;

    // ...
    free&#40;hash_table&#41;;
    return 0;
&#125;

Sven

JensBNielsen · Post by **JensBNielsen** » Mon Jun 15, 2009 11:35 pm

Hi again

The reason for I *had to* allocate more memory than I needed was that Dabbaba simply crashed if I did not allocate so much.

My Pelle-compiler says that sizeof(hash) is 20.... Perhaps that is a part of the problem, if other parts of the program assumes the length is 16...
Perhaps too little memory is copied; or perhaps too much so the element next to the current is violated...
But my check-hashtables do not report any error.

Here we are back to my original post. This hash-code is something I have been given and I do not fully understand all the commands.

Your code looks more straight and clean, but there still are some lines I am not familiar with.
See below - I assume that assert(idx<=hash_maxindex) is a kind of shifting used instead of an AND.
But I do not have to understand everything in detail.
As I read your code the length of an element has to be a power of two...

So I guess I should try your code

)

Thanks for your help!

Best, Jens

Sven Schüle wrote:
hash_struct * hash_table = 0;
hash_struct hash;

int CDECL main()
{
// ...
// verify assumptions and consistent definitions
assert(sizeof(hash_struct) equal 16);
assert(TOTAL_HASH_SIZE_BYTES equal NUMBER_HASHELEMS * sizeof(hash_struct));

hash_table = malloc(TOTAL_HASH_SIZE_BYTES);
assert(hash_table != 0);
}[/code]

Sven

ffao · Post by **ffao** » Tue Jun 16, 2009 2:50 am

JensBNielsen wrote: See below - I assume that assert(idx<=hash_maxindex) is a kind of shifting used instead of an AND.

No, that isn't shifting. assert() is a macro used to verify, when in debug mode, that certain conditions are true. In that case, Sven used it to verify that the index you get is not larger (<=: less than or equal to) than the hash array -- if it were, you would read into memory outside the hash table.

I'll just second what Sven said here. If you're not familiar with assert, it's a good time to get used to it. Otherwise, you may get a program that reads into uninitialized memory (aka Bubble) and you'll have no idea where to start debugging.

(And if you want to ask how I know that Bubble reads into uninitialized memory, it's because it behaves different under different compilers and debug/release modes.)

Felipe

I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables

Re: I want experts to look at my c-source for hashtables