Wouldn't it be nice if C++ GPU

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
chrisw
Posts: 1552
Joined: Tue Apr 03, 2012 2:28 pm

Wouldn't it be nice if C++ GPU

Post by chrisw » Thu Apr 25, 2019 10:49 am

Wouldn't it be nice to have a C++ header file which supported:

model = LoadTrainedModelFromFile(filename); // model and weights, saved in some appropriate format from Python

results = model.predict(inputs); // using GPU

Rémi Coulom
Posts: 426
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom » Thu Apr 25, 2019 11:59 am

I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).

Maybe other frameworks have better C++ support.

At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.

If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.

I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.

jdart
Posts: 3751
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Wouldn't it be nice if C++ GPU

Post by jdart » Thu Apr 25, 2019 12:51 pm

Caffe (https://github.com/BVLC/caffe) supports C++ - it is apparently the main language, Python is a binding. I don't know if it does quite what you need though.

--Jon

chrisw
Posts: 1552
Joined: Tue Apr 03, 2012 2:28 pm

Re: Wouldn't it be nice if C++ GPU

Post by chrisw » Thu Apr 25, 2019 1:54 pm

Daniel has some code here https://github.com/dshawul/egbbdll/blob ... val_nn.cpp

which includes what looks like load model, UFF format:

Code: Select all

void TrtModel::LoadGraph(const string& uff_file_name, int dev_id, int dev_type) {
    std::string dev_name = ((dev_type == GPU) ? "/gpu:" : "/cpu:") + std::to_string(dev_id);
    printf("Loading graph on %s\n",dev_name.c_str());
    fflush(stdout);

    Model::id = dev_id;
    cudaSetDevice(Model::id);

and so on ......
and what looks like a predict ...

Code: Select all

void TrtModel::predict() {

    cudaSetDevice(Model::id);

    context->execute(BATCH_SIZE, buffers.data());

    if(nn_type == DEFAULT || nn_type == SIMPLE) {
        for(int i = 0;i < n_batch;i++) {
            float p = buffers_h[valuei][3*i+0] * 1.0 + buffers_h[valuei][3*i+1] * 0.5;
            scores[i] = logit(p);
            
and so on ......
it seems to need various support includes, and is not exactly easy to work out what is going on. I would guess the CUDA support is ongoing, because it is a pretty obvious thing for many people (not just games) to want to get the predictor into C++ for ongoing apps and lose the Python requirement at runtime.

Daniel Shawul
Posts: 3657
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Wouldn't it be nice if C++ GPU

Post by Daniel Shawul » Thu Apr 25, 2019 4:04 pm

egbbdll is very easy to use because it was designed for probing endgame bitbases originally.
You could essentially do probe(FEN_string) and get value and policy results.
How and where it is evaluated the user doesn't need to know, but ofcourse it can use both CPU/GPU.
Both Tensorflow & TensoRT are supported which can use cuDNN so ofcourse it can use CUDA too.
Lc0 explicitly wrote cuda code for the backend but I am getting equal nps using TenorRT.
Moreover, one can use INT8 and maybe INT4. So writing backend code when there is a flora of deep learning
libraries is a futile endevour IMHO.

This is the actual code I use for probing bitbases and neural network. It has become a little cumbersome
after I added policy head but still easy to use. You populate your pieces, and feed history info (for lczero nets)
and just probe. The egbbdll takes care of "batching" with multi-thread approach, and caching as well.

Code: Select all

/*
Probe:
Change interanal scorpio board representaion to [A1 = 0 ... H8 = 63]
board representation and then probe bitbase.
*/

void SEARCHER::fill_list(int& count, int* piece, int* square) {
    PLIST current;

#define ADD_PIECE(list,type) {                  \
       current = list;                          \
       while(current) {                         \
          piece[count] = type;                  \
          square[count] = SQ8864(current->sq);  \
          count++;                              \
          current = current->next;              \
       }                                        \
    };
    ADD_PIECE(plist[wking],_WKING);
    ADD_PIECE(plist[bking],_BKING);
    ADD_PIECE(plist[wqueen],_WQUEEN);
    ADD_PIECE(plist[bqueen],_BQUEEN);
    ADD_PIECE(plist[wrook],_WROOK);
    ADD_PIECE(plist[brook],_BROOK);
    ADD_PIECE(plist[wbishop],_WBISHOP);
    ADD_PIECE(plist[bbishop],_BBISHOP);
    ADD_PIECE(plist[wknight],_WKNIGHT);
    ADD_PIECE(plist[bknight],_BKNIGHT);
    ADD_PIECE(plist[wpawn],_WPAWN);
    ADD_PIECE(plist[bpawn],_BPAWN);
    piece[count] = _EMPTY;
    square[count] = SQ8864(epsquare);
    count++;
}

int SEARCHER::probe_bitbases(int& score) {
#ifdef EGBB
    int piece[MAX_PIECES],square[MAX_PIECES],count = 0;
    fill_list(count,piece,square);
    score = probe_egbb(player,piece,square);
    if(score != _NOTFOUND)
        return true;
#endif
    return false;
}

int SEARCHER::probe_neural(bool hard_probe) {
#ifdef EGBB
    UBMP64 hkey = ((player == white) ? hash_key : 
             (hash_key ^ UINT64(0x2bc3964f82352234)));

    int moves[3*MAX_MOVES];
    int *s = moves;
    for(int i = 0; i < pstack->count; i++) {
        MOVE& m = pstack->move_st[i];
        int from = m_from(m), to = m_to(m);
        if(is_castle(m)) {
            if(to > from) to++;
            else to -= 2;
        }
        *s++ = SQ8864(from);
        *s++ = SQ8864(to); 
        *s++ = m_promote(m);
    }
    *s++ = -1;

    nnecalls++;
    if(nn_type == 0) {
        int piece[33],square[33],isdraw[1];
        int count = 0, hist = 1;
        fill_list(count,piece,square);

        return probe_nn(player,castle,fifty,hist,isdraw,piece,square,moves,
            (float*)pstack->score_st,pstack->count,hkey,hard_probe);
    } else {

        int piece[8*33],square[8*33],isdraw[8];
        int count = 0, hist = 0, phply = hply;
        
        for(int i = 0; i < 8; i++) {
            isdraw[hist++] = draw();
            fill_list(count,piece,square);

            if(hply > 0 && hstack[hply - 1].move) 
                POP_MOVE();
            else break;
        }

        count = phply - hply;
        for(int i = 0; i < count; i++)
            PUSH_MOVE(hstack[hply].move);

        if(isdraw[0])
            hkey ^= UINT64(0xc7e9153edee38dcb);
        hkey ^= fifty_hkey[fifty];

        return probe_nn(player,castle,fifty,hist,isdraw,piece,square,moves,
            (float*)pstack->score_st,pstack->count,hkey,hard_probe);
    }
#endif
    return 0;
}

void PROCESSOR::set_num_searchers() {
#ifdef EGBB
    if(SEARCHER::use_nn && set_num_active_searchers) {
        int n_searchers = n_processors - n_idle_processors;
        set_num_active_searchers(n_searchers);
    }
#endif
}
Daniel

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

Re: Wouldn't it be nice if C++ GPU

Post by Rein Halbersma » Thu Apr 25, 2019 4:58 pm

Rémi Coulom wrote:
Thu Apr 25, 2019 11:59 am
I developed my own home-made C++ deep-learning framework just to be able to do that. I used tensorflow for a while, but it was too painful to use from C++. What you describe can be done with tensorflow, but last time I tried, I had to use undocumented/unsupported features of the low-level C++ tensorflow library, and it was really unpleasant (having to compile the library from source with bazel, ...).

Maybe other frameworks have better C++ support.

At the moment, I am using some simple C++ classes on top of CuDNN. I don't have autodiff, but manually calculating a gradient is not such a big deal in my opinion. I am thinking about compile-time autodiff with template meta-programming.

If people here want help about how to use tensorflow from C++ with a nVidia GPU, I could explain a little how I did it. It is not very difficult to do, but it is not documented.

I applied to the Tensorflow Research Cloud, and was accepted. This gives me access to 100 TPUs for one month. After days of trying to use a TPU from C++, I gave up trying. I am sure it is doable, but there is no documentation at all.
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md

Rémi Coulom
Posts: 426
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom » Thu Apr 25, 2019 5:18 pm

Rein Halbersma wrote:
Thu Apr 25, 2019 4:58 pm
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.

By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi

smatovic
Posts: 715
Joined: Wed Mar 10, 2010 9:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic
Contact:

Re: Wouldn't it be nice if C++ GPU

Post by smatovic » Thu Apr 25, 2019 5:38 pm

Rémi Coulom wrote:
Thu Apr 25, 2019 5:18 pm
...
By the way, has anybody here tried to code a fast convolution in cuda directly? I will probably try soon. My impression is that the performance of cuDNN is very bad for small batches. Good performance with small batches is important for tree search.

Rémi
https://github.com/ankan-ban/ConvTest

--
Srdja

Rein Halbersma
Posts: 685
Joined: Tue May 22, 2007 9:13 am

Re: Wouldn't it be nice if C++ GPU

Post by Rein Halbersma » Thu Apr 25, 2019 6:23 pm

Rémi Coulom wrote:
Thu Apr 25, 2019 5:18 pm
Rein Halbersma wrote:
Thu Apr 25, 2019 4:58 pm
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.
That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.

Rémi Coulom
Posts: 426
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: Wouldn't it be nice if C++ GPU

Post by Rémi Coulom » Thu Apr 25, 2019 6:48 pm

Rein Halbersma wrote:
Thu Apr 25, 2019 6:23 pm
Rémi Coulom wrote:
Thu Apr 25, 2019 5:18 pm
Rein Halbersma wrote:
Thu Apr 25, 2019 4:58 pm
LeelaChessZero uses the 3rd party tensorflow_cc wrapper library around the official Tensorflow C++ API, to avoid the Bazel build stuff. See https://github.com/LeelaChessZero/lc0/b ... sorflow.md
Thanks for the link. Bazel is still necessary to build the library itself. This is in fact what I had managed to do by myself. It is still very unpleasant to do.
That's not what tensorflow_cc advertises: https://github.com/FloopCZ/tensorflow_cc
This repository makes possible the usage of the TensorFlow C++ API from the outside of the TensorFlow source code folders and without the use of the Bazel build system.
This means that you don't need bazel to build your own code, but you need it to build the tensorflow library:
If you require GPU support on Ubuntu, please also install Bazel
(from https://github.com/FloopCZ/tensorflow_cc)

Post Reply