Profile data request

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Profile data request

Post by Kempelen »

I want to compare my engine profile data with others. Would someone be so kind to post profile data of strong engines?
thx.
Fermín
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
User avatar
hgm
Posts: 28359
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Profile data request

Post by hgm »

I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Re: Profile data request

Post by Kempelen »

That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).

I use a trick. If I am profiling I dont call beginthread, but call the thread function directly. Of course I can not use the gui, but gprof show correct data:

I use this:

Code: Select all

#ifndef PROFILE_MINGW
	if ( _beginthread ( pensar, 0 , NULL ) == -1 ) {
#else
  if (1==0) { // Para hacer un profile con MINGW
#endif
    Error ( "Error: Can not create main thinking thread." );
		Salir();
  }
#endif
.....
    #ifdef PROFILE_MINGW
    procesarEPD("a epd string");
    fin_pensar = FALSE;
    color_Rodin = n->posicion.turno;
    nivel = NIVEL_TIEMPO_POR_MOVIMIENTO;
    tiempo_a_usar = 60 * 100;
    pensar();
    return;
    #endif

Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Re: Profile data request

Post by Kempelen »

forgot to say that I have my own profile system. It is based on the following:

1) I call PROFILE_START function each time I enter in a function. PROFILE_START need a tag, which I put the name of the function
2) Each time I return from a function, I call PROFILE_STOP with the tag of the function that has ended
3) PROFILE_START and PROFILE_STOP act like a "timer on" and "timer off".
4) If call to PROFILE_START and there are a previous time running, I stop it and put in a stack. When returning I get from the stack and put it on again.
5) When I want, I call PROFILE_PRINT to show profile data (I show and example at the end of the post).
6) It is a slow system, but as it is only for profile purposes it is useful. Also I have seen is very accurate, although not perfect.
7) I use a #define tag to activate this system. If not tag no code is generated.

Anyway, there a lot of issues to take into account, like running the application in only one CPU. As I said, my system is enought to profile a chess engine, although not perfect. If you are interested in knowing more issues about problems of profiling multithread programs and this system, take a look at:
http://cboard.cprogramming.com/showthre ... ht=profile

Now an example of the output after running 25 different position at "sd 5". Function names are in Spanish, sorry......

Code: Select all

+------------+------------+------------+-------+------------------------------+
| Calls (xK) | Total Ms.  | Ms.Child   |  %    | Tag                          |
+------------+------------+------------+-------+------------------------------+
| 0000007188 | 0000002505 | 0000015999 |  7.3% | hacerMovimiento()            |
| 0000020204 | 0000002355 | 0000002355 |  6.9% | atacantes()                  |
| 0000025692 | 0000002171 | 0000002171 |  6.3% | movilidadPieza()             |
| 0000000630 | 0000002159 | 0000039901 |  6.3% | evaluarMedioJuego()          |
| 0000000727 | 0000001998 | 0000111875 |  5.8% | busqueda()                   |
| 0000001497 | 0000001996 | 0000019767 |  5.8% | calcularAtaquesYMovilidad()  |
| 0000007678 | 0000001830 | 0000024797 |  5.3% | ordenarMovimiento()          |
| 0000000866 | 0000001527 | 0000026994 |  4.4% | evaluarFinal()               |
| 0000003467 | 0000001427 | 0000020102 |  4.2% | SEE()                        |
| 0000010761 | 0000001379 | 0000001379 |  4.0% | estaAtacada()                |
| 0000007188 | 0000001357 | 0000001357 |  4.0% | memcpy_hacerMovimiento()     |
| 0000001964 | 0000001176 | 0000001176 |  3.4% | situacionPiezas()            |
| 0000000776 | 0000001067 | 0000001067 |  3.1% | generarMovimientos()         |
| 0000002688 | 0000000983 | 0000070027 |  2.9% | quiesce()                    |
| 0000002831 | 0000000849 | 0000049512 |  2.5% | evaluar()                    |
| 0000003984 | 0000000803 | 0000000803 |  2.3% | piezaClava()                 |
| 0000002479 | 0000000738 | 0000010312 |  2.2% | SEE_SQ()                     |
| 0000006778 | 0000000704 | 0000006438 |  2.1% | evalPeonNegro()              |
| 0000010789 | 0000000703 | 0000000703 |  2.0% | menorAtacante()              |
| 0000006564 | 0000000681 | 0000005923 |  2.0% | evalPeonBlanco()             |
| 0000008528 | 0000000658 | 0000002117 |  1.9% | esJaque()                    |
| 0000003603 | 0000000539 | 0000000539 |  1.6% | actualizarHash()             |
| 0000000630 | 0000000515 | 0000006948 |  1.5% | examinarColumnas()           |
| 0000006936 | 0000000410 | 0000000410 |  1.2% | filaPeonNegro()              |
| 0000006936 | 0000000406 | 0000000406 |  1.2% | filaPeonBlanco()             |
| 0000000644 | 0000000356 | 0000000356 |  1.0% | generarCapturas()            |
| 0000007209 | 0000000349 | 0000000349 |  1.0% | deshacerMovimiento()         |
| 0000002962 | 0000000344 | 0000003656 |  1.0% | extensiones()                |
| 0000005663 | 0000000298 | 0000000298 |  0.9% | evaluarVentajaMaterial()     |
| 0000003603 | 0000000290 | 0000000290 |  0.8% | fasePartida()                |
| 0000000000 | 0000000288 | 0000000304 |  0.8% | mostrarVariantePrincipal()   |
| 0000002831 | 0000000247 | 0000000247 |  0.7% | evaluarCambios()             |
| 0000002986 | 0000000221 | 0000000221 |  0.6% | esTablasLegales()            |
| 0000000199 | 0000000170 | 0000003501 |  0.5% | quiesce_root()               |
| 0000000630 | 0000000154 | 0000001313 |  0.4% | escudoPeonesBlancos()        |
| 0000000630 | 0000000155 | 0000001391 |  0.5% | escudoPeonesNegros()         |
| 0000002222 | 0000000119 | 0000000226 |  0.3% | reducciones()                |
| 0000002017 | 0000000102 | 0000000102 |  0.3% | alfilAtrapado()              |
| 0000001261 | 0000000071 | 0000000071 |  0.2% | escalarAtaque()              |
| 0000000850 | 0000000061 | 0000000061 |  0.2% | probarHash()                 |
| 0000000850 | 0000000047 | 0000000047 |  0.1% | guardarHash()                |
| 0000000866 | 0000000040 | 0000000040 |  0.1% | esTablasNoLegales()          |
| 0000000023 | 0000000020 | 0000000020 |  0.1% | evaluarApertura()            |
| 0000000000 | 0000000012 | 0000153043 |  0.0% | pensar()                     |
| 0000000027 | 0000000009 | 0000000045 |  0.0% | hacerPromocion()             |
| 0000000000 | 0000000004 | 0000146645 |  0.0% | busqueda_raiz                |
| 0000000004 | 0000000004 | 0000000029 |  0.0% | strMovimientoSAN()           |
| 0000000021 | 0000000004 | 0000000004 |  0.0% | hacerMovimientoNulo()        |
| 0000000000 | 0000000001 | 0000000018 |  0.0% | strVariantePrincipal()       |
| 0000000000 | 0000000001 | 0000010471 |  0.0% | movimientosRaiz()            |
| 0000000021 | 0000000001 | 0000000001 |  0.0% | peonPasado()                 |
| 0000000000 | 0000000002 | 0000000002 |  0.0% | inicializarHash()            |
| 0000000028 | 0000000001 | 0000000001 |  0.0% | R()                          |
| 0000000006 | 0000000001 | 0000000001 |  0.0% | ordenarMovimientosRaiz()     |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | inicializarKillers           |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | controlTiempoRaiz()          |
| 0000000000 | 0000000000 | 0000000025 |  0.0% | generarMovsLegales()         |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | hash_vp()                    |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | pulsarReloj()                |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | parsearMovimientoSAN()       |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | parsearMovimiento()          |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | obtenerHash()                |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | controlTiempo()              |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | hacerAlPaso()                |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | comprobarHistorico()         |
+------------+------------+------------+-------+------------------------------+
another output again, but sorted by accumulated time per function (Ms Child):

Code: Select all

+------------+------------+------------+-------+------------------------------+
| Calls (xK) | Total Ms.  | Ms.Child   |  %    | Tag                          |
+------------+------------+------------+-------+------------------------------+
| 0000000000 | 0000000012 | 0000153043 |  0.0% | pensar()                     |
| 0000000000 | 0000000004 | 0000146645 |  0.0% | busqueda_raiz                |
| 0000000727 | 0000001998 | 0000111875 |  5.8% | busqueda()                   |
| 0000002688 | 0000000983 | 0000070027 |  2.9% | quiesce()                    |
| 0000002831 | 0000000849 | 0000049512 |  2.5% | evaluar()                    |
| 0000000630 | 0000002159 | 0000039901 |  6.3% | evaluarMedioJuego()          |
| 0000000866 | 0000001527 | 0000026994 |  4.4% | evaluarFinal()               |
| 0000007678 | 0000001830 | 0000024797 |  5.3% | ordenarMovimiento()          |
| 0000003467 | 0000001427 | 0000020102 |  4.2% | SEE()                        |
| 0000001497 | 0000001996 | 0000019767 |  5.8% | calcularAtaquesYMovilidad()  |
| 0000007188 | 0000002505 | 0000015999 |  7.3% | hacerMovimiento()            |
| 0000000000 | 0000000001 | 0000010471 |  0.0% | movimientosRaiz()            |
| 0000002479 | 0000000738 | 0000010312 |  2.2% | SEE_SQ()                     |
| 0000000630 | 0000000515 | 0000006948 |  1.5% | examinarColumnas()           |
| 0000006778 | 0000000704 | 0000006438 |  2.1% | evalPeonNegro()              |
| 0000006564 | 0000000681 | 0000005923 |  2.0% | evalPeonBlanco()             |
| 0000002962 | 0000000344 | 0000003656 |  1.0% | extensiones()                |
| 0000000199 | 0000000170 | 0000003501 |  0.5% | quiesce_root()               |
| 0000020204 | 0000002355 | 0000002355 |  6.9% | atacantes()                  |
| 0000025692 | 0000002171 | 0000002171 |  6.3% | movilidadPieza()             |
| 0000008528 | 0000000658 | 0000002117 |  1.9% | esJaque()                    |
| 0000000630 | 0000000155 | 0000001391 |  0.5% | escudoPeonesNegros()         |
| 0000010761 | 0000001379 | 0000001379 |  4.0% | estaAtacada()                |
| 0000007188 | 0000001357 | 0000001357 |  4.0% | memcpy_hacerMovimiento()     |
| 0000000630 | 0000000154 | 0000001313 |  0.4% | escudoPeonesBlancos()        |
| 0000001964 | 0000001176 | 0000001176 |  3.4% | situacionPiezas()            |
| 0000000776 | 0000001067 | 0000001067 |  3.1% | generarMovimientos()         |
| 0000003984 | 0000000803 | 0000000803 |  2.3% | piezaClava()                 |
| 0000010789 | 0000000703 | 0000000703 |  2.0% | menorAtacante()              |
| 0000003603 | 0000000539 | 0000000539 |  1.6% | actualizarHash()             |
| 0000006936 | 0000000410 | 0000000410 |  1.2% | filaPeonNegro()              |
| 0000006936 | 0000000406 | 0000000406 |  1.2% | filaPeonBlanco()             |
| 0000000644 | 0000000356 | 0000000356 |  1.0% | generarCapturas()            |
| 0000007209 | 0000000349 | 0000000349 |  1.0% | deshacerMovimiento()         |
| 0000000000 | 0000000288 | 0000000304 |  0.8% | mostrarVariantePrincipal()   |
| 0000005663 | 0000000298 | 0000000298 |  0.9% | evaluarVentajaMaterial()     |
| 0000003603 | 0000000290 | 0000000290 |  0.8% | fasePartida()                |
| 0000002831 | 0000000247 | 0000000247 |  0.7% | evaluarCambios()             |
| 0000002222 | 0000000119 | 0000000226 |  0.3% | reducciones()                |
| 0000002986 | 0000000221 | 0000000221 |  0.6% | esTablasLegales()            |
| 0000002017 | 0000000102 | 0000000102 |  0.3% | alfilAtrapado()              |
| 0000001261 | 0000000071 | 0000000071 |  0.2% | escalarAtaque()              |
| 0000000850 | 0000000061 | 0000000061 |  0.2% | probarHash()                 |
| 0000000850 | 0000000047 | 0000000047 |  0.1% | guardarHash()                |
| 0000000027 | 0000000009 | 0000000045 |  0.0% | hacerPromocion()             |
| 0000000866 | 0000000040 | 0000000040 |  0.1% | esTablasNoLegales()          |
| 0000000004 | 0000000004 | 0000000029 |  0.0% | strMovimientoSAN()           |
| 0000000000 | 0000000000 | 0000000025 |  0.0% | generarMovsLegales()         |
| 0000000023 | 0000000020 | 0000000020 |  0.1% | evaluarApertura()            |
| 0000000000 | 0000000001 | 0000000018 |  0.0% | strVariantePrincipal()       |
| 0000000021 | 0000000004 | 0000000004 |  0.0% | hacerMovimientoNulo()        |
| 0000000028 | 0000000001 | 0000000001 |  0.0% | R()                          |
| 0000000006 | 0000000001 | 0000000001 |  0.0% | ordenarMovimientosRaiz()     |
| 0000000000 | 0000000002 | 0000000002 |  0.0% | inicializarHash()            |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | hacerAlPaso()                |
| 0000000021 | 0000000001 | 0000000001 |  0.0% | peonPasado()                 |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | hash_vp()                    |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | pulsarReloj()                |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | parsearMovimientoSAN()       |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | parsearMovimiento()          |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | obtenerHash()                |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | controlTiempo()              |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | inicializarKillers           |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | controlTiempoRaiz()          |
| 0000000000 | 0000000000 | 0000000000 |  0.0% | comprobarHistorico()         |
+------------+------------+------------+-------+------------------------------+
do you find samething showy?
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
User avatar
hgm
Posts: 28359
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Profile data request

Post by hgm »

Kempelen wrote:That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).
Problem is that I do not use threads at all. (Well, I guess you could call that a single thread...) Joker is a simple console I/O program that uses a PeekNamedPipe call to periodically check for pending input that should terminate pondering.

I find this (just for a 12-ply search from the opening):

Code: Select all

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00 24421188     0.00     0.00  SEE
  0.00      0.00     0.00 14154977     0.00     0.00  Evaluate
  0.00      0.00     0.00  4646760     0.00     0.00  capt_gen
  0.00      0.00     0.00  3208159     0.00     0.00  capturable
  0.00      0.00     0.00  1060358     0.00     0.00  move_gen
  0.00      0.00     0.00    14022     0.00     0.00  make_list
  0.00      0.00     0.00      486     0.00     0.00  GetTime
  0.00      0.00     0.00      188     0.00     0.00  CheckMarks
  0.00      0.00     0.00       29     0.00     0.00  PrintPV
  0.00      0.00     0.00        5     0.00     0.00  key_init
  0.00      0.00     0.00        4     0.00     0.00  PrintResult
  0.00      0.00     0.00        3     0.00     0.00  delta_init
  0.00      0.00     0.00        2     0.00     0.00  ReadFEN
  0.00      0.00     0.00        2     0.00     0.00  Search
  0.00      0.00     0.00        2     0.00     0.00  pack
  0.00      0.00     0.00        1     0.00     0.00  MakeMove
  0.00      0.00     0.00        1     0.00     0.00  ReadMove
  0.00      0.00     0.00        1     0.00     0.00  makeup_init
  0.00      0.00     0.00        1     0.00     0.00  pboard
  0.00      0.00     0.00        1     0.00     0.00  piece_init
  0.00      0.00     0.00        1     0.00     0.00  unbuffered_input
Aleks Peshkov
Posts: 900
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Profile data request

Post by Aleks Peshkov »

There are two places (compiler and linker) in makefile where you need to insert "-pg" switches in GCC. Sorry, cannot be more specific, because I have not GCC handy.
User avatar
Roman Hartmann
Posts: 295
Joined: Wed Mar 08, 2006 8:29 pm

Re: Profile data request

Post by Roman Hartmann »

hgm wrote:I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
I'm on a Mac here but I used to have the same problem. I'm using Shark now as I cannot use gprof.

Roman
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Profile data request

Post by bob »

Kempelen wrote:That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).

I use a trick. If I am profiling I dont call beginthread, but call the thread function directly. Of course I can not use the gui, but gprof show correct data:

I use this:

Code: Select all

#ifndef PROFILE_MINGW
	if ( _beginthread ( pensar, 0 , NULL ) == -1 ) {
#else
  if (1==0) { // Para hacer un profile con MINGW
#endif
    Error ( "Error: Can not create main thinking thread." );
		Salir();
  }
#endif
.....
    #ifdef PROFILE_MINGW
    procesarEPD("a epd string");
    fin_pensar = FALSE;
    color_Rodin = n->posicion.turno;
    nivel = NIVEL_TIEMPO_POR_MOVIMIENTO;
    tiempo_a_usar = 60 * 100;
    pensar();
    return;
    #endif

I use both gcc and icc and both work just fine with threads for me. I even use PGO and run a multi-thread test set so that the parallel search code also gets profiled / optimized.
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Profile data request

Post by Tord Romstad »

Kempelen wrote:I want to compare my engine profile data with others. Would someone be so kind to post profile data of strong engines?
thx.
Fermín
Glaurung (all functions with at least 1.0%):

Code: Select all

evaluate():                             31.9%
MovePicker::pick_move_from_list():       8.0%
search():                                5.3%
do_move():                               5.0%
see():                                   4.5%
TranspositionTable::Retrieve():          4.4%
qsearch():                               4.4%
Position::move_is_check():               4.0%
MovePicker::get_next_move():             4.0%
generate_captures():                     4.0%
pop_1st_bit():                           2.6%
Position::undo_move():                   2.2%
Position::discovered_check_candidates(): 2.1%
init_node:                               1.9%
PawnInfoTable::get_pawn_info():          1.8%
generate_evasions():                     1.7%
generate_checks():                       1.2%
Position::move_is_legal():               1.2%
Position::pinned_pieces():               1.1%
extension():                             1.0%
Tord
Greg McGlynn

Re: Profile data request

Post by Greg McGlynn »

hgm wrote:I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
I had this problem too; it seems that there are bugs:

http://www.nabble.com/Cygwin-1.5.25:-gp ... 21057.html

I fixed the problem by replacing libgmon.a as mentioned in the linked thread.