I want to compare my engine profile data with others. Would someone be so kind to post profile data of strong engines?
thx.
Fermín
Profile data request
Moderator: Ras
-
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
-
- Posts: 28359
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Profile data request
I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
-
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
Re: Profile data request
That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).
I use a trick. If I am profiling I dont call beginthread, but call the thread function directly. Of course I can not use the gui, but gprof show correct data:
I use this:
I use a trick. If I am profiling I dont call beginthread, but call the thread function directly. Of course I can not use the gui, but gprof show correct data:
I use this:
Code: Select all
#ifndef PROFILE_MINGW
if ( _beginthread ( pensar, 0 , NULL ) == -1 ) {
#else
if (1==0) { // Para hacer un profile con MINGW
#endif
Error ( "Error: Can not create main thinking thread." );
Salir();
}
#endif
.....
#ifdef PROFILE_MINGW
procesarEPD("a epd string");
fin_pensar = FALSE;
color_Rodin = n->posicion.turno;
nivel = NIVEL_TIEMPO_POR_MOVIMIENTO;
tiempo_a_usar = 60 * 100;
pensar();
return;
#endif
-
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
Re: Profile data request
forgot to say that I have my own profile system. It is based on the following:
1) I call PROFILE_START function each time I enter in a function. PROFILE_START need a tag, which I put the name of the function
2) Each time I return from a function, I call PROFILE_STOP with the tag of the function that has ended
3) PROFILE_START and PROFILE_STOP act like a "timer on" and "timer off".
4) If call to PROFILE_START and there are a previous time running, I stop it and put in a stack. When returning I get from the stack and put it on again.
5) When I want, I call PROFILE_PRINT to show profile data (I show and example at the end of the post).
6) It is a slow system, but as it is only for profile purposes it is useful. Also I have seen is very accurate, although not perfect.
7) I use a #define tag to activate this system. If not tag no code is generated.
Anyway, there a lot of issues to take into account, like running the application in only one CPU. As I said, my system is enought to profile a chess engine, although not perfect. If you are interested in knowing more issues about problems of profiling multithread programs and this system, take a look at:
http://cboard.cprogramming.com/showthre ... ht=profile
Now an example of the output after running 25 different position at "sd 5". Function names are in Spanish, sorry......
another output again, but sorted by accumulated time per function (Ms Child):
do you find samething showy?
1) I call PROFILE_START function each time I enter in a function. PROFILE_START need a tag, which I put the name of the function
2) Each time I return from a function, I call PROFILE_STOP with the tag of the function that has ended
3) PROFILE_START and PROFILE_STOP act like a "timer on" and "timer off".
4) If call to PROFILE_START and there are a previous time running, I stop it and put in a stack. When returning I get from the stack and put it on again.
5) When I want, I call PROFILE_PRINT to show profile data (I show and example at the end of the post).
6) It is a slow system, but as it is only for profile purposes it is useful. Also I have seen is very accurate, although not perfect.
7) I use a #define tag to activate this system. If not tag no code is generated.
Anyway, there a lot of issues to take into account, like running the application in only one CPU. As I said, my system is enought to profile a chess engine, although not perfect. If you are interested in knowing more issues about problems of profiling multithread programs and this system, take a look at:
http://cboard.cprogramming.com/showthre ... ht=profile
Now an example of the output after running 25 different position at "sd 5". Function names are in Spanish, sorry......
Code: Select all
+------------+------------+------------+-------+------------------------------+
| Calls (xK) | Total Ms. | Ms.Child | % | Tag |
+------------+------------+------------+-------+------------------------------+
| 0000007188 | 0000002505 | 0000015999 | 7.3% | hacerMovimiento() |
| 0000020204 | 0000002355 | 0000002355 | 6.9% | atacantes() |
| 0000025692 | 0000002171 | 0000002171 | 6.3% | movilidadPieza() |
| 0000000630 | 0000002159 | 0000039901 | 6.3% | evaluarMedioJuego() |
| 0000000727 | 0000001998 | 0000111875 | 5.8% | busqueda() |
| 0000001497 | 0000001996 | 0000019767 | 5.8% | calcularAtaquesYMovilidad() |
| 0000007678 | 0000001830 | 0000024797 | 5.3% | ordenarMovimiento() |
| 0000000866 | 0000001527 | 0000026994 | 4.4% | evaluarFinal() |
| 0000003467 | 0000001427 | 0000020102 | 4.2% | SEE() |
| 0000010761 | 0000001379 | 0000001379 | 4.0% | estaAtacada() |
| 0000007188 | 0000001357 | 0000001357 | 4.0% | memcpy_hacerMovimiento() |
| 0000001964 | 0000001176 | 0000001176 | 3.4% | situacionPiezas() |
| 0000000776 | 0000001067 | 0000001067 | 3.1% | generarMovimientos() |
| 0000002688 | 0000000983 | 0000070027 | 2.9% | quiesce() |
| 0000002831 | 0000000849 | 0000049512 | 2.5% | evaluar() |
| 0000003984 | 0000000803 | 0000000803 | 2.3% | piezaClava() |
| 0000002479 | 0000000738 | 0000010312 | 2.2% | SEE_SQ() |
| 0000006778 | 0000000704 | 0000006438 | 2.1% | evalPeonNegro() |
| 0000010789 | 0000000703 | 0000000703 | 2.0% | menorAtacante() |
| 0000006564 | 0000000681 | 0000005923 | 2.0% | evalPeonBlanco() |
| 0000008528 | 0000000658 | 0000002117 | 1.9% | esJaque() |
| 0000003603 | 0000000539 | 0000000539 | 1.6% | actualizarHash() |
| 0000000630 | 0000000515 | 0000006948 | 1.5% | examinarColumnas() |
| 0000006936 | 0000000410 | 0000000410 | 1.2% | filaPeonNegro() |
| 0000006936 | 0000000406 | 0000000406 | 1.2% | filaPeonBlanco() |
| 0000000644 | 0000000356 | 0000000356 | 1.0% | generarCapturas() |
| 0000007209 | 0000000349 | 0000000349 | 1.0% | deshacerMovimiento() |
| 0000002962 | 0000000344 | 0000003656 | 1.0% | extensiones() |
| 0000005663 | 0000000298 | 0000000298 | 0.9% | evaluarVentajaMaterial() |
| 0000003603 | 0000000290 | 0000000290 | 0.8% | fasePartida() |
| 0000000000 | 0000000288 | 0000000304 | 0.8% | mostrarVariantePrincipal() |
| 0000002831 | 0000000247 | 0000000247 | 0.7% | evaluarCambios() |
| 0000002986 | 0000000221 | 0000000221 | 0.6% | esTablasLegales() |
| 0000000199 | 0000000170 | 0000003501 | 0.5% | quiesce_root() |
| 0000000630 | 0000000154 | 0000001313 | 0.4% | escudoPeonesBlancos() |
| 0000000630 | 0000000155 | 0000001391 | 0.5% | escudoPeonesNegros() |
| 0000002222 | 0000000119 | 0000000226 | 0.3% | reducciones() |
| 0000002017 | 0000000102 | 0000000102 | 0.3% | alfilAtrapado() |
| 0000001261 | 0000000071 | 0000000071 | 0.2% | escalarAtaque() |
| 0000000850 | 0000000061 | 0000000061 | 0.2% | probarHash() |
| 0000000850 | 0000000047 | 0000000047 | 0.1% | guardarHash() |
| 0000000866 | 0000000040 | 0000000040 | 0.1% | esTablasNoLegales() |
| 0000000023 | 0000000020 | 0000000020 | 0.1% | evaluarApertura() |
| 0000000000 | 0000000012 | 0000153043 | 0.0% | pensar() |
| 0000000027 | 0000000009 | 0000000045 | 0.0% | hacerPromocion() |
| 0000000000 | 0000000004 | 0000146645 | 0.0% | busqueda_raiz |
| 0000000004 | 0000000004 | 0000000029 | 0.0% | strMovimientoSAN() |
| 0000000021 | 0000000004 | 0000000004 | 0.0% | hacerMovimientoNulo() |
| 0000000000 | 0000000001 | 0000000018 | 0.0% | strVariantePrincipal() |
| 0000000000 | 0000000001 | 0000010471 | 0.0% | movimientosRaiz() |
| 0000000021 | 0000000001 | 0000000001 | 0.0% | peonPasado() |
| 0000000000 | 0000000002 | 0000000002 | 0.0% | inicializarHash() |
| 0000000028 | 0000000001 | 0000000001 | 0.0% | R() |
| 0000000006 | 0000000001 | 0000000001 | 0.0% | ordenarMovimientosRaiz() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | inicializarKillers |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | controlTiempoRaiz() |
| 0000000000 | 0000000000 | 0000000025 | 0.0% | generarMovsLegales() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | hash_vp() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | pulsarReloj() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | parsearMovimientoSAN() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | parsearMovimiento() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | obtenerHash() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | controlTiempo() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | hacerAlPaso() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | comprobarHistorico() |
+------------+------------+------------+-------+------------------------------+
Code: Select all
+------------+------------+------------+-------+------------------------------+
| Calls (xK) | Total Ms. | Ms.Child | % | Tag |
+------------+------------+------------+-------+------------------------------+
| 0000000000 | 0000000012 | 0000153043 | 0.0% | pensar() |
| 0000000000 | 0000000004 | 0000146645 | 0.0% | busqueda_raiz |
| 0000000727 | 0000001998 | 0000111875 | 5.8% | busqueda() |
| 0000002688 | 0000000983 | 0000070027 | 2.9% | quiesce() |
| 0000002831 | 0000000849 | 0000049512 | 2.5% | evaluar() |
| 0000000630 | 0000002159 | 0000039901 | 6.3% | evaluarMedioJuego() |
| 0000000866 | 0000001527 | 0000026994 | 4.4% | evaluarFinal() |
| 0000007678 | 0000001830 | 0000024797 | 5.3% | ordenarMovimiento() |
| 0000003467 | 0000001427 | 0000020102 | 4.2% | SEE() |
| 0000001497 | 0000001996 | 0000019767 | 5.8% | calcularAtaquesYMovilidad() |
| 0000007188 | 0000002505 | 0000015999 | 7.3% | hacerMovimiento() |
| 0000000000 | 0000000001 | 0000010471 | 0.0% | movimientosRaiz() |
| 0000002479 | 0000000738 | 0000010312 | 2.2% | SEE_SQ() |
| 0000000630 | 0000000515 | 0000006948 | 1.5% | examinarColumnas() |
| 0000006778 | 0000000704 | 0000006438 | 2.1% | evalPeonNegro() |
| 0000006564 | 0000000681 | 0000005923 | 2.0% | evalPeonBlanco() |
| 0000002962 | 0000000344 | 0000003656 | 1.0% | extensiones() |
| 0000000199 | 0000000170 | 0000003501 | 0.5% | quiesce_root() |
| 0000020204 | 0000002355 | 0000002355 | 6.9% | atacantes() |
| 0000025692 | 0000002171 | 0000002171 | 6.3% | movilidadPieza() |
| 0000008528 | 0000000658 | 0000002117 | 1.9% | esJaque() |
| 0000000630 | 0000000155 | 0000001391 | 0.5% | escudoPeonesNegros() |
| 0000010761 | 0000001379 | 0000001379 | 4.0% | estaAtacada() |
| 0000007188 | 0000001357 | 0000001357 | 4.0% | memcpy_hacerMovimiento() |
| 0000000630 | 0000000154 | 0000001313 | 0.4% | escudoPeonesBlancos() |
| 0000001964 | 0000001176 | 0000001176 | 3.4% | situacionPiezas() |
| 0000000776 | 0000001067 | 0000001067 | 3.1% | generarMovimientos() |
| 0000003984 | 0000000803 | 0000000803 | 2.3% | piezaClava() |
| 0000010789 | 0000000703 | 0000000703 | 2.0% | menorAtacante() |
| 0000003603 | 0000000539 | 0000000539 | 1.6% | actualizarHash() |
| 0000006936 | 0000000410 | 0000000410 | 1.2% | filaPeonNegro() |
| 0000006936 | 0000000406 | 0000000406 | 1.2% | filaPeonBlanco() |
| 0000000644 | 0000000356 | 0000000356 | 1.0% | generarCapturas() |
| 0000007209 | 0000000349 | 0000000349 | 1.0% | deshacerMovimiento() |
| 0000000000 | 0000000288 | 0000000304 | 0.8% | mostrarVariantePrincipal() |
| 0000005663 | 0000000298 | 0000000298 | 0.9% | evaluarVentajaMaterial() |
| 0000003603 | 0000000290 | 0000000290 | 0.8% | fasePartida() |
| 0000002831 | 0000000247 | 0000000247 | 0.7% | evaluarCambios() |
| 0000002222 | 0000000119 | 0000000226 | 0.3% | reducciones() |
| 0000002986 | 0000000221 | 0000000221 | 0.6% | esTablasLegales() |
| 0000002017 | 0000000102 | 0000000102 | 0.3% | alfilAtrapado() |
| 0000001261 | 0000000071 | 0000000071 | 0.2% | escalarAtaque() |
| 0000000850 | 0000000061 | 0000000061 | 0.2% | probarHash() |
| 0000000850 | 0000000047 | 0000000047 | 0.1% | guardarHash() |
| 0000000027 | 0000000009 | 0000000045 | 0.0% | hacerPromocion() |
| 0000000866 | 0000000040 | 0000000040 | 0.1% | esTablasNoLegales() |
| 0000000004 | 0000000004 | 0000000029 | 0.0% | strMovimientoSAN() |
| 0000000000 | 0000000000 | 0000000025 | 0.0% | generarMovsLegales() |
| 0000000023 | 0000000020 | 0000000020 | 0.1% | evaluarApertura() |
| 0000000000 | 0000000001 | 0000000018 | 0.0% | strVariantePrincipal() |
| 0000000021 | 0000000004 | 0000000004 | 0.0% | hacerMovimientoNulo() |
| 0000000028 | 0000000001 | 0000000001 | 0.0% | R() |
| 0000000006 | 0000000001 | 0000000001 | 0.0% | ordenarMovimientosRaiz() |
| 0000000000 | 0000000002 | 0000000002 | 0.0% | inicializarHash() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | hacerAlPaso() |
| 0000000021 | 0000000001 | 0000000001 | 0.0% | peonPasado() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | hash_vp() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | pulsarReloj() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | parsearMovimientoSAN() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | parsearMovimiento() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | obtenerHash() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | controlTiempo() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | inicializarKillers |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | controlTiempoRaiz() |
| 0000000000 | 0000000000 | 0000000000 | 0.0% | comprobarHistorico() |
+------------+------------+------------+-------+------------------------------+
-
- Posts: 28359
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Profile data request
Problem is that I do not use threads at all. (Well, I guess you could call that a single thread...) Joker is a simple console I/O program that uses a PeekNamedPipe call to periodically check for pending input that should terminate pondering.Kempelen wrote:That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).
I find this (just for a 12-ply search from the opening):
Code: Select all
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 24421188 0.00 0.00 SEE
0.00 0.00 0.00 14154977 0.00 0.00 Evaluate
0.00 0.00 0.00 4646760 0.00 0.00 capt_gen
0.00 0.00 0.00 3208159 0.00 0.00 capturable
0.00 0.00 0.00 1060358 0.00 0.00 move_gen
0.00 0.00 0.00 14022 0.00 0.00 make_list
0.00 0.00 0.00 486 0.00 0.00 GetTime
0.00 0.00 0.00 188 0.00 0.00 CheckMarks
0.00 0.00 0.00 29 0.00 0.00 PrintPV
0.00 0.00 0.00 5 0.00 0.00 key_init
0.00 0.00 0.00 4 0.00 0.00 PrintResult
0.00 0.00 0.00 3 0.00 0.00 delta_init
0.00 0.00 0.00 2 0.00 0.00 ReadFEN
0.00 0.00 0.00 2 0.00 0.00 Search
0.00 0.00 0.00 2 0.00 0.00 pack
0.00 0.00 0.00 1 0.00 0.00 MakeMove
0.00 0.00 0.00 1 0.00 0.00 ReadMove
0.00 0.00 0.00 1 0.00 0.00 makeup_init
0.00 0.00 0.00 1 0.00 0.00 pboard
0.00 0.00 0.00 1 0.00 0.00 piece_init
0.00 0.00 0.00 1 0.00 0.00 unbuffered_input
-
- Posts: 901
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Profile data request
There are two places (compiler and linker) in makefile where you need to insert "-pg" switches in GCC. Sorry, cannot be more specific, because I have not GCC handy.
-
- Posts: 295
- Joined: Wed Mar 08, 2006 8:29 pm
Re: Profile data request
I'm on a Mac here but I used to have the same problem. I'm using Shark now as I cannot use gprof.hgm wrote:I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
Roman
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Profile data request
I use both gcc and icc and both work just fine with threads for me. I even use PGO and run a multi-thread test set so that the parallel search code also gets profiled / optimized.Kempelen wrote:That is because you are using threads and gprof is not compatible wit threads. There is some workaround (http://sam.zoy.org/writings/programming/gprof.html).
I use a trick. If I am profiling I dont call beginthread, but call the thread function directly. Of course I can not use the gui, but gprof show correct data:
I use this:Code: Select all
#ifndef PROFILE_MINGW if ( _beginthread ( pensar, 0 , NULL ) == -1 ) { #else if (1==0) { // Para hacer un profile con MINGW #endif Error ( "Error: Can not create main thinking thread." ); Salir(); } #endif ..... #ifdef PROFILE_MINGW procesarEPD("a epd string"); fin_pensar = FALSE; color_Rodin = n->posicion.turno; nivel = NIVEL_TIEMPO_POR_MOVIMIENTO; tiempo_a_usar = 60 * 100; pensar(); return; #endif
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Profile data request
Glaurung (all functions with at least 1.0%):Kempelen wrote:I want to compare my engine profile data with others. Would someone be so kind to post profile data of strong engines?
thx.
Fermín
Code: Select all
evaluate(): 31.9%
MovePicker::pick_move_from_list(): 8.0%
search(): 5.3%
do_move(): 5.0%
see(): 4.5%
TranspositionTable::Retrieve(): 4.4%
qsearch(): 4.4%
Position::move_is_check(): 4.0%
MovePicker::get_next_move(): 4.0%
generate_captures(): 4.0%
pop_1st_bit(): 2.6%
Position::undo_move(): 2.2%
Position::discovered_check_candidates(): 2.1%
init_node: 1.9%
PawnInfoTable::get_pawn_info(): 1.8%
generate_evasions(): 1.7%
generate_checks(): 1.2%
Position::move_is_legal(): 1.2%
Position::pinned_pieces(): 1.1%
extension(): 1.0%
Re: Profile data request
I had this problem too; it seems that there are bugs:hgm wrote:I have a more general question about profiling: why is it that every routine in my engine takes 0.00% of the time, and even the total cumulative of all routines takes 0.00%? I know I can write fast code, but could it really be that fast...? I am using gprof and compile with "gcc -pg", under cygwin.
http://www.nabble.com/Cygwin-1.5.25:-gp ... 21057.html
I fixed the problem by replacing libgmon.a as mentioned in the linked thread.