CT800 engine: Android problems?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

CT800 engine: Android problems?

Post by Ras »

Hi,

I'm a bit stuck with my current Android version. Especially because I don't have an Android device and the simulator from the SDK doesn't work on my PC. I don't see problems under Windows or PC-Linux. I've already used the GCC sanitiser options under Linux to rule out a possible segfault.

The symptoms seem to be that Android binaries compiled with NDK r15c and r16b don't work, and the x86 version not even with r15b. Droidfish throws "engine terminated" or so.

I've put together a ZIP with versions compiled with r15b, r15c and r16b. Both for ARM and x86, and both for 32 bits and 64 bits. Everything is linked statically together with -pie and -fPIE. API level for 32 bit is 16 (Android 4.1) and 21 for 64 bit (Android 5.0).

Could someone please take a look how that fails?

https://www.ct800.net/download/ct800_v1.20p04.zip

thanks, Rasmus
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: CT800 engine: Android problems?

Post by Ferdy »

arm32_r15b worked well in Android v4.4.2 on Droidfish, Chess for Android, and Analyze This (Pro).
I have not tested the other compiles.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: CT800 engine: Android problems?

Post by Ras »

That's something positive, thank you. The issue is that all the others don't seem to work. Could you have a look at the r15c and r16b, too?
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: CT800 engine: Android problems?

Post by Ferdy »

Ras wrote:That's something positive, thank you. The issue is that all the others don't seem to work. Could you have a look at the r15c and r16b, too?
arm32_r15c worked, arm32_r16b did not work.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: CT800 engine: Android problems?

Post by Ras »

Ferdy wrote:arm32_r15c worked, arm32_r16b did not work.
Ok, thanks again, so at least it's reproducible.

Probably the symptom is that the engine doesn't seem to react. I think I'll have to overhaul the IO system.
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: CT800 engine: Android problems?

Post by abik »

Ras wrote:Could someone please take a look how that fails?
Results running on Android 8.1.

Code: Select all

ct800_v1.20p04_andarm32_r15b        runs
ct800_v1.20p04_andarm64_r15b        runs

ct800_v1.20p04_andarm32_r15c        runs
ct800_v1.20p04_andarm64_r15c        runs

ct800_v1.20p04_andarm32_r16b        Segmentation fault
ct800_v1.20p04_andarm64_r16b        runs
Sample run:

Code: Select all

# ./ct800_v1.20p04_andarm64_r16b
CT800 V1.20P04 64 bit UCI version
2016-2018 by Rasmus Althoff
Free software under GPLv3+
uci
id name CT800 V1.20P04 64 bit
id author Rasmus Althoff
option name Hash type spin default 8 min 1 max 1024
option name Keep Hash Tables type check default true
option name Clear Hash type button
option name Book Moves type button
option name OwnBook type check default true
option name Contempt Value [cps] type spin default 30 min -300 max 300
option name Contempt End [moves] type spin default 34 min 0 max 400
option name CPU Speed [%] type spin default 100 min 1 max 100
option name Move Overhead [ms] type spin default 100 min 0 max 1000
option name Eval Blur [cps] type spin default 0 min 0 max 500
option name Show Short PVs type check default false
option name Show Current Move type combo default Every Second var Every Second var Continuously
option name UCI_EngineAbout type string default The CT800 is free software under GPLv3+. Website: www.ct800.net
uciok
go
info depth 1 seldepth 1 score cp 1 time 0 nodes 1 nps 0 hashfull 0 tbhits 0 pv e2e4
bestmove e2e4
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: CT800 engine: Android problems?

Post by Ras »

abik wrote:Results running on Android 8.1.
Thanks, that confirms the other results. NDK r16b seems to have issues since neither compiler warnings from Clang nor GCC are there, CppCheck and Coverity Scan didn't find anything, and even the sanitiser options of GCC under Linux were fine.

The changelog is a bit sparse, but I did find some hints that more stuff from bionics gets pulled in there, plus the headers, so there might be changes to the libc and its behaviour. Not sure.

I'm building the binary with -fPIE -PIE -static in one single Clang call for all .c files, having built the NDK for API 16 (32 bit) and 21 (64 bit) respectively, is that OK?

What I've already changed is the usage of fgets() where I assumed that the whole UCI command will be read at once. However, it might be readable from stdin only partly, especially long move lists, depending on the redirection. So I better assemble it with subsequent reads until the last read character is a newline. I also tried to use setvbuf and buffer stdin with 8k line buffered and stdout with 2 fully buffered (along with fflush).

But even then, 15b and 15c work with Chess for Android, but 15b terminates with Droidfish after book end, and with 15c after some time into the game.

I suspect there may be some unexpected gotchas with how fgets behaves on stdin under Android. Maybe it can return NULL just like that without the redirection from the GUI being broken (i.e. no EOF condition)? Or perhaps waiting half a second before trying to use them would be useful so that the GUI's redirection is set up?

What's even more difficult is building an actually working binary for x86 Android, I guess I'll just drop that idea.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Partly solved!

Post by Ras »

At least I've found out what the problem was with NDK r16b: it does not like building both with PIE and static.

The reason is that under Linux, a dynamically linked executable actually is a shared library that also happens to be executable. But a shared library that is static - that doesn't make sense.

Next, the NDK has the C standard library both as static version, libc.a, and as dynamic one, libc.so. The issue is that the static version is more or less unmaintained. It was last updated with API level 21 (means also, btw: static linkage with API > 21 will probably not work) and before that at API level 9. Now for PIE, the minimum level is 16, and that is a bit far away from API 9.

For 64 bit, the minimum level is 21, so that matches, but for 32 bit, API 16 makes more sense because that will work with devices running Android 4.1+ but less than 5.0, which are still around 10% of the devices out there.

Actually, getting the static libc.a updated is targeted for one of the next NDK releases, could be r17 or maybe r18. The dynamic version, however, is on the target system anyway.

The solution is to compile with -fPIE and -pie, but not using -static. The linker flags must have -pie. When using one single compiler call for the whole source, -pie is forwarded to the linker, but -Wl,-pie can also be added just to be sure.

When using the readelf tool (Linux or Cygwin), it is easy to verify:

"readelf -l my_engine" must show the type as DYN (shared object file). Looks confusing, but see above, a dynamically linked executable is exactly that.

"readelf -d my_engine" then will show two "NEEDED" entries for an engine written in plain C: libdl.so, which is the dynamic loader, and libc.so, the C standard library. Both are present on the target system.

(It might also be possible to use API level 9 and then link statically, without PIE of course. But that still runs into the problem of an outdated libc.a, so I'd expect that there are still old bugs inside.)
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Completely solved.

Post by Ras »

The other issue that Droidfish would report an engine error was rooted in a bug that Arena did not trigger. I went through the source code of Droidfish; unlike most GUIs, Droidfish does not use UCI with the "position startpos moves ..." idiom over the whole game, but instead uses "position fen ... moves ..." with the FEN position being the one after the most recent irreversible move.

The IO layer rewrite for my V1.20 has introduced a bug where the engine would fail to evaluate the side to move if no move list was transferred and this side was Black. The result would be either an illegal null move for the wrong side or the refusal to move at all.

Since Arena and Shredder GUI both transfer the full game, my testing did not uncover this bug. Fixing it was easy - it was more of a headache how to still detect an ongoing game and make use of PV hits when there suddenly is no move list.