Need help with compiler and language choices

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Need help with compiler and language choices

Post by Don »

jwes wrote:
Don wrote:
jshriver wrote:A great and open source assembler for x86 is nasm. I've been using it off/on since the mid 90's and works well in Windows or Linux and supports various object formats including flat binary.

Had originally used it to write my own bootstrap loader for an embedded system I was working on.
I saw your post which disappeared but I agree with you on this. High level concepts sometimes take some effort to wrap your brain around - although they seem easy once you have done that. assembly has only very simple concepts so it's very easy to learn quickly.

I would point out that in ANY language being really proficient takes a lot of effort and I don't think that varies so much regardless of the language. Whether it's C, assembly, or kung-fu you probably won't be a black-belt for quite some time.
Writing good assembler is a lot harder than it used to be (back when instruction time was a meaningful concept). Now you can spend as much time on instruction sequencing and code location as you do on algorithms.
That depends on what you consider "good" assembler. By one definition it is well structured, nicely commented, readable and artfully composed but not necessary fast executing.

By your definition it means writing code that executes really fast. I agree with you that it is harder than it used to be. C is the simplest path to really fast code now but C is not the best beginners language.

I personally believe every programmer should start with assembler just so that he knows what is really going on, then he should move up gradually to higher level languages. I'm sure some will feel just the opposite.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Need help with compiler and language choices

Post by michiguel »

Don wrote:
jwes wrote:
Don wrote:
jshriver wrote:A great and open source assembler for x86 is nasm. I've been using it off/on since the mid 90's and works well in Windows or Linux and supports various object formats including flat binary.

Had originally used it to write my own bootstrap loader for an embedded system I was working on.
I saw your post which disappeared but I agree with you on this. High level concepts sometimes take some effort to wrap your brain around - although they seem easy once you have done that. assembly has only very simple concepts so it's very easy to learn quickly.

I would point out that in ANY language being really proficient takes a lot of effort and I don't think that varies so much regardless of the language. Whether it's C, assembly, or kung-fu you probably won't be a black-belt for quite some time.
Writing good assembler is a lot harder than it used to be (back when instruction time was a meaningful concept). Now you can spend as much time on instruction sequencing and code location as you do on algorithms.
That depends on what you consider "good" assembler. By one definition it is well structured, nicely commented, readable and artfully composed but not necessary fast executing.

By your definition it means writing code that executes really fast. I agree with you that it is harder than it used to be. C is the simplest path to really fast code now but C is not the best beginners language.

I personally believe every programmer should start with assembler just so that he knows what is really going on, then he should move up gradually to higher level languages. I'm sure some will feel just the opposite.
I think almost anything you will be fine as long as you do it right. However, if I have to give blanket suggestion, I will say that the best is to start directly with C. First, it is not needed to know what is going in assembler to right good code. C is low level enough to figure it out and pick what is needed from assembler on the go. Second, and most importantly IMHO, is to read and learn proper coding techniques. There is a vast literature about this and most of the time the examples are in C.

The beginner has to be careful because no matter what language you start, you risk picking up the vices of that language. I started with Pascal and it took me a while to realize that using break and continue was not a bad thing in C. Having a goto is not necessarily bad either, but I feel like the Orange Clockwork character listening to Beethoven when I see one :-)

BTW, people in the Comp Sci. dept strongly suggest to start with Python. I think they have a point. The problem or the good thing is... you may never want to use anything but Python anymore.

Miguel
PS: Disclaimer: I have no formal academic training in Comp sci., so whoever read this, take it with a huge grain of salt.
Dann Corbit
Posts: 12791
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Need help with compiler and language choices

Post by Dann Corbit »

Don wrote:
jwes wrote:
Don wrote:
jshriver wrote:A great and open source assembler for x86 is nasm. I've been using it off/on since the mid 90's and works well in Windows or Linux and supports various object formats including flat binary.

Had originally used it to write my own bootstrap loader for an embedded system I was working on.
I saw your post which disappeared but I agree with you on this. High level concepts sometimes take some effort to wrap your brain around - although they seem easy once you have done that. assembly has only very simple concepts so it's very easy to learn quickly.

I would point out that in ANY language being really proficient takes a lot of effort and I don't think that varies so much regardless of the language. Whether it's C, assembly, or kung-fu you probably won't be a black-belt for quite some time.
Writing good assembler is a lot harder than it used to be (back when instruction time was a meaningful concept). Now you can spend as much time on instruction sequencing and code location as you do on algorithms.
That depends on what you consider "good" assembler. By one definition it is well structured, nicely commented, readable and artfully composed but not necessary fast executing.

By your definition it means writing code that executes really fast. I agree with you that it is harder than it used to be. C is the simplest path to really fast code now but C is not the best beginners language.

I personally believe every programmer should start with assembler just so that he knows what is really going on, then he should move up gradually to higher level languages. I'm sure some will feel just the opposite.
I really like mix (Knuth's assembly language).
I know some people despise it for some reason.

I think the nice thing about assembly is that while it is tedious, it is very easy to understand. Even things like indirect addressing become instantly obvious when you are doing it in assembly.

C is close enough to the bare metal to call it a portable assembler.
The big problem with assembly is that an assembler program ages to zero value in less than five years, but if written in C, all you would have to do is recompile to get the modern instruction set.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Need help with compiler and language choices

Post by bob »

Don wrote:
Greg Strong wrote:You are saying that Intel assembly language is easier to learn than C?
Yes, at least it was for me.

It was easier for me because when I learned C I was constantly tripping over the syntax, figuring out the libraries, etc. It's just a LOT to learn. It took a while before it was second nature to me and now I rarely refer to a manual - but that initial learning curve is steep.

But assembly is very regular and hardly has any syntax. You learn a few op-codes and what they do are very precisely defined and you are writing code very quickly.

However, it could be more difficult to learn than C if you don't already have a good understanding of how data is stored in a computer and how basic operations such as and, or, xor and things like this work.
While I agree with part of the above, it is _clearly_ not easier to write a chess engine in asm than in C. The asm gets _way_ too big. We did the search, movgen, swap, and evaluation of Cray Blitz in asm, and it was 5x-10x more lines of code to deal with. We kept the original fortran around so that we could first make changes there and test, and when we were happy, spend the days or weeks necessary to merge the changes into the asm that were _highly_ optimized and very difficult to read after it had gotten "cold".

I've been teaching asm for 40+ years. From IBM1620 to IBM/360/370/390 to vax, to sparc, to cray, to x86, to MIPS, and I skipped a few like Dec-10, honeywell, TI-ASC, CDC, Univac and probably some others. In other words, I've taught a bunch of different ones, and I'd rather teach or learn C any day.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Need help with compiler and language choices

Post by Don »

bob wrote:
Don wrote:
Greg Strong wrote:You are saying that Intel assembly language is easier to learn than C?
Yes, at least it was for me.

It was easier for me because when I learned C I was constantly tripping over the syntax, figuring out the libraries, etc. It's just a LOT to learn. It took a while before it was second nature to me and now I rarely refer to a manual - but that initial learning curve is steep.

But assembly is very regular and hardly has any syntax. You learn a few op-codes and what they do are very precisely defined and you are writing code very quickly.

However, it could be more difficult to learn than C if you don't already have a good understanding of how data is stored in a computer and how basic operations such as and, or, xor and things like this work.
While I agree with part of the above, it is _clearly_ not easier to write a chess engine in asm than in C.
Yes, I agree. What I said is that for ME, it was easier to LEARN assembler than C. Once you are good in both languages C is by far easier to program in.

Having said all of that, when I decided to write Rexchess in assembler instead of Pascal I was expecting it to be a major ordeal because any literature on the subject exaggerated to a ridiculous degree how daunting a task it would be. But that simply was not the case. They were claiming something like 10 or 20x longer to write the same thing but it was nothing like that for me and I thoroughly enjoyed the process.
The asm gets _way_ too big. We did the search, movgen, swap, and evaluation of Cray Blitz in asm, and it was 5x-10x more lines of code to deal with. We kept the original fortran around so that we could first make changes there and test, and when we were happy, spend the days or weeks necessary to merge the changes into the asm that were _highly_ optimized and very difficult to read after it had gotten "cold".

I've been teaching asm for 40+ years. From IBM1620 to IBM/360/370/390 to vax, to sparc, to cray, to x86, to MIPS, and I skipped a few like Dec-10, honeywell, TI-ASC, CDC, Univac and probably some others. In other words, I've taught a bunch of different ones, and I'd rather teach or learn C any day.
My first program on a real computer was on the 1802 ELF kit - it had an led, a hex keypad, 2 digit hex display and 256 bytes of memory. I did not have the luxury of an assembler, so I programmed it in machine code through the hex keypad.

I was not there in the days of the coal burning computers and punch cards and vacuum tubes but I would have liked to have seen it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Need help with compiler and language choices

Post by bob »

Don wrote:
bob wrote:
Don wrote:
Greg Strong wrote:You are saying that Intel assembly language is easier to learn than C?
Yes, at least it was for me.

It was easier for me because when I learned C I was constantly tripping over the syntax, figuring out the libraries, etc. It's just a LOT to learn. It took a while before it was second nature to me and now I rarely refer to a manual - but that initial learning curve is steep.

But assembly is very regular and hardly has any syntax. You learn a few op-codes and what they do are very precisely defined and you are writing code very quickly.

However, it could be more difficult to learn than C if you don't already have a good understanding of how data is stored in a computer and how basic operations such as and, or, xor and things like this work.
While I agree with part of the above, it is _clearly_ not easier to write a chess engine in asm than in C.
Yes, I agree. What I said is that for ME, it was easier to LEARN assembler than C. Once you are good in both languages C is by far easier to program in.

Having said all of that, when I decided to write Rexchess in assembler instead of Pascal I was expecting it to be a major ordeal because any literature on the subject exaggerated to a ridiculous degree how daunting a task it would be. But that simply was not the case. They were claiming something like 10 or 20x longer to write the same thing but it was nothing like that for me and I thoroughly enjoyed the process.
The asm gets _way_ too big. We did the search, movgen, swap, and evaluation of Cray Blitz in asm, and it was 5x-10x more lines of code to deal with. We kept the original fortran around so that we could first make changes there and test, and when we were happy, spend the days or weeks necessary to merge the changes into the asm that were _highly_ optimized and very difficult to read after it had gotten "cold".

I've been teaching asm for 40+ years. From IBM1620 to IBM/360/370/390 to vax, to sparc, to cray, to x86, to MIPS, and I skipped a few like Dec-10, honeywell, TI-ASC, CDC, Univac and probably some others. In other words, I've taught a bunch of different ones, and I'd rather teach or learn C any day.
My first program on a real computer was on the 1802 ELF kit - it had an led, a hex keypad, 2 digit hex display and 256 bytes of memory. I did not have the luxury of an assembler, so I programmed it in machine code through the hex keypad.

I was not there in the days of the coal burning computers and punch cards and vacuum tubes but I would have liked to have seen it.

Where to start has been a near-religious argument for 40+ years. Dartmouth used to teach assembly language first, then they dropped into Basic (wonder why? :) Look at where it was developed.) And while I don't think asm is easier than a higher-level language, there are certain merits to learning asm first. One is to understand how things are actually done so that you can then code in a higher-level language in such a way that things go faster. For example, who would have thought it was better to do a decreasing loop rather than an increasing one? (for (i=0; i<n ;i++) as opposed to for (i=10; i> 0; i--))?? Yet in X86 the loop instruction makes the latter more attractive. Who'd know if they didn't know about that instruction? Or the equivalents on other architectures such as BDR ( branch and decrement register, et. al.)

However, X86 (and others) have some quirky syntax compared to more regular architectures like MIPS/SPARC and even the ugly Mix that nobody uses. :) I'm not convinced it is easier to learn x86 first, although it would have some benefits.

When you say 1802 I assume you mean the RCA 1802 "cosmac" chip? We did some process control applications using those way back. CMOS, very low power usage. And a pretty simple instruction set compared to x86 today which has a ton of opcodes.

I've not run on a tube-based machine, but I was around when VLSI meant a 7400 series chip that had 16 pins, and maybe 4 and gates or something similar on a single chip. That was the basis for the IBM /360 stuff. I still have my old electronic chess board I built in 1979 or so using 7400 TTL chips...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Need help with compiler and language choices

Post by wgarvin »

I think the biggest down side if you decide to write your program in assembler, is that you have to worry about a lot of trivial low-level details. You have to decide which registers to use to store which temporary values at which times, which values to spill and restore (and when), which calling conventions to use, etc.

Its possible to make macros to do "boilerplate" versions of e.g. calling sequences, but if you're going to do that, why not just write it in C to begin with? That is much easier, and it lets your compiler worry about all of those boring details on your behalf. Some of them the compiler is pretty good at. Almost no one writes large programs entirely in assembly any more, but a chess engine might be small enough to manage it. But because assembly is so low-level, it is hard to read, hard to maintain, and important algorithmic details can be easily obscured. One carefully written line of C code with a complex expression on it, might easily turn into 15 to 20 instructions, each of which will be on its own line, and those 15 instructions will look very much like the other 10,000+ instructions of the program. Making this maintainable and readable is a big challenge.

I would recommend to write your first engine entirely in C or C++ (but don't use any of the more fancy language features of C++... just use basic stuff like classes and inline methods). Only when you have a working engine that you're fairly satisfied with, would it be worth trying to re-write some routines in assembler, or use tricks such as inline assembler for bitscan instructions. Note that many strong engines are written entirely in C or C++, or at most they use a few tiny bits of assembly (like inline assembly for bitscan instructions). Very few engines these days contain entire hand-written routines in assembly.

But if you really want to go that route, then you'd definitely want to use a profiler to make sure that:
(1) the routines you convert to assembly were actually taking up a significant amount of the overall time (if its less than 2% then you probably shouldn't worry about it), and
(2) your hand-written assembly version of the routine is actually faster -- you would be surprised how easy it is to do worse than the compiler in this respect.. it is usually possible to do better than the compiler, but not without investing a lot of effort!

Also: there are things you can do in C to optimize your program, or to help the compiler get the most out of its own optimizations. You can help it with __restrict, or reading values into locals. You can disable C++ exceptions, and tweak your compiler settings for the fastest generated code. You can avoid constructs that cause extra instructions to be generated (such as converting small types into small ones) and rely on constructions that you know are "free" or "cheap" on x86 platforms (e.g. reading from a byte array into a 16- or 32-bit temporary, or indexing into an array of 2,4 or 8-byte values using a Scaled Index byte). If you want to do these kind of optimizations, then knowing how x86 assembly works and knowing what kind of code the compiler will generate for various constructions (and inspecting the actual code from time to time, to double-check) will probably come in handy. And if you manage to keep it entirely (or mostly) in C, then one day when you decide to port to a different platform (such as--the ARM processor in your Cell phone or iPod?) it will be easy, instead of being impossible!

One of my favorite SNES emulators from many years ago, was written entirely in x86 assembly. They had a good reason: back then PCs were barely fast enough to emulate the SNES, and this emulator was about twice as fast as its competitors. But since then, machines got a lot faster and the newer chips favored different instructions and have a different internal architecture. Some of the tricks that made this emulator fast on a 90mhz Pentium are now actually slower than equivalent C code, because of changes to the Intel architecture. However, CPUs are so much faster now, that they would really like to rewrite all of that code in C so it is much easier to maintain. They have been working on this rewrite for several years now, however its a huge job because it consists of hundreds of thousands of lines of assembly. Other new emulators have been written from scratch in C in much less time than the rewrite has already taken.
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: Need help with compiler and language choices

Post by Aaron Becker »

bob wrote: However, X86 (and others) have some quirky syntax compared to more regular architectures like MIPS/SPARC and even the ugly Mix that nobody uses. :) I'm not convinced it is easier to learn x86 first, although it would have some benefits.
I think it's worth it to learn Mix just to get more out of TAOCP. I really wish there was a MIPS edition, though. Regardless, x86 is about as ugly an instruction set as I've ever seen. If I really had to write an assembly chess engine, I think I'd go for ARM. Anything but x86.
Mincho Georgiev
Posts: 454
Joined: Sat Apr 04, 2009 6:44 pm
Location: Bulgaria

Re: Need help with compiler and language choices

Post by Mincho Georgiev »

Here is how one of my programs, written in assembly (MASM) looks like (with source):
http://download462.mediafire.com/0yxyl2 ... v1.0.2.zip
Assembler could be beautiful for a lot of purposes, but I don't think that is the right choice to write strong, complex, bug free engine.
It is possible of course, but you're going to invest your time in many different directions like debugging for example, instead of using it for actual code writing and testing.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Need help with compiler and language choices

Post by wgarvin »

Aaron Becker wrote:
bob wrote: However, X86 (and others) have some quirky syntax compared to more regular architectures like MIPS/SPARC and even the ugly Mix that nobody uses. :) I'm not convinced it is easier to learn x86 first, although it would have some benefits.
I think it's worth it to learn Mix just to get more out of TAOCP. I really wish there was a MIPS edition, though. Regardless, x86 is about as ugly an instruction set as I've ever seen. If I really had to write an assembly chess engine, I think I'd go for ARM. Anything but x86.
You can ignore most of the ugly things in x86 if you're writing flat-mode 32-bit or 64-bit code for modern machines. You don't need to care about prefixes, segment registers, string instructions, BCD conversions, etc.

Every instruction you'd see in a RISC instruction set has a more or less direct equivalent in x86; these are also the instructions which are fast on modern processors. Internally, all modern x86 chips are very RISC-like, but due to their CISC heritage, they retain the variable-length instruction encodings, which gives x86 an advantage over RISC instruction sets in code density (at least 25%). The front-end of an x86 chip internally decodes these variable-length instructions into fixed-length RISC instructions, and executes those. If you want to use vector instructions (or do fast single-precision floating point math) then you have to deal with SSE, but its not really any worse than Altivec. The old x86 floating-point instructions still work, but if you find them weird, you can usually just use SSE instead.

Anyway, if you want to write fast assembly code, it is important to understand the underlying out-of-order execution model that is actually used in x86 chips: read Agner Fog's microarchitecture guide for example. Many high-performance RISC chips have a similar execution model internally, but their front end is much simpler because the instruction set uses a fixed length encoding and fewer special cases.