Sven Schüle wrote:Hi Richard,rvida wrote:I decided to use the sed based solution. It is a wonderful tool,
take care of doing both steps (CR/LF->LF conversion + trailing whitespace removal) separately, starting with the CR/LF part. Depending on the platform where you perform these conversions, "sed" as well as "perl" or other tools using regular expressions may or may not recognize a CR/LF character sequence as something that matches a "$" (end of input line) in the given pattern. Therefore a pattern logically resembling "<whitespace><whitespace>*$" may or may not match an input line that ends with <whitespace><CR><LF>. You can expect it to succeed in a typical Windows-like environment where CR/LF is the typical text file line ending, but not in a typical UNIX environment. Furthermore, also combining "<whitespace><whitespace>*<CR><LF>" in one pattern will not always succeed since line endings could be inconsistent within one file.
The idea of $ matching end of line is to add portability. Will match end of line at run time OS independent. So your program dealing with some data before an end of line (data)$ will always find the data even if you run your script in a different operating system.
The main problem is when you write a regular expression using $ ( end of line match) when you really want to match only one character (CR or LF). Then the code will probably fail in other OS.
In this case, the problem proposed by Vida, you are looking for specific combination of space, tabs, CR and LF. Here its ok to not use $.
Your post, wich is valid and importatnt to be aware of this details, make me think the command I posted will not work if the input has mixed data (some lines with CR+LF, other only LF). This is solved making optional the CR match. So the version 3 (
Code: Select all
perl -pe 's/\s*\r*\n/\n/' in > out
For a real printer the DOS solution is the more logical and gives more control, because you have the option to move carriage, returning to column 0 and optional feeding, This allow to overwrite by returning carriage without feeding. A nonsense for a real printer because overwrites all, but useful if you write to screen and you don't like to scroll.Sven wrote: Hmmm ... wasn't it an invention from the DOS world to have that CR/LF line ending that created one of the biggest (in)compatibility issues in the whole IT world?
Sven
Ignacio
