looking for a tool to conver line endings

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

looking for a tool to conver line endings

Post by rvida »

Hi,

I am looking for a simple tool that can

1) convert Windows (CR+LF) line endings to Unix (LF)
2) trim trailing whitespace at the end of each line
nepossiver
Posts: 38
Joined: Wed Sep 03, 2008 4:12 am

Re: looking for a tool to conver line endings

Post by nepossiver »

To convert from dos to unix and vice-versa (and I think the package also includes the Mac text files, which if I am not mistaken is CR), there is dos2unix:

http://sourceforge.net/projects/dos2unix/

To trim space you could use a perl scrit - to do the format conversion as well, in fact.
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: looking for a tool to conver line endings

Post by IGarcia »

rvida wrote:Hi,

I am looking for a simple tool that can

1) convert Windows (CR+LF) line endings to Unix (LF)
2) trim trailing whitespace at the end of each line


The white space has to be before end-line, so inverting task order you can do both in a single step, run:

Code: Select all

perl -pe 's/ *\r\n/\n/' inputfile
It will replace all (optional) white spaces followed by CR+LF by a single LF from a file to stdout.

Hope it helps.
Regards

Ignacio
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: looking for a tool to conver line endings

Post by lucasart »

rvida wrote:Hi,

I am looking for a simple tool that can

1) convert Windows (CR+LF) line endings to Unix (LF)
2) trim trailing whitespace at the end of each line
Regular expressions can do it :D

First remove all trailing spaces and tabs at the end of each line with sed:

Code: Select all

sed 's/[ \t]*$//' file.txt > file_trimmed.txt
I'm not sure if sed will automatically convert CRLF into LF, but if it doesn't just pipe this into another sed

Code: Select all

sed 's/[ \t]*$//' file.txt | sed 's/\n/\n/' > file_trimmed.txt
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: looking for a tool to conver line endings

Post by bob »

rvida wrote:Hi,

I am looking for a simple tool that can

1) convert Windows (CR+LF) line endings to Unix (LF)
2) trim trailing whitespace at the end of each line
Linux has always had dos2unix and unix2dos commands. That what you want?
nepossiver
Posts: 38
Joined: Wed Sep 03, 2008 4:12 am

Re: looking for a tool to conver line endings

Post by nepossiver »

I am not fluent on perl one-liners, but now I am jealous of the previous posts, so here is a (untested) script to convert to / from any OS specific format:

Code: Select all

#!/usr/bin/perl
open &#40;fh_in, "<", $ARGV&#91;0&#93;) or die "could not open file $!";
open &#40;fh_out, ">", $ARGV&#91;1&#93;) or die "could not open file $!";

while (<fh_in>)
&#123;
	$_ =~ s/\r//;
	$_ =~ s/\n//;
	$_ =~ s/\s+$//;
	if ( $ARGV&#91;2&#93; =~ m/mac/i ) &#123; $_ = $_."\r"; &#125;
	elsif ( $ARGV&#91;2&#93; =~ m/unix/i ) &#123; $_ = $_."\n"; &#125;
	elsif ( $ARGV&#91;2&#93; =~ m/windows/i ) &#123; $_ = $_."\r\n"; &#125;
	print fh_out;
&#125;
close &#40;fh_in&#41;;
close &#40;fh_out&#41;;
just save this as whatever_you_like.pl, then make it executable:

Code: Select all

 chmod +x whatever_you_like.pl
and call it as:

Code: Select all

 ./whatever_you_like.pl input_file output_file SYSTEM
SYSTEM being any one choice of mac, unix or windows (case-insensitive). Again, untested, if does not work, just get back here and someone will point to any possible errors.

edit: this assumes you are developing on unix/mac, if not, you have to call the script as:

Code: Select all

perl whatever_you_like.pl input_file output_file SYSTEM
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: looking for a tool to conver line endings

Post by IGarcia »

bob wrote:
rvida wrote:Hi,

I am looking for a simple tool that can

1) convert Windows (CR+LF) line endings to Unix (LF)
2) trim trailing whitespace at the end of each line
Linux has always had dos2unix and unix2dos commands. That what you want?
those programs are not installed by default and the other problem is not solved: you still have spaces before end-line.

@Horacio: Nice coding. :wink:
My command misses tabs.. This will do the trik form command line

Code: Select all

perl -pe 's/\s*\r\n/\n/' infile > outfile
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: looking for a tool to conver line endings

Post by rvida »

Thanks for all the answers.

I decided to use the sed based solution. It is a wonderful tool, although for people coming from Dos/Windows world the syntax is somewhat obscure...

Btw. after some googling I found a list of very useful sed one liners:
http://sed.sourceforge.net/sed1line.txt
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: looking for a tool to conver line endings

Post by IGarcia »

rvida wrote:Thanks for all the answers.

I decided to use the sed based solution. It is a wonderful tool, although for people coming from Dos/Windows world the syntax is somewhat obscure...

Btw. after some googling I found a list of very useful sed one liners:
http://sed.sourceforge.net/sed1line.txt
Sure, sed is a great.

There are little, but important, differences on how each program interprets regular expressions. Keeping track of those differences in memory is a mess so its common to stick to one of few programs.

Regular expression (regex) syntax is obscure but is all logic behind them, and crafting some regex can be great fun, as solving a chess problem! :)


Ignacio.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: looking for a tool to conver line endings

Post by Sven »

rvida wrote:I decided to use the sed based solution. It is a wonderful tool,
Hi Richard,

take care of doing both steps (CR/LF->LF conversion + trailing whitespace removal) separately, starting with the CR/LF part. Depending on the platform where you perform these conversions, "sed" as well as "perl" or other tools using regular expressions may or may not recognize a CR/LF character sequence as something that matches a "$" (end of input line) in the given pattern. Therefore a pattern logically resembling "<whitespace><whitespace>*$" may or may not match an input line that ends with <whitespace><CR><LF>. You can expect it to succeed in a typical Windows-like environment where CR/LF is the typical text file line ending, but not in a typical UNIX environment. Furthermore, also combining "<whitespace><whitespace>*<CR><LF>" in one pattern will not always succeed since line endings could be inconsistent within one file.
rvida wrote:although for people coming from Dos/Windows world the syntax is somewhat obscure...
Hmmm ... wasn't it an invention from the DOS world to have that CR/LF line ending that created one of the biggest (in)compatibility issues in the whole IT world? :-)

Sven