Ordo question

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Ordo question

Post by Guenther »

I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)

Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.

I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.

AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.

Guenther
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Ordo question

Post by Guenther »

I noticed that the biggest source of problems is the creative usage of lower/uppercase letters in the naming of the programs.
Would it be possible to have a flag which ignores this and treats them the same?
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Ordo question

Post by Vinvin »

You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo question

Post by Adam Hair »

Guenther wrote:I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)

Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.

I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.

AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.

Guenther


There is no flag for listing player names. And I do believe that the group flag lists all players when more than one group exists. You could double check this with Norm Pollock's nameList tool.

Miguel will soon be on Christmas break from Loyola Chicago. Maybe we can coerce him to work on Ordo :wink:
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Ordo question

Post by Guenther »

Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Thanks Vincent, I will look into it.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Ordo question

Post by Vinvin »

Guenther wrote:
Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Thanks Vincent, I will look into it.
The central part of the program is :

Code: Select all

repeat
   read;
   if ((leftstr(lpgn , 8) = '[White "') or (leftstr(lpgn , 8) = '[Black "'))
    then begin for loop:=1 to ind do lpgn:=StringReplace(lpgn,replsour[loop],repldest[loop],[rfReplaceAll]);
               writename;
         end;
...
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Ordo question

Post by Guenther »

Vinvin wrote:
Guenther wrote:
Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Thanks Vincent, I will look into it.
The central part of the program is :

Code: Select all

repeat
   read;
   if ((leftstr(lpgn , 8) = '[White "') or (leftstr(lpgn , 8) = '[Black "'))
    then begin for loop:=1 to ind do lpgn:=StringReplace(lpgn,replsour[loop],repldest[loop],[rfReplaceAll]);
               writename;
         end;
...

Code: Select all

-------------------------------------------
Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------
Shouldn't this example replace both false entries by 'Abrok 50', or did
I missread your post from the original thread?
This was a quick test example for replacement speed.

The result was that the output file still was identical to the input file?
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Ordo question

Post by Guenther »

Ok I found what I did wrong. I had already deleted the original content of your listsub.txt and after downloading it again I see that the first line must be no delimiter line.

Code: Select all

Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Ordo question

Post by Norm Pollock »

With regard to name normalization, I have 3 tools that might help.

"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.

see www below for 40H-pgn if you think any of these might help your project along.

-Norm
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Ordo question

Post by Guenther »

Norm Pollock wrote:With regard to name normalization, I have 3 tools that might help.

"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.

see www below for 40H-pgn if you think any of these might help your project along.

-Norm
Thanks Norm, but I am sure Vincents tool is quite sufficient for the task and it is faster than I thought in the beginning.

I just introduced some extra work in my first test (letter A), because I did not realize that the plain string comparison could give me wrong entries,
when having the same substrings.
Just adding the closing '"' to the players name string avoids this of course ;-) I should have looked at the code example first.

(example: Alex 20 | Alex 200 => replaced all strings beginning with Alex 20, which wasn't intended, ending up e.g. with Alex 2001 - was Alex 201 before - just replacing Alex 20" by Alex 200" does it)