I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)
Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.
I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.
AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.
Guenther
Ordo question
Moderators: hgm, Rebel, chrisw
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Ordo question
I noticed that the biggest source of problems is the creative usage of lower/uppercase letters in the naming of the programs.
Would it be possible to have a flag which ignores this and treats them the same?
Would it be possible to have a flag which ignores this and treats them the same?
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Ordo question
You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723
If you want, I have a more recent "listsub.txt" somewhere ...
If you want, I have a more recent "listsub.txt" somewhere ...
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Ordo question
Guenther wrote:I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)
Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.
I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.
AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.
Guenther
There is no flag for listing player names. And I do believe that the group flag lists all players when more than one group exists. You could double check this with Norm Pollock's nameList tool.
Miguel will soon be on Christmas break from Loyola Chicago. Maybe we can coerce him to work on Ordo
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Ordo question
Thanks Vincent, I will look into it.Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723
If you want, I have a more recent "listsub.txt" somewhere ...
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Ordo question
The central part of the program is :Guenther wrote:Thanks Vincent, I will look into it.Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723
If you want, I have a more recent "listsub.txt" somewhere ...
Code: Select all
repeat
read;
if ((leftstr(lpgn , 8) = '[White "') or (leftstr(lpgn , 8) = '[Black "'))
then begin for loop:=1 to ind do lpgn:=StringReplace(lpgn,replsour[loop],repldest[loop],[rfReplaceAll]);
writename;
end;
...
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Ordo question
Vinvin wrote:The central part of the program is :Guenther wrote:Thanks Vincent, I will look into it.Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723
If you want, I have a more recent "listsub.txt" somewhere ...
Code: Select all
repeat read; if ((leftstr(lpgn , 8) = '[White "') or (leftstr(lpgn , 8) = '[Black "')) then begin for loop:=1 to ind do lpgn:=StringReplace(lpgn,replsour[loop],repldest[loop],[rfReplaceAll]); writename; end; ...
Code: Select all
-------------------------------------------
Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------
I missread your post from the original thread?
This was a quick test example for replacement speed.
The result was that the output file still was identical to the input file?
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Ordo question
Ok I found what I did wrong. I had already deleted the original content of your listsub.txt and after downloading it again I see that the first line must be no delimiter line.
Code: Select all
Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Ordo question
With regard to name normalization, I have 3 tools that might help.
"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.
see www below for 40H-pgn if you think any of these might help your project along.
-Norm
"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.
see www below for 40H-pgn if you think any of these might help your project along.
-Norm
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Ordo question
Thanks Norm, but I am sure Vincents tool is quite sufficient for the task and it is faster than I thought in the beginning.Norm Pollock wrote:With regard to name normalization, I have 3 tools that might help.
"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.
see www below for 40H-pgn if you think any of these might help your project along.
-Norm
I just introduced some extra work in my first test (letter A), because I did not realize that the plain string comparison could give me wrong entries,
when having the same substrings.
Just adding the closing '"' to the players name string avoids this of course ;-) I should have looked at the code example first.
(example: Alex 20 | Alex 200 => replaced all strings beginning with Alex 20, which wasn't intended, ending up e.g. with Alex 2001 - was Alex 201 before - just replacing Alex 20" by Alex 200" does it)