Ordo question

Guenther · Post by **Guenther** » Sun Dec 11, 2016 11:01 am

I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)

Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.

I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.

AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.

Guenther

Guenther · Post by **Guenther** » Sun Dec 11, 2016 12:10 pm

I noticed that the biggest source of problems is the creative usage of lower/uppercase letters in the naming of the programs.
Would it be possible to have a flag which ignores this and treats them the same?

Vinvin · Post by **Vinvin** » Sun Dec 11, 2016 12:46 pm

You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...

Adam Hair · Post by **Adam Hair** » Sun Dec 11, 2016 2:16 pm

Guenther wrote:I am currently fiddling with a 6.3 millions game database which could replace the grand unfied rating list, but because of the size and not enough memory on my machine I am forced to do some workarounds.
(CSVEd which is a wonderful tool I already use since long regrettably runs out of memory when trying to sort entries and splitting into chunks doesn't help much when one wants to normalize names of not even known entities,
BTW the import of the full csv only needs a few seconds here!)

Thanks to ordoprep the size of the current file I am working with is only ~0.5 GB. But because of not normalized names it will not converge on the data yet.

I used now the -g flag to output the groups file as mentioned in the manual.
My question is, does it practically contain all different players names/entities?
I will use that file to normalize some names now which hopefully will result
in more connected groups too.

AFAIK there is no flag to just output all players names?
Of course if my assumption is right and the groups file contains all names all is ok for me.

Guenther

There is no flag for listing player names. And I do believe that the group flag lists all players when more than one group exists. You could double check this with Norm Pollock's nameList tool.

Miguel will soon be on Christmas break from Loyola Chicago. Maybe we can coerce him to work on Ordo

Guenther · Post by **Guenther** » Sun Dec 11, 2016 2:46 pm

Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...

Thanks Vincent, I will look into it.

Vinvin · Post by **Vinvin** » Sun Dec 11, 2016 3:47 pm

Guenther wrote:
Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Thanks Vincent, I will look into it.

The central part of the program is :

Code: Select all

repeat
   read;
   if (&#40;leftstr&#40;lpgn , 8&#41; = '&#91;White "') or &#40;leftstr&#40;lpgn , 8&#41; = '&#91;Black "'))
    then begin for loop&#58;=1 to ind do lpgn&#58;=StringReplace&#40;lpgn,replsour&#91;loop&#93;,repldest&#91;loop&#93;,&#91;rfReplaceAll&#93;);
               writename;
         end;
...

Guenther · Post by **Guenther** » Tue Dec 13, 2016 2:06 pm

Vinvin wrote:
Guenther wrote:
Vinvin wrote:You can use my tool : http://talkchess.com/forum/viewtopic.ph ... 723#523723

If you want, I have a more recent "listsub.txt" somewhere ...
Thanks Vincent, I will look into it.
The central part of the program is :
Code: Select all
repeat
   read;
   if (&#40;leftstr&#40;lpgn , 8&#41; = '&#91;White "') or &#40;leftstr&#40;lpgn , 8&#41; = '&#91;Black "'))
    then begin for loop&#58;=1 to ind do lpgn&#58;=StringReplace&#40;lpgn,replsour&#91;loop&#93;,repldest&#91;loop&#93;,&#91;rfReplaceAll&#93;);
               writename;
         end;
...

Code: Select all

-------------------------------------------
Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------

Shouldn't this example replace both false entries by 'Abrok 50', or did
I missread your post from the original thread?
This was a quick test example for replacement speed.

The result was that the output file still was identical to the input file?

Guenther · Post by **Guenther** » Tue Dec 13, 2016 2:52 pm

Ok I found what I did wrong. I had already deleted the original content of your listsub.txt and after downloading it again I see that the first line must be no delimiter line.

Code: Select all

Abrok 5 0
Abrok 50
-------------------------------------------
Abrok 500
Abrok 50
-------------------------------------------

Norm Pollock · Post by **Norm Pollock** » Tue Dec 13, 2016 5:05 pm

With regard to name normalization, I have 3 tools that might help.

"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.

see www below for 40H-pgn if you think any of these might help your project along.

-Norm

Guenther · Post by **Guenther** » Tue Dec 13, 2016 6:00 pm

Norm Pollock wrote:With regard to name normalization, I have 3 tools that might help.

"nameChange" takes a text file of old name followed by new name and outputs a new file with the name changes.
"nameList" which can optionally give a list of names in all caps which can be compared to a default list. If there is a difference in quantity, then there is a capitalization duplicate.
"nameSimilar" which goes through all the names and lists similar names if they start with the same token, or if just the first 3 characters are identical. User can then pick out duplicate names.

see www below for 40H-pgn if you think any of these might help your project along.

-Norm

Thanks Norm, but I am sure Vincents tool is quite sufficient for the task and it is faster than I thought in the beginning.

I just introduced some extra work in my first test (letter A), because I did not realize that the plain string comparison could give me wrong entries,
when having the same substrings.
Just adding the closing '"' to the players name string avoids this of course ;-) I should have looked at the code example first.

(example: Alex 20 | Alex 200 => replaced all strings beginning with Alex 20, which wasn't intended, ending up e.g. with Alex 2001 - was Alex 201 before - just replacing Alex 20" by Alex 200" does it)

Ordo question

Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question

Re: Ordo question