PK wrote: ↑Sun Dec 06, 2020 9:27 am
@Guenther, you have a nice crawler over there! I have one friend who can investigate options of putting at least programming part of the forum somewhere, so stay in touch.
PK wrote: ↑Sun Dec 06, 2020 9:27 am
@Guenther, you have a nice crawler over there! I have one friend who can investigate options of putting at least programming part of the forum somewhere, so stay in touch.
Well, after crawling for around nine days I had to abandon this for now. It blocked my (old) computer for too long and the result somehow was not convincing enough. Moreover after reaching around 85% it needed another day to add 2-3 percent, with still growing link numbers.
(Noticed also that it would not be possible to synchronize with already available links e.g. in CPW, because of a randomly added hex number with 4 digits, in the middle of the new created links)
Probably I should have found better settings at the beginning.
May be I will try it again later, with more cautious settings and less depth.
OTH it would be much simpler, if ChessUSA would create a database backup and make it available somewhere.
Guenther wrote: ↑Sat Dec 05, 2020 6:27 pm
I am currently in the process of saving the most important parts of the forum by an old tool, which still seems to work like a charm.
(I don't want to put too much stress on the host server so I limited the bandwidth for downloading)
This will take a while and I have to check first, if the saved archives work at all later.
If all works out as desired, I will report back in a few days.
I don't think I will save everything and will filter manually 'not so useful' content, at least from the General forum.
I have no clue how big it will be and hopefully it won't crash my computer over night
Oh, I forgot to say I started of course with the Programmers forum
After this I will try the General forum. The others are not that important.
Edit:
A first check is very promising - everything is working as expected in the saved archives structure!
BTW ofc I also tried wayback on this and in fact talkchess was saved hundredths of times in the past, BUT the saved archives
never go deeper than the thread/post titles, which means those are useless.
First guess for the programmers part ~5GB (total links will still increase further)
This will make the General part something like 20-50GB
It reminds me of old Website aspirator HTTrack ^^ It was not easy to configure parameters but you could exclude link to files more than x Mb i remember.
peter wrote: ↑Sat Dec 05, 2020 7:15 am
But even less I would simply let the data stored here go without even trying to save them in one way or the other.
I agree. But Talkchess is no longer a place where you can reliably store and retrieve information.
Email Quentin Turner at YMC&G directly with your problems and concerns (info@chessusa.com). If there is a sufficient volume of requests I would expect there to be some action. I have forwarded all requests sent to me about unblocking, but the last couple of months I have gotten no replies.
Wouldn't it be possible to set up a 'relay server' at a non-blocked IP address, which does nothing other than forward the URL request to TalkChess, fetch the data, and then forward that to the original requester? People could use that as if it was the original TalkChess website. (Although it would be a little slower.) I suppose that for every page it relays it would have to substitute any occurrence of the string 'talkchess.com' to its own domain name, so that people who follow the links would again go through the relay server.
The technology is finally there. Amazing what things are possible in 2021.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
Hardly get access since yesterday, the notorious Forbidden You don't have permission to access /forum3/search.php on this server once again. Since today no access at all. Need to use a VPN connection.
Is Quentin Turner blocking IP's again?
90% of coding is debugging, the other 10% is writing bugs.