Interesting article in "Nature" on Machine Learning
Posted: Wed Jan 09, 2019 8:27 am
Computer Chess Club
https://talkchess.com/
Gödel only showed that the continuum hypothesis did not contradict the ZFC axioms, i.e. it could not be proved false within standard set theory. It was Paul Cohen that showed that completed the proof of the independence of the continuum hypothesis from ZFC (i.e. that assuming that the continuum hypothesis is false also does not contradict the ZFC axioms).The mathematicians, who were working on a machine-learning problem, show that the question of ‘learnability’ — whether an algorithm can extract a pattern from limited data — is linked to a paradox known as the continuum hypothesis. Gödel showed that the statement cannot be proved either true or false using standard mathematical language.
So this was not a 1940 result by Gödel but a 1960 result by Cohen.A 1940 result by Gödel (which was completed in the 1960s by US mathematician Paul Cohen) showed that the continuum hypothesis cannot be proved either true or false starting from the standard axioms — the statements taken to be true — of the theory of sets, which are commonly taken as the foundation for all of mathematics.
So they seem to be looking into learning things about an infinitely large data set. For all practical purposes (in particular given that practical computers have finite memory) we can limit application of AI to finite data sets.In the latest paper, Yehudayoff and his collaborators define learnability as the ability to make predictions about a large data set by sampling a small number of data points. The link with Cantor’s problem is that there are infinitely many ways of choosing the smaller set, but the size of that infinity is unknown.
There must be a mistake here. A finite subset of the unit interval is finite, so can't be giving much problems.Ben-David and colleagues then prove that the ability to carry out a weak form of monotone compression is related to the size of certain infinite sets. The set that the authors ultimately use in their work is the unit interval, which is the set of real numbers between 0 and 1. Their results imply that the finite subsets of the unit interval have monotone-compression schemes, and therefore are learnable in EMX, if and only if the continuum hypothesis is true, which is known to be unprovable.
Our focus is on a specific learning problem we call ‘estimating the maximum’ (EMX). The EMX problem belongs to both models discussed above. Here is a motivating example. Imagine a website that is being visited by a variety of users. Denote by X the set of all potential visitors to the website. The owner of the website wishes to post ads on it. The posted ads are to be chosen from a given pool of ads. Each ad A in the pool targets a certain population of users FA ⊆ X. For example, if A is a sports ad then FA is the collection of sports fans. The goal is to place an ad whose target population visits the site most frequently. The challenge is that it is not known in advance which visitors are to visit the site.
More formally, we assume access to a training sample of visitors drawn from an (unknown) distribution P. The collection of ads corresponds to the family of sets ℱ = {FA : A is an ad in the pool}. The ad problem above becomes an instance of the following EMX problem:
Given a family ℱ of subsets of some domain X, find a set in ℱ whose measure with respect to an unknown probability distribution P is close to maximal. This should be done based on a finite sample generated i.i.d. from P.
So it is about the "EMX learnability" of the family of all finite subsets of the unit interval.The family of sets ℱ∗ we consider is the family of all finite subsets of the interval [0, 1]. The class of probability distributions 𝒫∗we consider is the class of all distributions over [0, 1] with finite support.
Theorem
The EMX learnability of ℱ∗with respect to 𝒫∗ is independent of the ZFC axioms.