Interesting article in "Nature" on Machine Learning

Laskos · Post by **Laskos** » Wed Jan 09, 2019 8:27 am

https://www.nature.com/articles/d41586-019-00083-3

syzygy · Post by **syzygy** » Sat Jan 12, 2019 10:11 pm

The article starts with an inaccuracy:

The mathematicians, who were working on a machine-learning problem, show that the question of ‘learnability’ — whether an algorithm can extract a pattern from limited data — is linked to a paradox known as the continuum hypothesis. Gödel showed that the statement cannot be proved either true or false using standard mathematical language.

Gödel only showed that the continuum hypothesis did not contradict the ZFC axioms, i.e. it could not be proved false within standard set theory. It was Paul Cohen that showed that completed the proof of the independence of the continuum hypothesis from ZFC (i.e. that assuming that the continuum hypothesis is false also does not contradict the ZFC axioms).

Later on the article corrects this somewhat:

A 1940 result by Gödel (which was completed in the 1960s by US mathematician Paul Cohen) showed that the continuum hypothesis cannot be proved either true or false starting from the standard axioms — the statements taken to be true — of the theory of sets, which are commonly taken as the foundation for all of mathematics.

So this was not a 1940 result by Gödel but a 1960 result by Cohen.

It is inconceivable that this new result has any practical impact. For all we know, the physical world does not even need infinite sets to exist.

In the latest paper, Yehudayoff and his collaborators define learnability as the ability to make predictions about a large data set by sampling a small number of data points. The link with Cantor’s problem is that there are infinitely many ways of choosing the smaller set, but the size of that infinity is unknown.

So they seem to be looking into learning things about an infinitely large data set. For all practical purposes (in particular given that practical computers have finite memory) we can limit application of AI to finite data sets.

Anyway, I will be the last to say that new results are uninteresting if they have no practical impact

syzygy · Post by **syzygy** » Sat Jan 12, 2019 10:28 pm

See also here.

Ben-David and colleagues then prove that the ability to carry out a weak form of monotone compression is related to the size of certain infinite sets. The set that the authors ultimately use in their work is the unit interval, which is the set of real numbers between 0 and 1. Their results imply that the finite subsets of the unit interval have monotone-compression schemes, and therefore are learnable in EMX, if and only if the continuum hypothesis is true, which is known to be unprovable.

There must be a mistake here. A finite subset of the unit interval is finite, so can't be giving much problems.

Anyway, whatever the exact theorem is that has been proven to be unprovable, this result only holds if one sticks to ZFC.

If one accepts the ZFC axioms and the continuum hypothesis, then apparently these guys have actually proved their theorem.

And if one accepts instead the ZFC axioms and the negation of the continuum hypothesis, then apparently these guys have disproved their theorem.

In the "real world", the continuum hypothesis does not really play a role (nor does the "C" in ZFC, i.e. the axiom of choice).

syzygy · Post by **syzygy** » Sat Jan 12, 2019 10:38 pm

The paper itself is also available: https://www.nature.com/articles/s42256-018-0002-3

Our focus is on a specific learning problem we call ‘estimating the maximum’ (EMX). The EMX problem belongs to both models discussed above. Here is a motivating example. Imagine a website that is being visited by a variety of users. Denote by X the set of all potential visitors to the website. The owner of the website wishes to post ads on it. The posted ads are to be chosen from a given pool of ads. Each ad A in the pool targets a certain population of users FA ⊆ X. For example, if A is a sports ad then FA is the collection of sports fans. The goal is to place an ad whose target population visits the site most frequently. The challenge is that it is not known in advance which visitors are to visit the site.

More formally, we assume access to a training sample of visitors drawn from an (unknown) distribution P. The collection of ads corresponds to the family of sets ℱ = {FA : A is an ad in the pool}. The ad problem above becomes an instance of the following EMX problem:

Given a family ℱ of subsets of some domain X, find a set in ℱ whose measure with respect to an unknown probability distribution P is close to maximal. This should be done based on a finite sample generated i.i.d. from P.

The family of sets ℱ∗ we consider is the family of all finite subsets of the interval [0, 1]. The class of probability distributions 𝒫∗we consider is the class of all distributions over [0, 1] with finite support.

Theorem
The EMX learnability of ℱ∗with respect to 𝒫∗ is independent of the ZFC axioms.

So it is about the "EMX learnability" of the family of all finite subsets of the unit interval.

Interesting article in "Nature" on Machine Learning

Interesting article in "Nature" on Machine Learning

Re: Interesting article in "Nature" on Machine Learning

Re: Interesting article in "Nature" on Machine Learning

Re: Interesting article in "Nature" on Machine Learning