My time and space experiments – 1

1. Particle-wave behavior: Does there exist a model which incorporates both? Can we use prime numbers for it (frequency-like scheme and interger-based minimal integer value)

The answer is: quantum field theory.

2. Light sources: two light sources at both ends of a spring are emitted in opposite directions. What is the speed at which the spring extends?

http://physics.stackexchange.com/questions/127248/experiment-with-spring-and-two-light-sources-emitting-light-in-opposite-directio

3. Would full determinism mean that God cannot exist, since would exist no such object which would not be able to do whatever he wants?

http://philosophy.stackexchange.com/questions/14722/determinism-vs-the-existence-of-god

4. Does there exist a mathematical formalism (model) in particle physics that assumes that the existence of an infinite number of different, yet smallest particles (building blocks)?

http://physics.stackexchange.com/questions/127283/mathematical-model-that-allows-the-existence-of-an-infinite-number-of-smallest

5. Does the uncertainty principle imply the non-deterministic universe, or just the fact that our model based on observation can be at most non-deterministic? (this question has already been asked, below a good link)

http://physics.stackexchange.com/questions/24068/isnt-the-uncertainty-principle-just-non-fundamental-limitations-in-our-current

For a fixed $n \in N$, there exists at least one prime among the integers of the form $2^{k}n+2^k-1$ for an arbitrary $k \in N$.

From mathematical constants, through amplituhedron to eleboration on information structuring

Lets think what makes a beautiful constant. It is its ability to capture intrinsic information in our number structure. The deeper we go, the better the constants will be. One important note: the constants deal with our number system and as such sometimes don’t stand a chance to learn about

I will mention a couple of interesting ones. I took the pictures of interesting ones from an article on Wikipedia- http://en.wikipedia.org/wiki/Mathematical_constants_and_functions

Euler-Mascheroni (I do not like it so much since I cannot grasp the logic behind introducing the floor):

Golden ratio (Beautiful due to the spiral coming out of it, might be also over-rated- might be there are some better looking spirals?):

Dottie number (shown here due to its recursiveness and association with fixed points):

(fixed points: http://en.wikipedia.org/wiki/Fixed_point_(mathematics) )

Complex $i$ (complex $i$ had never existed among the regular integers, but then the distance from (0,0) among the “new” and “complex” did not exist in integers; I still will spend more time to understand the foundations of these numbers):

Square root of 2 (the concept of irrational numbers is yet another one, like the concept of complex numbers, which goes beyond many ideas )

Polya Random Walk constant (parameter space exploration exploiting the idea of random walk, how close to so many statistics-based ideas in machine learning; it is important to show what mathematics such a mechanism could use):

Liouville number (for its tremendous value in disclosing transcendental numbers, which direction to go to define what a number really is? see some of my previous posts)

Pi number (for connecting 2d distance with n-sphere, transcendental numbers; I think there is still much to learn about Pi)

Euler’s number (what is the relation of n+1 and n for big n’s, unbelievable thing that it is e)

But the constants themselves are just single shots at a data set. Others try to capture a larger picture by iterative approximation and modeling. Now I will show to to speak the language of distributions. The list is here: http://en.wikipedia.org/wiki/List_of_probability_distributions. I will shortly address only those that are of interest to this post.

Bernoulli distribution (for potentially inf. sequence of binary random variables, assuming 0/1)

Binominal distribution (number of successes in these binary 0/1 experiments)

Beta-Binominal (where success probability varies)

Hypergeometric distribution (first m of n 1/0 experiments, with the number of successes known)

Poisson distribution (probability of a given number of events occurring in time interval)

Or Zeta distribution- for learning about the world with Zeta glasses.

Now, the point is that all the distributions model the world given what we know about numbers. Distributions based on combinatorics and Borel’s probability measure. Based on our knowledge we model the processes and build universes’es “building blocks” based on our models. Like one called “amplituhedron”. It is supposed to be a higher level abstraction for prior learning in physics.

The same way we model processes, we model more granular structures, such as particles. We use e.g. the notions of simplex or polytope. Given the amount of data in the model some cry out that everything must be non-deterministic. This is clearly not necessarily the case, since one would have to provide a non-existence proof of any deterministic relation in the data set. Given that we don’t even have the entire one, but only the observable (at our level) part, some of us should focus on revising the most fragile foundations of the number system.

We model and will iteratively model. We will use distributions and n-dimensional notions but those notions are going to evolve. Larger abstraction is required for the prior and those will come up from the unsupervised free learning. To do so we have to investigate what is possible and capture limits and constants like the ones that I mentioned at the very beginning.

Quotes that I think of while learning to learn about thinking

- “guiding the training of intermediate levels of representation using unsupervised learning, which can be performed locally at each level”

- “When that prior is not task-specific, it is often one that forces the solution to be very smooth”

- “if a solution to the task is represented with a very large but shallow architecture (with many computational elements), a lot of training examples might be needed to tune each of these elements and capture a highly varying function”

- “when a function can be compactly represented by a deep architecture, it might need a very large architecture to be represented by an insufficiently deep one.”

- “with depth-two logical circuits, most Boolean functions require an exponential (with respect to input size) number of logic gates to be represented”

- “deep architectures can compactly represent highly varying functions which would otherwise require a very large size to be represented with an inappropriate architecture”

- “local estimators are inappropriate to learn highly varying functions, even though they can potentially be represented efficiently with deep architectures”

- “what matters for generalization is not dimensionality, but instead the number of “variations” of the function we wish to obtain after learning”

- “Kernel machines with a local kernel yield generalization by exploiting what could be called the smoothness prior: the assumption that the target function is smooth or can be well approximated with a smooth function”

- “Learning algorithms for deep architectures can be seen as ways to learn a good feature space for kernel machines.”

- “decision trees are also local estimators in the sense of relying on a partition of the input space and using separate parameters for each region”

- “the generalization performance of decision trees degrades when the number of variations in the target function increases.”

- “Although deep supervised neural networks were generally found too difficult to train before the use of unsupervised pre-training, there is one notable exception: convolutional neural networks”

– “An effect of the fan-in would be consistent with the idea that gradients propagated through many paths gradually become too diffuse, i.e., the credit or blame for the output error is distributed too widely and thinly ” ?

- “is that the hierarchical local connectivity structure is a very strong prior that is particularly appropriate for vision tasks”?

- “generalizes the mean squared error criterion to the minimization of the negative log-likelihood of the reconstruction”

- “means that the representation is exploiting statistical regularities present in the training set, rather than learning to replicate the identity function “

- “trying to encode the input but also to capture the statistical structure in the input, by approximately maximizing the likelihood of a generative model “

For a fixed $n \in N$, there exists at least one prime among the integers of the form $2^{k}n+2^k-1$ for an arbitrary $k \in N$.