AMA: Yoshua Bengio (self.MachineLearning)

Hello Prof. Bengio, What motivates you to stay in academia? What do you think about corporate research labs in terms of productivity and innovation when compared to academic labs. Does research flexibility (doing what you want, more or less) play a large role in this decision?

Hi there! I'm an undergrad and your work combined with Hinton's is a huge inspiration to me! A bunch of questions, so feel free to answer all or none!

Hinton semi-recently offered an awesome MOOC on Coursera over NNs. The resources and lectures it provided are what allowed me and many others to build homebrew nets and really get into the field. It would be a great resource if another researcher at the forefront of the field offered their own take, do you have any plans for something like this?

As a leading professor in the field, how do you personally view the resurgence of interest in modern NN applications? Do you believe it's well deserved recognition, guilty of overhype, some mixture of the two, or something completely different! On a similar note, how do you feel about the portrayal of modern NN research in popular literature?

I'm interested in using unsupervised techniques to learn automated data augmentations/corruptions for increasing generalization performance, which I hope is a promising hybrid of supervised and unsupervised learning that's different from traditional pretraining. A lot of advances have been made using "simple" data augmentations/corruptions pioneered in your lab like gaussian noise corruption and what we now call input dropout in the context of DAEs. Preliminary results on MNIST seem successful (~0.8% permutation invariant) and I can send code if you are interested but admittedly I'm just an undergrad with no formal research experience. Do you see this as an area with potential and could you point me to any resources or papers that you are aware of - I've had a hard time finding them.

No one has a crystal ball, but what do you see as the most interesting areas of research for continuing to advance your work? The last few years has seen purely supervised techniques make a lot of headroom riding off the success of dropout, for instance.

Thank you so much for doing this AMA, it's great to have you here on /r/MachineLearning!

Traditional (deep or non-deep) Neural Networks seem somewhat limited in the sense that they cannot keep any contextual information. Each datapoint/example is viewed in isolation. Recurrent Neural Networks overcome this, but they seem to be very hard to train and have been tried in a variety of designs with apparently relatively limited success.

Do you think RNNs will become more prevalent in the future? For which applications and using what designs?

Thank you very much for taking your time to do this!

Hi Prof. Bengio, I'm an undergrad at McGill University doing research in type theory. Thank you for doing this AMA!

Questions:

My field is extremely concerned with formal proofs. Is there a significant focus on proofs in machine learning too? If not, how do you make sure to maintain scientific rigor?
Is there research being done about the use of deep learning for program generation? My intuition is that eventually we could use type theory to specify a program and deep learning to "search " for an instantiation of the specification, but I feel like we're quite far from that.
Can you give me examples of exotic data structure used in ML?
How would I get into deep learning starting from zero? I don't know what resources to look at, though if I develop some rudiments I would LOVE to apply for a research position on your team.

Dr. Bengio, In your paper Big Neural Networks Waste Capacity you suggest that gradient descent does not work as well with a lot of neurons as it does with fewer. (1) Why do the increased interactions create worse local minima? (2) Do you think hessian free methods like in (Martens 2010) are sufficient to overcome these issues?

Thank You!

Ref: Dauphin, Yann N., and Yoshua Bengio. "Big neural networks waste capacity." arXiv preprint arXiv:1301.3583 (2013).

Martens, James. "Deep learning via Hessian-free optimization." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.

Verification post: https://plus.google.com/103174629363045094445/posts/2fqbkyYULAf

With the recent success of maxout and hinge activations, how relevant is the older work on RBM pretraining using various contrastive divergence tweaks? What do you think is still worth investigating about stochastic models?

How biologically plausible is maxout, and should we care?

Hello Prof. Bengio, thank you for the AMA. What recommendations would you have for someone who is not a PHD in getting started with Deep Learning.

[deleted]

Dear Yoshua, thanks for doing this!

You are, to my knowledge, the only ML academic to publicly (and wonderfully!) speculate about the sociocultural perspectives afforded by the vantage of deep representation learning. In your fascinating article "Culture vs Local Minima" you touch on many important things, some of which I'm very curious about:

You describe how individuals learn by being immersed in culture. We both agree that they don't always learn very wholesome things. If you were king of the world, and you could prescribe a set of concepts that should be a part of every childhood learning trajectory, what would those be and to what end?
A corollary of "cultural immersion" is that the specific process of learning is not evident to the learner, the world simply "is" in a particular way. The author David Foster Wallace phrased this phenomenon as akin to fish having to figure out what water is. In your opinion, is this phenomenon an experiential byproduct of the neural architecture, or does it confer some learning benefit?
Why do you think that cultural trends become entrenched and cause their learners to fight to stay in (what could be argued to be) local optima - like e.g. the conflicts between various religious institutions and Enlightenment philosophy, or patriarchal society vs the suffragettes, etc.? Is this a case of very pernicious parameters, or is there some benefit to the learners in question?
Do you have an opinion on such concepts as mindfulness meditation, and if so, how do you think they relate to the exploration of "idea space"?

Again, thanks a lot for taking the time. In the space of human ideas you are a trailblazer, and we are immensely richer for your presence!

Verification post: https://plus.google.com/112504130537129706790/posts/eqdBAysAyqR

I live in Montreal, working in the technology startup world. Very interested in your work, thank you for doing this AMA Professor Bengio. I worked hard to filter down to one question:

There seems to be a lot of disinterest from Machine Learning specialists and academics in general towards ML competitions hosted by Kaggle and the like. I recognize the odds of winning are quite low, making a the return on the investment of your time even worse, but it would seem to be even worse for ML enthusiasts that are flocking to participate. It would seem a few hours from an ML domain expert could be really beneficial on the right open datasets. Can you imagine an open, collaborative approach to competitive machine learning where experts and enthusiasts work effectively together?

What's your opinion of Solomonoff Induction and AIXI? I'm just starting to read up on the topic, and I can't quite decide whether it's serious work, or a fringe theory by a small group of people who all cite each other.

We have all been hearing about the performance achievable via deep learning (in academic journals such as the New York Times, no less!). I've also heard that it's difficult for non-experts to get these techniques to work: Ilya Sutskever says that there is a weighty oral tradition about the design and training of deep networks and that the best way to learn how is to work for years with someone who is already an expert (source: http://vimeo.com/77050653).

I studied machine learning but not deep learning. Going back to grad school is not really an option for me. How can I learn how to design, build, and train deep neural networks without access to the oral tradition? Could you write it down for us somewhere?

Hi Prof. Bengio, There were some work on applying "higher" math - algebraic/tropical geometry, category theory, to deep learning. Notably, John Healy several years ago claimed improving neural net (ART1) with category theory. What's your opinion on this approach? Will it be only toy model in foreseeable future, or there is some promise in this approach in your opinion?

Dear Prof. Bengio,

I am about to finish my PhD in computational neuroscience and I am very interested in the "gray area" between neuroscience and machine learning.

What aspects of brain computation do you think are (or will be) most relevant for machine learning?

If you could know the answer to one question about how the brain computes information, what would that be?

Thanks!

When asked about sum product networks, one of the original Google Brain team members told me he's not interested in tractable models.

What's your opinion about sum product networks? They made a big splash at NIPS one year and now they've disappeared.

Why do Deep Networks actually work better than shallow ones? We know a 1-Hidden-Layer Net is already an Universal Approximator (for better or worse), yet adding additional fully connected layer usually helps performance. Were there any theoretical or empirical investigations into this? Most papers I read just showed that they WERE better, but there were very few explanations as to why -- and if there was any explanation. then it was mostly speculation.. what is your view on the matter?
What was your most interesting idea that you never managed to publish?
What was funniest/weirdest/strangest paper you ever had to peer-review?
If I read your homepage correctly, you teach your classes in French rather than English. Is this a personal preference or mandated by your University (or by other circumstances)?

We have seen deep learning work really well for image/video/sound. Do you foresee it working for text classification as well? Most papers that have tried text/document classification using deep learning have not done better than the conventional SVM/Bayes. What are your thoughts on this?

I see more and more pop media articles extolling deep learning as a panacea that will make AI a reality (Wired is especially guilty of this). Given the AI winters of the 1970's and 1980's that arose from overhyped expectations, what can deep learning and ML researchers and advocates do to mitigate this from happening again?

[deleted]

Hi Bengio. I'm a masters candidate in robotics, mostly doing reinforcement learning mushed together with some ML regression methods for the identification of interesting value functions and state space representations.

How is your work life balance? Do you have fun? What sorts of things do you do to unwind?

I'm considering doing a PhD, but I literally feel like just getting a part-time job and doing independent research, because the academic environment can be pretty stifling.

Also, Montreal seems really fun!

J

Dr Bengio,

I'd like to thank you for the amazing research and software(theano, pylearn2) that your lab has contributed.

What are your feelings on Hinton and LeCun moving to industry?

What about academia and publishing your research is more valuable than the floating point overflow of money you could make at private companies?

Are you nervous that machine learning will go the way of time-series analysis, where a lot of advanced research takes place behind closed doors because the intellectual property is so valuable?

Given the recent advancements in training discriminative neural networks, what role do you envision generative neural networks play in the future?

Hi Yoshua, very excited about this AMA, thank you for your time. I have a few questions:
- What are the biggest challenges in ML nowadays?
- What are the most interesting and/or creative ways you have seen people/businesses using ML?
- What does the future of Machine Learning look like?

Last year I did my undergrad thesis on NLP using probabilistic models and neural networks partly inspired by your work. I became interested and at that point I considered doing further work on NLP. Currently I am pursuing an MSc degree taking several related courses.

But, after several months, I haven't found NLP to be as motivating as I was expecting it to be; research on this area seems to be a little stagnant, from my limited point of view. What do you think are some challenges that are making or going to make this field move forward?

Thanks for taking the time to answer some questions here!

While deep nets have helped move the state of the art forward in natural language text understanding, the improvements there haven't really been significant. Where do you think significant progress can come from in that field?

What will be the role of deep neural nets in Artificial General Intelligence (AGI) / Strong AI?

Do you believe AGI can be achieved (solely) by further developing these networks? If so: how? If not: why not, and are they still suitable for part of the problem (e.g. perception)?

Thanks for doing this AMA!

Have you observed practical applications where deep learning succeeds but traditional ML fails? i.e. not simply improving the state of the art on an image benchmark by X%, but a case where an intractable problem is made tractable, solely via deep learning?

arXiv:1305.0445v2 [cs.LG] 7 Jun 2013

Can you describe what you are currently researching, first by bringing us up to speed on the current techniques used and then what you are trying to do to advance that?

Hi Prof. Bengio,

Thank you for doing this AMA. Questions:

How much do you think we can actually accomplish in the big data challenge?
Do you think data alone is sufficient to solve practical problems, as opposed to use some kind of expert knowledge?

I'm currently finishing up my undergrad in philosophy of science and logic and I am trying to make the switch to computer science for masters work with the intention of pursuing machine learning at the phd level. Besides filling in the obvious knowledge gaps in mathematics and basic programming skills, what are some of the things a person in my position could do to make themselves a more attractive candidate for your field of work? Thanks so much for visiting us a r/MachineLearning!

So I've had a desire to get deep into Deep Learning and general machine learning for a while. I'm currently taking the computational neurology course coursera offers. I'll follow that up with the ML and NN courses.

Where do you recommend someone go from there? I've not seen much that is at the grad level out there.

Hi! The guys behind the Blue Brain project intend to build a working brain by reverse engineering the human brain. I heard Hinton be critical of this approach in a talk. I got the impression that he believed the kind of work that is done within ML would be more likely to lead to a general strong AI.

Let's imagine we are some time in the future, and we have created strong artificial intelligence - that passes the Turing test, and generally passes as alive and conscious. If we look at the code for this AI, do you think it would mostly be a result of reverse engineering the human brain, or would it be mostly made of parts that we humans have invented on our own?

Hi Sir, I am a self-learner trying to train a sparse autoencoder with linear/relu units. What would be a suitable sparsity cost which is differentiable? I saw something that uses KL divergence but could not understand it. Is sparsity-inducing formula a holy grail or secret? Thanks, KK.

Hi Prof Bengio,

Is it possible to get into Lisa-Lab without any Machine learning/Deep Learning publications? The university I'm attending does a tiny bit of research in computer vision, bioinformatics, and 1980s-era neural networks; but none of it as contemporary or as in-depth as the research at Lisa-Lab and the other labs listed on Deeplearning.net

As EJBorey says, "I've heard that it's difficult for non-experts to get these techniques to work." Was is the most promising work being done to automate the configuration of deep learning networks? Thanks!

Is there attempts to apply neural nets to the task of machine translation?

When do you think NN based approaches replace statistical methods in commercially deployed MT systems? I mean in speech recognition(all major industry players) and vision(Google, Baidu) tasks NNs are already deployed...

What are your thoughts on Google acquiring all of these different AI related companies the last year or so?

This thread has been linked to from elsewhere on reddit.

[/r/compsci] Deep learning pioneer Yoshua Bengio taking questions for his AMA in /r/MachineLearning
[/r/artificial] Deep learning pioneer Yoshua Bengio AMA: Thursday 1-2PM EST in /r/MachineLearning
[/r/Futurology] Deep learning pioneer Yoshua Bengio taking questions for his AMA in /r/MachineLearning

^I ^am ^a ^bot. ^Comments? ^Complaints? ^Send ^them ^to ^my ^inbox!

Any advice on hiring your students? What is compelling to the modern machine learning PhD?

Sorry for being so mundane: What as yet unexplored fields do you see machine learning being applied to in the future?

Dear Prof. Bengio,

In Neuroinformatics several researchers work in the field of 'Reservoir Computing' (random sparse RNN with a linear read-out which is trained). Comparing this architecture to 'Deep networks' I see a lot of similarities in both approaches. There seems to be a strong link between learning abstract features in deep architectures and plasticity mechanisms in spiking reservoirs.

I would very much like to hear your opinion on this

Who are some of the people you have a lot of respect for?

What was the last fiction book that you've read?

Dear Prof. Bengio.

In my experience with using different neural networks models, it seems that either a good initialization (for example via pretraining, or the sort of guided learning) or the structure (think of the convolutional net) or standard regularization like l2 norm is crucial for learning. In my opinion all of them are special forms of the regularization. Therefore, it looks that 'without prior assumptions, there is no learning'. In the era of 'big data' we can slowly decrease the influence of the regularization part - and therefore develop more 'data-driven' approaches.

Nonetheless, still some form of regularization is needed. For me it seems there is a complexity gap between training networks from scratch (and keeping the regularization as small as possible), and using regularized networks (structure, l2 norm, pre-training, smart initialization, ...). Something like P-hard vs NP-hard in the complexity theory.

Are you aware of any literature that tackle this problem from the formal or experimental perspective?

When you're learning something new, do you spend time trying to figure out how the learning process is happening in your own brain?

Hi Professor Bengio, thanks so much for answering our questions. I was wondering what you thought of stochastic feedforward methods like Tang and Salakhutdinov presented at NIPS last year.

It seems to me like a great way to get some of the benefits of stochastic methods (especially the ability to predict at multiple modes) while retaining the efficiency of feedfoward methods that can be trained by backprop. It seems like there are some interesting parallels between this approach and the stochastic networks your lab has been working on, and I'd love to hear your thoughts on the comparison.

Thanks again!

Hello Dr. Bengio,

Thank you for your time. There are two questions I would like to ask you, if you don't mind:

How is the atmosphere in your lab?
What do you look for in a graduate student?

If I were summarizing the results from deep models, I'd say that deep models are excelling in problems that humans held the previous state-of-the-art (vision/audio/language).

Do you know of any successes in problems of the opposite nature; problems where statistical methods are already better than humans? One example I can think of is the Merk Kaggle challenge won by George Dahl, but I'd love to hear of some more.

Hi Professor!

I always find myself resorting to ensembles and random forests in my projects (I think I can just internalize decision trees much better than deep learning). Could you offer the flip side for why I should be excited about neural networks?

(I mostly work with "medium-sized" data, and it usually fits on a single machine.)

Thanks!

What are some things that self-taught machine learning scientists lack that those trained in a formal environment (university or similar) have?
(I'm asking as a member of the first group)

There seems to be a recent trend where a lot of deep learning researchers have moved to industry, ostensibly to gain access to very large data sets. Do you think deep learning research within academia can continue to flourish without such access? Or is the field invariably moving toward HPC and massive data sets as perquisites?

Professor Bengio,

What do you think of Ray Kurzweil's PRTM? Do you think any of its characteristics could be implemented on current deep learning techniques to improve their capabilities?

Thank you.

Hello, professor. I have a question that I always ask experts in their fields: In your field of study, what is the best book/paper you know of? Why? (here "best" can have any meaning, as long as it's specified)

Thanks.

Do you think of any other interesting deep learning approaches to NLP than Recursive Neural Network from Richard Socher ?

[deleted]

Hi professor Yoshua Bengio.

Do you think that machine learning as we understand it today will be the basis of future AI?

Which is a bigger obstacle to making AI stronger, hardware limitations or algorithmic/software problems? What is the biggest obstacle to making AI better in general?

What do you think of Ray Kurzweil's prediction that an AI will pass the Turing test by 2029? He has placed a bet on this prediction.

Which suggestions would you give to a young professor building a new research lab on machine learning, neural networks and such? What do you think are the most important aspects about lab environment, hardware and software resources? What about international cooperation? Also, How to be competitive worldwide?

Professor Bengio,

Thank you for taking our questions. How do you respond to this criticism of Deep Learning from Jeff Hawkins:

Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time. Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment. “Google’s attitude is: lots of data makes up for everything,” Hawkins says.

Source: Deep Learning

[META] In the comments at the verification page it looks like Yann LeCun is open to the AMA idea as well! Should we try to request him as well?

Bonjour professeur Bengio! Thank you so much for this AMA! Here are a few questions of mine (not chosen i.i.d.):

Where does deep learning show promise? And in what application would it be an absolutely horrible choice?

Why do stacked RBMs work? Is this something that can be explained in a throughly formal manner or is there still some magic that needs to be unraveled?

What would you say is the relationship between ensemble learning and deeply layered learning?

Can you describe some of the work your lab/grad students is/are doing and why you support it?

What are some of the best things about living in Montreal?

How do you like to approach a research question? What kind of working environment do you prefer?

Can you talk about the connection, if there is one, between big, structured knowledge projects like Google'sKnowledge Graph (built largely on the entity graph Freebase) and deep learning?

Is it significant that the data of the knowledge graph has this recursive network structure that looks a lot like the layers of abstraction in a deep learning setup?

This question is regarding deep learning. From what I understand, the success of deep neural networks on a training task relies on choosing the right meta parameters, like network depth, hidden layer sizes, sparsity constraint, etc. And there are papers on searching for these parameters using random search. Perhaps some of this relies on good engineering as well. Is there a resource where one could find "suggested" meta parameters, maybe for specific class of tasks? It would be great to start with these tested parameters, then searching/tweaking for better parameters for a specific task.

What is the state of research on dealing with time series data with deep neural nets? Deep RNN's perhaps?

[deleted]

Is fluency in French a pre-requisite to becoming your student? Does it matter at all?

Given three candidates, none of which have much experience in ML, who would you rather chose as a potential student (other dimensions being equal):

Someone experienced in applied statistics (say, psychology research, or epidemiology), knows R
Someone who is very good at software development and knows some numpy/scipy, Matlab
Pure math undergrad who has little exposure to either programming or "real world" data

I know I'm a little late to the party, but I was just wondering if you thought there was any room for an evolving topologies algorithm such as NEAT within deep learning? In some ways, techniques like dropout and dropconnect approach an evolving topolgy type methodolgy, but overall the idea of an evolving topology is not entirely captured by such techniques.

Thanks for doing this AMA!

Hello Prof. Bengio. I am a student from Denmark.

I am trying to add your Maxout Networks solution to the sparse autoencoder to see the potential benefits ... do you have any pre comment?

Can we be allowed to see more updates on your DL book .. hehe

Hello professor Bengio I tried to run the Matlab toolbox that you have for DBN and I run at same time the Plearn app, but I want to know how can run a similar process between them?, because it is some options on plearn that are so different with the Matlab schemes and it would be useful to prototype a faster application.

Thank you

JMM

Hi Prof. Bengio, very happy to see you here.

What is the difference between learning and deep learning ? For example, the neural network language model (using RNNS) published in work by Mikolov is referred as a deep learning method , can you point out the reason or maybe explain (deep learning) by using another example of a deep learning method ?

Hi! No experience with deep learning, here. The introduction says that deep learning advances in machine learning can be used to solve artificial intelligence problems. Does that mean solving the consciousness/self-awareness problem or is it in a narrow sense?

What exactly is deep learning and how it differ from conventional ML?

Could you explain Rationale behind sparse and deep learning?

Did we get just 5 responses to ~100 questions?

Do you know all terms mentioned in questions here?