Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Biological evolution and information acquisition (construction-physics.com)
44 points by chmaynard 8 hours ago | hide | past | favorite | 7 comments
 help



When trying to apply these ideas to other problems, the advantages offered by biological concepts almost always seem to be more than compensated by the computational difficulty of implementing them. Gathering 100x more bits per iteration is great, but not if it takes 1000x longer to run each iteration.

Population-based methods tend to perform poorly on practical hardware due to the IO and latency constraints around managing the candidate genomes. The moment you rely on populations you also become dependent on new problems like genetic diversity which is best solved by having... an unlimited population size. Pure mutation might converge more slowly on paper, but you can hill climb a single candidate at a ludicrous rate if it fits entirely within L1/L2. Biology being permitted virtually infinite population sizes with zero performance impact is the only reason this stuff works.

I'll always start with the technique that can do a million iterations per second over the one that does maybe a ten thousand. Even if the first is slower in terms of my end goal, it gives me more information to work with experimentally. I have a higher resolution time domain to play inside of. The dream would be to find something that provides most of the benefits of crossover and larger gene pools without the severe performance hit. The best I could come up with is a delta encoding scheme which made me realize this is a compression / information theoretical iron triangle type of thing. There is no way we can recover all that performance.


Sure, if you're working on problems that are trivial to evaluate. If your goal is to find good algorithms or parameters then you can pick a problem that evaluates fast.

But for many interesting problems evaluation will be the bottleneck, and it doesn't matter if the evolution algorithm fits into L2. It is more important to make use of distributed computing, e.g. it shouldn't have to stop all evaluations to make progress, or alternatively it should work well with very large population sizes.


> many unpromising and time-consuming branches of the search tree are screened off

Imagine if we networked absolutely everything so that no AI ever recomputed the same thing twice and they all shared a literal global cache


If you're interested in this I highly recommend this podcast (Mindscape by Sean Carroll): [0]

"Nothing in biology makes sense except in the light of information."

[0] https://www.youtube.com/watch?v=4PCHelnFKGc


Breaking knowledge down into combinable modules is a great insight; ultimately, I think that’s the idea behind Zettlekasten.

The article has a few problems with regards to the assumptions it makes.

For instance:

> often caused by random mutation, preferentially selecting the most fit organisms to propagate their genes into the future

So, mutations are not entirely random; transposons, for instance, have preferential hotspots, same with individual nucleotide positions. But by and large there is a lot of randomness, so this is not necessarily the main problem with the statement. However had, it is then stated that there is a preference towards selection ("most fit organism") as well as a "propagating their genes into the future". I am quite certain that the assumption made here is largely by e. g. Richard Dawkins babbling about the selfish gene.

The thing is that evolution as-is, operates primarily over the phenotype as such. If you have, let's assume, mutations in genes that yield some benefit, in certain ecological environments, you may be "more fit" in the environment, assuming you can reproduce (as an organism). Still the selection step happens on the phenotype level though; you could have "negative" genes too that may offset that advantage. So the assumption of "most fit" in and by itself is already a tautology, even more as the concept of the gene itself is poorly definable - see books explaining this, such as "The Gene: From Genetics to Postgenomics".

Of course one can say that for some entities, the genome is the selection step, e. g. for organisms that may integrate into a given genome and then be propagated as part of that genome. But even then the phenotype is the important part; you need to see the DNA replicated, and if, for instance, the bacterium dies, the DNA is often degraded, so if you were some entity here that integrated (transpoon, viruses etc...) and the DNA is degraded, that is the game over step for you already. So you absolutely need the phenotype and this is the real level of selection. All the information prior to that, e. g. the genome sequence (aka many "genes", but again, the concept of the term gene is not well describable; after all RNA matters a lot as well but are all RNA-yielding sequences automatically "genes"?), has to pass that gauntlet.

Or this:

> By leveraging modularity at the genetic level, populations of organisms can increase the rate that useful genetic variants spread through the population, effectively increasing their rate of information acquisition. Sexual reproduction, along with other ways of sharing genetic material like horizontal gene transfer, is essentially a mechanism for doing this.

I understand what he refers to and while it is not incorrect, you also have constraints such as the fact that meioses requires double-strand breaks in order to resolve the Holliday junction. That's simply how crossing-over works. Yes, you can say that more variation is the net outcome of this, that is true, but without that resolution and the double-strand break, you can not do the later step of pulling the sister chromatids onto opposite poles of the cell. That's a cell biology constraint, not a genetic one at its core. Resolution is also not totally random either, as chromosomes have some kind of position regularly (Rabl orientation); see hotspots that exist (biological transposons naturally don't have this constraint anywhere near as much, but even transposons show preferential integration sites, which makes all variation not 100% random per se, in addition to other constraints such as the fact that organisms have to be viable, so not all random integration is possible to begin with).


While “most fit” seems tautological it’s also subtly wrong. There’s a great deal of randomness in survival and a great number of genes involved at the same time.

Extremely harmful mutations have massive selective pressure, but things with minimal negative impact can spread as long as they fall below the noise floor. There’s nothing pushing any specific negative mutation but the space of minor negative mutations is so large some of them spreading becomes quite likely.

Thus “sufficiently fit” is a more accurate model. Especially when you consider extinction is a common outcome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: