"Evolved" Caches Could Speed the Net 195
SpaceDilbert writes "According to New Scientist, evolutionary algorithms could make many network caches twice as efficient. This article describes a study carried out by a US researcher and two German academics, who "evolved" algorithms to determine what data should be held at a cache and for how long."
Caching (Score:5, Funny)
That's easy, just cache 4 boobs at a time instead of two
Re:Caching (Score:2, Funny)
Algorithms (Score:5, Interesting)
It would be interesting to see exactly which algorithms they are talking about here. I wouldn't be surprised if they drew some ideas from garbage collection algorithms also.
Re:Algorithms (Score:5, Funny)
I suppose you're the clever garbage man from the Dilbert cartoons. Because as an engineer I don't really get the connection between garbage collection and caching algorithms. Now that we have covered the first two frames of the cartoon - what's in the third frame where the garbage man makes me feel ashamed by explaining some complicated concept of engineering?
Re:Algorithms (Score:5, Insightful)
It seems to make sense to me, if only because the two are complimentary problems. Caching is figuring out what stuff is valuable so you keep it around. GC is figuring out what stuff is valuable so you can throw away the rest. Kind of like in probability where it's easier to figure out the likelihood of something not happening and then subtracting from 1 to figure out how likely it is to happen.
Re:Algorithms (Score:5, Insightful)
not even close (Score:5, Insightful)
For cacheing there is no requirement of correctness. Performance is improved when things that will be used in the future are kept. There is no correctness requirement for keeping things around, and indeed best performance often requires discarding things that will be used in the future. In addition, there is no way of determining what will be used in the future without knowledge of the future. The correctness requirements of a cache are related to security (don't cache sensitive information) and staleness (don't cache stuff that is too old), but even those may be relaxed rather than strict requirements.
Fundamentally the two problems are very different, and algorithms that deal are successful in the two different cases will likely be very different.
Re:not even close (Score:2)
In garbage collection it is not correct to throw out something that will not be used in the future unless it is also unreachable.
It is correct to do so, but in practice, you cannot be certain that the data will not be used, unless it's unreachable.
Re:Algorithms (Score:2)
I thought Garbage Collection was shuffling stuff around with the goal of consolidating all the stuff you don't want into large blocks (collecting the garbage) so the space can be reallocated efficiently.
The smart garbage man is on vacation.
Re:Algorithms (Score:2)
Re:Algorithms (Score:2, Informative)
Re:Algorithms (Score:2)
I'm not sure how relevant that would be, as garbage collection in general is way more radical than what a cache needs -- a cache operates best when full.
:-)
The best algorithm for improving a cache is, of course, to increase the size of the cache.
Regards,
--
*Art
Re:Algorithms (Score:3)
Re:Algorithms (Score:4, Insightful)
By contrast, in caching, 'references' are 'client requests', which are unpredictable and highly variable. Furthermore, on busy networks, there is seldom a moment for you to 'stop the world' and 'evolve' for a while!
In the story, people use network simulators and randomly generated requests. Since network simulators are notoriously simplistic and inaccurate, and caching is heavily influenced by real-life workloads, I'm interested to know whether their algorithm is applicable in reality.
What makes a good cache? (Score:5, Insightful)
Well, I have found one flaw in the methodology:
I would think this would breed out of the caching system any affinity to locality-of-reference.One of the things I did each morning when I was running a cybercafé was "prime" the Squid cache by running a little script that did a wget -p on all of the URLs in the portal page, and a few sites that were not. And it definitely did improve performance for most users.
One of my unrealized dreams would be some sort of speculative-fetch algorithm for Squid that would basically do a breadth-first fetch on a page while the user was busy reading it.
Of course in my not-so-humble opinion, the biggest problem with any caching system is the population of websites that, through either malice or incompetence develop content that is cache-hostile, and call it "experience enhanced".
Re:What makes a good cache? (Score:4, Insightful)
Assuming you can still read the code, you're still going to want to put in common-sense improvements that the GA's didn't discover.
Re:What makes a good cache? (Score:3, Interesting)
So, do you posit that the purpose of the GA is just to give human programmers insight into avenues they might not have otherwise considered?
Re:What makes a good cache? (Score:4, Interesting)
e.g. This page looks a lot like a truly dynamic page, so let's not cache it. Over here, this is dynamic, but only slightly so. Let's cache it and index the most likely path the user will take.
Have CS researchers given up on the neural net approach, or are the nets still far too unstable for real world use?
Re:What makes a good cache? (Score:3, Interesting)
It turns out that if you have more than three layers (in other words, if you have any layers between the input and putput), you run into proplems in training when the network doesn't work as well as you want it to perform, but furthur cycles of your training algorithm don't seem to make it any better.
The comp.ai.neural-nets FAQ [fu-berlin.de] was my primary source for that paper. Read that if yo
Re:What makes a good cache? (Score:3, Informative)
Re:What makes a good cache? (Score:2, Informative)
Well, no, it's just that neural nets have to use a carefully calculated and parameterized error-propagating algorithm, as well as an appropriate architecture and size of network, and if you choose the wrong one, you're basically hosed. Genetic algorithms (and genetic programming) have learning algorithms that are much simpler to understand and implement -- even though there's a lot of argumen
Re:What makes a good cache? (Score:2)
The reason that neural nets are not in common usage is because they must be specifically designed for the problem they are going to solve. That design work is difficult & requires expertise. Neural nets are not suited to every type of problem. But no, CS researchers have not given up on neural nets.
Re:What makes a good cache? (Score:3, Insightful)
What you get out of the breeder shouldn't be what you put into production. Assuming you can still read the code, you're still going to want to put in common-sense improvements that the GA's didn't discover.
Could this be the long awaited answer to harmonizing Creation and Evolution :-)
This seems a little...selfish (Score:2, Insightful)
Re:This seems a little...selfish (Score:5, Interesting)
"Priming the cache" and "doing a breadth-first fetch on a page" are both things that create *more* network traffic on the off-chance that it might save some number of microseconds for the user.
More traffic isn't necessarily bad. While what you say is true, you fail to note that user-initiated traffic is done in bursts. Just like your CPU is idle >95% of the time, so is your network connection. So all users benefit, both in real and perceived performance, when there is a steady 100% utilization.
So everyone just grabs what they can get and everyone is worse off.
Again true, but naive. What would be better is if there were a mechanism to prioritize the pre-fetch cache. Every page has one link that is pretty much the most likely next page. Then a secondary link, and so on. Ideally, a site owner should be able to put that priority list somewhere in the page such that a user agent can start getting it after the current page has loaded and is being read. Otherwise, maybe the user agent can favor new content (i.e., compare this load of Slashdot with the one done five minutes ago and grab the links in the diff). That's a far cry from a mad grab, and would probably benefit all parties involved.
Re:This seems a little...selfish (Score:2)
Re:This seems a little...selfish (Score:3, Interesting)
There's the link tag for html headers, which specifies the next page, previous page, and things like that.
Ah, I knew it was too good an idea to be all my own! For those not aware of how to use the link element for that, I found this page [w3.org] as well as this one [w3.org]. I don't know how widely it is used by site designers, and I can't say I know of any web browsers that use it, but I would say it is definitely something a proxy server should be taking advantage of. It's use is even stated directly in the second
Re:What makes a good cache? (Score:3, Interesting)
Except that wasn't what the poster was getting at - he was noting that the trials provided to the GA were essentially random, and didn't necessarily reflect what the cache is likely to see IRL.
I suspect, however, that this is an error in the NS article, and that the data sets *were* tuned
Ok.. (Score:2)
So the ISP's will have to upgrade their soft/hardware to make this work?
Will that be worth it ?
In my opinion the article was a bit light on details
Re:Ok.. (Score:2)
Sounds like it could save an ISP a ~lot~ of money.
Algorithms (Score:3, Funny)
Re:Algorithms (Score:5, Funny)
That would be the "suicide algorithm". As the server goes up in flames from the Slashdot-effect it brought upon itself, it would become the first cyber recipient of a Darwin Award.
Alan.
Business model (Score:2)
2. Scan new story for links.
3. Cash[sic] those pages.
4. PROFIT!
what about worm traffic? (Score:4, Interesting)
That alone would save me quite a bit of traffic as people on my subnet hit me constantly with their infected machines.
66.41.161.120 hit my machine 57 different times (that isn't individual requests, that's total times).
Re:what about worm traffic? (Score:2)
LRU Rules (Score:5, Informative)
For the uninitiated, elements are added to an LRU cache until it fills up; thereafter, whenever a new element is added, space is made for it by throwing away the least-recently used one. Note, least recently used, not the least recently added, i.e. the oldest, since an element that was cached long ago may be used all the time, and so be well worth its place in the cache. For example, consider the company-logo image that your browser caches when you visit a new site and that is embedded in every page on that site. However old it gets, it's likely to continue to be used while you're on the site. As soon as you move to another site, it gradually shuffles its way down the deck until it falls off the bottom - which is precisely what you want.
Re:LRU Rules (Score:4, Informative)
Re:LRU Rules (Score:5, Informative)
Thanks for the pointer. Here's a link to some background [ibm.com] on ARC, and a paper [ibm.com] describing the algorithm. Looks like an interesting algorithm.
Re:LRU Rules (Score:3, Interesting)
For small cache sizes with one or a few fairly similar users, this is true. For larger cache sizes with a large number of users, it's almost always better to expire based on both LRU (Least Recently Used) and LFR (Least Frequently Requested). You may want to cache all those sucky Christmas links until next year
Not DNS (Score:4, Interesting)
GroupShares Inc. [groupshares.com]
Re:Not DNS (Score:3, Informative)
Re:Not DNS (Score:2)
Re:Not DNS (Score:2)
If your zones are taking too long to propogate, maybe you need to Read The Fine Manual [isc.org] ... especially the parts around Refresh and Time to Live
Great (Score:2, Funny)
bittorrent tie in? (Score:4, Interesting)
just something I'm thinking about today, well, that and the Kerry/Edwards pairing.
PCBRwer342$#
Re:bittorrent tie in? (Score:2)
However, in practice, I'm not sure how useful somehting like this would be. Most websites today have so much dynamic info, that it simply would not work. For example,
This would be c
Re:bittorrent tie in? (Score:2)
I'm thinking of almost a distcc way of handling this, where the server can dole out tasks to any/many other servers that it finds that are 'willing' to help out.
PCB%$s
Re:bittorrent tie in? (Score:2)
IMO peer-to-peer caching/mirroring would work for big chunks of data where the overhead of finding a cache (guesstimate a few to a few tens of round-trips among peers and the original site) would be insignificant compared to the actual download time. For short pages it would be better just to return the page as opposed to the current list of mirrors.
On the other hand if the page contained (in the HTTP metadat
Next Gen Networking? (Score:5, Interesting)
He[Pablo Funes] suggests networks might in the future be designed to work out who deserves the most help for themselves. "Sophisticated network behaviours might implement rules for reciprocity and trust," he says. "And conversely, for not cooperating with other others who try to abuse our resources."
The future of network security? Imagine the next computer virus outbreak: Every network in the world could recognize the virus type activity and allocate them lesser or zero resources, maybe sending them a "Virus detected, please run antivirus software or contact your IT Department" notice, and detecting outside attacks from viruses and automatically flagging them as unsafe, and not give much(or any) attention to traffic from or to that site
Re:Next Gen Networking? (Score:2)
Could also work if it communicated "$IP being DDOSed, please trace back and isolate those who are doing this"?
Basically, I think we really should be thinking about how to build an immune system for a network and what minimal behaviour from the nodes would achieve it.
Evolutionary Computing (Score:2)
the details of their method (Score:5, Interesting)
Re:the details of their method (Score:2)
Al Gore is a great contributor (Score:4, Funny)
Once again, Al Gore has his hand in the shaping of the internet.
I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to popular culture. Truly an American icon.
A note on hill-climbing (Score:5, Informative)
If the search space is not dominated by local maxima, basic hill climbing (go for the best neighboring value) will work. And it will be fast. If the function is differentiable, it can be orders of magnitude faster than other methods, because you can use a variant of Newton's Method.
If the search space is small, random search (just guessing) will work by exhaustively searching the space. This is obvious, but tends to be ignored in academic papers all too often.
This discussion also applies to neural nets and simulated annealing.
Now this article at least describes a problem for which a GA might actually be useful. Many such articles don't. But they haven't demonstrated that you need a bumpy hill-climbing algorithm.
This is why, despite all the hype, GAs, neural nets, and such aren't used all that much. The search space has to have the right properties. Not too small, not too big, bumpy, but not too bumpy.
Re:A note on hill-climbing (Score:2, Informative)
A large part of the reason GAs aren't catching on more is that there aren't enough really good text books on them. All of the really useful information is still in the form of research papers, these things take time. Also,
Legos and Tron mentioned in his thesis (Score:3, Informative)
The is even a link to an online Tron game [brandeis.edu] where us humans can play versus his evolving algorithms. The win/loss stats [brandeis.edu] for his algorithm is approaching even. Given that humans can also evolve in the Tron game play, I imagine that the algorithm will have a head start over the new influx of slashdot visitors and start to win more often than not over the next week. I never got to play though. The SQL db to mange the stats was already down then I tried.
This can't be legal (Score:2, Funny)
What happens if someone caches Metallica, or the new super hot J-Lo movie? Then how is the guy who brings the director a warm towel ever going to make his money?
You guys should really think before you go out and start making technology that blatently abuses copyrights
</sarcasm>
A new sorting algorithm also ... (Score:2, Informative)
with Critticall tool. [wikipedia.org]
Original site. [critticall.com]
50% of all bits (Score:3, Funny)
Security (Score:2)
so far so good but.. (Score:3, Insightful)
so what happens if you, for instance, have a security hole in one of your "smart" servers?? or even a breach in the protocol structure (DoS)?
you could get the server to "breed" algorithms that would stop all servers by either corrupting the data in all servers that are providing the service or just DoS-ing them.
I guess i'm not yet convinced that this solution is of any good for real world network which the whole structure is based on insecure protocols.
maybe after IPV6, IPV9, who knows?
Freenet (Score:3, Interesting)
Doesn't Freenet [freenetproject.org] already do this within its own network?
Not such a great idea (Score:2)
First of all I would point out that genetic algorithms are most appropriate for pure optimization problems with minimal mandatory constraints. While loading speed is important for internet caching it must be optimized inside certain constraints. A good cache s
Re:Sigh... (Score:2)
Re:Sigh... (Score:3, Informative)
\ This was on data where the distance the document had travelled was taken into account. So, given some available
Re:*sigh* (Score:5, Informative)
The evolutionary algorithm will have a range of all possible algorithms that can be developed, so in a sense it is limited to "test various algorithms", though it would be testing all possible algorithms. Similarly biological evolution is limitted to testing various imperfect self-replicators, meaning all possible imperfect self-replicators. It is further constrained by the current state, but then that's the problem with non-biological GAs as well, the King of the Hill effect.
Skips too far forward (Score:3, Informative)
Evolution was a well known phenomenon (e.g. from the geological record) long before the work of Mendel was known. Specifically, the rather well known Darwin had no c
Re:Skips too far forward (Score:2)
"In fact, evolution can be precisely defined as any change in the frequency of alleles within a gene pool from one generation to the next."
- Helena Curtis and N. Sue Barnes, Biology, 5th ed. 1989 Worth Publishers, p.974
Overly simplistic (Score:2)
Yes, you *can* define evolution that way (to be precise, one group of biologists, the population geneticists, do define evolution that way). However, that's an entirely different statement than saying that's the *only* valid definition of evolution, which seems to be what you're implying.
As a molecular evolutionist, I find alleles to be too high level -- I deal with t
Re:Overly simplistic (Score:2)
In the end, though, you can show that the molecular evolutionist and the zoologist's definitions are equivalent to the population geneticist definition, and vice ve
Re:*sigh* (Score:3, Informative)
That is one of the facts that it explains,
Re:*sigh* (Score:2)
My argument is that "micro-evolution" has not been shown to be the mechanism by which "macro-evolution" can occur. The results of environmental adaption and genetic mutations are well known, but no one h
Re:*sigh* (Score:2)
For example, if humans lack the genome for an exoskeleton, and exoskeletons are "nature's selection" for space survivability, then the "micro-evolutionary theory" says that humans who live in space should eventually experience genetic mutations from which exoskeleton code will appear.
"All too often creationists spend their time arguing with a straw-man caricature of evolution."
Well put. [talkorigins.org]
Re:*sigh* (Score:2)
Define micro-evolution and macro-evolution and a set of criteria that a person could use to determine which of the two an adaptive event falls into.
For example, if humans lack the genome for an exoskeleton, and exoskeletons are "nature's selection" for space survivability, then the "micro-evolutionary theory" says that humans who live in space should eventually experience genetic mutations f
Re:*sigh* (Score:3, Interesting)
No, the current theories state that it is a unnoticable change over a long period of time. On a higher level, evolution simply states that major changes occurred over a long period of time. It also states that those changes occurred in an "organic vacuum". e.g. Lungs developed where no lungs had ever existed before.
Evolution can be understood as
Re:*sigh* (Score:2)
Nothing "evolves" through the process, merely "adapts".
Nope. This is one and the same thing. E.g. there is plenty of molecular evidence that the blood clotting proteins forked off of pancreatic digestive enzymes by means of gene duplication and mutation. In embryonic development, some muscle cells are chemically activated to turn into bone cells. Archaeopteryx [talkorigins.org] is still a tell-tale example of a gradual transition between bipedal dinosaurs and birds.
Similarly, no code existed in nature for "micro-evolut
Re:*sigh* (Score:4, Informative)
Re:*sigh* (Score:3, Informative)
You forgot two. (Score:2)
Natural selection is one of the methods (the other being sexual selection) by which all evolution (macro and micro) proceeds.
Mutation and genetic drift [talkorigins.org]. Also, the term natural selection as Darwin used it included sexual selection (besides ecological selection). The term is used to distinguish these kinds of naturally occuring selection from intentional breeding, which is called artificial selection.
Re:You forgot two. (Score:2)
Re:*sigh* (Score:2)
One of the problems with creationists is that they have a hard time conceiving of the incredibly vast amount of time that has passed which makes macroevolution possible.
I don't know where you
Re:*sigh* (Score:2)
No, they are theorized to be related. Macro level changes require a mechanism more sophisticated than that of "micro-evolution". "Micro-evolution" requires an existing genome from which to pull, and only allows for a small degree of genetic mutation. As I replied to another poster, Sickle-cell anemia is a perfect example of a mutation that "helps" by resisting malaria, but it has the side effect of an early death for th
Re:*sigh* (Score:2)
That is a nonsensical statement. Breeding is fundamental to evolution, whether it be sexual reproduction or asexual splitting. Of course, breeding itself isn't evolution, but it is the process through which evolution occurs.
You seem to assume that the genes are unitary, descrete entities - that evolution occurs when, (to use your example) a giraffe suddenly gets the correct genes to develop gills. Genetics is mu
Re:*sigh* (Score:2)
Breeding is fundamental to life. Life is fundamental to evolution. Evolution tries to describe how organic soup could form simple organisms that become progressively more complex.
You seem to assume that the genes are unitary, descrete entities - that evolution occurs when, (to use your example) a giraffe suddenly gets the correct genes to develop gills. Genetics is much more complex than this.
Genetics is certa
Re:*sigh* (Score:2, Insightful)
Nature makes no distinction between your so called macro and micro-evolution. Rather there are to separate processes at work that combine to produce evolution.
1. Survival of the fittest (what you call environmental adaptation) starts with a population with a mixture of genes and states that those best able to reproduce in the current environment will probably make up a larger percentage of the next generation.
2. Genetic crossover during
Re:*sigh* (Score:2)
Re:*sigh* (Score:2)
Why? Just because you give an example of where this is true (Sickle-cell anaemia), doesn't mean it is generally true. Why can't a random mutation be beneficial without bad side-effects? A random mutation can surely be harmful without good side-effects. Why not the reverse? It isn't a zero-sum game. Sure, the kind of mutation that is solely beneficial may be exceedingly rare, but that isn't proof that
Re:*sigh* (Score:3, Interesting)
I have made up my mind. The parent of my post made a statement that was factually correct: SCA does not "prove" that positive mutations can't happen. I then go on to show that everything from SCA to genetic manipulation has so far shown us that genetics is a zero sum game. It's perfect
Re:*sigh* (Score:4, Insightful)
Most computer scientists loosely use the term "evolutionary programming" to talk about algos that have inherently unpredictable ("emergent") results rather than modelling actual evolutionary processes observed in nature, although that's also a fair part of it all. (Incidentally, I also believe the "Science" topic assigned to this story is wrong for this reason.)
The meta-algo is evolutionary programming in that the algo fianlly "developed" by the meta-algo is apparently result that isn't immediately apparent, indeed, one that perhaps unpredictable by humans.
Re:*sigh* (Score:2)
That being said, it can be a great tool for auto-adjusting software to its environment. If it works out, it may
Re:*sigh* (Score:3, Insightful)
Rubbish. There's nothing sudden implied.
Re:*sigh* (Score:2)
Re:*sigh* (Score:2)
Re:*sigh* (Score:4, Interesting)
GA systems both propogate "good" information, as well as generate and test new information incrementally across generations. Of course, this is done within whatever limitations you choose to impose on the solution set (i.e. if you're using a fixed N-bit string to represent entities in the solution space, then you'll search across all 2^N possible solutions in the N-bit string space).
Fundamentally, GAs are just a way of searching a solution space for good solutions. They are slow, but can tackle a more diverse set of problems easier than the traditional backpropogation-based neural network. I would consider neural networks to be more in line with your description of an 'adapting' learning algorithm.
The following analogy can only be taken so far, but you can compare backpropogation-based neural networks to be akin to hill-climbing algorithms - you start from one solution, move some limited distance towards what you think is a better solution. For a neural network with N vertices, you can think of neural network learning as hill-climbing in an N-dimensional space over whatever field used to specify edge weights.
GAs, on the other hand, do a much more disparate search over any solution space. And it is surprising how well it finds near-optimal solutions even for problems where the relation between solutions is complex and non-intiuitive (e.g. minimums for seemingly chaotic functions).
-Laxitive
Re:*sigh* (Score:2)
Re:*sigh* (Score:2)
Consider a GA system that operated on bitstrings that are considered inputs to a universal turing machine. Then, the information created is an alrogithm.
True, when designing a GA system to do that, you might have problems in creating test cases for evaluating solution fitness, but that doesn't detract from the point.
There has been research done on using GAs on lisp-style expressions, using subexpression substitution instead of crossovers, to generate use
Re:*sigh* (Score:2)
As I understand it - from my reading of Climbing Mount Improbable by Dawkins - you've just described evolution. Perhaps you're using a different, far more strict definition of the word?
My statement is that "macro-evolution" is the basic concept of evolution. i.e. A single celled organism
Re:I've a question (Score:2)
Please people, before hitting submit, think atleast a second what you are saying.
"I have this hammer here. It's really handy for hitting nails."
"Oh cool, I have this piece of two-by-four here. Can I cut it in half with that?"
(If you did have some kind of point, please elaborate.)
not intelligent design (Score:2)
No, if it were truly following the "intelligent design" theory, there wouldn't be any trial and error to begin with. The code would be written once, from the ground up, and