Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Math

New Algorithm Provides Huge Speedups For Optimization Problems (mit.edu) 129

An anonymous reader writes: MIT graduate students have developed a new "cutting-plane" algorithm, a general-purpose algorithm for solving optimization problems. They've also developed a new way to apply their algorithm to specific problems, yielding orders-of-magnitude efficiency gains. Optimization problems look to find the best set of values for a group of disparate parameters. For example, the cost function around designing a new smartphone would reward battery life, speed, and durability while penalizing thickness, cost, and overheating. Finding the optimal arrangement of values is a difficult problem, but the new algorithm shaves a significant amount of operations (PDF) off those calculations. Satoru Iwata, professor of mathematical informatics at the University of Tokyo, said, "This is indeed an astonishing paper. For this problem, the running time bounds derived with the aid of discrete geometry and combinatorial techniques are by far better than what I could imagine."
This discussion has been archived. No new comments can be posted.

New Algorithm Provides Huge Speedups For Optimization Problems

Comments Filter:
  • by MobileTatsu-NJG ( 946591 ) on Saturday October 24, 2015 @11:18PM (#50795881)

    Satoru Iwata, professor of mathematical informatics at the University of Tokyo, said, "This is indeed an astonishing paper.

    I'm assuming this isn't the same Satoru Iwata of Nintendo fame... who sadly passed away earlier this year. :/

  • by pushing-robot ( 1037830 ) on Saturday October 24, 2015 @11:19PM (#50795883)

    And as it turns out, P=NP. Stay tuned for more exciting developments!

    • If you read the press release from MIT, this discovery was about problems squarely inside P. Namely, they were (for a certain class of problems in P) able to reduce the complexity from N^^5 or N^^6 down to N^^2 or N^^3. So there would seem to be no implications for the P=NP problem.

  • Bad news for them (Score:4, Interesting)

    by goombah99 ( 560566 ) on Saturday October 24, 2015 @11:29PM (#50795909)

    I guess they never heard of the "No Free Lunch" theorem for optimization, which believe it or not is the name of a proven theorem by David Wolpert that says the following is rigrorously true.
    Averaged over all optimization problems every possible algorithm (that does not repeat a previous move) takes exactly the same amount of time to find the global minimum.

    The collorary for this is that: If you show your algorithm outperforms other algorithms on a test set, then you have just proven it performs slower on all other problems.

    That is the better it works on a finite class of problem, the worse it is on most problems.

    it is even true that a hill climbing algorithm blunders into the minimum in the same average time as a hill descending algorithm. (yes that is true!).

    Look it up. It's the bane of people who don't understand optimization.

    The only things that distinguishes one optimization algorithm from another is:
    1) if your problems are restricted to a subspace with special properties that your algorithm takes advantage of
    2) your performance metric is different then the average time it takes to find the global minimum. for example, something that limits the worst case time to get within epsilon of the minimum.

    • Re:Bad news for them (Score:5, Interesting)

      by goombah99 ( 560566 ) on Saturday October 24, 2015 @11:39PM (#50795937)

      To be fair, the actual article doesn't claim to violate the no free lunch theorem because it only applies to "some" problems. It's the headline that violates the no free lunch theorem. But one aspect of the no-free-lunch theorem that usually proves to be insanely diabolical is that it turns out that it's usually VERY hard to define which problems an alogorithm will do best on in general. Often you can see a specific set for which it is obvious. But as you complicate things the obviousness part goes out the window. FOr example, if you know your surface is convex and has a very simple shape like a multidimensional parabola, then it's trivial to hill descend to the bottom. But once you start putting in multiple minima and arbitrary discontinuities it becomes hard. Interestingly in high dimensional spaces know something is smooth isn't very useful as smoothness only removes a low number of degrees of freedom--leaving behind a smaller but still high dimensional space.

      Thus if there is a breakthrough here it's not the algorithm, it would be the ability to specify what kinds of problems this will work well on!

      • Wow. I've solved optimization problems, both at school and at work, but your comment shows me how much I don't know about the topic. Fascinating.
      • by KGIII ( 973947 )

        Indeed. I've given it a bit of a scan at this point (posted up-thread) and it appears that, with first glance, the best way to describe it (to this crowd) is optimized sub-routines in a program. It looks like optimization done where the additional data is being inserted, calculated, and run. It's sort of like, I guess, object oriented programming (for lack of a more accurate description). I can see this coming in handy in traffic modeling and will give it a good, more thorough, read later to see what I can

      • The fact that people obviously versed in the art are commenting on Slashdot, rather than excitedly hunching over the algorithm, putting it in whatever they work on, is telltale sign that it's not as revolutionary as the gushing summary states.

      • This article is the real deal as they do address what types of optimization problems and give bounds on the speedups over previous techniques. In addition, cutting plane techniques have proven to be very successful at solving integer programming problems and have yielded good/optimal solutions to large NP-hard problems such as traveling salesman.

        As for the NFL theorem, it's not very surprising. By looking over all optimization problems you are considering an enormous set of functions. The "average" funct

    • Averaged over all optimization problems [...]

      ...most of which have optimisation functions which are incompressible and hence do not fit on current computers...

      [...] every possible algorithm (that does not repeat a previous move) takes exactly the same amount of time to find the global minimum.

      The NFL theorem relies on all problems being equally likely. In practice, there are many known classes of problem where known algorithms do close-enough-to-optimal on problems which turn up in practice, and also can t

      • The proof of correctness of the Metropolis-Hastings algorithm (to pick one example) relies on the fact that the transition probability for its proposals is symmetric (that is, Q(x|y) = Q(y|x)).

        That's the Metropolis-Hastings algorithm in chapter 1. The one in the rest of the book doesn't assume that. However, it does assume that the probability is non-zero.

        • This "non-repeat" issue is a red herring. The proviso on "not repeat" is not meant to exclude reversable trajectories. It simply means that going back to an earlier state will be a free-pass in terms of counting moves. We won't add that to your count. So this has nothing to do with the applicability of the no-free-lunch theorem

      • by goombah99 ( 560566 ) on Sunday October 25, 2015 @12:19AM (#50796071)

        I totally agree with you. I indeed left out the issue of incompressible optimization functions and their infeasibility in memory. More generally, I also believe that the NFL threorem does not imply despair but rather that the problems that interest humans are a tiny subspace of all possible problems. On this subspace better average performance is entirely plausible. The rub, is that to make that claim you have to be able to specify what the subspace of interest to humans in. I think you might be able to define that in terms of algorithmic complexity. But that alone doesn't tell you if your algorithm must be better than another one on all problems of a specified maximum complexity. Thus any time someone says their algorithms outperform without telling us in what subspace they achieve this there is no proof of superior performance. Since the paper actually does make this claim it's confusing to sort out how they are limiting the problem. Likely this is my own failing at understanding their mathermatical definitions.

        to be more complete. the NFL theorem ignores the computational cost of deciding the next move in the search as well. so really the NFL is counting the average number of guesses required not the time.

        Your statement that NFL relies on all problems being equally likely is very close to equvalent to my statement that the subspace of problems of interest to humans is smaller than the subspace of all problems. So I think were in agreement.

        The NFL does not imply optimization is hopeless. It just says a claim of improved speed is dubious till you tell us what subspace of problem you improved it on. This is what I stated to begin with. If an algorithm is faster on some problems then it's slower on most other problems.

    • by Twinbee ( 767046 )
      Well then why have optimization algs at all? Why not just use brute force and be done with?

      As you know, patterns in the real world have structure to them and have dips and valleys, characteristic shapes, and repeating fractal elements. I think what you don't appreciate enough is that there are classes of optimization algorithms (e.g: genetic algorithms or neural networks) which have a very GENERIC way of working with MANY kinds of search spaces, some of which one can't even imagine.

      What you say about
      • by goombah99 ( 560566 ) on Sunday October 25, 2015 @12:29AM (#50796091)

        Right. I don't disagree. The sublty of the NFL theorem is not in its crass conclusion that brute force is just as good as anyhting else. it's in the realization first that algorithms only work on subspaces well. And that the real trick is not in creating the algorithm but being able to say what subspace it works well on.

        TO see that consider the following thought experiment. imagine we divide the space of all problems into many subspaces and on each of these subspaces we have a very fast algorithm. Surely then this defeats the NFL theorem? No. The time it takes to decide what subspace solver to use, plus the time it takes to use it would be the same as brute force.

        Thus anytime one says an algorithm is better, if you can't say when it's better, one hasn't quite solved the issue.

        The escape clause is that it's very likely the class of problems humans care about is a very small subspace of all problems. But defining what that meansis hard.

        • by ledow ( 319597 )

          It's naive to believe that such a thing is universal to an entire problem set, and is as solid as colloquially suggested.

          But, more than that, optimisations like this often work by transforming one type of problem into a related, but different, equivalent problem. In doing that you are indeed shifting problems between problem spaces and, thus, between algorithms of different optimisation. Though there is some conversion involved, it is by no means guaranteed that such conversion is just as hard as solving

    • by KGIII ( 973947 )

      Marshal and HInton did some work on this. It's a good read. I'll find it for you.

      http://arxiv.org/pdf/0907.1597... [arxiv.org]

      An interesting quote in 1.1 would be:

      Subsequently it was shown that the No Free Lunch theorem only holds if the set of objective functions under consideration is closed under permutation (c.u.p.).

      It's worth a read. IIRC, it was cited later by Hinkley and further work has been done since then but, well, I'm kind of lazy at the moment. Either way, it's worth looking at if you're curious and have been out of academia for a while.

      • THanks. I've read that. I think if you look at some of my other responses above that you will see that I agree that C.U.P problems are not of interest to humans. The real issue is not that. The issue is defining which algorithms are useful for which kinds of subsets of problems (each of which is non C.U.P. but in a different way. That is, unless you can say what your algorithm is good for (clearly) then all you are doing us fracturing the subspace of all problems into subsets but you still don't know w

        • by KGIII ( 973947 )

          Oh, I agree entirely. I just thought you might find it interesting if you'd never read it. I was kind of fascinated by their process but their outcome was as anticipated - they didn't conclude anything "new" or "revolutionary" really. Also, to be fair, I'm a human (sort of) and this is of interest to me, albeit reflectively. My company modeled traffic - vehicular and then increased model pedestrian, as well. (To give a small hint - I sold in 2007 and, obviously, retired. If you know the market then you can

          • The coast of the united kingdom is exactly 43 long, but I'm still working out what undiscovered units that is in. :-)

    • Look at the big brain on Brad.

      Anyway, this has nothing to do with no free lunch... read the goddamn article if you want to know what the authors are actually saying. http://arxiv.org/abs/1508.0487... [arxiv.org]

      It's an asymptotic speedup for certain classes of problems.

    • By the title of the paper -- A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization -- I would say that they are not trying to provide a truly general optimization algorithm, but one that is specific to combinatorial and convex optimization. Hence, the NFL theorem is not violated.

      TFA headline gives the impression that they would be talking about a truly general algorithm, but this is actually a manifestation of the "No Competent Editor" theorem.

    • I guess they never heard of the "No Free Lunch" theorem for optimization

      What makes you say that?

      Averaged over all optimization problems every possible algorithm (that does not repeat a previous move) takes exactly the same amount of time to find the global minimum. The collorary for this is that: If you show your algorithm outperforms other algorithms on a test set, then you have just proven it performs slower on all other problems.

      The only things that distinguishes one optimization algorithm from another i

    • The title of their paper says it's a complex optimization algorithm.

    • ... 1) if your problems are restricted to a subspace with special properties that your algorithm takes advantage of ....

      Any Engineer knows that -all- problems are of form 1. Solving things for all possible cases is a Mathematician's fantasy! 8-)

  • by Dracos ( 107777 ) on Saturday October 24, 2015 @11:44PM (#50795953)

    Is not an attribute to penalize. Thicker phones are sturdier (less prone to bending, among other things), easier to grip (more surface area on the edges, where you grip it while talking, and feel more substantial. Phone cases offer protection for the device and mitigate all these shortcomings.

    Looking back, Zoolander only got one dimension right when it comes to phone size. The joke now would be a phone that is the size of a sheet of paper (and the 13.9" screen has 6204ppi).

    • My understanding is that there's a certain optimal thickness for a phone and any design element that forces the phone to be thicker makes it less desirable.
    • To a degree. No one wants to be carrying around a brick that they need to use a fanny pack to hold when it's not in use. Similarly a credit card sized phone would also be less than useful. But if you have a thickness you feel is appropriate then anything thicker would be undesirable.
  • by Art3x ( 973401 ) on Sunday October 25, 2015 @12:19AM (#50796067)

    I read the summary. I didn't understand it. Then I read the article. I didn't understand it either.

    • by Kwyj1b0 ( 2757125 ) on Sunday October 25, 2015 @01:11AM (#50796187)

      Consider minimizing x^2 when x can take values in -10 to 10 (we know the answer is 0, since we only consider real valued numbers). If we wanted to solve this problem, there are several approaches; some example approaches are: randomly try a lot of different values, set the derivative to zero, or try a cutting plane algorithm. In general, we might not know the analytical expression for the function we are trying to minimize (or it might be too complex) so we can't really find the derivative efficiently. Derivatives can also be computationally expensive to compute, so let's ignore that approach.

      What we can do is to say let me find a solution for which the function is less than some threshold t, and then keep reducing t till I can't find any more solutions; this is what the article meant by finding a smaller circle inside a larger one (for each value of t, I try to find solutions that are smaller than t).

      What cutting planes do is chop up the original function in to pieces - in some pieces, I know the value will be much larger than my threshold (so I don't have to search in those pieces), and in others it might be smaller - I focus on searching these pieces to find a value that is smaller than my threshold (after which I can reduce the threshold and try again). This is what (in a simplistic sense) cutting plane algorithms do; they chop up my search space.

      How we select the points for chopping is crucial - bad choices (say choosing one point after another at random, or trying points very close to each other), I spend a lot of time chopping unnecessarily (or not benefiting from chopping by much). We also want to make sure our cuts really do divide the problem in to pieces which I can discard searching, and those pieces I discard should (ideally) be quite large. Until this work, the time taken to decide where to chop was n^3.373; they brought down the time to n^3 (where n is the number of variables that the function I am trying to minimize takes as inputs).

      They also said that for special classes of functions, they can really improve the total computation time significantly (from n^6 to n^2).

      I'm glossing over (and am certain I've got some details wrong) many issues to give a taste of the big picture idea of cutting plane approaches in general; there has been decades of work on these problems that you can read (I recommend anything written by Prof. Stephen Boyd as an introduction to some of this research).

      • by Art3x ( 973401 )

        Thank you for trying to explain it to me. I guess I should say I don't understand how the theoretical meets the practical here. Then again, I'm just a web programmer.

        The theoretical. Sure, I read the article about a circle and trying to find the better, smaller circle within it, and I can imagine slicing into the circle to find it. No problem there.

        The practical. Also, I noticed how they started off with real-world problems like designing the thinnest, most durable smartphone with the longest battery life.

        B

  • by Kwyj1b0 ( 2757125 ) on Sunday October 25, 2015 @12:39AM (#50796113)

    Some very interesting results, but the two key contributions are (almost verbatim from the article):
    1) With the best general-purpose cutting-plane method, the time required to select each new point to test was proportional to the number of elements raised to the power 3.373. Sidford, Lee, and Wong get that down to 3.
    2) And in many of those cases (submodular minimization, submodular flow, matroid intersection, and semidefinite programming), they report dramatic improvements in efficiency, from running times that scale with the fifth or sixth power of the number of variables down to the second or third power.

    So this seems to be a better cutting plane approach that improves the cutting process by reducing the time to find the next test point (in general), and for certain structured problems (like SDPs) this approach reduces the computation time significantly.

    This does raise some interesting questions, such as: is there a limit to how fast you can (in a general problem, not using a specific problem's structure) find the next test point? Even if we don't know what algorithm gets us there, it would be useful to have a bound for future research.

    • From my reading of the English sections of their paper (someone had fun with LaTeX equations in that one), I see worst-case running times for problems where it is assumed an Oracle (a subroutine) can provide the answer to a certain question in constant time, or at least a low-complexity amount of time. The running times are given *with respect to an Oracle*, meaning that the running time of the Oracle is not considered in the worst-case running time.

      It is known that for certain specific kinds of problem

  • by Anonymous Coward

    Algorithms can be fast, or use low amount of memory. Doesn't the bucketing used here require additional memory over other methods? In the end, just an optimization of speed, while requiring more memory?
    What use are the tables comparing different algorithms when only the computational complexity is given, but not the memory requirements?

  • One of the tricks that people play when they claim to have parallelized something or improved computational complexity is that they don't back up their claims with any real runtimes. They provide a theoretical evaluation, but they haven't actually shown that anything is actually FASTER in a practical way. Maybe it is, maybe it isn't. They haven't made a proper case for it. Maybe I missed it. I'm also not sure they actually coded it up and tried it. As a researcher who puts time and effort into actuall

  • All these algorithms are exponential for optimal solutions. (Well, unless P=NP, but that looks more and more unlike all the time. And even if it was true, high-exponent polynomial is not really better in practice anyways.) Hence _nobody_ uses the algorithms for optimal solutions in practice. What is used are approximation algorithms that provide a good solution, and there the quality measure is not "speed", but speed vs. result quality.

    This may be nice theoretical research, but the practical impact is rathe

E = MC ** 2 +- 3db

Working...