Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Education Programming Science

Ask Slashdot: Best Language To Learn For Scientific Computing? 465

New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Best Language To Learn For Scientific Computing?

Comments Filter:
  • More details? (Score:4, Informative)

    by schneidafunk ( 795759 ) on Thursday October 17, 2013 @01:35PM (#45154853)

    Depending on your needs, R may be your best bet if it is statistical processing you are interested in.

  • What are you doing? (Score:4, Informative)

    by RichMan ( 8097 ) on Thursday October 17, 2013 @01:36PM (#45154865)

    What do you mean by scientific computing?

    Modelling: Hard core finite element simulations or the like. Then C or Fortran and you will be linking with the math libraries.
    Log Processing: A lot of other stuff you will be parsing data logs and doing statistics. So perl or python then octive.
    Data Mining: Python or other SQL front end.

  • by rla3rd ( 596810 ) on Thursday October 17, 2013 @01:36PM (#45154867)
    Install these 2 and you'll be good to go
    http://ipython.org/notebook.html [ipython.org]
    http://pandas.pydata.org/ [pydata.org]
  • Re:Python (Score:5, Informative)

    by Garridan ( 597129 ) on Thursday October 17, 2013 @01:37PM (#45154883)
    I use Sage. When Python isn't fast enough, I can essentially write in C with Cython. It's gloriously easy. Have some trivially parallelizable data mining? Just use the @parallel decorator. Sage comes with a slew of fast mathematical packages, so your toolbox is massive, and you can hook it all in to your Cython code with minimal overhead.
  • Re:More details? (Score:5, Informative)

    by Bovius ( 1243040 ) on Thursday October 17, 2013 @01:41PM (#45154937)

    Second this. There are numerous languages out there that are tailor-made for specific kinds of problems. You didn't quite share enough to narrow down what kinds problems you need to solve, but the R project is geared toward number crunching, albeit with a significant bent toward statistics and graphic display.

    http://www.r-project.org/ [r-project.org]

    If that's not pointed in the right direction, some other language might be. Alternatively, there are a lot of libraries out there for the more popular languages that could help with what you're doing. Heck, 12 years ago we didn't even have the boost libraries for C++. It's difficult for me to imagine using that language with out them now.

  • 2 paths (Score:4, Informative)

    by johnjaydk ( 584895 ) on Thursday October 17, 2013 @01:41PM (#45154941)

    If you can find anything that resembles a math library with the correct tools then go with Python. Numpy is everyones friend here.

    If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.

  • R-language (Score:5, Informative)

    by biodata ( 1981610 ) on Thursday October 17, 2013 @01:42PM (#45154965)
    Most of the cutting edge data mining I've seen is done using R (which acts as a scripting wrapper for the C or Fortran code that the fast analysis libraries are coded in), or alternatively in python. Some people swear by MatLab if they have trained in it (so your octave would come in handy there). Have a look at some discussions at places like kaggle.com to see what the competitive machine learning community uses (if that is what you mean by data mining).
  • Re:Python (Score:5, Informative)

    by rwa2 ( 4391 ) * on Thursday October 17, 2013 @01:47PM (#45155031) Homepage Journal

    Yes, I did my master's thesis using simpy [readthedocs.org] / scipy [scipy.org], integrated with lp_solve for the number crunching , all of which was a breeze to learn and use. It was amazing banging out a new recursive algorithm crawling a new object structure and just having it work the first time without spending several precious cycles bugfixing syntax errors and chasing down obscure stack overflows.

    I used the psyco JIT compiler (unfortunately 32-bit only) to get ~100x boost in runtime performance (all from a single import statement, woo), which was fast enough for me... these days I think you can get similar boosts from running on PyPy [pypy.org]. Of course, if you're doing more serious number crunching, python makes it easy to rewrite your performance-critical modules in C/C++.

    I also ended up making a LiveCD and/or VM of my thesis, which was a good way of wrapping up the software environment and dependencies, which could quickly grow outdated in a few short years.

  • by n1ywb ( 555767 ) on Thursday October 17, 2013 @01:51PM (#45155089) Homepage Journal

    Better yet, Fortran + Python.

    http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html#f2py [scipy.org]

    I used it to wrap some crazy magnetometer processing code written in Fortran into a nice Python program. I ripped out all the I/O from the Fortran code and moved it into the Python layer. It worked great. Fortran is AWESOME at number crunching but SUCKS ASS at IO or well pretty much anything else, hence Python.

  • Python, numpy, Pyvot (Score:5, Informative)

    by shutdown -p now ( 807394 ) on Thursday October 17, 2013 @01:53PM (#45155119) Journal

    Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot [codeplex.com] to access the data, and then numpy [numpy.org]/scipy [scipy.org]/pandas [pydata.org] to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.

    In addition, using Python this way gives you the ability to use IPython [ipython.org] to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.

    If you want an IDE that can connect all these together, try Python Tools for Visual Studio [codeplex.com]. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.

    (Full disclosure: I am a developer on the PTVS team)

  • matlab (Score:3, Informative)

    by smadasam ( 831582 ) on Thursday October 17, 2013 @01:53PM (#45155125) Homepage
    FORTAN used to be it back in the day, but now days Matlab is the stuff that many engineers use for scientific computing. Many of the math libraries are very good in Matlab and don't require you to be a computer scientist to make them run fast. I used to work with scientists in my old lab to port their Matlab code to run on HPC clusters porting them to FORTAN or C. Often the matlab libraries smoked the BLAS/Atlas packages that you find on Linux/UNIX machines for instance. The same would hold true for Octave since they just build on the standard GNU math pacakges like BLAS.
  • by UnknowingFool ( 672806 ) on Thursday October 17, 2013 @02:03PM (#45155235)
    Well if your problems require statistical computing, R is the language to use. For general scientific computing, the last I checked Octave was still valid. As for multi-core processing only a few languages and compilers support platforms like Open MP. Fortran, C, and C++.
  • I recommend C#.NET (Score:2, Informative)

    by dryriver ( 1010635 ) on Thursday October 17, 2013 @02:12PM (#45155327)
    Not only is C# easy to learn, and easy to both read and write, it also runs at a fairly high speed when it is compiled. To make use of multiple CPU Cores, C# has a neat feature named PARALLEL.FOR. If your algorithm scans across a 2D Data Array using a FOR LOOP at all, Parallel.For will automatically break that array into smaller arrays, and have each calculated by a different CPU core, resulting in a much faster overall computation speed. I develop algorithms in C# and highly recommend it if you want a) a nice, readable code syntax and b) fast execution speed. I hope this helps...
  • Re:FORTRAN (Score:5, Informative)

    by Obfuscant ( 592200 ) on Thursday October 17, 2013 @02:24PM (#45155465)

    Upside: tonnes of great libraries.

    Those great libraries are spread across several different "FORTRAN"s. gfortran. gfortran44. Intel's fortran. f77. f90. PGI pgif90. etc. etc etc.

    Gfortran is woooonderful. It allows complete programming idiots to write functional code, since the libraries all do wonderful input error checking. Want to extract a substring from the 1 to -1 character location? gfortran will let you do it. Quite happily. Not a whimper.

    PGI pgif90 will not. PGI writes compilers that are intended to do things fast. Input error checking takes time. If you want the 1 to -1 substring, your program crashes. PGI assumes you know not to do something that stupid, and it forces you to write code that doesn't take shortcuts.

    So, if you get a program from someone else that runs perfectly for them, and you want to use it for serious work and get it done in a reasonable amount of time so you compile it with pgif90, you may find it crashes for no obvious reason. And then you have to debug seriously stupidly written code wondering how it could ever have worked correctly, until you find that it really shouldn't have worked at all. They want to extract every character in an input line up to the '=', and they never check to see if there wasn't an '=' to start with. 'index' returns zero, and they happily try to extract from 1 to index-1. Memcpy loves that.

    The other issue is what is an intrinsic function and what isn't. I've been bitten by THAT one, too.

    And someone I work with was wondering why code that used to run fine after being compiled with a certain compiler was now segment faulting when compiled with the same compiler, same data. Switching to the Intel compiler fixed it.

    Sigh. But yes, FORTRAN is a de-facto standard language for modeling earth sciences, even if nobody can write it properly.

  • PDL (Score:4, Informative)

    by swm ( 171547 ) * <swmcd@world.std.com> on Thursday October 17, 2013 @02:36PM (#45155613) Homepage

    Perl Data Language
    The power of Perl + the speed of C

  • Re:Python (Score:4, Informative)

    by wanax ( 46819 ) on Thursday October 17, 2013 @02:43PM (#45155685)

    Sage is okay for small-midsize projects, as is R (both benefit from being free).. on the whole though, I'd really recommend Mathematica, which is purpose-built for that type of project, makes it trivial to parallelize code, is a functional language (once you learn, I doubt you'll want to go back) and scales well up to fairly large data sets (10s of gigs).

  • Re:Python (Score:5, Informative)

    by RDW ( 41497 ) on Thursday October 17, 2013 @02:52PM (#45155789)

    I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python.

    I think your friend is mistaken. Though it's essential to know a scripting language, most of the computationally expensive stuff in sequence analysis is done with code written in, as you might expect, C, C++, or Java. Perl and Python are used more for glue code, building analysis pipelines, and processing the output of the heavy duty tools for various downstream applications. R is used heavily for statistics, and especially for anything involving microarrays.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...