Ask Slashdot: Best Language To Learn For Scientific Computing? 465
New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"
Python (Score:5, Insightful)
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python. You probably don't have the hardware resources that they do, but I'd bet you also don't have data sets that are nearly as large as theirs are.
You might also get better results from something less general purpose like Julia [julialang.org], which is designed for number crunching.
Re:Python (Score:5, Insightful)
the whole industry uses mostly python
This is certainly the way of the future, not just for gene sequencing but many other quantitative sciences, although a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching. It's best to start with just Python, but eventually some C++ knowledge will be helpful. (Or just plain C, but I can't see any good reason to inflict that on myself or anyone else.)
Re:Python (Score:4, Insightful)
Re:Python (Score:5, Insightful)
Python is VB done right.
Re:Python (Score:5, Funny)
VB is feeding your scrotum to a python.
Re: (Score:3, Insightful)
Re:Python (Score:4, Insightful)
No, it's a simple language that is easy for beginners to learn. But, unlike VB, it is not horribly designed, and is useful even once you grow out of the beginner phase.
Re:Python (Score:5, Interesting)
a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching.
The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass - you either attach two debuggers and simulate stepping across the boundary by manually setting breakpoints, or you give up and resort to printf debugging.
OTOH, if Windows is an option, PTVS is a Python IDE that can debug Python and C++ code side by side [codeplex.com], with cross-boundary stepping etc. It can also do Python/Fortran debugging with a Fortran implementation that integrates into VS (e.g. the Intel one).
(full disclosure: I am a developer on the PTVS team who implemented this particular feature)
Re: (Score:3, Insightful)
The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass
Only if you don't use the C/C++ code as an independent module, as it should be. If you *must* debug it in parallel, you're designing it wrong.
Re: (Score:3)
How do you write C++ code for use from Python such that it's not an independent module?
Anyway, regardless of how you architecture it, in the end you'll have Python script feeding data to your C++ code. If something goes wrong, you might want to debug said C++ code specifically as it is called from Python (i.e. with that data). Even if you don't ever have to cross the boundary between languages during debugging, there are still benefits to be had from a debugger with more integrated support - for example, it
Re:Python (Score:5, Insightful)
Compared to C and C++, Fortran is actually more elegant for pure numerical computing.
Unsurprising - that's what Fortran was designed for...!
Re:Python (Score:5, Informative)
Yes, I did my master's thesis using simpy [readthedocs.org] / scipy [scipy.org], integrated with lp_solve for the number crunching , all of which was a breeze to learn and use. It was amazing banging out a new recursive algorithm crawling a new object structure and just having it work the first time without spending several precious cycles bugfixing syntax errors and chasing down obscure stack overflows.
I used the psyco JIT compiler (unfortunately 32-bit only) to get ~100x boost in runtime performance (all from a single import statement, woo), which was fast enough for me... these days I think you can get similar boosts from running on PyPy [pypy.org]. Of course, if you're doing more serious number crunching, python makes it easy to rewrite your performance-critical modules in C/C++.
I also ended up making a LiveCD and/or VM of my thesis, which was a good way of wrapping up the software environment and dependencies, which could quickly grow outdated in a few short years.
Re: (Score:3)
Yep. High level languages such as python are great for letting you focus on the domain-specific task you want to accomplish without spending years learning all the little poorly-documented compiler-specific idiosyncrasies of compilers and preprocessors and template languages. Once you're through the prototyping phase and have your interface definitions and unit tests set up, you can then toss things one module at a time over to one of those software weenies to turn into hand-optimized production code. A
Re:Python (Score:4, Interesting)
"This is certainly the way of the future, not just for gene sequencing but many other quantitative sciences, although a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching."
I mostly agree with your conclusion, but for somewhat different reasons. I don't believe Python is "the wave of the future", but rather I'd recommend it because it has been in use by the scientific community for far longer than other similar languages, like Ruby. Therefore, there will be more pre-built libraries for it that a programmer in the sciences can take advantage of.
I also agree that some C should go along with it, for building those portions of the code that need to be high performance. I would choose C over C++ for performance reasons. If you need OO, that's what Python is for. If you need performance, that's what the C is for. C++ would sacrifice performance for features you already have in Python.
If it were entirely up to me, however -- that is to say, if there weren't so much existing code for the taking out there already -- I'd choose Ruby over Python. But that's just a personal preference.
Re: (Score:3)
Re: (Score:3)
There are a lot of good suggestions in this discussion so far.
I have a few points to add.
1) compiled language vs scripting language
In general, any compiled language is going to run faster than any scripting language. But you will probably spend more time coding and debugging to get your analysis running with a compiled language. It is useful to think about how important performance is to you relative to the value of your own time. Are you going to be doing these data mining runs repeatedly? Is it worth
Re:Python (Score:5, Informative)
Re:Python (Score:4, Informative)
Sage is okay for small-midsize projects, as is R (both benefit from being free).. on the whole though, I'd really recommend Mathematica, which is purpose-built for that type of project, makes it trivial to parallelize code, is a functional language (once you learn, I doubt you'll want to go back) and scales well up to fairly large data sets (10s of gigs).
Java Java! (Score:4, Interesting)
For numeric-intensive work, I can get within 20% of the speed of C++ using the usual techniques -- minimize garbage collection by allocating variables once, use the "server" VM, perform "warmup" iterations in benchmark code to stabilize the JIT. I use the Eclipse IDE, copy and paste numeric results from the Console View into a spreadsheet program, and voila, instant journal article tables.
Re:Java Java! (Score:5, Funny)
I tried out those benchmarks myself.
Java:
$ time java nbody 50000000
-0.169075164
-0.169059907
real 0m8.863s
user 0m8.820s
sys 0m0.016s
Not too shabby. But checkout the C++ times! ./nbody.gpp-7.gpp_run
$ time
Segmentation fault (core dumped)
real 0m0.097s
user 0m0.000s
sys 0m0.000s
OMG that's a ton faster!
Re:Python (Score:5, Informative)
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python.
I think your friend is mistaken. Though it's essential to know a scripting language, most of the computationally expensive stuff in sequence analysis is done with code written in, as you might expect, C, C++, or Java. Perl and Python are used more for glue code, building analysis pipelines, and processing the output of the heavy duty tools for various downstream applications. R is used heavily for statistics, and especially for anything involving microarrays.
Re:Python (Score:5, Insightful)
Perl is still in wide use.
Do not use Perl for this. I've been using Perl for 15-20 years, and I love it for "scripting", text processing, etc., but using it for scientific computing sounds like an exercise in masochism.
PDL (Score:4, Informative)
Perl Data Language
The power of Perl + the speed of C
Re:PDL (Score:5, Funny)
The power of Perl + the speed of C
Re:Python (Score:5, Funny)
I wrote some Perl that looked like the output of AES once.
Re: (Score:2)
I was using Mathematica in grad school (experimental physics).
That wasn't the right tool for the task. Mathematica is for symbolic math, not number crunching.
Fortran (Score:2, Insightful)
sorry to say, but that is a fact
Re: (Score:3)
It depends on what exactly his computationally intensive part is. It may be something that can be trivially implemented in Python in terms of standard numpy operations, for example, with performance that's "good enough".
Re: (Score:3)
Fortran + Python = F2PY (Score:5, Informative)
Better yet, Fortran + Python.
http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html#f2py [scipy.org]
I used it to wrap some crazy magnetometer processing code written in Fortran into a nice Python program. I ripped out all the I/O from the Fortran code and moved it into the Python layer. It worked great. Fortran is AWESOME at number crunching but SUCKS ASS at IO or well pretty much anything else, hence Python.
Re: (Score:3)
Re: (Score:3)
if you don't care about having your code be maintained or extended by anyone under age 30
1. There are plenty of programmers over age 30.
2. Someone who is 30 today, likely finished his BSc in 2005. Do you think Fortran was much more popular then?
3. People under age 30 learn Fortran if they're involved in HPC. It's still widely used, and has advantages over C/C++ (easy, built-in parallelization, etc.).
don't plan on doing any custom visualization beyond GNUplot
There are lots of other programs you can use besides GNUplot. In serious HPC graphics are often considered a back end that runs separately from the main program, and sometimes on a different machine
English (Score:4, Funny)
Obviously.
FORTRAN (Score:3, Insightful)
Seriously consider FORTRAN
Re: (Score:2, Insightful)
Yeah, sure.
So that no one can ever check your models or replicate your results even if you publish code and initial data.
Re: (Score:2)
Re: (Score:2)
Having to read old FORTRAN is not a pleasant experience for someone who figures they can generally read languages without hitting a book of some sort,
Having some experience actually writing Fortran code helps...
Re: (Score:3)
Not really. My first job while still green and fresh out of high school was an internship with Lockheed Martin, working on hundreds of thousands of lines of meteorological software code that was used by NASA and was written in FORTRAN. I went in without ever having seen it before in my life, and was able to pick it up easily enough so that I was productive within a couple of weeks. I recall that having the first few columns of each line reserved for special uses threw me off the first time I saw it, as did
Re:FORTRAN (Score:5, Interesting)
Clearly you are not involved in serious science.
And if you think FORTRAN is some ancient esoteric languge, you're ignorent as well. The most recent standard, ISO/IEC 1539-1:2010, informally known as Fortran 2008, was approved in September 2010.
Fortran is, for better or worse, the only major language out there specifically designed for scientific numerical computing. It's array handling is nice, with succinct array operations on both whole arrays and on slices, comparable with matlab or numpy but super fast. The language is carefully designed to make it very difficult to accidentally write slow code -- pointers are restricted in such a way that it's immediately obvious if there might be aliasing, as the standard example -- and so the optimizer can go to town on your code. Current incarnations have things like coarray fortran, and do concurrent and forall built into the language, allowing distributed memory and shared memory parallelism, and vectorization.
The downsides of Fortran are mainly the flip side of one of the upsides mentioned; Fortran has a huge long history. Upside: tonnes of great libraries. Downsides: tonnes of historical baggage.
If you have to do a lot of number crunching, Fortran remains one of the top choices, which is why many of the most sophisticated simulation codes run at supercomputing centres around the world are written in it. But of course it would be a terrible, terrible, language to write a web browser in. To each task its tool.
Re:FORTRAN (Score:5, Informative)
Upside: tonnes of great libraries.
Those great libraries are spread across several different "FORTRAN"s. gfortran. gfortran44. Intel's fortran. f77. f90. PGI pgif90. etc. etc etc.
Gfortran is woooonderful. It allows complete programming idiots to write functional code, since the libraries all do wonderful input error checking. Want to extract a substring from the 1 to -1 character location? gfortran will let you do it. Quite happily. Not a whimper.
PGI pgif90 will not. PGI writes compilers that are intended to do things fast. Input error checking takes time. If you want the 1 to -1 substring, your program crashes. PGI assumes you know not to do something that stupid, and it forces you to write code that doesn't take shortcuts.
So, if you get a program from someone else that runs perfectly for them, and you want to use it for serious work and get it done in a reasonable amount of time so you compile it with pgif90, you may find it crashes for no obvious reason. And then you have to debug seriously stupidly written code wondering how it could ever have worked correctly, until you find that it really shouldn't have worked at all. They want to extract every character in an input line up to the '=', and they never check to see if there wasn't an '=' to start with. 'index' returns zero, and they happily try to extract from 1 to index-1. Memcpy loves that.
The other issue is what is an intrinsic function and what isn't. I've been bitten by THAT one, too.
And someone I work with was wondering why code that used to run fine after being compiled with a certain compiler was now segment faulting when compiled with the same compiler, same data. Switching to the Intel compiler fixed it.
Sigh. But yes, FORTRAN is a de-facto standard language for modeling earth sciences, even if nobody can write it properly.
Also, it is fast (Score:3)
In part, this is because Intel has a compiler for it. On commodity hardware (as in desktop, laptop), you will generally get the best performance running an Intel CPU and using an Intel compiler. That means C/C++ or FORTRAN, as they are the only languages for which Intel makes compilers. C++ is easy to see, since so much is written in it but why would they make a FORTRAN compiler? Because as you say, serious science research uses it.
When you want fast numerical computation on a desktop, FORTRAN is a good cho
Re: (Score:2)
Agreed. There are also OpenMP implementations for doing your parallel processing. If you're running on a Xeon processor then I would SERIOUSLY consider Intel's linux fortran compiler as it will provide the best performance by far.
Re: (Score:2)
It's totally possible to use Python and Fortran side by side. Fortran for heavy computational tasks, Python (with numpy) for glue wrapper code that loads the data and massages it into the desired shape before handing it over to that super-fast Fortran routine, and then visualizes the result
Step one: export to a database? (Score:2)
>> I initially coded all of my routines in VBA because it 'was there'.
Are you in Access? Or Excel?
If your routines work but are just slow, I'd first look at moving the data to SQL Server and porting your VBA routines to VB.NET.
If you have more time, you may want to learn what the "Hadoop" world is all about.
Re: (Score:2)
Re: (Score:2)
>> Hadoop isn't extremely useful, but for a student who managed to scrounge up a single Xeon machine, it's entirely ill suited
Go back and read the problem again: "would like to be able to make use of all four CPU cores"
Here's a guy seeking parallelization...and may not know that you don't have to throw big (potentially expensive) multicore processors against the problem - he could throw multiple (cheaper?) computers against it.
Comment removed (Score:4, Informative)
Re:More details? (Score:5, Informative)
Second this. There are numerous languages out there that are tailor-made for specific kinds of problems. You didn't quite share enough to narrow down what kinds problems you need to solve, but the R project is geared toward number crunching, albeit with a significant bent toward statistics and graphic display.
http://www.r-project.org/ [r-project.org]
If that's not pointed in the right direction, some other language might be. Alternatively, there are a lot of libraries out there for the more popular languages that could help with what you're doing. Heck, 12 years ago we didn't even have the boost libraries for C++. It's difficult for me to imagine using that language with out them now.
Re: More details? (Score:3)
R is by far the best solution that I've found for statistical analysis and data mining. It's ugly, inconsistent, quirky and old fashioned but it's absolutely brilliant.
The whole syntax of R is based around processing data sets without ever needing to worry about loops. Read up on data tables - not data frames - in R and you'll learn how to filter data, aggregate it, add columns, perform a regression and beautifully plot the results all in one line of code. The Zoo package will sort out your time series anal
What are you doing? (Score:4, Informative)
What do you mean by scientific computing?
Modelling: Hard core finite element simulations or the like. Then C or Fortran and you will be linking with the math libraries.
Log Processing: A lot of other stuff you will be parsing data logs and doing statistics. So perl or python then octive.
Data Mining: Python or other SQL front end.
Re:What are you doing? (Score:4, Informative)
Re: (Score:3)
Well if your problems require statistical computing, R is the language to use.
A lot of people seem to be pretty happy with Python+pandas lately.
(and the advantage of going the Python way is that it's also a general purpose language that's useful elsewhere)
Re: (Score:3)
It sounds like you are saying a more specific version of what I was going to post.
A little research goes a long way and libraries may be more important than language. I don't care how nice the language is.... the less underlying mechanisms I need to implement, and the faster I can get into the meat of what I am working on, the better.
If you want to do RSA encryption in your code (for example) your best bet is NOT to pick a language where you can't find an RSA implementation (Applesoft basic? lol not sure wh
IPython Notebook + Python Data Analysis Library (Score:4, Informative)
http://ipython.org/notebook.html [ipython.org]
http://pandas.pydata.org/ [pydata.org]
what the rest of your team uses (Score:5, Insightful)
And if you are not a member of a team then I seriously question the quality of your graduate program.
BAD TIM! BAD! (Score:5, Funny)
What language suggestions or tips can you give me?"
Timothy, shame on you. You should know better than to start a holy war.
Fortran (plus MPI and some CUDA) (Score:2)
Re: (Score:2)
Re: (Score:3)
2 paths (Score:4, Informative)
If you can find anything that resembles a math library with the correct tools then go with Python. Numpy is everyones friend here.
If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.
Re: (Score:2)
If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.
True, but the only place where this *really* matters is programming for repetitive calculations on massively parallel supercomputers. For anything else, there is a tradeoff between program speed and developer speed, and ultimately it's cheaper to buy more computers than hire more programmers.
Python, or ... (Score:2)
First suggestion: Python. Lot's of nice stuff for science (NumPy, SciPy), lots of other goodies, easy to learn, many people to ask or places to get help from. Plus you can explore data interactively ("Yes Wedesday, play with your data!").
Beyond that: CERN uses a lot of Java (sorry folks, true), they have good (and fast) tools I do a project right now where I am using Jython since it is supported by the main (Java) software I have to use. I like jhepwork/SCaVis quite a bit, if you are into plotting stuff on
R-language (Score:5, Informative)
Re: (Score:3, Insightful)
Go (aka Golang) if you come from a C background (Score:2)
Re: (Score:2)
Could use some vectorizing FP, but yeah, it's not a bad choice, especially if the complexity of mixed environments is undesirable. (Could also use some native port of netlib/GSL as well, though.)
It might also make him a better practical software engineer, which, as I understand, is an area in which many numerics people...experience certain difficulties.
Profile (Score:5, Insightful)
A lot of people will propose a language because it is their favorite. Others because they believe it is very easy to learn. I will give you a third line of thought.
I would not look for a language in this case, I would look for a library, then teach myself whatever language is easiest/quickest to access it. I would try to profile what you are building, figure out where the bottlenecks are likely to be (profiling your existing mockup can help here but dont trust it entirely) and try to find the best stable well-designed high performance library for that particular type of code.
Re: (Score:2)
Speed incarnate (Score:2)
If you're using VBA in Excel, you can speed it up a ton by putting this at the beginning of your function:
Application.Calculation = xlCalculationManual
And restore it with ...Automatic at the end.
Do this at the top level with a wrapper function whose only purpose is to disable and enable that, calling the real function in between.
If you want a real speedup, I am available for part time work in C or C++.
My favorite is CnH2n+1OH (Score:4, Funny)
Fortran 90+ with OpenMP or Python (Score:2)
If you really want to do heavy lifting, you can't beat Fortran. Just stay away from Fortran 77; it's a hot mess. Fortran 90 and later are much easier to use, and they're supported by the main compilers: gfortran and ifortran.
ifortran is Intel's Fortran compiler. It's the fastest out there, and it runs on Windows and Linux. Furthermore, you can get it as a free download for some types of academic use. (Search around intel's website -- it's hard to find.) That said, I usually use gfortran -- which is free and
Why code, when you are use a workflow tool? (Score:2)
Depends (Score:2)
R, MATLAB, SAS, Python, there's a bunch of languages you can use, and a bunch of ways to store the data (RDBMS, NOSQL, Hadoop, etc.). It really comes down to what kind of access to the data you have, how it's presented, what other resources you have available to you, and what you want to do with it.
Depends... On the Data... (Score:2)
Well, it depends. You say " computationally intensive data mining problem" but, what kind computations (arithmetic, mathematical, text-base, etc.).
In general for flat out speed, toss interpreted languages out (Perl, Python, Java, etc.) the door. You'll want something that compiles to machine code, esp. if you are running on older hardware. Crunching numbers, complex math, matrices then Fortran is the beast. If you're data is arranged in lists, consider lisp, then pick something else as it will likely gi
Re: (Score:2)
If you're data is arranged in lists, consider lisp,
Oh please! It's not like Lisp doesn't have any other data structure, is it? You can have your multidimensional numerical arrays in CL quite easily. (I'm saying neither "use CL" nor "don't use CL", merely that your argument is pretty weak. It's easier to learn to work with lists in the language you already know (unless it's COBOL!) than to learn an entirely different one just because of lists.)
Python, numpy, Pyvot (Score:5, Informative)
Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot [codeplex.com] to access the data, and then numpy [numpy.org]/scipy [scipy.org]/pandas [pydata.org] to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.
In addition, using Python this way gives you the ability to use IPython [ipython.org] to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.
If you want an IDE that can connect all these together, try Python Tools for Visual Studio [codeplex.com]. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.
(Full disclosure: I am a developer on the PTVS team)
matlab (Score:3, Informative)
Same language as your piers (Score:2)
Try J.. (Score:2)
...at jsoftware.com .
It's more powerful, concise, and consistent than most languages. However, R and Matlab have larger user communities and this is an important consideration.
There was a note on the J-forum a few months ago from an astronomer who uses J to "...compute photoionization models of planetary nebulae." His code to do this is about 500 lines in about 30 modules and uses some multi-dimensional datasets, including a four-dimensional one of "...2D grids of the collisional cooling by each of 16 ion
C/C++ (Score:3, Interesting)
I'm a MSEE and I've been working in the digital signal processing realm for the last 10 years since graduating. I should mention that I haven't done a lot of low level hardware work, I haven't programmed actual DSP cards or played with CUDA. I have written software that did real-time signal processing just on a GPU. Everyone in my industry at this point uses C or C++. There is some legacy FORTRAN, and I shudder when I have to read it. Some old types swear by it, but it's fallen out of favor mostly just because it's antiquated and most people know C/C++ and libraries are available for it.
For non-real-time prototypes I'd recommend learning python (scipy, numpy, matplotlib). Perhaps octave and/or Matlab would be useful as well.
At some point you have to decide what your strength will be. I love learning about CS and try to improve my coding skills, but it's just not my strength. I'm hired because of my DSP knowledge, and I need to be able to program well enough to translate algorithms to programs. If you really want to squeeze out performance then you'll probably want to learn CUDA, assembly, AVX/SSE, and DSP specific C programming. But I haven't delved to that level because, honestly, we have a somewhat different set of people at the company that are really good in those realms.
Of course, it would be great if I could know everything. But at the moment it's been good enough to know C/C++ for most of our real time signal processing. If something is taking a really long time, we might look at implementing a vectorized version. I would like to learn CUDA for when I get a platform that has GPUs but part of me wonders if it's worth it. The reason C/C++ has been enough so far is that compilers are getting so good that you really have to know what you're doing in assembly to beat them. Casual assembly knowledge probably won't help. I might be wrong, but I envision that being the case in the not too distant future with GPUs and parallel programming.
Quick suggestion... (Score:3)
Do you have access to MATLAB or a similar analysis tool? Many universities have licenses, and overall it seems like it might be a good choice for you. These programs usually have a lot of build-in functionality that will be difficult to reproduce if you are not an experienced scientific programmer.
I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch.
This is probably a bigger problem than choosing which language to use. If you don't know how to program properly and efficiently, it doesn't matter which language you choose. If you go this route I'd suggest taking a course to refresh or upgrade your skills. Since you're familiar with C that might be a good language to focus on in the course. Another factor is if you have to work with any existing libraries it might limit your choices. I program in C, FORTRAN, and VB and find that for computationally intensive programs C is usually the best fit, sometimes FORTRAN, and never VB.
Re: (Score:3)
NO.
No Matlab. Not portable, not open, and it perpetuates a vendor lock-in for quantitative scientists/engineers every bit as bad and destructive as the stranglehold Windows has enjoyed on the desktop for decades.
I think you're over-stating things a touch. Some of the core stuff is closed source but most of the functions are open, meaning that they are readable .m scripts. e.g. if you're worried about how MATLAB implements ANOVA then you read the file and check. You can modify if needed. So MATLAB is open enough in most normal usage scenarios. You're not really locked in given that we have Octave.
Python is more readable, more enjoyable to code, has equivalent IDEs available (Spyder), far more user-friendly features, you can use your code literally anywhere you go without worrying about a Matlab license, and the SciPy Stack has reached functional feature parity with Matlab (and is evolving well beyond in certain areas).
I like Python and I've spent some time learning it recently and ported some of MATLAB code. Python is not a panacea, h
VB (Score:2)
Matlab (Score:3)
Matlab will fall, SciPy will rise (Score:3)
If you are working in academia, then you probably have access to Matlab.
On the other hand, you definitely have access to SciPy, given that it's free.
I predict that Python with SciPy/NumPy will completely displace Matlab within a few years.
I say that even though I am working in one industry, digital signal processing, that is really married to Matlab and will be one of the last places to make the switch.
Because Matlab was purpose-built for scripting with matrices, it has some nice syntactic sugar for that.
Fortran (Score:2)
I worked as a sysadmin for a high energy physics group at the Beckman Center. Day and night, it was Fortran, on big whopping clusters, doing monte carlo simulations.
Though it ~was~ many years ago.
Elsewhere, I worked for a company doing datamining on massive datasets, over a terabyte of data back in 2000, per customer, with multiple customers and daily runs on 1-5 gig subsets. We used C + big math/vector/matrix libs for the processing because nothing else could come close, and Perl or Java for the data mana
Matlab (Score:2)
Don't use a programming language. Use a tool like Matlab or Mathematica instead. These tools are well designed for scientific computing and have sufficient scripting built in to support the programming-language-like functionality you're probably looking for.
You won't be able to call yourself a programmer. But you're not a programmer, you're a scientist.
R, Perl, some C (Score:3)
I run lots of statistical analyses. Most of the code is in R with some wrappers in Perl and some specific libraries in C. The R and Perl code is pretty much all my own. The C is almost entirely open source software with very minor changes to specify different libraries (I'm experimenting with some GPU computing code from NVidia). Most of the people who are doing similar things are using Python with R (or more specifically, the people I know who are doing the same thing are using Python/R).
An average run with a given data set takes approximately 20 minutes to complete on an 8-core AMD 8160. About 80% of the run is multi-threaded and all cores are pegged. The last bit is constrained mainly by network and disk speed.
You may consider using something like Java/Hadoop depending on your data and compute requirements. Though my Java code is just a step above the level of a grunting walrus, I've found that the performance is actually not that bad and can be pretty good in some cases.
I recommend C#.NET (Score:2, Informative)
old xeon box? linux vs. xp (Score:2)
With xeons going 64-bit around 2005, it would have to be really old to be only 32 bit.
And even if it was an ancient 32-bit only xeon, XP is still going to have issues using more than 3.5 gb ram.
XP process management seems weak to me compared to the linux side of things.
I don't have a favorite brand of linux to recommend; I would ask your professors and fellow researchers if they have a preference (because the
Congratulations! You are a sysadmin! (Score:2)
It sounds like you have control of the whole machine, which makes you the sysadmin. You don't only get to choose the programming language. You have to design a workflow. The programming language will fall out of you designing your plan of attack. You have to do so within the limitation of your advisor's budget, the assistance you can beg, etc. Take comfort in the fact that procedural languages are deep down 98% the same with different words for things, it is the libraries that get confusing. And read the li
"Scientific Computing" is over-broad (Score:3)
The problem with this question is that "scientific computing" is an over-broad term. The truth is that certain languages have found specific niches in different parts aspects of scientific computing. Bioinformatics, for example, tends to involve R, Python, Java, and PERL (the prominence of each depends largely on the application). Big-data analytics typically involves Java or languages built on Java (Scala, Groovy). Real-time data processing is generally done in Matlab. pharmacokinetics, some physics, and some computational chemistry are often done in FORTRAN. Instrumentation is generally controlled using C, C++, or VB.NET. Visualization is done in R, D3 (JavaScript), or Matlab. Validated clinical biostatistics are all done in SAS (!).
Python is a nice simple to learn start, very powerful, and the NumPy package is important to learn for scientific computing. R is the language of choice for many types of statistical and numerical analysis. Those are a good place to start, if incomplete. From there, I'd look at the specific fields of interest and look at what the common applications and code-base are for those.
With regard to the OS, that's pretty easy: Linux (though OS X is a reasonable substitute). Nearly all scientific computing is done in a UNIX-like environment.
VB is too slow for you? C++ then... (Score:3)
I suspect that VB is NOT your problem here. But, if you have a VB program that is too slow, then I'm going to suggest you do the following:
1. Profile your program and see if you can figure out what's taking up all the processing time. It may be possible to change the program you already have slightly and get the performance you need. It would be a shame to go though all the trouble to learn a new language and recode the whole thing if replacing some portion of your code will fix it. Do you have a geometric solution implemented when a non-geometric solution exists?
2. Consider adding hardware - It's almost ALWAYS cheaper to throw hardware at it than to re-implement something in a language you are learning.
3. Rewrite your program in VB - This time, looking for ways to make it perform faster (you did profile it right? You know what is taking all the time right?) Can you multi-thread it, or adjust your data structures to something more efficient?
4. Throw hardware at it - I cannot stress this enough, it's almost ALWAYS easier to throw hardware at it, unless you really have a problem with geometric increases in required processing and you are just trying to run bigger data sets..
5. If 1-4 don't fix it, then I'm guessing you are in serious trouble. If you really do not have a geometric problem, You *MIGHT* be able to learn C/C++ well enough to get an acceptable result if you re-implement your program. C/C++ will run circles around VB when properly implemented, but it can be a challenge to use C/C++ if your data structures are complex.
6. Throw hardware at it - seriously.
Unless you really just have a poorly written VB program or you are really doing some geometric algorithm with larger data sets (In which case, you are going to be stuck waiting no matter what you do) getting better hardware may be your only viable option. I would NOT recommend trying to pick up some new language over VB just for performance improvement unless it is simply your only option. If you do decide to switch, use C/C++ but I would consider that a very high risk approach and the very last resort.
C. Obviously. (Score:4, Insightful)
You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?
Another thought: scientific libraries. If you need external services/algorithms then your chosen language should support the libraries you need. C/C++ are well served by many fast machine learning libs such as FANN, LIBSVM, OpenCV, not to mention CBLAS, LinPACK, etc.
Re: (Score:3)
Re: (Score:2)
You can use Cython for heavy lifting without dropping all the way down to C.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Before you C++ kids want to tell me something, read up on that Mr Kuck and his optimizers. Fortran optimizers did things about 20 years ago which C++ optimizers still cannot do.
Such as? (Please understand that I'd opt for Fortran instead of C++ for numerics any day of the week myself. But I think this is mostly a fallacy nowadays - I'm pretty sure the Intel stuff shares a major part between the two compilers.)
Re: (Score:3)
It's mainly due to more constraints that Fortran places on data structures, e.g. lack of aliasing, that let the optimizer do a better job.