Introduction to HPC

Resources

Beocat

Video Script

Hey folks, today I’d like to talk to you a little bit about supercomputing. I’m Dr. Dan Andresen. I’m the director of our K-State Supercomputing center, also a professor in computer science. So if you want to see me come by my office or set up a Zoom meeting, happy to talk about this sort of thing. So supercomputing is about two things. It’s about size. And it’s about speed. And compared to your average laptop, the laptop is the teeny mouse, so it’s about two big things. And it’s about size, because many problems just can’t fit into a standard laptop or standard desktop. And see if this issue of Okay, my problem is too big for that, what do I do? You take it to a supercomputer. Or similarly, if you’ve got a big problem that just takes way too long, or you got a little problem, and you get lots and lots of them, then it’s really useful to have something that isn’t likely to go ahead and say, Oh, yeah, by the way, I’m going to restart it every couple weeks, because Microsoft pushed out a patch, or something like that. And so these are the types of problems that we do on a supercomputer because with a supercomputer, we bring the size, we bring the speed so that if you’ve got really big problems, they’re actually manageable.

We had one of our users, for instance, that came to me a year or two ago, and said, I’ve got a problem that’s taking too long to run. Sure. How long have you taking, like calculate is going to take 436 years? Ah, we can handle that. So anyway, we talked about it, set some students on it, we ran it on Beocat, we ran it in parallel, and we knocked down a one week problem down to 36 seconds, which is about an 81 million times speed up. And that’s the sort of thing that a) is a lot of fun. And b) it’s something that we get to do in supercomputing because we have the tools that can really take this on. So supercomputers are awesome for doing the really big problems. We use it for things like simulations and data mining and visualization. The idea is we take really big amounts of data. An average wheat genome is one and a half terabytes, for instance, a little more than your average PC can handle. Especially if you’re doing a denovo genome assembly, where you need all that in memory. Otherwise, it takes forever. So what we do is we take the big data, and we look at it and we say, hey, how can we analyze this? How can we compute with it, and come up with some really life changing stuff?

So for instance, tornadoes: fact of life here in Kansas. And so what happened was years ago, we couldn’t really do things at a high resolution, that’s the kind of this general idea that will maybe there might be a problem somewhere in the area. And on TV, they’d have this thing, look out everybody in this massive cone of space. Now it’s more like, Okay, if you’re in this area move. And if you’re in this area, you’re probably gonna be relatively safe. Because we’ve got more and more powerful computers, we’ve got better and better models. And because we’ve got the supercomputers that can handle it, we can start saying, hey, look, it’s worth spending $20 million on the machine, because we got to avoid evacuating Houston, which would cost billions or evacuating most of Florida, which would also cost billions. And so because we have bigger computers, and we have faster computers, we can actually do some really cool stuff.

So we do simulation, because a lot of times it’s either impossible, or really hard to do things in real life. You know, it’s the like the old joke about defining the universe give two examples. What do I do? I can’t. So you can either use simulation to say, Okay, let’s simulate what a universe would be like if you know, various laws were tweaked. Or there are times where it’s just too expensive. We had a professor, for instance, that was doing air purification systems. And every time she built a sample, micro engineering product, it cost about $10,000. Well, what you would do is you’d use a supercomputer to analyze electronically 10s of thousands of these possibilities, then for the ones that looked really, really promising, you’d have a sample made. Saved her millions of bucks. So the idea behind simulation on supercomputers, we can do things that otherwise we just can’t do, whether it’s for ethical reasons, or for economic reasons. Both are very, very valid. The other thing is supercomputing isn’t just something we do for science. It’s also something that really has a big impact on industry. Airlines, for instance, use supercomputers to figure out how can we optimize logistics saving about 100 million bucks a year, automotive design, again, about a billion dollars per company per year in terms of not having to do all the testing and being able to pull together and do CAD CAM, structural integrity, all this analysis that we don’t have to do physically. But it becomes something we can do digitally in a simulation, and it saves a ton of money despite being kind of expensive. I mean, to get a supercomputer it’s anything over about a million bucks or so up into the possibly hundred million dollars. If you’re a government system that really needs a big system. Semiconductor industry, same sort of things saving about a billion bucks a year. Energy, about building about two new power plants for you and those suckers are expensive, you know, 300 million up to about a billion dollars each.

If you go to work, for instance, for Cerner. They have the equivalent of a supercomputer they use to do virtual drugs studies, Hey, what is this impact on this drug with this drug for male patients in their late 80s, or something like that they do enough, they have enough data, they can do this analysis and say, here, here’s the impact. And it works. And it’s really has a big impact on their bottom line. And on the fact that they didn’t have to go out and do drug testing on a bunch of 80 plus year old men, because like my dad, he’s not in good, good shape for setting up a bunch of drug tests might actually hurt him. So really an important thing, both in the real world, real world, and in academia and in science. So why do we bother? Well, the big news is, you can do bigger and more exciting science. And that’s really one of the drivers today for scientific supercomputing is the fact that you can do things at a massive scale, whether it’s simulating the universe, simulating weather and hurricanes, or simulating genomes and looking for things that oh, you know, might be causing the COVID virus to be so nasty. And so you get very, very useful information out of some very, very big powerful tools. Similarly, if you’re familiar with HPC, or supercomputing, then you know, where normal computing is going.

So 20 years ago, Google was barely a gleam in Sergey’s eye. But on the other hand, for those of us that were in supercomputing at the time, what they were doing is like, Oh, yeah, we understand that. And so you can get ahead of the curve, and know where normal computing is going to be. And that can be a really strategic advantage. If you’re trying to figure out where am I going to go in my job, and what’s my career going to look like? So in the future, the general idea is, particularly if you start combining things like 5G networking, and the cloud and software, you can license pretty easily and the Internet of Things. And you started seeing My computer is really well, I need this sensed I need this computed on. And I need this visualized, and I need to plug in this AI algorithm to do an analysis on it for me. And then I want the output displayed in my virtual reality goggles. That’s what it looked like, we could have a great, here’s my credit card. Well, I still have it all on file already. Here’s my credit card. Here’s my budget. And let’s go ahead and rent the sort of thing. So you won’t necessarily say hey, I need to buy a $5 million computer, although you might, but you might rent one for the time that you need it and then let other people use it when they don’t just like you do today in the cloud.

So it’s collaborative, you work with other people. It’s dynamic, and it is really, really cool. So that’s kind of what we’re looking at for the future in terms of supercomputing, but it’s all at massive scales. So a couple of resources. A lot of these slides I get from Henry Newman, Globus Open Science grid and other things are available. Thanks for the people that let us use your graphics. And I hope this helps. If you have any questions want to talk about it. As you can tell, I like to talk about it. Get in touch