The Four V's


Video Script

To really understand how big data can be useful, we can look at four different aspects that are usually referred to as the four V’s of big data. These are volume. So how much data there is a variety, how much or how little variety actually exist, velocity, just how fast data is being produced and or received. And veracity. How accurate is that data, there’s also a fifth fee, typically referred to as value. But we won’t actually include that in this particular lecture. But if you are looking at Big Data information online, you might see that fifth V out there called value. But let’s take a deeper dive into each one of these.

So volume of data, as I mentioned, deals with the scale of information. So how much is there. And so I was mentioned, this, and this graphic here is from a few years ago, but they mentioned that by 2020, there will be 40 zettabytes of data or 43 trillion gigabytes. That’s an increase of over 300 times more from 2005. And also 6 billion people have cell phones, which is an insane amounts of cellular devices. And if you think about how many of those are smartphones, which are generating a significant amount of information, right? It’s just amazing the amount of information that we are generating nowadays, especially with so many internet connected devices, things like smartwatches, smart cars, and even semi smart cars that aren’t truly self driving, but have electronics in them that generate information. And things like businesses, though, have increased the sheer amount of data that they’re dealing with now, as well, because 2030 years ago, there really wasn’t the means to actually buy one generate this kind of level of data, as well as store and store that type of information as well. So now we just basically store everything. And overall storage has become significantly cheaper than it has been in the past. And we can, technology has gotten a lot better. And we can store a lot more information in a much smaller amount of space.

Variety of data is really important as well. So what kind of information are we actually generating? And we already saw what happened in a an internet minute earlier in a previous video. So what does that really equate to as well, things like smartwatches, smart devices, internet connected devices, so smartwatches, smartphones, things like your IoT devices inside of your house, internet connected lights, and social media is a huge one, everyone is using some form of social media, whether it be Facebook, Twitter, Instagram, Snapchat, streaming services, YouTube, Twitter, YouTube, Twitch, Netflix, Hulu, all of those sorts of streaming services. And then we even also have health care. So 2030 years ago, healthcare was still all pure pencil and or pen and paper, right? When you went to the doctor’s office, they came in with your file on a clipboard and filled out all of your information. Now, if you go to the doctor’s office, majority of the time, they’re going to come in with a laptop instead of a clipboard. And so all of your information is now digitize your chart, all of your health information is online, or at least digitize instead of being an on physical paper. That’s 150 exabytes. And that was in 2011. Due to our reliance, of technology, and the fact that pretty much everyone is online almost all the time during the day, we’re generating a sheer amount of information in a variety of different contexts. And so that provides a lot more interesting use cases of the Information Network generating online and are in our daily routine.

The third V of big data that we talked about is velocity. So velocity deals with essentially the speed right of how we’re actually retrieving that information. So, for example, the New York Stock Exchange deals with one terabyte of trade information during each trading session, and 20. By 2016, there was an estimated amount of almost 19 billion network connections. So that’s almost two as two and a half connections per person on earth. And if we stop to think about how many devices that you can currently have are currently own, that are connected to the internet and at any given point in time, the lot, right? When I was a kid, we might have had one computer that was connected intermittently to the Internet through a dial up connection. But now I have a smartphone, a smartwatch, Alexa devices that are in my house that are always connected online, my TV has a wireless internet connection, your gaming devices have an internet connection. So the number of things that are connected anymore, the number of connections that you have to the outside world, have increased drastically, and so has the speed of which you have your speed of your internet connection is before right with a dial up internet connection, there’s only so much information I can be exchanged on the internet every second. But now with high speed internet access, many more people are able to generate and consume significant amount of more information than we ever have been able to before. And that’s not just typical devices that we experience. We even have things like cars, right cars are generating the sheer, more more amount of information that they ever have as well. Even if you don’t own a smart car, most modern cars come equipped with far more sensors that are actually can actually transmit back to even the manufacturer to tell them information about your vehicle. Or even just tell you more about your vehicle than what information that you would have had before. The last fee in the last fee that we’ll be talking about today is veracity, which deals with the uncertainty of information. And this is a really important one because we have a significant amount of information. And a lot of business revolves around that information as well. So recommender systems online on Amazon, Google searches, your digital footprint that you leave online, all of that information is consumed in some way shape, or form, whether or not be from the company that you actually use a service for and generated that information. Or you personally as a user for things like your your health, so smartwatches Fitbit trackers, that sort of thing.

Is that data accurate? Is that data valid? That’s where we’re dealing with the veracity or uncertainty of that information. And so this is a really interesting aspect here, one in three business leaders don’t actually trust the information that they use to make valid decisions. A little over a quarter of people done in the survey, were unsure about how much how accurate they’re getting, it actually was. This cost a significant amount of money, as estimated that poor data cost us about three and a half $3.1 trillion a year. So if you think about the number of businesses decisions that are actually made using this data, or the number of services that you use, that rely on big data techniques, or data that has been generated a lot, right, a lot of business revolves around that. And if that data is incorrect, or invalid, that’s money loss. And it can cause a lot of different controversies. And in some cases, and in terms of things like health care, borrows and various other things that can actually cost lives as well. So the veracity or certainty of data is an extremely important aspect of how we actually work with big data, how we store it, how we retrieve it, and how we actually analyze it.