BIG DATA

Sarthak Agarwal
5 min readSep 17, 2020

Big Data has been buzzing around for quite some time, but there are a lot of misconceptions surrounding it. In this post, I would try my best to explain Big Data in the simplest way I can.
Image for post

What is Big Data?
Big Data refers to a huge volume of data, that cannot be stored and processed using the traditional computing approach within a given time frame.
But how huge this data needs to be? To be termed as Big Data?
There is a lot of misconception surrounding, what amount of data can be termed as Big Data.
Usually, the data which is either in gigabytes, terabytes, petabytes, exabytes or anything larger than this in size is considered as Big Data.
This is where the misconception arises.
Even a small amount of data can be referred to as Big Data depending on the context it is being used.
To get more clarity on this let me make use of a few examples and explain it to you.

For example, if we try to attach a document that is of 100 megabytes in size to an email we would not be able to do so. As the email system would not support an attachment of this size.
Therefore this 100 megabytes of attachment with respect to email can be referred to as Big Data.
Let me take another example and explain so that you get a better understanding of the term Big Data.
Let’s say we have around 10 terabytes of image files, upon which certain processing needs to be done.

For Facebook the graphs are shown in the below post:

For instance, let’s say we want to resize and enhance these images within a given time frame. If we make use of a desktop computer to perform this task. We wouldn’t be able to accomplish this task within the given time frame. As the computing resources of a desktop computer wouldn’t be sufficient to accomplish this task. You may require a powerful server with high-end computing resources to accomplish this task on time.
Therefore this 10 terabytes of image files can be referred to as Big Data with respect to processing them on a desktop computer.
I hope by now it’s completely clear to you, what do we mean by Big Data.
How Big Data is Classified?
Big Data is classified into 3 different categories.
Structured Data
Semi-Structured Data
Unstructured Data

Structured Data refers to the data that has a proper structure associated with it. For example, the data that is present within the databases, the CSV files, and the excel spreadsheets can be referred to as Structured Data.
Semi-Structured Data refers to the data that does not have a proper structure associated with it. For example, the data that is present within the emails, the log files, and the word documents can be referred to as Semi-Structured Data.
Un-Structured Data refers to the data that does not have any structure associated with it at all. For example, the image files, the audio files, and the video files can be referred to as Un-Structured Data.
This is how Big Data is classified into different categories.
Characteristics of Big Data
Big Data is categorized by 3 important characteristics.
Volume
Velocity
Variety
Image for post
Volume refers to the amount of data that is getting generated.
Velocity refers to the speed at which the data is getting generated.
And Variety refers to the different types of data that is getting generated.
These are the 3 important characteristics of Big Data.
Traditional Approach of Storing and Processing Big Data
In a traditional approach, usually the data that is being generated out of the organizations, such as the banks or stock markets, or the hospitals is given as an input to an ETL (Extract, Transform and Load) System.
An ETL System, would extract this data, transform this data, (that is, it would convert this data into proper format) and finally load this data into the database.
Once this process is completed, the end users would be able to perform various operations, such as generate reports and perform analytics by querying this data.
But as this data grows, it becomes a challenging task to manage and process this data using this traditional approach.
This is one of the reasons for not using the traditional approach for storing and processing the Big Data.
Now, let’s try to understand, some of the major drawbacks associated with using the traditional approach for storing and processing the Big Data.
The first drawback is, it’s an expensive system and requires a lot of investment for implementing or upgrading the system, therefore small and mid-sized companies wouldn’t be able to afford it.
The second drawback is scalability. As the data grows expanding this system would be a challenging task.
And the last drawback is, it is time-consuming. It takes a lot of time to process and extract, valuable information from this data, as it is designed and built based on legacy computing systems.
Hope this makes clear, why the traditional approach or the legacy computing systems are not used to store and process the Big Data.
Challenges Associated with Big Data
There are 2 main challenges associated with Big Data.
The first challenge is, how do we store and manage such a huge volume of data, efficiently?
And the second challenge is, how do we process and extract valuable information from such a huge volume of data within a given time frame?
These are the 2 main challenges associated with the storing and processing Big Data, which led to the creation of the Hadoop framework.
The Wrap-Up
I have done my best to explain the concepts surrounding Big Data in the simplest way I can.
Big Data and related technologies such as Hadoop, HBase and likewise are here to stay, as long as, the data is there and it keeps growing.
Investing time and money in learning Big Data would be the right decision, as it’s very promising, and has a very bright career prospect.
Hope you found this post to be informative. Please do not hesitate to keep a WOW for it (An Open Secret: You can clap up to 50 times for a post, and the best part is, it wouldn’t cost you anything), also feel free to share it across.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sarthak Agarwal
Sarthak Agarwal

Written by Sarthak Agarwal

Cloud & DevOps Enthusiast ★ARTH Learner ★ AWS ★ GCP ★ Jenkins ★ K8S ★ Ansible ★ MLOps ★ Terraform ★ Networking ★ Python

No responses yet

Write a response