HoughtonlakeBoard.org

TIPS & EXPERT ADVICE ON ESSAYS, PAPERS & COLLEGE APPLICATIONS

-What is big data :

 Big data
from it is name is very big , starting size from 1 TB.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

    The
term Big Data is often used to denote a storage system where different types of
data in different formats can be stored for analysis and driving business
decisions.

   Big
Data is an assortment of such a huge and complex data that it becomes very
tedious to capture, store, process, retrieve and analyze it with the help of
traditional RDBMS databases or traditional data processing techniques.


the characteristics of big data

·        
Velocity – Data needs to be
analyzed quickly  ,  Denotes the
speed at which data is emanating and changes are occurring between the diverse
data sets.

·        
Volume – Large amount of
data , Around 6 million people are using the digital media and it is estimated
that about 2.5 quintillion bytes of data is being updated every day.

·        
Variety – Most of the data
is unstructured in nature.

·        
value – extracting business insights and
revenue from data .

 

 

 

 

-What Is Hadoop?

Hadoop
is a framework that allows for distributed processing of large data sets across
clusters of commodity computers using a simple programming model.

Unstructured
data such as log files, Twitter feeds, media files, data becoming from the
internet in general is becoming more and more relevant to businesses, also you
can’t store large data on the systems

-Hadoop
is composed of four core components

1. Hadoop Common

Also
known as Hadoop Core; It is a collection of libraries or utilities that support
other Hadoop modules.

2. Hadoop Distributed
File System (HDFS):

Data in Hadoop
consists of clusters that break down into smaller units called blocks that are
a distributed on different nodes in the cluster , which  allows scalability during data processing
because it enables it to do functions on smaller subsets of the larger sets.
All of this is possible because of the HDFS which provides steady and reliable
storage and creates a link between the file systems and many local nodes in
order to form a single file system .

3. MapReduce

Hadoop
achieves the processing of big amounts of structured and unstructured data in a
parallel way through the Map Reduce ;which is a programming framework that
allows huge scalability across the other servers in the Hadoop clusters.
Therefore, Map Reduce is considered the heart of Hadoop and is simple to deal
with or understand for those familiar with clustered scale out data processing
solutions.

4. Yet Another
Resource Negotiator (YARN).

One
of the key features in the second generation Hadoop and the next generation
MapReduce , is believed to be YARN which is a cluster management technology

What
makes it special is that it opens up on a lot of possibilities since it allows
application frameworks other than MapReduce to run on Hadoop , It also assigns
CPU , memory and storage to applications that run on the a Hadoop cluster .

Part
of the core Hadoop project, YARN is the architectural center of Hadoop that
allows multiple data processing engines such as interactive SQL, real-time
streaming, data science and batch processing to handle data stored in a single
platform

 

 

-Here
are a few key features of Hadoop:

1. Flexibility in
data processing

Hadoop can handle
structured, unstructured, encoded and formatted data which is a huge feature;
since unstructured data has been one of the most challenging aspects
organizations have to deal with. Furthermore , structured data makes up only
20% in any organization but with the ability of Hadoop to analyze unstructured
data , it can save a big amount of usually ignored data that is significant in
the decision making process .

2. Easily Scalable

One
of its most valuable features is its ability to expand because of the ease of
adding new nodes to the system; since it is an open source platform running on
industry-standard hardware.
This feature is considered huge because it allows growth in the data volume
without placing change on any of the existing systems or programs

3. Fault Tolerance

Hadoop can continue
to operate normally if an error occurs or in case a node or some data gets
lost; because it doesn’t actually get lost since data is stored in HDFS meaning
it it’s automatically saved in two more location. This decreases the risk of
losing data in case of collapse to one or both of the other systems.

 

4. Quick at Data
Processing

Hadoop
processes data in a parallel way, making it 10 times faster to process high
volume data compared with traditional ETL and batch processes.

5. Robust Ecosystem

Hadoop
has a robust ecosystem that is able to deliver various data processing needs to
developers and organizations wither small or big. This feature comes from the
tools and technologies that are continuously added to the ecosystem during the
growth of the market. Some of these projects are Hive,Zookeeper, HCatalog,
MapReduce, and Apache Pig.

6. Cost Effectiveness
Hadoop data management costs one fifth to one twentieth the cost of other
technologies since it brings parallel computing to commodity servers which
results in a very significant reduction in the cost per terabyte of storage.

 

 

 

 

 

 

 

-Hadoop
is composed of four core components

1. Hadoop Common

Also
known as Hadoop Core; It is a collection of libraries or utilities that support
other Hadoop modules.

2. Hadoop Distributed
File System (HDFS):

Data in Hadoop
consists of clusters that break down into smaller units called blocks that are
a distributed on different nodes in the cluster , which  allows scalability during data processing because
it enables it to do functions on smaller subsets of the larger sets.
All of this is possible because of the HDFS which provides steady and reliable
storage and creates a link between the file systems and many local nodes in
order to form a single file system .

3. MapReduce

Hadoop
achieves the processing of big amounts of structured and unstructured data in a
parallel way through the Map Reduce ;which is a programming framework that
allows huge scalability across the other servers in the Hadoop clusters.
Therefore, Map Reduce is considered the heart of Hadoop and is simple to deal
with or understand for those familiar with clustered scale out data processing
solutions.

4. Yet Another
Resource Negotiator (YARN).

One
of the key features in the second generation Hadoop and the next generation
MapReduce , is believed to be YARN which is a cluster management technology

What
makes it special is that it opens up on a lot of possibilities since it allows
application frameworks other than MapReduce to run on Hadoop , It also assigns
CPU , memory and storage to applications that run on the a Hadoop cluster .

Part
of the core Hadoop project, YARN is the architectural center of Hadoop that
allows multiple data processing engines such as
interactive SQL, real-time streaming, data science and batch processing to
handle data stored in a single platform.

 

 

 

 

 

 

 

 

-Here
are some ways big data already affects your everyday life.

1. Mobile
Maps

The GPS is
available because of big data. There is more than hundreds of reports and other
maps are scanned in and can be used to have an accurate GPS devices and to
bring data from other reports and areas to make It trustworthy than in the
past.

 

2. Medical
Records

The medical
records now is helping the hospitals to record faster and easier that we can
put them on the computer  which is will
help the doctor  to detect across all the
data and to have privacy for the patient .

3. Online
Shopping

The online
shopping maybe the easiest way to have anything in anytime and to buy what any
consumer or customer wants to in this case the big data is responsible for
recently items that you can find it on the websites these can help the
companies to target their customer’s segment that will help the company to
achieve their goals

 

The
Future of Big Data

In the next years, how will your competitors be using big
data to reach customers? From what we know now, your competitors will use big
data to know exactly how to interact with every customer. And it’s happening
soon. This doesn’t give your business much time to prepare.

 by 2020, predictive and
prescriptive analytics will draw 40% of net new BI investments by enterprise.
In a separate report,    the market for prescriptive analytics software
will grow to $1.1B in 2019. 

These investments into prescriptive and predictive
analytics are only part of the story. IDC reports the big
data and business analytics market will grow to $203B by 2020.

Across all industries, enterprises will leverage customer
data to deliver better personalization. And we can anticipate this will drive
remarkable revenue growth.

Post Author: admin

x

Hi!
I'm Irvin!

Would you like to get a custom essay? How about receiving a customized one?

Check it out