KNOWLEDGE

2019-12-06 00:00:00

Difference between Structured, Semi-structured and Unstructured data

 

 

 

         Big Data includes huge valume, high velocity, and extensible variaty of data. These are 3 types: Structured data, Semi-structured data, and Unstructured data.

  1. Structured data

            Structured data is a data whose elements are addressable for effective analysis. It has been organised into a formatted repository that is typically a database. It concern all data which can be stored in database SQL in table with rows and columns. They have relational key and can easily mapped into pre-designed fields. Today, those data are most processed in development and simplest way to manage information. Example: Relational data.

  1. Semi-structured data

Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyze. With some process, you can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data.

  1. Unstructured data

            Unstructured data is a data that is which is not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good fit for a mainstream relational database. So for Unstructured data, there are alternative platforms for storing and managing, it is increasingly prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics applications. Example: Word, PDF, Text, Media logs.

 

PROPERTIES

STRUCTURED DATA

SEMI-STRUCTURED DATA

UNSTRUCTURED DATA

Technology

It is based on Relational database table

It is based on XML/RDF

It is based on character and binary data

Transaction management

Matured transaction and various concurrency technique

Transaction is adapted from DBMS not matured

No transaction management and no concurrency

Version management

Versioning over tuples,row,tables

Versioning over tuples or graph is possible

Versioned as whole

Flexibility

It is sehema dependent and less flexible

It is more flexible than structuded data but less than flexible than unstructured data

it very flexible and there is abbsence of schema

Scalability

It is very difficult to scale DB schema

It’s scaling is simpler than sstructured data

It is very scalable

Robustness

Very robust

New technology, not very spread

Query performance

Structured query allow complex joining

Queries over anonymous nodes are possible

Only textual query are possible

 

 

ที่มา: Geeksforgeeks