Lesson 1 of 2225 min read

What is Data? Raw Facts, Types and the Data Lifecycle Explained

Understand what data really means, how it differs from information, the types of data, and why every SQL learner must master this foundation.

What is Data? Raw Facts, Types and the Data Lifecycle Explained

Every SQL query begins with one core idea: data. Before you create a table or write a SELECT statement, it helps to understand what data actually is, how it differs from information, and why its quality affects every result a query produces. This is one of the most important database basics for beginners, because the same raw facts in a database power Netflix recommendations, Amazon pricing, and Zomato delivery estimates. Once data vs information is clear, every SQL command that follows will make a lot more sense.

What is Data?

In the context of a database, data is any raw, unprocessed fact recorded without context — a name, a number, a date, or a click. On its own, a single piece of data, such as the number 88, says nothing. It only becomes useful once SQL processes it — counting it, averaging it, or grouping it — to produce information you can act on. Recognizing the different types of data in a database is the first real step toward writing SQL with confidence.

What You'll Learn

  • Understand what data means and how it differs from information.
  • Identify structured, semi-structured, and unstructured data.
  • Learn the stages of the data lifecycle, from collection to archival.
  • See why data quality directly affects the accuracy of SQL results.

Key Terms to Know

  • Data: A raw, unprocessed fact such as a name, number, date, or timestamp with no context attached.
  • Information: Data that has been processed or organized to answer a specific question.
  • Structured data: Data stored in a fixed format of rows and columns, like a database table.
  • Unstructured data: Data with no fixed format, such as images, videos, or free-text reviews.
  • Data lifecycle: The journey data takes from collection through storage, processing, and eventual archival or deletion.

Data vs Information: What's the Difference?

Data and information are often used interchangeably, but in a database they mean different things. Data is a raw fact with no context attached — the number 88 on its own tells you nothing. Information is what you get once that data is processed to answer a question. If 88 is one of five exam scores and you calculate the class average, the result, say 80.4, is information because it answers a specific question: how did the class perform?

SQL is the tool that performs this transformation. A query such as SELECT AVG(marks) FROM students takes raw stored values and turns them into a meaningful result your application can use.

Types of Data in a Database: Structured, Semi-Structured, and Unstructured

The type of data affects how it gets stored. Structured data follows a fixed schema with clearly defined rows and columns, such as a customers table with columns for name, email, and signup date — this structured-versus-unstructured data distinction is exactly what relational databases and SQL are built around.

Semi-structured data, like a JSON object or an API response, has some organization but does not fit neatly into fixed columns, since different records can carry different fields. Unstructured data has no defined format at all — product photos, recorded calls, and free-text reviews are common examples — and it is usually stored as files outside the database, with only a reference path saved in a SQL table.

The Data Lifecycle in SQL and Why Data Quality Matters

Data moves through a lifecycle: collection, storage, processing, analysis, and eventually archival or deletion. SQL developers are involved at almost every stage, from designing tables that store data correctly to writing the queries that process and analyze it later.

Data quality matters just as much as data type. A sales table with duplicate rows or inconsistent formatting will make a simple SUM or COUNT query return the wrong answer. Constraints such as NOT NULL, UNIQUE, and CHECK enforce quality directly inside the database, which is why thinking about it early saves debugging time later.

Visual Summary

Picture a simple left-to-right pipeline: raw facts are collected and stored as rows in a table, then a SQL query processes that data using functions like SUM, AVG, or COUNT to produce a final result. For example, individual order amounts stored in an orders table become one useful number once a query calculates total monthly revenue.

Data vs Information at a Glance

AspectDataInformation
DefinitionRaw unprocessed factProcessed and meaningful result
Example85, 90, 72 (student marks)Average score is 82.3
ContextHas no inherent contextAlways answers a specific question
SQL roleStored in table columnsProduced by SELECT queries
Quality impactErrors stored silentlyErrors appear in reports and decisions

SQL Example

-- Step 1: Create a students table to store raw data
CREATE TABLE students (
  student_id  INT          PRIMARY KEY AUTO_INCREMENT,
  student_name VARCHAR(100) NOT NULL,
  subject      VARCHAR(80)  NOT NULL,
  marks        INT          NOT NULL CHECK (marks BETWEEN 0 AND 100),
  exam_date    DATE         NOT NULL
);

-- Step 2: Insert raw data (individual facts)
INSERT INTO students (student_name, subject, marks, exam_date) VALUES
  ('Asha Mehta',    'Mathematics', 88, '2026-03-15'),
  ('Rahul Sharma',  'Mathematics', 76, '2026-03-15'),
  ('Priya Nair',    'Mathematics', 92, '2026-03-15'),
  ('Arjun Das',     'Mathematics', 65, '2026-03-15'),
  ('Sneha Patel',   'Mathematics', 81, '2026-03-15');

-- Step 3: Transform data into information using SQL
SELECT
  subject,
  COUNT(*)               AS total_students,
  AVG(marks)             AS average_marks,
  MAX(marks)             AS highest_marks,
  MIN(marks)             AS lowest_marks,
  SUM(CASE WHEN marks >= 75 THEN 1 ELSE 0 END) AS passed_students
FROM students
GROUP BY subject;

The five rows in the students table above are raw data — just numbers. The SELECT query transforms them into information by calculating the total students, average marks, and pass count for the subject. The same stored data can answer many different questions depending on how you query it.

Real-World Examples

  • Netflix stores each play event — user, content, and duration — as raw data, then analyzes it to power its recommendation engine.
  • Amazon turns millions of clicks and purchases into information used for pricing, recommendations, and fraud detection.
  • Zomato logs each order as raw data, then aggregates it into reports on delivery time and partner performance.
  • Banks store every transaction as a single raw record, then aggregate them into monthly statements and real-time fraud alerts.
  • Ride-hailing apps like Uber store each trip's raw GPS and fare data, then turn it into information such as surge pricing and driver ratings.

Best Practices and Pro Tips

  • When designing a new table, decide upfront whether a column holds raw data (like a phone number) or a derived value (like an age) — derived values are usually safer to calculate in a query than to store, since stored ones can go stale.
  • If you're working with semi-structured data in MySQL 5.7 or later, the native JSON column type lets you keep flexible fields without giving up SQL's indexing and querying tools entirely.
  • Add NOT NULL and CHECK constraints the moment you design a table, not after a bug report — cleaning up bad data that has already accumulated in production is far more painful than preventing it at creation time.

Common Mistakes to Avoid

  • Treating data and information as the same thing — they are technically different in database systems.
  • Assuming all data fits neatly into rows and columns, when much of it is semi-structured or unstructured.
  • Skipping constraints like NOT NULL or UNIQUE, which leads to poor data quality and incorrect results.
  • Thinking data only means numbers — text, dates, images, and audio are data too.

Interview Questions

Q1. What is the difference between data and information?

Data is a raw, unprocessed fact with no context, such as a single number. Information is what you get after processing that data to answer a specific question, such as an average or a total.

Q2. What are the three types of data?

Structured data fits into fixed rows and columns, semi-structured data like JSON has partial organization, and unstructured data such as images or videos has no fixed format at all.

Q3. Why does data quality matter in SQL?

Poor data quality, such as duplicate rows or missing values, leads to incorrect query results. Constraints like NOT NULL and UNIQUE help enforce quality at the database level.

Q4. What is the data lifecycle?

It is the journey data takes from collection, through storage and processing, to analysis and eventual archival or deletion.

Practice MCQs

1. A customer's date of birth stored in a table is an example of:

  1. Information
  2. Raw data
  3. A SQL query
  4. A schema

Answer: B. Raw data

Explanation: A single stored value is a raw fact with no applied meaning yet.

2. Which SQL clause is most associated with turning data into summarized information?

  1. WHERE
  2. GROUP BY
  3. ORDER BY
  4. LIMIT

Answer: B. GROUP BY

Explanation: GROUP BY aggregates rows into summary results.

3. Which of these is unstructured data?

  1. A customers table
  2. A JSON config file
  3. A product review in plain text
  4. A CSV file

Answer: C. A product review in plain text

Explanation: Free text has no fixed schema and cannot be queried like structured columns.

Quick Revision Points

  • Data is raw; information is processed and meaningful.
  • The three data types to remember are structured, semi-structured, and unstructured.
  • SQL and relational databases are built primarily for structured data.
  • The data lifecycle order is: Collection > Storage > Processing > Analysis > Archival.

Conclusion

  • Every SQL query transforms stored data into meaningful information.
  • Knowing data types helps you design better table schemas.
  • Data quality determines how trustworthy your query results are.

Data is the raw material every database stores, and SQL is the language that turns it into information you can act on. This is exactly why understanding what data is comes before any SQL for beginners tutorial dives into syntax. Structured data fits naturally into tables, while semi-structured and unstructured data need extra handling — and that distinction matters from your very first lesson onward.

Frequently Asked Questions

Data is any raw fact or observation that has been recorded. A student's name is data. A product's price is data. The time you placed a food order is data. On its own, each piece is just a number, a word, or a date. Data becomes truly useful when it is organized in a database and analyzed with SQL to answer questions.

Data is raw and has no immediate meaning. The numbers 88, 76, 92, 65, 81 are raw marks. Information is what emerges after processing. After calculating the average with SQL, you know the class average is 80.4. That average is information because it answers 'how did the class perform?'

SQL is a language for storing, querying, and manipulating data. If you do not understand what data is, why it is collected, and how it is structured, SQL commands become rote memorization instead of a practical tool. Understanding data gives every SQL lesson genuine meaning and context.

Structured data is organized in a fixed schema with named columns and defined data types. An employees table with columns for employee_id, name, department, and salary is structured data. Every row follows the same pattern and SQL can query any column precisely.

Semi-structured data has some organization but is flexible, such as JSON documents where different records may have different fields. MySQL 5.7 and later supports a native JSON data type, allowing semi-structured documents to be stored in a table column and queried with JSON functions like JSON_EXTRACT.

Every user action on Amazon generates data: searches, product views, cart additions, purchases, reviews, returns, and delivery tracking. Millions of users performing these actions simultaneously produce billions of rows per day stored across operational databases, data lakes, and analytics warehouses.

Poor data quality produces incorrect results. If customer records have duplicate entries, a count of unique customers will be inflated. If order amounts have data entry errors like negative prices, revenue reports will be wrong. Constraints such as NOT NULL, UNIQUE, CHECK, and FOREIGN KEY help prevent quality problems at the database level.

No. Structured data is stored in rows and columns in relational databases. Unstructured data such as images, videos, and audio files is typically stored in file systems or object storage like Amazon S3, with only a file path or URL saved in the database. Semi-structured data can be stored in JSON columns or document databases like MongoDB.

A record is one complete row in a table. In a students table, one record might contain student_id=101, student_name='Asha', subject='Mathematics', and marks=88. The record represents all stored facts about one entity, which in this case is one student's exam result.

SQL developers participate in multiple stages of the data lifecycle. They design schemas for data storage, write INSERT statements for data collection, create SELECT queries for processing and analysis, set up archival procedures using DELETE or archive tables, and build indexes to maintain performance as data volume grows over the lifecycle.