Reading CSV Files with Pandas

Pandas is a powerful Python library for data analysis and manipulation. One of the most common tasks in data analysis is reading CSV (Comma-Separated Values) files. Pandas makes this process simple with its read_csv function.

Reading a CSV File

To read a CSV file into a Pandas DataFrame, use the following code:

import pandas as pd

# Read CSV file
df = pd.read_csv("data.csv")

### Display the first five rows
print(df.head())

Handling Different Delimiters

If your CSV file uses a delimiter other than a comma, specify it using the delimiter parameter:

df = pd.read_csv("data.tsv", delimiter="\t")  # Tab-separated file

Reading Large CSV Files

For large files, you can read them in chunks using the chunksize parameter:

chunks = pd.read_csv("large_data.csv", chunksize=1000)
for chunk in chunks:
    print(chunk.head())

Selecting Specific Columns

To load only certain columns, use the usecols parameter:

df = pd.read_csv("data.csv", usecols=["column1", "column2"])

Handling Missing Values

To handle missing values while reading a CSV file, use the na_values parameter:

df = pd.read_csv("data.csv", na_values=["N/A", "na", "--"])

Conclusion

Pandas provides a flexible and efficient way to read CSV files, whether they are small or large, formatted differently, or contain missing values. Mastering read_csv will significantly enhance your data-handling capabilities in Python.