Reading CSV Files with Pandas
Pandas is a powerful Python library for data analysis and manipulation. One of the most common tasks in data analysis is reading CSV (Comma-Separated Values) files. Pandas makes this process simple with its read_csv
function.
Reading a CSV File
To read a CSV file into a Pandas DataFrame, use the following code:
import pandas as pd
# Read CSV file
df = pd.read_csv("data.csv")
### Display the first five rows
print(df.head())
Handling Different Delimiters
If your CSV file uses a delimiter other than a comma, specify it using the delimiter
parameter:
df = pd.read_csv("data.tsv", delimiter="\t") # Tab-separated file
Reading Large CSV Files
For large files, you can read them in chunks using the chunksize
parameter:
chunks = pd.read_csv("large_data.csv", chunksize=1000)
for chunk in chunks:
print(chunk.head())
Selecting Specific Columns
To load only certain columns, use the usecols
parameter:
df = pd.read_csv("data.csv", usecols=["column1", "column2"])
Handling Missing Values
To handle missing values while reading a CSV file, use the na_values
parameter:
df = pd.read_csv("data.csv", na_values=["N/A", "na", "--"])
Conclusion
Pandas provides a flexible and efficient way to read CSV files, whether they are small or large, formatted differently, or contain missing values. Mastering read_csv
will significantly enhance your data-handling capabilities in Python.