travisjneuman
diff --git a/‎projects/modules/07-data-analysis/01-pandas-basics/README.md‎
Lines changed: 108 additions & 0 deletions b/‎projects/modules/07-data-analysis/01-pandas-basics/README.md‎
Lines changed: 108 additions & 0 deletions
diff --git a/‎projects/modules/07-data-analysis/01-pandas-basics/data/students.csv‎
Lines changed: 31 additions & 0 deletions b/‎projects/modules/07-data-analysis/01-pandas-basics/data/students.csv‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎projects/modules/07-data-analysis/01-pandas-basics/notes.md‎
Lines changed: 10 additions & 0 deletions b/‎projects/modules/07-data-analysis/01-pandas-basics/notes.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎projects/modules/07-data-analysis/01-pandas-basics/project.py‎
Lines changed: 150 additions & 0 deletions b/‎projects/modules/07-data-analysis/01-pandas-basics/project.py‎
Lines changed: 150 additions & 0 deletions
@@ -0,0 +1,108 @@
+# Module 07 / Project 01 — Pandas Basics
+
+[README](../../../../README.md) · [Module Index](../README.md)
+
+## Focus
+
+- Loading CSV data with `pd.read_csv()`
+- Exploring a DataFrame: `head()`, `tail()`, `shape`, `dtypes`, `info()`, `describe()`
+- Selecting columns by name
+- Sorting rows with `sort_values()`
+
+## Why this project exists
+
+Before you can analyze data, you need to know how to load it and look at it. This project teaches you how to get a CSV file into a pandas DataFrame and use built-in methods to understand what the data looks like — how many rows, what columns exist, what types the values are, and what the basic statistics tell you. These exploration steps are the first thing every data analyst does with a new data set.
+
+## Run
+
+```bash
+cd projects/modules/07-data-analysis/01-pandas-basics
+python project.py
+```
+
+## Expected output
+
+```text
+=== Loading student data ===
+Loaded 30 rows and 4 columns from data/students.csv
+
+=== First 5 rows (head) ===
+        name  subject  grade  age
+0  Alice Chen     Math     92   17
+1  Bob Martinez  Science   78   16
+2  Carol Johnson  English   85   17
+3    David Kim     Math     67   16
+4    Eva Patel  Science     91   18
+
+=== Shape ===
+Rows: 30, Columns: 4
+
+=== Column types (dtypes) ===
+name       object
+subject    object
+grade       int64
+age         int64
+dtype: object
+
+=== Summary statistics (describe) ===
+            grade        age
+count  30.000000  30.000000
+mean   80.100000  17.000000
+...
+
+=== Selecting just name and grade columns ===
+(first 5 rows)
+            name  grade
+0     Alice Chen     92
+1   Bob Martinez     78
+...
+
+=== Sorted by grade (highest first) ===
+(first 10 rows)
+          name  subject  grade  age
+18   Sam Turner     Math     96   17
+...
+
+Done.
+```
+
+The exact numbers will match the CSV data. The `...` sections are abbreviated here — your output will show all rows and statistics.
+
+## Alter it
+
+1. Change `head()` to `head(10)` and see what happens. Try `tail(3)`.
+2. Sort by `age` instead of `grade`. What happens when two students have the same age?
+3. Select three columns instead of two. What does `df[["name", "subject", "grade"]]` return?
+4. Try `df["grade"].mean()` and `df["grade"].max()` — what do they return?
+
+## Break it
+
+1. Change the filename in `read_csv()` to a file that does not exist. What error do you get?
+2. Try selecting a column that does not exist: `df["score"]`. Read the error message.
+3. Remove the `import pandas as pd` line. What happens?
+
+## Fix it
+
+1. Wrap `read_csv()` in a try/except that catches `FileNotFoundError` and prints a friendly message.
+2. Before selecting a column, check if it exists: `if "score" in df.columns`.
+3. Put the import back.
+
+## Explain it
+
+1. What is a DataFrame? How is it different from a list of dictionaries?
+2. What does `describe()` tell you that `info()` does not?
+3. Why does `dtypes` show `object` for the name and subject columns instead of `string`?
+4. What is the difference between `df["grade"]` (one column) and `df[["grade"]]` (double brackets)?
+
+## Mastery check
+
+You can move on when you can:
+
+- Load any CSV file into a DataFrame from memory.
+- Use `head()`, `shape`, `dtypes`, `info()`, and `describe()` to explore a new data set.
+- Select one or more columns from a DataFrame.
+- Sort a DataFrame by any column, ascending or descending.
+
+## Next
+
+[Project 02 — Filtering & Grouping](../02-filtering-grouping/)
@@ -0,0 +1,31 @@
+name,subject,grade,age
+Alice Chen,Math,92,17
+Bob Martinez,Science,78,16
+Carol Johnson,English,85,17
+David Kim,Math,67,16
+Eva Patel,Science,91,18
+Frank Lopez,English,73,17
+Grace Okafor,Math,88,16
+Henry Wang,Science,82,17
+Irene Novak,English,95,18
+James Brown,Math,54,16
+Karen Lee,Science,76,17
+Leo Garcia,English,89,16
+Maria Santos,Math,71,18
+Nathan Green,Science,93,17
+Olivia Reed,English,62,16
+Peter Zhao,Math,84,17
+Quinn Adams,Science,79,18
+Rachel Hill,English,90,17
+Sam Turner,Math,96,16
+Tina Wilson,Science,68,17
+Uma Desai,English,81,18
+Victor Cruz,Math,77,16
+Wendy Fox,Science,86,17
+Xavier Bell,English,59,16
+Yara Hussain,Math,94,18
+Zane Porter,Science,72,17
+Alice Turner,Math,83,16
+Ben Okafor,English,91,17
+Clara Reyes,Science,65,18
+Derek Nash,Math,87,16
@@ -0,0 +1,10 @@
+# Notes — Pandas Basics
+
+## What I learned
+
+
+## What confused me
+
+
+## What I want to explore next
+
@@ -0,0 +1,150 @@
+"""
+Project 01 — Pandas Basics
+
+This script loads a CSV file of student grades into a pandas DataFrame
+and explores the data using built-in methods: head(), shape, dtypes,
+info(), describe(), column selection, and sorting.
+
+Data file: data/students.csv (30 rows with name, subject, grade, age)
+"""
+
+# pandas is the core library for data analysis in Python.
+# The convention is to import it as "pd" so you type less.
+# You installed it with: pip install pandas
+import pandas as pd
+
+
+def load_data(filepath):
+    """
+    Load a CSV file into a pandas DataFrame.
+
+    pd.read_csv() reads a comma-separated file and returns a DataFrame —
+    a table-like structure with labeled columns and numbered rows.
+    Think of it as a spreadsheet you can manipulate with code.
+    """
+    df = pd.read_csv(filepath)
+    print(f"Loaded {len(df)} rows and {len(df.columns)} columns from {filepath}")
+    return df
+
+
+def explore_head(df):
+    """
+    Show the first few rows of the DataFrame.
+
+    head() returns the first 5 rows by default. This is the fastest way
+    to see what your data looks like after loading it.
+    """
+    print("\n=== First 5 rows (head) ===")
+    print(df.head())
+
+
+def explore_shape(df):
+    """
+    Show the dimensions of the DataFrame.
+
+    shape is a tuple (rows, columns). It tells you how big your data set
+    is without printing all the data.
+    """
+    rows, cols = df.shape
+    print(f"\n=== Shape ===")
+    print(f"Rows: {rows}, Columns: {cols}")
+
+
+def explore_dtypes(df):
+    """
+    Show the data type of each column.
+
+    dtypes tells you whether each column holds numbers (int64, float64),
+    text (object), dates, or other types. This matters because you cannot
+    do math on text columns.
+
+    "object" in pandas usually means the column contains strings.
+    """
+    print("\n=== Column types (dtypes) ===")
+    print(df.dtypes)
+
+
+def explore_info(df):
+    """
+    Show a concise summary of the DataFrame.
+
+    info() prints the column names, non-null counts, and data types
+    all in one view. It is especially useful for spotting missing values —
+    if a column has fewer non-null entries than total rows, some values
+    are missing.
+    """
+    print("\n=== Info ===")
+    df.info()
+
+
+def explore_describe(df):
+    """
+    Show summary statistics for numeric columns.
+
+    describe() calculates count, mean, std, min, 25%, 50% (median),
+    75%, and max for every numeric column. This gives you a quick
+    sense of the distribution — are grades clustered around 80?
+    Is the youngest student 14 or 18?
+    """
+    print("\n=== Summary statistics (describe) ===")
+    print(df.describe())
+
+
+def select_columns(df):
+    """
+    Select specific columns from the DataFrame.
+
+    df["column_name"] returns a single column as a Series.
+    df[["col1", "col2"]] returns multiple columns as a new DataFrame.
+    Notice the double brackets — the inner list tells pandas which
+    columns you want.
+    """
+    print("\n=== Selecting just name and grade columns ===")
+    # Double brackets: pass a list of column names to get a DataFrame back.
+    subset = df[["name", "grade"]]
+    print("(first 5 rows)")
+    print(subset.head())
+
+
+def sort_by_grade(df):
+    """
+    Sort the DataFrame by the grade column, highest first.
+
+    sort_values() returns a new DataFrame with rows reordered.
+    ascending=False puts the highest values at the top.
+    The original DataFrame is not changed.
+    """
+    print("\n=== Sorted by grade (highest first) ===")
+    sorted_df = df.sort_values("grade", ascending=False)
+    print("(first 10 rows)")
+    print(sorted_df.head(10))
+
+
+def main():
+    print("=== Loading student data ===")
+
+    # Step 1: Load the CSV into a DataFrame.
+    # The file path is relative to where you run the script from.
+    df = load_data("data/students.csv")
+
+    # Step 2: Explore the data using built-in methods.
+    # These are the first things you should do with any new data set.
+    explore_head(df)
+    explore_shape(df)
+    explore_dtypes(df)
+    explore_info(df)
+    explore_describe(df)
+
+    # Step 3: Select specific columns.
+    select_columns(df)
+
+    # Step 4: Sort the data.
+    sort_by_grade(df)
+
+    print("\nDone.")
+
+
+# This pattern means: only run main() when this file is executed directly.
+# If someone imports this file, main() will NOT run automatically.
+if __name__ == "__main__":
+    main()