This part of Module 1 connects basic Python syntax with the kind of tasks many researchers actually do: finding files, reading data, transforming it, and visualizing results.
The notebook material in module1s5/ starts with a useful question:
From which folder is this script or notebook running?
That matters because relative paths depend on the current working directory.
With pathlib, you can inspect the filesystem clearly:
from pathlib import Path
current_dir = Path.cwd()
print(current_dir)
for path in current_dir.iterdir():
print(path.name)When a folder contains many files, you often want only a subset:
from pathlib import Path
csv_files = list(Path.cwd().glob("*.csv"))
print(csv_files)This is a simple but important step toward automation. Instead of opening files manually one by one, your code can discover them for you.
from pathlib import Path
text = Path("notes.txt").read_text(encoding="utf-8")
print(text)import json
from pathlib import Path
data = json.loads(Path("config.json").read_text(encoding="utf-8"))
print(data["project_name"])import pandas as pd
df = pd.read_csv("measurements.csv")
print(df.head())NumPy is the standard tool for fast numerical arrays and vectorized calculations:
import numpy as np
angles = np.linspace(0.01, 5.0, 500)
values = np.sin(angles)Vectorized code is one of Python's biggest strengths in scientific workflows: you can apply one operation to many values at once.
Visualization is often the fastest way to understand data:
import matplotlib.pyplot as plt
plt.plot(angles, values)
plt.xlabel("angle")
plt.ylabel("signal")
plt.grid()
plt.show()The introductory notebook in module1s2/ also uses small plotting examples to
show how quickly Python can move from values to visual feedback.
Pandas is especially useful when working with tables:
import pandas as pd
df = pd.DataFrame(
{
"sample": ["A", "B", "C"],
"intensity": [10.2, 12.5, 9.8],
}
)
print(df)
print(df["intensity"].mean())Typical pandas tasks include:
- filtering rows,
- selecting columns,
- grouping by categories,
- merging data from multiple sources,
- exporting cleaned results.
The optional notebook in module1s5/ goes one step further and shows how
loaded data can feed into:
- computed
Qvectors, - mathematical models,
- plots on linear or logarithmic scales,
- parameter fitting with scientific libraries.
That notebook is a good example of how Module 1 begins with basics but already points toward realistic scientific coding tasks.
- Keep raw data unchanged and write processed results separately.
- Use clear filenames and folder names.
- Prefer
pathlibover fragile hand-written path strings. - Check shapes, columns, and units early.
- Plot intermediate results when something looks suspicious.
- List all
.csvfiles in a folder. - Read one file into pandas and print the first five rows.
- Create a NumPy array with evenly spaced values.
- Plot one column against another with labels on both axes.