My exploration of concepts and examples from Python for Data Analysis by Wes McKinney.
This project / repository serves to document my efforts at learning the Python programming language and applying it successfully to solve quantitative finance / data science problems. Through this, I aim to:
- Address the 2-language problem (prototyping in R, implementation in SAS / Java / C++) by learning this language which has limited capabilities of both.
- Apply learnings from this book to Derivatives Analytics based on Mario Cerrato's book & Yves Hilpsch's book.
- Attain at least interview-level proficiency in Python, expand my horizons and improve my employability at crack quant teams.
This document itself will serve to summarize the various chapters in this book and provide a reference to code I can refer later.
Python is useful as 'glue' language for legacy C / C++ and FORTRAN code. Currently it largely addresses the '2-language' problem. Being an interpreted language, it is not optimized for multi-threading applications (MS R somehow gets around this issue).
The essential Python libraries are:
- NumPy - Efficient data storage and Manipulation.
- pandas - blends the high-performance, array-computing ideas of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases; primary focus of this book.
- matplotlib - for plots and 2D visualizations.
- SciPy - Addresses problems specific to Scientific Computing.
- scikit-learn - ML tools.
- JuPyter notebooks.