Curated Resources
My personally vetted collection of data science tools, learning materials, and industry resources. These are the resources I actually use and recommend based on 8+ years of experience bridging academic research and production ML systems.
๐ ๏ธ Essential Tools
Programming Languages
- Python - Most popular language for data science
- R - Excellent for statistics and data analysis
- SQL - Essential for data querying
- Scala - For big data processing with Spark
Data Manipulation & Analysis
- Pandas - Data manipulation in Python
- NumPy - Numerical computing
- dplyr - Data manipulation in R
- Apache Spark - Big data processing
Machine Learning Libraries
- Scikit-learn - General-purpose ML library
- TensorFlow - Deep learning framework
- PyTorch - Research-focused deep learning
- XGBoost - Gradient boosting framework
Data Visualization
- Matplotlib - Basic plotting in Python
- Seaborn - Statistical visualization
- Plotly - Interactive visualizations
- Tableau - Business intelligence platform
- Power BI - Microsoft’s BI tool
Development Environment
- Jupyter Notebooks - Interactive development
- Google Colab - Cloud-based notebooks
- VS Code - Versatile code editor
- PyCharm - Python IDE
- RStudio - R development environment
๐ Datasets
Public Dataset Repositories
- Kaggle Datasets - Wide variety of datasets
- UCI ML Repository - Classic ML datasets
- Google Dataset Search - Search engine for datasets
- AWS Open Data - Public datasets on AWS
- Data.gov - US government open data
Domain-Specific Datasets
- Papers With Code - Research datasets
- Financial Data - Stock market data
- Healthcare - Health and medical data
- Climate Data - Environmental datasets
- Social Media APIs - Real-time social data
Sample Datasets for Learning
- Boston Housing - Regression
- Iris Dataset - Classification
- Titanic - Binary classification
- MNIST - Image classification
๐ Learning Resources
Books
Beginner
- “Python for Data Analysis” by Wes McKinney
- “Hands-On Machine Learning” by Aurรฉlien Gรฉron
- “Data Science from Scratch” by Joel Grus
Intermediate
- “The Elements of Statistical Learning” by Hastie, Tibshirani & Friedman
- “Pattern Recognition and Machine Learning” by Christopher Bishop
- “Feature Engineering for Machine Learning” by Alice Zheng
Advanced
- “Deep Learning” by Ian Goodfellow
- “Bayesian Data Analysis” by Andrew Gelman
- “Causal Inference: The Mixtape” by Scott Cunningham
Online Courses
- Coursera Data Science Specialization
- Fast.ai - Practical deep learning
- edX MIT Introduction to Data Science
- Udacity Data Scientist Nanodegree
YouTube Channels
- StatQuest - Statistics explained simply
- 3Blue1Brown - Math visualization
- Two Minute Papers - Latest AI research
- Sentdex - Python tutorials
Blogs & Publications
- Towards Data Science
- KDnuggets
- Distill - Clear explanations of ML concepts
- Google AI Blog
- OpenAI Blog
๐ฏ Practice Platforms
Competitive Programming
- Kaggle - Data science competitions
- DrivenData - Social good competitions
- Analytics Vidhya - Hackathons and challenges
Coding Practice
- LeetCode - Algorithm challenges
- HackerRank - Programming skills
- Codewars - Coding kata
Project-Based Learning
- GitHub - Open source projects
- GitLab - Alternative to GitHub
- Jupyter Notebooks - Browse shared notebooks
๐ข Industry Resources
Job Boards
- Indeed - General job search
- LinkedIn - Professional networking
- AngelList - Startup positions
- Glassdoor - Company reviews and salaries
Professional Communities
- Reddit r/MachineLearning
- Stack Overflow
- Data Science Central
- Meetup - Local data science groups
Conferences & Events
- KDD - Knowledge Discovery and Data Mining
- ICML - International Conference on Machine Learning
- NeurIPS - Neural Information Processing Systems
- Strata Data Conference - Industry-focused
๐ Staying Updated
Newsletters
Podcasts
Research Papers
- arXiv - Preprint repository
- Google Scholar - Academic search
- Papers With Code - Papers with implementations
๐ก Tips for Using These Resources
- Start with fundamentals - Don’t jump to advanced topics too quickly
- Practice regularly - Consistent practice is better than intensive bursts
- Build projects - Apply what you learn to real problems
- Join communities - Learn from others and get help when stuck
- Stay current - Follow industry trends and new developments
- Focus on quality - Better to master a few tools well than know many superficially
๐ฏ My Personal Recommendations
Having worked with these tools in both academic research (University of Melbourne) and high-stakes production environments at a leading Australian sports technology company, here are my top picks:
- For Statistical Computing: R + RStudio for complex statistical analysis, Python for production
- For Production ML: Azure ML + Docker + Kubernetes for scalable systems
- For Research: Jupyter notebooks + Git + scientific Python stack
- For Learning: Start with Python โ pandas โ scikit-learn โ gradually add specialized tools
Questions about any of these resources? Feel free to reach out - I’m happy to share more detailed experiences.