Dr. Data Scientist
  • Home
  • About
  • Insights
  • Resources

Curated Resources

My personally vetted collection of data science tools, learning materials, and industry resources. These are the resources I actually use and recommend based on 8+ years of experience bridging academic research and production ML systems.

๐Ÿ› ๏ธ Essential Tools

Programming Languages

  • Python - Most popular language for data science
  • R - Excellent for statistics and data analysis
  • SQL - Essential for data querying
  • Scala - For big data processing with Spark

Data Manipulation & Analysis

  • Pandas - Data manipulation in Python
  • NumPy - Numerical computing
  • dplyr - Data manipulation in R
  • Apache Spark - Big data processing

Machine Learning Libraries

  • Scikit-learn - General-purpose ML library
  • TensorFlow - Deep learning framework
  • PyTorch - Research-focused deep learning
  • XGBoost - Gradient boosting framework

Data Visualization

  • Matplotlib - Basic plotting in Python
  • Seaborn - Statistical visualization
  • Plotly - Interactive visualizations
  • Tableau - Business intelligence platform
  • Power BI - Microsoft’s BI tool

Development Environment

  • Jupyter Notebooks - Interactive development
  • Google Colab - Cloud-based notebooks
  • VS Code - Versatile code editor
  • PyCharm - Python IDE
  • RStudio - R development environment

๐Ÿ“Š Datasets

Public Dataset Repositories

  • Kaggle Datasets - Wide variety of datasets
  • UCI ML Repository - Classic ML datasets
  • Google Dataset Search - Search engine for datasets
  • AWS Open Data - Public datasets on AWS
  • Data.gov - US government open data

Domain-Specific Datasets

  • Papers With Code - Research datasets
  • Financial Data - Stock market data
  • Healthcare - Health and medical data
  • Climate Data - Environmental datasets
  • Social Media APIs - Real-time social data

Sample Datasets for Learning

  • Boston Housing - Regression
  • Iris Dataset - Classification
  • Titanic - Binary classification
  • MNIST - Image classification

๐Ÿ“š Learning Resources

Books

Beginner

  • “Python for Data Analysis” by Wes McKinney
  • “Hands-On Machine Learning” by Aurรฉlien Gรฉron
  • “Data Science from Scratch” by Joel Grus

Intermediate

  • “The Elements of Statistical Learning” by Hastie, Tibshirani & Friedman
  • “Pattern Recognition and Machine Learning” by Christopher Bishop
  • “Feature Engineering for Machine Learning” by Alice Zheng

Advanced

  • “Deep Learning” by Ian Goodfellow
  • “Bayesian Data Analysis” by Andrew Gelman
  • “Causal Inference: The Mixtape” by Scott Cunningham

Online Courses

  • Coursera Data Science Specialization
  • Fast.ai - Practical deep learning
  • edX MIT Introduction to Data Science
  • Udacity Data Scientist Nanodegree

YouTube Channels

  • StatQuest - Statistics explained simply
  • 3Blue1Brown - Math visualization
  • Two Minute Papers - Latest AI research
  • Sentdex - Python tutorials

Blogs & Publications

  • Towards Data Science
  • KDnuggets
  • Distill - Clear explanations of ML concepts
  • Google AI Blog
  • OpenAI Blog

๐ŸŽฏ Practice Platforms

Competitive Programming

  • Kaggle - Data science competitions
  • DrivenData - Social good competitions
  • Analytics Vidhya - Hackathons and challenges

Coding Practice

  • LeetCode - Algorithm challenges
  • HackerRank - Programming skills
  • Codewars - Coding kata

Project-Based Learning

  • GitHub - Open source projects
  • GitLab - Alternative to GitHub
  • Jupyter Notebooks - Browse shared notebooks

๐Ÿข Industry Resources

Job Boards

  • Indeed - General job search
  • LinkedIn - Professional networking
  • AngelList - Startup positions
  • Glassdoor - Company reviews and salaries

Professional Communities

  • Reddit r/MachineLearning
  • Stack Overflow
  • Data Science Central
  • Meetup - Local data science groups

Conferences & Events

  • KDD - Knowledge Discovery and Data Mining
  • ICML - International Conference on Machine Learning
  • NeurIPS - Neural Information Processing Systems
  • Strata Data Conference - Industry-focused

๐Ÿ“ˆ Staying Updated

Newsletters

  • Data Science Weekly
  • O’Reilly Data Newsletter
  • The Batch by deeplearning.ai

Podcasts

  • Linear Digressions
  • Data Skeptic
  • Not So Standard Deviations
  • The Data Science Podcast

Research Papers

  • arXiv - Preprint repository
  • Google Scholar - Academic search
  • Papers With Code - Papers with implementations

๐Ÿ’ก Tips for Using These Resources

  1. Start with fundamentals - Don’t jump to advanced topics too quickly
  2. Practice regularly - Consistent practice is better than intensive bursts
  3. Build projects - Apply what you learn to real problems
  4. Join communities - Learn from others and get help when stuck
  5. Stay current - Follow industry trends and new developments
  6. Focus on quality - Better to master a few tools well than know many superficially


๐ŸŽฏ My Personal Recommendations

Having worked with these tools in both academic research (University of Melbourne) and high-stakes production environments at a leading Australian sports technology company, here are my top picks:

  • For Statistical Computing: R + RStudio for complex statistical analysis, Python for production
  • For Production ML: Azure ML + Docker + Kubernetes for scalable systems
  • For Research: Jupyter notebooks + Git + scientific Python stack
  • For Learning: Start with Python โ†’ pandas โ†’ scikit-learn โ†’ gradually add specialized tools

Questions about any of these resources? Feel free to reach out - I’m happy to share more detailed experiences.

Dr. Data Scientist

AI & Data Science Leader at leading Australian sports technology company. PhD in Statistics with 8+ years experience building production ML systems and leading data science teams.

Navigate
  • Home
  • About
  • Insights
  • Resources
Topics
  • Machine Learning
  • Python
  • Data Analysis
  • Career
Contact
  • alimahmoodi29@gmail.com
  • Melbourne, Australia
© 2025 Ali Mahmoudi. All rights reserved.
Built with Hugo & Modern Design