#Ultimate Skill Checklist For Data Analyst
- Programming
- Statistic
- Mathematics
- Machine Learning
- Data Wrangling
- Communication and Data Visualization
- Data Intuition
- Python programming language
- numpy
- pandas
- matplotlib
- scipy
- scikit-learn
- R programming language
- ggplot2
- dplyr
- ggally
- reshape2
- Optional
- ipython
- ipython notebook
- anaconda
- ggplot
- seaborn
- Spreadsheet tools (like Excel)
- Additional Skills
- Javascript and HTML for D3.js
- D3.js
- AJAX implementation
- jQuery
- C/C++ or Java
- Javascript and HTML for D3.js
- Descriptive and Inferential statistics
- Mean, median, mode
- Data distributions
- Standard normal
- Exponential/Poisson
- Binomial
- Chi-square
- Standard deviation and variance
- Hypothesis testing
- P-values
- Test for significance
- Z-test, t-test, Mann-Whitney U
- Chi-squared and ANOVA testing
- Experimental design
- A/B Testing
- Controlling variables and choosing good control and testing groups
- Sample Size and Power law
- Hypothesis Testing, test hypothesis
- Confidence level
- SMART experiments: Specific, Measurable, Actionable, Realistic, Timely]
- Translate numbers and concepts into a mathematical expression: 4 times the square-root of one-third of a gallon of water (expressed as g): 4 √(1/3 g)
- Solve for missing values in Algebra equations: 14 = 2x + 29
- How does the 1/2 value change the shape of this graph?
- Linear algebra and Calculus
- Matrix manipulations. Dot product is crucial to understand.
- Eigenvalues and eigenvectors -- Understand the significance of these two concepts
- Multivariable derivatives and integration in Calculus
- Supervised Learning
- Decision trees
- Naive Bayes classification
- Ordinary Least Squares regression
- Logistic regression
- Neural networks
- Support vector machines
- Ensemble methods
- Unsupervised Learning
- Clustering Algorithms
- Principal Component Analysis (PCA)
- Singular Value Decomposition (SVD)
- Independent Component Analysis (ICA)
- Reinforcement Learning
- Qlearning
- TD-Learning
- Reinforcement Learning
- Python
- Learn about Python String library for string manipulations
- Parsing common file formats such as csv and xml files
- Regular Expressions
- Mathematical transformations
- Convert non-normal distribution to normal with log-10 transformation
- Database systems (SQL-based and NO SQL based) - Databases act as a central hub to store information
- Relational databases such as PostgreSQL, mySQL, Netezza, Oracle, etc.
- Optional: Hadoop, Spark, MongoDB
- SQL
- Understand visual encoding and communicating what you want the audience to take away from your visualizations
- Programming
- matplotlib
- ggplot
- d3.js
- Presenting data and convincing people with your data
- Know the context of the business situation at hand with regards to your data
- Make sure to think 5 steps ahead and predict what their questions will be and where your audience will challenge your assumptions and conclusions
- Give out pre-reads to your presentations and have pre-alignment meetings with interested parties before the actual meeting