What are the specific technical skills you need to know for data analytics?

Maggie @DataStoryteller
7 min readDec 16, 2023

--

Photo by Christina @ wocintechchat.com on Unsplash

You’ve heard the advice “learn Excel, SQL, Python, Tableau if you want a career in analytics.” But what exactly do you need to know what to do with those tools to land a job?

The generic answer is “what it takes to answer questions with data!”

But more specifically, below is a rundown of the actual functions, formulas, commands, packages, syntax, etc, that I have used on the job. I have worked in marketing analytics and also product analytics since 2016.

Currently, my job title is Data Scientist, so I’ve done a lot of things that are much more advanced than what is expected in an entry-level Data Analyst role, and I’ll try to call that out below.

Personally, I have used Excel, SQL, Power BI, Python, and a little bit of Tableau and R on the job. I’m going to combine the sections into: Excel, SQL, Data Visualization (Tableau and Power BI) and Python and R.

Excel

I used Excel for years in marketing before I started my first official analytics role, and then I continued using it some more once I was official. It’s a great place to start because you probably already have access to Excel or Google Spreadsheets and can start playing around with any data. (Don’t have any data? Check out this list of free datasets.)

Excel is great if you’re working with smaller amounts of data (maybe thousands, but not millions, of rows). You can do data cleaning, exploration, and visualizations.

Specifically, you want to learn how to use the following, especially if you’re going after Data Analyst roles.

Formulas and Functions: VLOOKUP, MATCH, IF, IFS, SUMIF, AVERAGEIF, AVERAGE, YEAR, MONTH, and CONCATENATE are the ones I probably used the most.

Pivot tables: Group your data by different categorical columns. Calculate things like sum, average, min, max. Calculate the percent by column or row. Add filters.

Visuals: Create visualizations from your data. Bars and lines are what I use the majority of the time but also scatterplots, and very occasionally pie charts. It’s also important to know how to customize a visual to make it easier to understand — update the chart title, axis titles, change the range, use multiple types of charts (bar and line) together, secondary axis, error bars, data labels, change colors, etc.

Bonus points — Macros and VBA: This is great for automation. I haven’t personally used these because I’ve used other tools (like Power BI).

SQL

You need to know enough SQL to get the right data, whether you’re a Data Analyst or a Data Scientist. You also need to know enough to pass a live coding challenge during a job interview. (Here are some resources to practice coding challenges.)

These are the things I regularly use and that also come up in interviews.

Order of operations. SELECT FROM WHERE GROUP BY … obviously, you need to know this to be able to write any query.

WHERE versus HAVING. How do you filter your data? How are these functions different? Which should you use when? Where do they come in the order of operations?

Aggregations and GROUP BY. How do you sum or average your data?

JOIN. How do you join data from different (or the same) tables? What’s the difference between left, right, inner, outer, etc? What’s a UNION?

Subqueries. How do they work? When would you use them?

CTEs. How do they work? When would you use them? How would you query them? How do you create multiple CTEs in one query?

CASE WHEN. How can you use this to create new columns?

Windows Functions, such as LEAD, LAG, RANK, DENSE RANK, ROW_NUMBER, and rolling sum and average. How can you compare the data? Order it by different values? Calculate rolling aggregates?

DATEPART, MONTH, YEAR, DATEDIFF. How do you work with different date values?

COALESCE. How do you handle NULL values?

Data Visualization (Tableau and Power BI)

Of the tools listed in this post, Tableau and Power BI are where I have the least amount of experience, but I have built a few dashboards over the years.

Some Data Analyst roles (as well as Business Intelligence) are very focused on building dashboards, so being familiar with these tools will open a lot of doors to entry-level roles. Sometimes you will need to combine SQL with Tableau or Power BI.

Connecting to data. How do you get your data into the tool? If it’s a spreadsheet, how to import, and if it’s an API, how to find and connect?

Creating a data model. If you are using multiple data sources in your dashboard, how are they related?

Creating visuals. Again, I use mostly bars and lines, but similar to Excel, the same things apply to being able to customize them to make them more understandable.

Adding filters. How can your end users filter the data to get the view they want?

Combining multiple visuals into tabs. What is the right balance without creating something overwhelming? How should the visuals be grouped together?

Python and R

I mostly use Python on the job but I have occasionally used R and I used both during my graduate program in Data Science. So far, everything I’ve learned in one tool I could do in the other. However, most of my instructions below will reference Python packages but I’ll reference R when I remember the package to use.

Most entry-level Data Analyst roles aren’t going to use Python or R, so if that is what you are currently working towards, I wouldn’t spend much time on Python and R until you’ve mastered the tools above.

However, once you have some experience, learning Python and R will open more doors. The advanced topics at the end (statistics and predictive modeling) can prepare you for Advanced Data Analyst or Data Scientist roles — but generally, those aren’t viewed as entry-level.

Anyway, the topics to learn:

Importing data. Both Python and R have a read CSV function to import data but you can also connect directly to a SQL database using the SQL Alchemy package in Python. (I know it’s possible in R as well.)

Data types and structures and how to work with them: Numeric values (integer, float), booleans, null/na/nan, string, list, tuple, dictionary, class, etc.

Dataframes. Pandas in Python and dplyr in R are very popular packages to use for working with a dataframe, including exploring and cleaning your data.

Date formats. Working with date formats can be tricky, so familiarizing yourself with different packages is important. In Python, you can use datetime and in R, Lubridate is popular.

Visualization. In Python the common packages are MatPlotLib and Seaborn. In R, ggplot2 is very popular. You should be able to create different types of visuals (histogram, bar, line, scatter, heatmaps are the ones I use most frequently) and also customize them — change titles, move the legend, change colors, layer one one top of the other, add labels, change the ranges, etc.

Statistics. This is where it’s more efficient to work in a tool like Python or R versus the ones above.

Descriptive stats (count, min, max, mean, standard deviation, median, quartiles) is one line of code in both Python — “describe” in Pandas — or R “desc” in stat or “summary”.

You can do some basic calculations with Pandas or base R, but to do more advanced math and stats, you’ll need NumPy and Scipy in Python and Stats in R (although R has many statistical packages, it’s where R really shines).

You can use these for applying other mathematical and statistical methods like t-tests, p-values, and other tools for experimentation and hypothesis testing. Also for calculations like standard deviation or z-score or even rounding your displayed data. NumPy also has some functions that are useful for data cleaning and is great for working with arrays. Learning how to calculate different correlations is also important.

Prediction and machine learning. Scikit-Learn is a very popular Python package that covers many of the most popular machine learning models, including data preparation (scaling your data, splitting for train/test), and evaluation (cross-fold validation, accuracy metrics, feature importance). Some other popular packages are XGBoost, and Keras and TensorFlow for neural networks.

R is great for things like principal component analysis, regression, and time series modeling. There are a lot of packages that can handle regression and predictive models, Caret is another good one to check out.

If you want to move into more of a machine learning role, the above will help you research and build your model, but you’ll also need to learn frameworks for deploying your models. There are lots of different tools you can use.

I’m sure I’m forgetting things, but those are the things off the top of my head that I’ve used the most on the job.

If you’ve worked as a Data Analyst or Data Scientist, what else do you think is necessary to learn?

What to become a Data Analyst but aren’t sure where to start? Check out my roadmap to prepare for your first role.

Want more career advice? Follow me on TikTok, Instagram, or LinkedIn, and sign up for my free data career newsletter.

--

--

Maggie @DataStoryteller
Maggie @DataStoryteller

Written by Maggie @DataStoryteller

Data Scientist in Product Analytics in Tech. Career Changer from Marketing.

Responses (4)