Time Series Analysis That Finally Makes Sense
I love time series analysis because it turns raw dates and numbers into living
June 17, 2026

Time Series Analysis That Finally Makes Sense
Why I Wrote This
I love time series analysis because it turns raw dates and numbers into living stories.
You can spot trends, smell seasonality, and even guess the future like you have a crystal ball.
In this mega-guide I pack every code cell from my notebook, sprinkle plain-English explanations, and keep the vibe as chill as a coffee chat.
What Is Time Series Analysis
Time series analysis means you study numbers collected at regular time intervals so you can understand the past and predict the next tick.
You focus on order because yesterday influences today and today nudges tomorrow.
That temporal dependency makes the game different from classic spreadsheets filled with unordered rows.
Time Series vs Cross-Sectional
Time series watches one entity through time, while cross-sectional snapshots many entities at one moment.
In cross-sectional data each row is independent, but in time series rows hug each other through autocorrelation.
This hug breaks the assumptions behind many textbook statistics tricks, so you need special tools.
Data Types You Meet
A univariate series tracks one variable, say Bitcoin price, over time.
A multivariate series tracks many at once, like price, volume, and tweets per hour.
Knowing the type guides the model choice because extra columns can share signals.
The Core Components
Every series hides a trend, a seasonality, and a noise party crashers call residuals.
Trend points to long-term direction, seasonality loops in fixed cycles, and residuals cover everything messy in between.
You can add or multiply them depending on whether seasonal wiggles stay constant or scale with the level.
The Git-Style Workflow
I always follow five baby steps: define goal, visualize, transform, model, and validate.
Skipping any step is like pushing to Git without tests: you will pay later.
Letโs walk those steps with live code so you feel the rhythm.
Setup And Imports
I start with mounting Google Drive so Python can see my CSV files.
The next cell swaps to the working directory and loads core libraries.
Watch how the comments set the stage.
# Mount the drive (Google Colab specific) ๐
from google.colab import drive
drive.mount('/content/drive')
# Change directory to the desired folder in Google Drive ๐
%cd /content/drive/MyDrive/Python - Time Series Forecasting/Time Series Analysis/Introduction to Time Series Forecasting
# Import necessary libraries for plots and decomposition ๐
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import month_plot, quarter_plot, plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose
This trio of cells gives me disk access, sets my path, and arms me with Pandas plus Statsmodels for fancy plots.
Load Bitcoin Data
I pull in daily Bitcoin prices so we can play with real numbers that bounce a lot.
# Load the Bitcoin dataset ๐ฐ
df = pd.read_csv("bitcoin_price.csv")
df.head()
Date Open High Low Close Adj Close \
0 2014-09-17 465.864014 468.174011 452.421997 457.334015 457.334015
1 2014-09-18 456.859985 456.859985 413.104004 424.440002 424.440002
2 2014-09-19 424.102997 427.834991 384.532013 394.795990 394.795990
3 2014-09-20 394.673004 423.295990 389.882996 408.903992 408.903992
4 2014-09-21 408.084991 412.425995 393.181000 398.821014 398.821014
Volume
0 21056800
1 34483200
2 37919700
3 36863600
4 26580100
Five rows tell me the CSV loaded fine and that early Bitcoin was way cheaper.
Tidy The Index
Plots and resampling act smoother once the Date column becomes an actual datetime index.
# Convert "Date" to datetime so Pandas understands time โณ
df["Date"] = pd.to_datetime(df["Date"], format="%Y-%m-%d")
df.head()
# Set "Date" as index to unlock time-aware slicing ๐
df.set_index("Date", inplace=True)
df.head()
Now any line like df.loc['2021'] fetches the whole year instantly.
# Peek at one exact date for fun ๐ฏ
df.loc['2021-11-09']
Open 6.754973e+04
High 6.853034e+04
Low 6.638206e+04
Close 6.697183e+04
Adj Close 6.697183e+04
Volume 4.235799e+10
Name: 2021-11-09 00:00:00, dtype: float64
That day Bitcoin flirted with 68k dollars.
Shortcut Loading
You can parse dates at read time which is shorter and cleaner.
# One-liner to load with parsed dates ๐โโ๏ธ
df1 = pd.read_csv("bitcoin_price.csv", index_col="Date", parse_dates=True)
df1.head()
The result matches the manual steps above and saves two lines of code.
Resample Like A Pro
Sometimes you want weekly or monthly summaries to reduce noise and expose the trend.
# Resample weekly and grab max values ๐
df.resample("W").max()
Open High Low Close ...
2014-09-21 465.864014 468.174011 452.421997 457.334015
...
2023-12-31 43599.847656 43804.781250 42765.769531 43613.140625
Weekly data reads smoother than the daily storm which helps your eyes.
Rolling Averages
A moving average smooths the line even further so outlier spikes stop shouting.
# Calculate 7-day rolling mean on closing price ๐ค
df["7_day_closing_avg"] = df["Close"].rolling(window=7).mean()
# Plot raw close vs smoothed version for 2023 ๐ผ๏ธ
df[["Close", "7_day_closing_avg"]].loc["2023"].plot()
plt.show()
The blue jagged line is the real price, and the orange line is the chilled version.
Month-End Peak
Finding the month with the top average close is one line away.
# Identify hottest month-end in the entire history ๐ฅ
df.resample("ME").mean()["Close"].idxmax()
Timestamp('2021-11-30 00:00:00')
No surprise 2021-11 tops the chart thanks to the bull run.
Quick Data Health Check
Explorers hunt for missing values right away because gaps break many models.
# Show any NaNs lurking around โ
df.isnull().sum()
Open 0
...
30_day_rolling_vol 29
Missing values only exist in derived columns so the raw data is intact.
Fill And Interpolate
I patch empty spots with backfill and linear interpolation so models stay happy.
# Fix rolling volume by backfilling ๐ฉน
df["30_day_rolling_vol"] = df["30_day_rolling_vol"].bfill()
# Interpolate the rolling average linearly ๐
df["7_day_closing_avg"] = df["7_day_closing_avg"].interpolate(method="linear")
Now df.isnull().sum() would print zeros across the board.
Feature Engineering Goodies
I extract useful date parts to capture weekday quirks in a future model.
# Create multiple calendar-based columns ๐๏ธ
df["year"] = df.index.year
df["month"] = df.index.month
df["day"] = df.index.day
df["day_of_week"] = df.index.dayofweek
df["weekday"] = df.index.day_name()
df["weekday_numeric"] = df.index.weekday
df["is_weekend"] = df["weekday_numeric"].isin([5, 6])
df.head()
Those columns let a machine pick up weekend patterns automatically.
Lag Features
Lagged values put yesterdayโs price directly inside todayโs row which helps algorithms learn momentum.
# Shift close price to create lag_1 and lag_2 โช
df["lag_1"] = df["Close"].shift(1)
df["lag_2"] = df["Close"].shift(2)
df.head()
AR models basically automate this idea but itโs nice to see it explicitly.
Seasonality Plots
Monthly and quarterly plots show repeated patterns you might miss in raw charts.
# Month plot on resampled monthly mean ๐
month_plot(df['Close'].resample('ME').mean(), ylabel="Closing")
plt.show()
# Quarter plot on resampled quarterly mean ๐ฎ
quarter_plot(df['Close'].resample('QE').mean(), ylabel="Closing")
plt.show()
I see stronger quarters during bull runs and weaker ones in crypto winters.
A Second Dataset: Chocolate Revenue
I introduce a sweet univariate monthly series to showcase other frequencies.
# Load chocolate revenue CSV ๐ซ
df_choco = pd.read_csv("choco_monthly_revenue.csv", index_col="Month with Year", parse_dates=True)
df_choco.head()
revenue
Month with Year
2018-01-01 1458
...
# Visualize seasonal chocolate love โค๏ธ
month_plot(df_choco['revenue'], ylabel="revenue")
plt.show()
Valentineโs and Christmas spikes jump out clearly.
Decomposition Deep Dive
Statsmodels decomposes a series into trend, seasonal, and residual components which feels like taking apart a LEGO set.
# Decompose Bitcoin with weekly additive model ๐งฉ
decomposition = seasonal_decompose(df['Adj Close'], model='additive', period=7)
fig = decomposition.plot()
fig.set_size_inches(18, 10)
plt.show()
Additive fits Bitcoin because seasonal amplitude stays roughly flat across price levels.
# Decompose chocolate with monthly multiplicative model ๐
decomposition = seasonal_decompose(df_choco['revenue'], model='multiplicative', period=12)
fig = decomposition.plot()
fig.set_size_inches(18, 10)
plt.show()
Multiplicative works for chocolate because holiday bumps grow with overall revenue growth.
Autocorrelation Fun
Autocorrelation exposes how much the series copies itself at different lags.
# Plot ACF for Bitcoin up to 100 lags ๐
fig, ax = plt.subplots(figsize=(12, 6))
plot_acf(df['Adj Close'], lags=100, ax=ax)
plt.show()
You can see slow decay which flags non-stationarity and maybe a unit root.
# Quick info on chocolate DataFrame โน๏ธ
df_choco.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 60 entries...
# Plot ACF for chocolate revenue ๐ช
fig, ax = plt.subplots(figsize=(12, 6))
plot_acf(df_choco['revenue'], lags=20, ax=ax)
plt.show()
Chocolate shows a yearly spike at lag 12 which screams seasonality.
PACF To Pick AR Terms
Partial ACF removes indirect effects so spikes there suggest how many AR lags you need.
# PACF for Bitcoin ๐ฅ
fig, ax = plt.subplots(figsize=(12, 6))
plot_pacf(df['Adj Close'], lags=100, ax=ax)
plt.show()
First spike at lag 1 dominates, hinting an AR(1) structure.
# PACF for chocolate ๐
fig, ax = plt.subplots(figsize=(12, 6))
plot_pacf(df_choco['revenue'], lags=20, ax=ax)
plt.show()
Lag 1 and 12 pop out meaning both short and yearly memory matter.
Why Stationarity Matters
Many algorithms, especially the legendary ARIMA model, assume constant mean and variance over time.
Non-stationary data fools the math and yields misleading confidence intervals.
You often difference or log-transform the series to force stationarity before modeling.
Forecasting With ARIMA Models
An autoregressive integrated moving average stitches AR, differencing, and MA parts to capture memory and noise.
The parameters (p,d,q) decide how many AR lags, how many differences, and how many MA lags to include.
Seasonal ARIMA tacks on extra (P,D,Q,s) for seasonality where s equals the season length like 12 for months.
Common Mistakes I Still See
- People treat time series analysis for forecasting the same as cross-sectional regression and ignore lag order.
- Folks eyeball correlations on non-stationary data which inflates coefficients.
- They forget to hold out the recent slice for validation and end up overfitting history.
Trend Analysis In Plain English
A trend is a slow drift up or down and you spot it with rolling means or decomposition.
You can also run a simple linear regression on time index to quantify slope size.
Removing trend (detrending) centers the series so seasonality stands out.
Seasonality Detection Tricks
I use month_plot, ACF spikes at periodic lags, and dummy variables for months or weekdays.
Spectral analysis with Fourier transforms is another fancy option but overkill for most beginners.
Always plot first because visuals beat stats when checking loops.
Time Series Analysis vs Cross-Sectional Analysis Recap
Time order, autocorrelation, and non-independence make time series special.
You rarely shuffle rows because that would break sequence magic.
Errors propagate through time so you need walk-forward testing rather than random splits.
Correlation Caveats
Pearson correlation can mislead due to common trends.
Spearman reduces sensitivity to outliers but still ignores autocorrelation.
Always difference or detrend before trusting correlation coefficients.
# Pearson correlation on rolling volume vs close ๐
print(df["30_day_rolling_vol"].corr(df["Close"]))
# Confirm with matrix ๐บ๏ธ
df[["Close", "30_day_rolling_vol"]].corr()
0.750445379618605
Close 30_day_rolling_vol
Close 1.000000 0.750445
30_day_rolling_vol 0.750445 1.000000
High value partly comes from both series trending up in bull markets.
Detect Those Crazy Days
Big daily returns often flag news events, hacks, or regulation shocks.
# Percent change and filter over 10% swings ๐๐
df["daily_returns_100%"] = df['Close'].pct_change() * 100
df[abs(df["daily_returns_100%"]) > 10]
Open High Low Close ...
2015-01-14 178.102997 223.893997 171.509995 178.102997 ...
...
2023-10-23 30140.685547 34370.437500 30097.828125 33086.234375 ...
Those rows become valuable labels for anomaly detection models.
Visualize Volume Vs Price
Plotting different axes helps you compare metrics with very different scales.
# 30-day rolling volume and close price on twin axes ๐๐ช
df["30_day_rolling_vol"] = df["Volume"].rolling(window=30).mean()
df[["30_day_rolling_vol"]].plot(legend=True)
ax = df["Close"].plot(secondary_y=True, legend=True)
ax.set_ylabel("Closing Price")
plt.show()
Volume spikes often precede price surges which hints at causal links.
Putting It All Together
You begin with date parsing, explore trends, smooth with rolling averages, decompose to separate parts, check autocorrelation, ensure stationarity, pick an ARIMA, and finally forecast future.
Validation happens on unseen future slices, not shuffled samples.
Once happy you deploy the model and monitor because the market never sleeps.
FAQ
Q1: What are the steps involved in time series analysis?
A1: I always clarify the goal, clean and index the data, visualize to spot patterns, test for stationarity, engineer features, choose and fit a model, and finally validate on a hold-out future slice, so you iterate these steps because new data keeps coming.
Q2: How does time series analysis differ from cross-sectional analysis?
A2: Cross-sectional treats observations as independent, but time series has order and autocorrelation, so shuffling rows ruins the signal, and you usually split train-test chronologically rather than randomly.
Q3: What are common techniques used in time series forecasting?
A3: Simple ones include naรฏve last-value, moving average, exponential smoothing, and ARIMA, while advanced setups cover Prophet, LSTM networks, and hybrid statistical-ML stacks that mix trend decomposition with gradient boosting.
Q4: How can seasonality be detected in time series data?
A4: I use visuals like month_plot or quarter_plot, inspect ACF spikes at known seasonal lags, run STL or classical decomposition, and sometimes fit dummy variables for months to see if coefficients pop.
Q5: What is the role of stationarity in time series analysis?
A5: Stationarity keeps mean and variance constant so model parameters stay reliable, and many tests like the Dickey-Fuller help you decide if a difference or log transform is needed before forecasting.
One More Thing
If you enjoyed this friendly dive into time series analysis and want more hands-on notebooks, subscribe to my newsletter so you never miss a beat in the data rhythm.