Data Analysis

Time Series Analysis That Finally Makes Sense

I love time series analysis because it turns raw dates and numbers into living

June 17, 2026

Time Series Analysis That Finally Makes Sense

Why I Wrote This

I love time series analysis because it turns raw dates and numbers into living stories.
You can spot trends, smell seasonality, and even guess the future like you have a crystal ball.
In this mega-guide I pack every code cell from my notebook, sprinkle plain-English explanations, and keep the vibe as chill as a coffee chat.


What Is Time Series Analysis

Time series analysis means you study numbers collected at regular time intervals so you can understand the past and predict the next tick.
You focus on order because yesterday influences today and today nudges tomorrow.
That temporal dependency makes the game different from classic spreadsheets filled with unordered rows.


Time Series vs Cross-Sectional

Time series watches one entity through time, while cross-sectional snapshots many entities at one moment.
In cross-sectional data each row is independent, but in time series rows hug each other through autocorrelation.
This hug breaks the assumptions behind many textbook statistics tricks, so you need special tools.


Data Types You Meet

A univariate series tracks one variable, say Bitcoin price, over time.
A multivariate series tracks many at once, like price, volume, and tweets per hour.
Knowing the type guides the model choice because extra columns can share signals.


The Core Components

Every series hides a trend, a seasonality, and a noise party crashers call residuals.
Trend points to long-term direction, seasonality loops in fixed cycles, and residuals cover everything messy in between.
You can add or multiply them depending on whether seasonal wiggles stay constant or scale with the level.


The Git-Style Workflow

I always follow five baby steps: define goal, visualize, transform, model, and validate.
Skipping any step is like pushing to Git without tests: you will pay later.
Letโ€™s walk those steps with live code so you feel the rhythm.


Setup And Imports

I start with mounting Google Drive so Python can see my CSV files.
The next cell swaps to the working directory and loads core libraries.
Watch how the comments set the stage.

# Mount the drive (Google Colab specific) ๐Ÿš—
from google.colab import drive
drive.mount('/content/drive')
# Change directory to the desired folder in Google Drive ๐Ÿ“‚
%cd /content/drive/MyDrive/Python - Time Series Forecasting/Time Series Analysis/Introduction to Time Series Forecasting
# Import necessary libraries for plots and decomposition ๐Ÿ“Š
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import month_plot, quarter_plot, plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

This trio of cells gives me disk access, sets my path, and arms me with Pandas plus Statsmodels for fancy plots.


Load Bitcoin Data

I pull in daily Bitcoin prices so we can play with real numbers that bounce a lot.

# Load the Bitcoin dataset ๐Ÿ’ฐ
df = pd.read_csv("bitcoin_price.csv")
df.head()
         Date        Open        High         Low       Close   Adj Close  \
0  2014-09-17  465.864014  468.174011  452.421997  457.334015  457.334015   
1  2014-09-18  456.859985  456.859985  413.104004  424.440002  424.440002   
2  2014-09-19  424.102997  427.834991  384.532013  394.795990  394.795990   
3  2014-09-20  394.673004  423.295990  389.882996  408.903992  408.903992   
4  2014-09-21  408.084991  412.425995  393.181000  398.821014  398.821014   

     Volume  
0  21056800  
1  34483200  
2  37919700  
3  36863600  
4  26580100  

Five rows tell me the CSV loaded fine and that early Bitcoin was way cheaper.


Tidy The Index

Plots and resampling act smoother once the Date column becomes an actual datetime index.

# Convert "Date" to datetime so Pandas understands time โณ
df["Date"] = pd.to_datetime(df["Date"], format="%Y-%m-%d")
df.head()
# Set "Date" as index to unlock time-aware slicing ๐Ÿ”‘
df.set_index("Date", inplace=True)
df.head()

Now any line like df.loc['2021'] fetches the whole year instantly.

# Peek at one exact date for fun ๐ŸŽฏ
df.loc['2021-11-09']
Open         6.754973e+04
High         6.853034e+04
Low          6.638206e+04
Close        6.697183e+04
Adj Close    6.697183e+04
Volume       4.235799e+10
Name: 2021-11-09 00:00:00, dtype: float64

That day Bitcoin flirted with 68k dollars.


Shortcut Loading

You can parse dates at read time which is shorter and cleaner.

# One-liner to load with parsed dates ๐Ÿƒโ€โ™‚๏ธ
df1 = pd.read_csv("bitcoin_price.csv", index_col="Date", parse_dates=True)
df1.head()

The result matches the manual steps above and saves two lines of code.


Resample Like A Pro

Sometimes you want weekly or monthly summaries to reduce noise and expose the trend.

# Resample weekly and grab max values ๐Ÿ“…
df.resample("W").max()
                    Open          High           Low         Close  ...
2014-09-21    465.864014    468.174011    452.421997    457.334015
...
2023-12-31  43599.847656  43804.781250  42765.769531  43613.140625

Weekly data reads smoother than the daily storm which helps your eyes.


Rolling Averages

A moving average smooths the line even further so outlier spikes stop shouting.

# Calculate 7-day rolling mean on closing price ๐Ÿ’ค
df["7_day_closing_avg"] = df["Close"].rolling(window=7).mean()

# Plot raw close vs smoothed version for 2023 ๐Ÿ–ผ๏ธ
df[["Close", "7_day_closing_avg"]].loc["2023"].plot()
plt.show()

The blue jagged line is the real price, and the orange line is the chilled version.


Month-End Peak

Finding the month with the top average close is one line away.

# Identify hottest month-end in the entire history ๐Ÿ”ฅ
df.resample("ME").mean()["Close"].idxmax()
Timestamp('2021-11-30 00:00:00')

No surprise 2021-11 tops the chart thanks to the bull run.


Quick Data Health Check

Explorers hunt for missing values right away because gaps break many models.

# Show any NaNs lurking around โ“
df.isnull().sum()
Open                   0
...
30_day_rolling_vol    29

Missing values only exist in derived columns so the raw data is intact.


Fill And Interpolate

I patch empty spots with backfill and linear interpolation so models stay happy.

# Fix rolling volume by backfilling ๐Ÿฉน
df["30_day_rolling_vol"] = df["30_day_rolling_vol"].bfill()

# Interpolate the rolling average linearly ๐Ÿ“ˆ
df["7_day_closing_avg"] = df["7_day_closing_avg"].interpolate(method="linear")

Now df.isnull().sum() would print zeros across the board.


Feature Engineering Goodies

I extract useful date parts to capture weekday quirks in a future model.

# Create multiple calendar-based columns ๐Ÿ—“๏ธ
df["year"] = df.index.year
df["month"] = df.index.month
df["day"] = df.index.day
df["day_of_week"] = df.index.dayofweek
df["weekday"] = df.index.day_name()
df["weekday_numeric"] = df.index.weekday
df["is_weekend"] = df["weekday_numeric"].isin([5, 6])
df.head()

Those columns let a machine pick up weekend patterns automatically.


Lag Features

Lagged values put yesterdayโ€™s price directly inside todayโ€™s row which helps algorithms learn momentum.

# Shift close price to create lag_1 and lag_2 โช
df["lag_1"] = df["Close"].shift(1)
df["lag_2"] = df["Close"].shift(2)
df.head()

AR models basically automate this idea but itโ€™s nice to see it explicitly.


Seasonality Plots

Monthly and quarterly plots show repeated patterns you might miss in raw charts.

# Month plot on resampled monthly mean ๐ŸŒ™
month_plot(df['Close'].resample('ME').mean(), ylabel="Closing")
plt.show()
# Quarter plot on resampled quarterly mean ๐Ÿฎ
quarter_plot(df['Close'].resample('QE').mean(), ylabel="Closing")
plt.show()

I see stronger quarters during bull runs and weaker ones in crypto winters.


A Second Dataset: Chocolate Revenue

I introduce a sweet univariate monthly series to showcase other frequencies.

# Load chocolate revenue CSV ๐Ÿซ
df_choco = pd.read_csv("choco_monthly_revenue.csv", index_col="Month with Year", parse_dates=True)
df_choco.head()
                 revenue
Month with Year         
2018-01-01          1458
...
# Visualize seasonal chocolate love โค๏ธ
month_plot(df_choco['revenue'], ylabel="revenue")
plt.show()

Valentineโ€™s and Christmas spikes jump out clearly.


Decomposition Deep Dive

Statsmodels decomposes a series into trend, seasonal, and residual components which feels like taking apart a LEGO set.

# Decompose Bitcoin with weekly additive model ๐Ÿงฉ
decomposition = seasonal_decompose(df['Adj Close'], model='additive', period=7)
fig = decomposition.plot()
fig.set_size_inches(18, 10)
plt.show()

Additive fits Bitcoin because seasonal amplitude stays roughly flat across price levels.

# Decompose chocolate with monthly multiplicative model ๐ŸŒŸ
decomposition = seasonal_decompose(df_choco['revenue'], model='multiplicative', period=12)
fig = decomposition.plot()
fig.set_size_inches(18, 10)
plt.show()

Multiplicative works for chocolate because holiday bumps grow with overall revenue growth.


Autocorrelation Fun

Autocorrelation exposes how much the series copies itself at different lags.

# Plot ACF for Bitcoin up to 100 lags ๐Ÿ“Ž
fig, ax = plt.subplots(figsize=(12, 6))
plot_acf(df['Adj Close'], lags=100, ax=ax)
plt.show()

You can see slow decay which flags non-stationarity and maybe a unit root.

# Quick info on chocolate DataFrame โ„น๏ธ
df_choco.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 60 entries...
# Plot ACF for chocolate revenue ๐Ÿช
fig, ax = plt.subplots(figsize=(12, 6))
plot_acf(df_choco['revenue'], lags=20, ax=ax)
plt.show()

Chocolate shows a yearly spike at lag 12 which screams seasonality.


PACF To Pick AR Terms

Partial ACF removes indirect effects so spikes there suggest how many AR lags you need.

# PACF for Bitcoin ๐Ÿฅ
fig, ax = plt.subplots(figsize=(12, 6))
plot_pacf(df['Adj Close'], lags=100, ax=ax)
plt.show()

First spike at lag 1 dominates, hinting an AR(1) structure.

# PACF for chocolate ๐Ÿ“š
fig, ax = plt.subplots(figsize=(12, 6))
plot_pacf(df_choco['revenue'], lags=20, ax=ax)
plt.show()

Lag 1 and 12 pop out meaning both short and yearly memory matter.


Why Stationarity Matters

Many algorithms, especially the legendary ARIMA model, assume constant mean and variance over time.
Non-stationary data fools the math and yields misleading confidence intervals.
You often difference or log-transform the series to force stationarity before modeling.


Forecasting With ARIMA Models

An autoregressive integrated moving average stitches AR, differencing, and MA parts to capture memory and noise.
The parameters (p,d,q) decide how many AR lags, how many differences, and how many MA lags to include.
Seasonal ARIMA tacks on extra (P,D,Q,s) for seasonality where s equals the season length like 12 for months.


Common Mistakes I Still See

  1. People treat time series analysis for forecasting the same as cross-sectional regression and ignore lag order.
  2. Folks eyeball correlations on non-stationary data which inflates coefficients.
  3. They forget to hold out the recent slice for validation and end up overfitting history.

Trend Analysis In Plain English

A trend is a slow drift up or down and you spot it with rolling means or decomposition.
You can also run a simple linear regression on time index to quantify slope size.
Removing trend (detrending) centers the series so seasonality stands out.


Seasonality Detection Tricks

I use month_plot, ACF spikes at periodic lags, and dummy variables for months or weekdays.
Spectral analysis with Fourier transforms is another fancy option but overkill for most beginners.
Always plot first because visuals beat stats when checking loops.


Time Series Analysis vs Cross-Sectional Analysis Recap

Time order, autocorrelation, and non-independence make time series special.
You rarely shuffle rows because that would break sequence magic.
Errors propagate through time so you need walk-forward testing rather than random splits.


Correlation Caveats

Pearson correlation can mislead due to common trends.
Spearman reduces sensitivity to outliers but still ignores autocorrelation.
Always difference or detrend before trusting correlation coefficients.

# Pearson correlation on rolling volume vs close ๐Ÿ“‰
print(df["30_day_rolling_vol"].corr(df["Close"]))

# Confirm with matrix ๐Ÿ—บ๏ธ
df[["Close", "30_day_rolling_vol"]].corr()
0.750445379618605
                       Close  30_day_rolling_vol
Close               1.000000            0.750445
30_day_rolling_vol  0.750445            1.000000

High value partly comes from both series trending up in bull markets.


Detect Those Crazy Days

Big daily returns often flag news events, hacks, or regulation shocks.

# Percent change and filter over 10% swings ๐Ÿ“ˆ๐Ÿ“‰
df["daily_returns_100%"] = df['Close'].pct_change() * 100
df[abs(df["daily_returns_100%"]) > 10]
                    Open          High           Low         Close  ...
2015-01-14    178.102997    223.893997    171.509995    178.102997   ...
...
2023-10-23  30140.685547  34370.437500  30097.828125  33086.234375   ...

Those rows become valuable labels for anomaly detection models.


Visualize Volume Vs Price

Plotting different axes helps you compare metrics with very different scales.

# 30-day rolling volume and close price on twin axes ๐Ÿ“Š๐Ÿช
df["30_day_rolling_vol"] = df["Volume"].rolling(window=30).mean()
df[["30_day_rolling_vol"]].plot(legend=True)
ax = df["Close"].plot(secondary_y=True, legend=True)
ax.set_ylabel("Closing Price")
plt.show()

Volume spikes often precede price surges which hints at causal links.


Putting It All Together

You begin with date parsing, explore trends, smooth with rolling averages, decompose to separate parts, check autocorrelation, ensure stationarity, pick an ARIMA, and finally forecast future.
Validation happens on unseen future slices, not shuffled samples.
Once happy you deploy the model and monitor because the market never sleeps.


FAQ

Q1: What are the steps involved in time series analysis?
A1: I always clarify the goal, clean and index the data, visualize to spot patterns, test for stationarity, engineer features, choose and fit a model, and finally validate on a hold-out future slice, so you iterate these steps because new data keeps coming.

Q2: How does time series analysis differ from cross-sectional analysis?
A2: Cross-sectional treats observations as independent, but time series has order and autocorrelation, so shuffling rows ruins the signal, and you usually split train-test chronologically rather than randomly.

Q3: What are common techniques used in time series forecasting?
A3: Simple ones include naรฏve last-value, moving average, exponential smoothing, and ARIMA, while advanced setups cover Prophet, LSTM networks, and hybrid statistical-ML stacks that mix trend decomposition with gradient boosting.

Q4: How can seasonality be detected in time series data?
A4: I use visuals like month_plot or quarter_plot, inspect ACF spikes at known seasonal lags, run STL or classical decomposition, and sometimes fit dummy variables for months to see if coefficients pop.

Q5: What is the role of stationarity in time series analysis?
A5: Stationarity keeps mean and variance constant so model parameters stay reliable, and many tests like the Dickey-Fuller help you decide if a difference or log transform is needed before forecasting.


One More Thing

If you enjoyed this friendly dive into time series analysis and want more hands-on notebooks, subscribe to my newsletter so you never miss a beat in the data rhythm.

Like this post?

Get the next one straight to your inbox.

No spam. Unsubscribe anytime.