Bitcoin Price Prediction Using CNN-LSTM Models

The financial landscape has evolved rapidly in recent years, with investors increasingly shifting from traditional savings to dynamic assets such as stocks, bonds, and cryptocurrencies. Among these, Bitcoin stands out due to its extreme volatility, nonlinearity, and non-stationarity—characteristics that make price prediction both challenging and highly valuable. With the advancement of deep learning, models like Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) have emerged as powerful tools for time series forecasting. This article explores how combining these two architectures into a CNN-LSTM hybrid model significantly improves Bitcoin price prediction accuracy.

Understanding LSTM: Capturing Temporal Dependencies

LSTM is a specialized type of Recurrent Neural Network (RNN) designed to overcome the vanishing gradient problem and effectively capture long-term dependencies in sequential data. Unlike standard RNNs, LSTMs use a gating mechanism—comprising the input gate, forget gate, and output gate—to regulate information flow, enabling selective retention or discarding of past data.

How LSTM Works

Forget Gate
Determines which information from the previous cell state should be discarded. It uses a sigmoid function to output values between 0 (completely forget) and 1 (fully retain):
$$ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) $$
Input Gate
Decides what new information will be stored in the cell state. It consists of a sigmoid layer (to filter values) and a tanh layer (to create candidate values):
$$ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) $$
$$ \tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) $$
Output Gate
Computes the final output based on the updated cell state and hidden state:
$$ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) $$
$$ h_t = o_t \cdot \tanh(C_t) $$

This architecture allows LSTM to maintain relevant historical context—ideal for financial time series where trends and patterns unfold over time.

👉 Discover how advanced AI models can enhance market forecasting

Empirical Analysis: Building the LSTM Model

Data Selection and Features

We used daily Bitcoin price data from Nasdaq spanning September 11, 2016, to September 10, 2021—a total of 1,826 trading days. In addition to closing prices, six technical indicators were incorporated as input features:

RSI14: Relative Strength Index over 14 days
DIFF: Difference between short-term and long-term exponential moving averages
DEA: Signal line for DIFF
MACD: Moving Average Convergence Divergence
Up20: Upper Bollinger Band (20-day high)
Down20: Lower Bollinger Band (20-day low)

A sliding window approach with a size of 10 days was applied to structure the time series for prediction.

Model Architecture and Evaluation

The LSTM model consisted of:

3 LSTM layers
3 dense layers
5 dropout layers (to prevent overfitting)

Model performance was evaluated using Mean Absolute Percentage Error (MAPE):

$$ \text{MAPE} = \frac{1}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right| \times 100 $$

The initial MAPE was 10.14%, indicating moderate accuracy. Visual analysis revealed noticeable lag in tracking sudden price movements—a known limitation of pure LSTM models in volatile markets.

Introducing CNN: Extracting Spatial and Local Features

While LSTM excels at modeling sequences, CNN is renowned for extracting local patterns through convolutional filters—originally developed for image recognition but now widely used in time series analysis.

CNN Architecture Overview

A typical CNN includes:

Input Layer: Accepts time-series data reshaped into matrix form
Convolutional Layers: Apply filters to detect local patterns (e.g., price spikes)
Pooling Layers: Reduce dimensionality while preserving key features
Flatten & Dense Layers: Convert features into predictions

To enhance performance, this study employed:

Dilated convolutions for broader temporal coverage
Residual connections to mitigate gradient degradation
Bottleneck layers to reduce computational load

An additional feature—rate of change for each technical indicator—was introduced to enrich input data.

CNN Model Performance

The optimized CNN architecture included:

3 convolutional layers
2 pooling layers
3 dense layers
2 dropout layers

Using the same dataset split (80% training, 10% validation, 10% testing), the CNN achieved a MAPE of 9.29%, outperforming the base LSTM model. It showed stronger responsiveness to abrupt changes but exhibited vertical prediction errors due to overfitting on short-term fluctuations.

The Hybrid Solution: CNN-LSTM Model

Recognizing the complementary strengths of both models, we developed a CNN-LSTM hybrid that leverages:

CNN’s ability to extract deep spatial-temporal features
LSTM’s proficiency in modeling long-term dependencies

Model Integration Strategy

The hybrid model follows a three-step process:

Train CNN and LSTM separately to extract optimal features.
Assign weights (α for CNN, β for LSTM) based on individual MAPE scores—lower error receives higher weight.
Combine outputs via weighted sum:
$$ y_{\text{hybrid}} = \alpha \cdot y_{\text{CNN}} + \beta \cdot y_{\text{LSTM}} $$
After extensive testing, optimal weights were found to be α = 0.1, β = 0.9.

White noise test p-values were also used to assess confidence levels in predictions.

👉 Explore how machine learning is reshaping crypto analytics

Comparative Results and Performance Evaluation

Model	MAPE (%)
Base LSTM	10.14
Base CNN	9.29
Optimized LSTM	8.20
Optimized CNN	7.09
CNN-LSTM	4.74

The final CNN-LSTM model achieved a remarkable 4.74% MAPE, demonstrating superior accuracy compared to standalone models. Graphical comparisons show:

Closer alignment with actual price trends
Reduced lag in responding to volatility
Better handling of local price swings

Residual analysis confirms fewer and smaller errors, although extreme market events still pose challenges due to Bitcoin’s inherent unpredictability.

Frequently Asked Questions (FAQ)

Q: Why combine CNN and LSTM instead of using one model alone?
A: CNN excels at identifying local patterns and spatial hierarchies in data, while LSTM captures long-term temporal dynamics. Together, they provide a more comprehensive understanding of complex time series like Bitcoin prices.

Q: What makes Bitcoin price prediction so difficult?
A: Bitcoin exhibits high volatility, nonlinear behavior, and sensitivity to external factors like regulatory news and macroeconomic shifts. These make it resistant to traditional linear forecasting methods.

Q: Can this model predict sudden market crashes or rallies?
A: While the CNN-LSTM model improves short-term forecasting accuracy, predicting black swan events remains challenging without incorporating real-time sentiment or news data.

Q: Is technical analysis alone sufficient for accurate predictions?
A: Technical indicators provide valuable historical context, but integrating on-chain data, trading volume, and market sentiment can further enhance model robustness.

Q: How often should the model be retrained?
A: Given Bitcoin’s evolving market structure, retraining every 3–6 months with updated data ensures the model adapts to new trends and maintains predictive power.

Conclusion

This study demonstrates that hybrid deep learning models offer a significant edge in cryptocurrency price forecasting. By integrating CNN’s feature extraction capability with LSTM’s sequential modeling strength, the proposed CNN-LSTM architecture achieves a MAPE of just 4.74%, outperforming individual models. The results confirm that ensemble approaches are essential for navigating the complexities of digital asset markets.

Future work could explore integrating external variables such as social media sentiment, blockchain metrics, or macroeconomic indicators to further refine predictions. As AI continues to evolve, models like CNN-LSTM will play an increasingly vital role in shaping data-driven investment strategies.

👉 See how cutting-edge platforms leverage AI for smarter trading decisions