Visualizing Correlation: How to Add Correlation Coefficient to Scatter Plots in Python
Learn how to add correlation coefficient to scatter plots in Python with ease! This article covers two methods - using seaborn and matplotlib - and provides step-by-step code examples. Improve your data visualization skills today!
In data analysis and visualization, it is often useful to show the strength of the relationship between two variables. One way to do this is by adding a correlation coefficient to a scatter plot. In Python, you can use the matplotlib
library to create a scatter plot and add a correlation coefficient label. In this article, we will demonstrate how to do this using code examples.
Code Demonstrations
Importing Libraries
First, let’s import the necessary libraries:
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
The matplotlib
library is used for creating the scatter plot and other visualizations, while the scipy.stats
module is used to calculate the correlation coefficient.
Creating a Scatter Plot
Next, let’s create a scatter plot using the matplotlib
library:
# Create a scatter plot
plt.scatter(x, y)
Here, x
and y
are the two variables you want to visualize the relationship between. The resulting scatter plot will look something like this:
Calculating the Correlation Coefficient
To calculate the correlation coefficient, we can use the pearsonr
function from the scipy.stats
module:
# Calculate the correlation coefficient
corr_coef = pearsonr(x, y)
The pearsonr
function takes two arrays as input and returns a tuple containing the correlation coefficient and the p-value of the correlation. The correlation coefficient ranges from -1 to 1, with values closer to 1 indicating a positive correlation and values closer to -1 indicating a negative correlation.
Adding the Correlation Coefficient Label
Finally, let’s add the correlation coefficient label to the scatter plot:
# Add the correlation coefficient label
plt.text(0.5, 0.5, f"Correlation Coefficient: {corr_coef[0]}", ha="center")
Here, we use the text
function from the matplotlib.pyplot
library to add a text label to the scatter plot. The ha="center"
argument centers the label horizontally.
The resulting scatter plot with the correlation coefficient label will look something like this:
Conclusion
In this article, we demonstrated how to add a correlation coefficient label to a scatter plot in Python using the matplotlib
and scipy.stats
libraries. This can be a useful tool for visualizing the strength of the relationship between two variables and quickly identifying patterns in your data.