Statistics¶
This section showcases the utility functions found in the datachart.utils.stats module.
Let us start by importing the supporting libraries:
import random
Statistics Submodule¶
The dataset.utils.stats submodule contains functions for calculating statistics. To showcase its use, let us create a list of random numbers:
random_values = random.sample(range(1, 100), 10)
random_values
[69, 35, 56, 99, 85, 50, 68, 37, 65, 29]
Let us now showcase the functions in the stats module.
Count¶
The count function returns the number of elements in the list.
from datachart.utils.stats import count
count(random_values)
10
Sum¶
The sum_values function returns the sum of all values in the list.
from datachart.utils.stats import sum_values
sum_values(random_values)
593.0
Mean¶
The mean function returns the mean of the values.
from datachart.utils.stats import mean
mean(random_values)
59.3
Median¶
The median function returns the median of the values.
from datachart.utils.stats import median
median(random_values)
60.5
Standard Deviation¶
The stdev function returns the standard deviation of the values.
from datachart.utils.stats import stdev
stdev(random_values)
21.312203077110542
Variance¶
The variance function returns the variance of the values. Variance is the square of the standard deviation.
from datachart.utils.stats import variance
variance(random_values)
454.21000000000004
Quantile¶
The quantile function returns the quantile of the values.
from datachart.utils.stats import quantile
Show the 25th quantile:
quantile(random_values, 25)
40.25
Show the 75th quantile:
quantile(random_values, 75)
68.75
Interquartile Range (IQR)¶
The iqr function returns the interquartile range, which is the difference between the 75th percentile (Q3) and 25th percentile (Q1). It is useful for identifying outliers and understanding the spread of the middle 50% of the data.
from datachart.utils.stats import iqr
iqr(random_values)
28.5
Minimum¶
The minimum function returns the minimum of the values.
from datachart.utils.stats import minimum
minimum(random_values)
29.0
Maximum¶
The maximum function returns the maximum of the values.
from datachart.utils.stats import maximum
maximum(random_values)
99.0
Correlation¶
The correlation function calculates the Pearson correlation coefficient between two lists of values. It measures the linear relationship between the datasets, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation).
from datachart.utils.stats import correlation
Create a second list of random values to compare:
random_values_2 = random.sample(range(1, 100), 10)
random_values_2
[39, 32, 93, 19, 35, 77, 72, 69, 34, 55]
correlation(random_values, random_values_2)
-0.440056293156963
Under development
This theme is still under development. If you are interested in improving it, please let us know.