Data Visualization Guide for Multi-dimensional Data (2024)

Importing all the Necessary Packages

Starting with importing the relevant libraries like NumPy (handling mathematical calculations), pandas (DataFrame manipulations), matplotlib (The OG visualization library closest to python interpreter and “C” development), and last but not least- seaborn (built on top of matplotlib it give way more options and better look and feels comparative).

To explore higher dimensional data and the relationships between data attributes, we’ll load the file Diabetes.csv. It’s from Kaggle.

Inference: We will be using two datasets for this article, one will be the diabetes patients dataset, and another one is the height and weight of the persons. In the above output, we can see a glimpse of the first dataset using the head() method.

Python Code:

Inference: Removing the missing or junk values from the dataset is always the priority step. Here we are first replacing the 0 value with NaN as we had 0 as the bad data in our features column.

df.info()

Output:

Data Visualization Guide for Multi-dimensional Data (1)

Viewing the Data

So, surprisingly no one it’s useful to view the data. Straight up by using head, we can see that this dataset is utilizing 0 to represent no value – unless some poor unfortunate soul has a skin thickness of 0.

If we want to do more than expect the data, we can use the described function we talked about in the previous section.

df.describe()

Output:

Data Visualization Guide for Multi-dimensional Data (2)

Scatter Matrix

Scatter Matrix is one of the best plots to determine the relationship (linear mostly) between the multiple variables of all the datasets; when you will see the linear graph between two or more variables that indicates the high correlation between those features, it can be either positive or negative correlation.

pd.plotting.scatter_matrix(df, figsize=(10, 10));

Output:

Data Visualization Guide for Multi-dimensional Data (3)

Inference: From the above plot, we can say that this plot alone is quite descriptive as it is showing us the linear relationship between all the variables in the dataset. For example, we can see that skin Thickness and BMI is sharing linear tendencies.

Note: Due to the big names of columns, we are facing a bit issue while reading the same though that can be improved (out of the scope of the article).

df2 = df.dropna()colors = df2["Outcome"].map(lambda x: "#44d9ff" if x else "#f95b4a")pd.plotting.scatter_matrix(df2, figsize=(10,10), color=colors);

Output:

Data Visualization Guide for Multi-dimensional Data (4)

Inference: The scatter plot gives us both the histograms for the distributions along the diagonaland also a lot of 2D scatter plots off-diagonal. Not that this is a symmetric matrix, so I just look at the diagonal and below it normally. We can see that some variables have a lot of scattering, and some are correlated (ie, there is a direction in their scatter). This leads us to another type of plot i.e., correlation plot.

Correlation Plots

Before going into a deep discussion with the correlation plot, we first need to understand the correlation and for that reason, we are using the pandas’ corr() method that will return the Pearson’s correlation coefficient between two data inputs. In a nutshell, these plots easily quantify which variables or attributes are correlated with each other.

df.corr()

Output:

Data Visualization Guide for Multi-dimensional Data (5)
sb.set(rc={'figure.figsize':(11,6)})sb.heatmap(df.corr());

Output:

Data Visualization Guide for Multi-dimensional Data (6)

Inference: In a seaborn or matplotlib supported correlation plot, we can compare the higher and lower correlation between the variables using its color palette and scale. In the above graph, the lighter the color, the more the correlation and vice versa. There are some drawbacks in this plot which we will get rid of in the very next graph.

sb.heatmap(df.corr(), annot=True, cmap="viridis", fmt="0.2f");

Output:

Data Visualization Guide for Multi-dimensional Data (7)

Inference: Now one can see this is a symmetric matrix too. But it immediately allows us to point out the most correlated and anti-correlated attributes. Some might just be common sense – Pregnancies v Age for example – but some might give us a real insight into the data.

Here, we have also used some parameters like annot= True so that we can see correlated values and some formatting as well.

2D Histograms

2D Histograms are mainly used for image processing, showing the intensities of pixels at a certain position of the image. Similarly, we can also use it for other problem statements, where we need to analyze two or more variables as two-dimensional or three-dimensional histograms, which provide multi-dimensional Data.

For the rest of this section, we’re going to use a different dataset with more data.

Note: 2-D histograms are very useful when you have a lot of data. See here for the API.

df2 = pd.read_csv("height_weight.csv")df2.info()df2.describe()

Output:

Data Visualization Guide for Multi-dimensional Data (8)
plt.hist2d(df2["height"], df2["weight"], bins=20, cmap="magma")plt.xlabel("Height")plt.ylabel("Weight");

Output:

Data Visualization Guide for Multi-dimensional Data (9)

Inference: We have also worked with one-dimensional Histograms for multi-dimensional Data, but that is for univariate analysis now if we want to get the data distribution of more than one feature then we have to shift our focus to 2-D Histograms. In the above 2-D graph height and weight is plotted against each other, keeping the C-MAP as magma.

Contour plots

Bit hard to get information from the 2D histogram, isn’t it? Too much noise in the image. What if we try and contour diagram? We’ll have to bin the data ourselves.

Every alternative comes into the picture when the original one has some drawbacks, Similarly, in the case with 2-D histograms it becomes a bit hard to get the information from it as there is soo much noise in the graph. Hence now, we will go with a contour plot

Here is the resource that can help you deep dive, into this plot. The contour API is here.

hist, x_edge, y_edge = np.histogram2d(df2["height"], df2["weight"], bins=20)x_center = 0.5 * (x_edge[1:] + x_edge[:-1])y_center = 0.5 * (y_edge[1:] + y_edge[:-1])
plt.contour(x_center, y_center, hist, levels=4)plt.xlabel("Height")plt.ylabel("Weight");

Output:

Data Visualization Guide for Multi-dimensional Data (10)

Inference: Now we can see that this contour plot which is way better than a complex and noisy 2-D Histogram as it shows the clear distribution between height and weight simultaneously. There is still room for improvement. If we will use the KDE plot from seaborn, then the same contours will be smoothened and more clearly informative.

Conclusion

From the very beginning of the article, we are primarily focussing on data visualization for the multi-dimensional data, and in this journey, we got through all the important graphs/plots that could derive business-related insights from the numeric data from multiple features all at once. In the last section, we will cover all these graphs in a nutshell.

  1. Firstly we got introduced to a scatter matrix that shows us the relationship of every variable with the other one. Then using seaborn heat map is used to get a better approach to multivariable analysis.
  2. Then came the 2-D histograms, where we can go with binary variable analysis, i.e., 2 variables can be simultaneously seen, and we can get insights from them.
  3. At last, we got to know about the Contour plot, which helped us to get a better version of 2-D histograms as it removes the noise from the image and has a more clear interpretation.

Here’s the repo link to this article. I hope you liked my article on the Data visualization guide for multi-dimensional data. If you have any opinions or questions, then comment below.

Connect with me on LinkedIn for further discussion.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Data Visualization Guide for Multi-dimensional Data (2024)

FAQs

How do you visualize multidimensional data? ›

There are two categories of Multidimensional Visualizations. The first looks at category proportions, or category counts. The second examines the relationships between the variables. Examples of visualizations that show category proportions or counts: pie chart, Wordles, bar chart, histogram, rank plot, tree map.

What are the visualization tools for multivariate data? ›

In essence, multivariate data consists of several variables or qualities, each with its own set of values, creating a multi-dimensional puzzle. Data scientists use a variety of visual tools, such as scatter plots, heatmaps, and parallel coordinate graphs, among others, to uncover hidden gems.

Which AI visual is used to see data in multiple dimensions? ›

The decomposition tree visual lets you visualize data across multiple dimensions. It automatically aggregates data and enables drilling down into your dimensions in any order.

How to visualize data with 3 variables? ›

A level plot colors a grid spanned by two variables by the color of a third variable. The term heat map is also used, in particular with a specific color scheme. But heat map often means a more complex visualization with an image plot at its core. Superimposing contours on a level plot is often helpful.

How do you visualize a large data set? ›

Plots. Plots are useful when identifying and comparing the relationships between two or more large datasets. Scatter plots use dots to show the value of various numeric variables in a bar graph. The way the plots group together can identify patterns and trends within a dataset.

How to display multivariate data? ›

Such data are easy to visualize using 2D scatter plots, bivariate histograms, boxplots, etc. It's also possible to visualize trivariate data with 3D scatter plots, or 2D scatter plots with a third variable encoded with, for example color.

What is the easiest data visualization tool to use? ›

Some of the best data visualization tools include Google Charts, Tableau, Grafana, Chartist, FusionCharts, Datawrapper, Infogram, and ChartBlocks etc. These tools support a variety of visual styles, be simple and easy to use, and be capable of handling a large volume of data.

Which is the best technique if data has many dimensions? ›

3. Parallel Coordinates. Parallel coordinates are a common way of visualizing high-dimensional data. Each feature is represented as a vertical axis, and each data point is represented as a line that intersects each axis at the corresponding feature value.

How to represent 4 dimensional data? ›

For many kinds of four dimensional data, you can use color to represent the fourth dimension. This works well if you have a function of three variables. For example, represent highway deaths in the United States as a function of longitude, latitude, and if the location is rural or urban.

What are the techniques to visualize high-dimensional data? ›

Visualizing high-dimensional data is a crucial skill in data science and analytics. Techniques like PCA, t-SNE, UMAP, parallel coordinates, and heatmaps provide powerful tools to uncover patterns, relationships, and insights in complex datasets.

How do you display a multidimensional array? ›

To declare a multi-dimensional array, define the variable type, specify the name of the array followed by square brackets which specify how many elements the main array has, followed by another set of square brackets which indicates how many elements the sub-arrays have: string letters[2][4];

How is multidimensional data represented? ›

A multidimensional data model is organized around a central theme, for example, sales. This theme is represented by a fact table. Facts are numerical measures. The fact table contains the names of the facts or measures of the related dimensional tables.

How do you visualize dense data? ›

A transparency value between 90 percent and 99 percent works best for visualizing most high-density data. You can highlight a specific layer in a map with multiple layers by making it 100 percent opaque (no transparency) and adding transparency to the other layers.

References

Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated:

Views: 6333

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.