February 21, 2021

Plots/Graphs in Data Science/Data Analysis

Which plots/graphs to use in Data Science/Data Analysis ?


Line Charts/Plot - When we have a time scale/trend along the horizontal axis, line plot is best. Can be used to show different categories of data. We should use line plot when you chart a continuous data set. Also used with one categorical variable and one numeric variable.

Bar Charts/Graphs   - Suits for one categorical & one/more quantitative values. To visualize discrete/range values bar charts are best. X axis shows categorical variable & Y axis shows count/proportion associated with variable.

- Stacked bar chart => used when there are more than one categorical variables. To compare many different items and show composition of each item being compared. - Grouped bar chart => for two quantitative and one categorical variable, to present different sub-groups among main categories. - Column chart => laid out vertically

Dual Axes Charts - Suits for 2 quantitative and 1 categorical variable, where ranges of the quantitative variables differ a lot. A dual axis chart allows you to plot data using two y-axes and a shared x-axis. This can be a bar chart or scatter plot.

- Thermometer Graph => both quantitative variables are represented as bar chats, with different widths.

Histograms - Used to summarise quantitative values, of single attribute; Histograms show distribution of variables, whereas bar charts are used for comparison. To visualize continuous values histograms are good. Frequency histograms plot quantitative data with ranges of data grouped into bins, which are shown on x-axis.


Scatter Plots  - Suits to plot two quantitative/numeric variables. To visualize correlation betwen two numeric columns/dimensions, scatter plots are ideal.


Bubble Chart - A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.

Strip Plot - A scatterplot where one variable is categorical. Can be used in conjunction with other plots to show each observation.

(Bee) swarm plot - Categorical scatterplot with non-overlapping points.


Jitter Plot - A type of point plot/scatter plot used to avoid over plotting, especially for categorical variables.

Pie Charts  - Suits for summarising categorical variables. Best summarises the share of different components in an aggregate whole. Better to keep number of pies to less than 8.


Tree Map  - Suits for one/more categorical & two quantitative values, colour intensity is directly proportional to the values of the measure. Among two measures, one will control size and one will control colour.


Box Plots - To study the distributional characteristics of a variable, to show overall patterns. A box plot is suitable to represent the quartile, percentile and outliers values. Box plots are extremely useful to understand data distribution or spread of data.

Outliers are the data points which have absurdly high or low values, compared to the values of the rest of the observations. Whiskers denoting the lowest and the highest values in the variable. Boxplots are great when you have a numeric column that you want to compare across different categories. A traditional box-and-whisker plot with a similar API. A good way to visualise quartiles is to use a box plot. Flattened box plots (in Tableau), is showed when box plots are based on a single mark. Box plots are intended to show a distribution of data, and that can be difficult when data is aggregated, as in the current view.

Violin Plot - A combination of boxplot and kernel density estimation.


Factor Plot - Combine categorical plots.


Heat Map - A heat map shows the relationship between two items and provides rating information, such as high to low or poor to excellent. The rating information is displayed using varying colors or saturation.


Area Maps -  to visualise has geographical location data.


Calendar Map - 

Arc Plot - 


Matrix Plot -



Circos Plot -


Correlation MatrixTo show correlation between variables. The top-right half represents the correlation coefficient and the left bottom half has the scatter plot between the 2 variables.


Related Articles:  Machine Learning Online Quiz

1 comment:

  1. Nice information on various plots in Data Science.

    ReplyDelete