Scatterplot



A scatterplot, scatter diagram or scatter graph is a chart that uses Cartesian coordinates to display values for two variables. The data is displayed as a collection of points, each having one coordinate on the horizontal axis and one on the vertical axis.

A scatterplot does not specify dependent or independent variables. Either type of variable can be plotted on either axis. Scatterplots represent the association (not causation) between two variables.

A scatterplot can show various kinds of relationships, including positive (rising), negative (falling), and no relationship. If the pattern of dots slopes from lower left to upper right, it suggests a positive correlation between the variables being studied. If the pattern of dots slopes from upper left to lower right, it suggests a negative correlation. A line of best fit can be drawn in order to study the correlation between the variables. An equation for the line of best fit can be computed using the method of linear regression.

One of the most powerful aspects of a scatterplot, however, is its ability to show nonlinear relationships between variables. A simply polynomial relationship between variables is obvious in a scatterplot even if there is the correlation between variables is zero. Furthermore, if the data is represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns. The diagram at right is a good illustration of this phenomenon.

For example, to display values for "lung capacity" (first variable) and how long that person could hold his breath (second variable), a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold his breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis. A person with a lung capacity of 400 cc who held his breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point (400, 21.7) in the Cartesian coordinates. The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set, and help to determine what kind of relationship there might be between the two variables.

The scatter diagram is one of the basic tools of quality control, which include the histogram, Pareto chart, check sheet, control chart, cause-and-effect diagram and flowchart.