## Abstract

## Keywords

^{1}

^{, }

^{2}

Series of steps to be taken to carry out a particular task or calculation.Algorithm: |

The geometric center of a two-dimensional area.Centroid: |

A type of algorithm used to conduct cluster analysis, also known as iterative partitioning. Under centroid-based clustering, an iterative process is used after the researcher initially determines the number of clusters to be constructed and defines an initial set of centroids. K-means clustering is one example of centroid-based clustering.Centroid-Based Clustering: |

Set of algorithms or methods used to group a set of observations or cases into a set of clusters (categories, groups, trees, structures) where the cases in a given cluster are similar to one another and different from cases in other clusters with respect to some meaningful and predetermined set of characteristics or attributes.Cluster Analysis: |

A type of algorithm used to conduct cluster analysis. Under connectivity-based clustering, clusters include data points that are all “connected” with one another based on having a sufficiently high degree of similarity with one another. It is also referred to as hierarchical clustering.Connectivity-Based Clustering: |

A mathematical term that in the context of principal components analysis (or factor analysis or reduced rank regression) represents the total amount of variance in a set of input variables that is captured by a given principal component. Eigenvalues are standardized so that their average value is 1 and the sum of eigenvalues for a set of principal components is equal to the number of principal components created.Eigenvalue: |

A mathematical term that in the context of principal components analysis (or factor analysis or reduced rank regression) represents a set of “factor loadings” associated with a given principal component (factor). An eigenvector contains a separate factor loading for each input variable, and the factor loading represents the weight or importance of the input variable for that principal component (factor); input variables with the largest factor loadings (in absolute value) influence the principal component most strongly.Eigenvector: |

Set of procedures used to summarize the information contained in a group of input variables with a smaller number of variables, referred to as factors. Factor analysis results in the creation of eigenvalues and eigenvectors that can be used to construct the factors. Factor analysis is closely related to principal components analysis.Factor Analysis: |

Weight given to a particular input variable in constructing a principal component or factor. Input variables with positive factor loadings are positively correlated with the principal component; those with negative values are negatively correlated with the principal component. Factor loadings for each of the input variables are contained in the eigenvector of a particular principal component or factor.Factor Loading: |

A type of algorithm used to conduct cluster analysis. Under hierarchical clustering, clusters include data points that are all “connected” with one another based on having a sufficiently high degree of similarity with one another. It is also referred to as connectivity-based clustering.Hierarchical Clustering: |

A type of algorithm used to conduct cluster analysis, also known as centroid-based clustering. Under this algorithm, an iterative process is used after the researcher initially determines the number of clusters to be constructed and defines an initial set of centroids. K-means clustering is one example of iterative partitioning.Iterative Partitioning: |

An algorithm used to conduct cluster analysis. Under k-means clustering, the researcher determines the number of clusters to be constructed (where k is equal to this number), and also chooses a starting point for each of the clusters to be created by selecting an initial set of k centroids. The algorithm then proceeds to group observations being analyzed into clusters based on their distance to the nearest centroid.k-Means Clustering: |

A mathematical technique involving repeatedly conducting a specific analysis based on artificially constructed random samples, allowing researchers to determine the expected result of the analysis through simulation when it is not possible to determine the expected result theoretically.Monte Carlo Analysis: |

Variable that is produced by principal components analysis and summarizes the information contained in a larger number of input variables. Each principal component produced by principal components analysis is a weighted average of the input variables included in the analysis.Principal Component: |

Set of procedures used to summarize the information contained in a group of input variables with a smaller number of variables, referred to as principal components. Principal components analysis results in the creation of eigenvalues and eigenvectors that can be used to construct the principal components. Principal components analysis is closely related to factor analysis.Principal Components Analysis: |

Set of procedures also known as maximum redundancy analysis that is used to summarize the information contained in a group of input variables with a smaller number of variables, referred to as factors or principal components. Reduced rank regression is closely related to principal components analysis and factor analysis, except that it derives factors by accounting for as much variation as possible in researcher-determined response variable(s).Reduced Rank Regression: |

Graphical representation of the eigenvalues resulting from principal components (or related) analysis. Each eigenvalue is represented sequentially on the x axis of the plot, with the height of the plot showing its value. A scree plot is used by researchers in a scree test to determine the number of principal components to retain from the analysis.Scree plot: |

Test used in principal components (and related) analysis to determine the number of principal components to retain from the analysis. In a scree test, a researcher examines a scree plot to identify the point at which there is a large drop-off in eigenvalues (that is, where there is a large drop-off in the height of the plot). Principal components with larger eigenvalues before (to the left of) the drop-off are retained; others are dropped.Scree test: |

^{4}

## Understanding and Performing the Techniques

### Principal Components Analysis and Factor Analysis

#### Performing Principal Components Analysis

Principal component | Eigenvalue | Percentage of variance explained | Cumulative percentage of variance explained |
---|---|---|---|

1 | 2.0 | 25.0 | 25.0 |

2 | 1.9 | 23.8 | 48.8 |

3 | 1.8 | 22.5 | 71.3 |

4 | 1.1 | 13.8 | 85.0 |

5 | 0.7 | 8.8 | 93.8 |

6 | 0.3 | 3.8 | 97.5 |

7 | 0.1 | 1.3 | 98.8 |

8 | 0.1 | 1.3 | 100.0 |

^{4}

^{7}

Input variables | PC 1 (fruits and vegetables) | PC 2 (meats) | PC 3 (sweets) |
---|---|---|---|

Fruits | 0.537 | 0.015 | −0.322 |

Vegetables | 0.611 | 0.189 | −0.077 |

Dairy, high fat | −0.103 | 0.183 | 0.488 |

Dairy, low fat | 0.401 | −0.129 | −0.007 |

Meats | 0.094 | 0.727 | 0.140 |

Grains, whole | 0.233 | 0.144 | −0.048 |

Grains, processed | −0.175 | 0.443 | 0.259 |

Sweets | 0.131 | 0.182 | 0.649 |

^{4}

^{9}

^{, }For example, one might do this by examining the role of dietary patterns in mediating the relationships between various individual characteristics and cardiovascular disease. Finally, the principal components might be used as control variables to adjust for potential confounders in a regression model. In the case of the work done by Nettleton and colleagues, this would mean that a construct like an individual’s dietary patterns could be represented in a regression model by just four covariates rather than the original 47 input variables.

#### Performing Factor Analysis

^{5}

^{12}

^{13}

Suhr DD. Principle Component Analysis vs Exploratory Factor Analysis. Proceedings of the SAS Users Group International 30 (SUGI 30), Philadelphia, PA, 2005. http://www2.sas.com/proceedings/sugi30/203-30.pdf. Accessed February 25, 2014.

### Reduced Rank Regression

#### Performing Reduced Rank Regression

^{15}

- Fialkowski M.K.
- McCrory M.A.
- Roberts S.M.
- Tracy J.K.
- Grattan L.M.
- Boushey C.J.

*Public Health Nutr.*2012; 15: 1948-1958

Food group | PC 1 (vegetarian and grains) | PC 2 (healthy) | PC 3 (sweet drinks) | % of Variance explained |
---|---|---|---|---|

Fish (other than salmon) | −0.21 | 5.4 | ||

Alcohol | −0.66 | 56.8 | ||

Salmon | −0.28 | 9.8 | ||

Sweetened drinks | 0.30 | −0.41 | 0.23 | 44.0 |

Unsweetened drinks | 0.21 | 7.1 | ||

Butter | −0.21 | 8.5 | ||

Fruit juices | −0.21 | 7.7 | ||

Fruit | 0.35 | 23.5 | ||

Legumes, beans, soybeans | 0.24 | 0.34 | 28.7 | |

Tomato (including juice) | 0.24 | 9.1 | ||

Nuts, seeds, peanut butter | 0.29 | 18.1 | ||

Vegetables | 0.29 | 14.1 | ||

Unsweetened cereals | 0.29 | 15.4 | ||

Refined grains | −0.23 | 9.6 | ||

Pasta | 0.29 | 11.6 | ||

Red meat | −0.38 | −0.23 | 24.0 | |

Processed meats | −0.21 | 6.8 | ||

Eggs | −0.33 | 13.8 | ||

% Variance explained | 52.4 | 24.1 | 5.9 | Sum=82.3 |

^{15}

- Fialkowski M.K.
- McCrory M.A.
- Roberts S.M.
- Tracy J.K.
- Grattan L.M.
- Boushey C.J.

*Public Health Nutr.*2012; 15: 1948-1958

^{15}

- Fialkowski M.K.
- McCrory M.A.
- Roberts S.M.
- Tracy J.K.
- Grattan L.M.
- Boushey C.J.

*Public Health Nutr.*2012; 15: 1948-1958

^{15}

- Fialkowski M.K.
- McCrory M.A.
- Roberts S.M.
- Tracy J.K.
- Grattan L.M.
- Boushey C.J.

*Public Health Nutr.*2012; 15: 1948-1958

### Cluster Analysis

^{5}

#### Performing Cluster Analysis

- •specifying a measure of similarity; that is, how the similarity between different sample members will be measured;
- •selecting a cluster analysis algorithm or method to be used;
- •determining the number of clusters to be formed; and
- •validating the cluster solution.

^{17}

^{5}

^{17}

^{16}

^{17}

Calcium intake from different food/beverage sources | Sweet-drink permissive parents | Dedicated-milk providers/ drinkers | Water regulars | P value (ANOVA) |
---|---|---|---|---|

←mean (mg/day)→ | ||||

Early adolescents | ||||

All food sources | 873 | 1,273 | 1,001 | <0.0001 |

Dairy foods | 515 | 919 | 630 | <0.0001 |

Milk | 351 | 711 | 474 | <0.0001 |

Parents | ||||

All food sources | 744 | 1055 | 446 | <0.0001 |

Dairy foods | 350 | 663 | 361 | <0.0001 |

Milk | 205 | 452 | 248 | <0.0001 |

## Conclusions

## Acknowledgements

## References

- Publishing nutrition research: A review of multivariate techniques—Part 1.
*J Am Diet Assoc.*2011; 111: 103-110 - Publishing nutrition research: A review of multivariate techniques—Part 2: Analysis of variance.
*J Acad Nutr Diet.*2012; 112: 90-98 - Cluster Analysis.Sage Publications, Thousand Oaks, CA1984
- Associations between markers of subclinical atherosclerosis and dietary patterns derived by principal components analysis and reduced rank regression in the Multi-Ethnic Study of Atherosclerosis (MESA).
*Am J Clin Nutr.*2007; 85: 1615-1625 - Comparing 3 dietary pattern methods—cluster analysis, factor analysis, and index analysis—with colorectal cancer risk.
*Am J Epidemiol.*2009; 171: 479-487 - Development and reliability testing for measures of psychosocial constructs associated with adolescent girls' calcium intake.
*J Am Diet Assoc.*2008; 108: 857-861 - The association between food patterns and the metabolic syndrome using principal components analysis: The ATTICA Study.
*J Am Diet Assoc.*2007; 107: 979-987 - Short-term stability of dietary patterns defined a priori or a posterior.
*Maturitas.*2011; 68: 272-278 - Advances in basic behavioral research will make the most important contributions to effective dietary change programs at this time.
*J Am Diet Assoc.*2006; 106: 808-811 - The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.
*J Pers Soc Psychol.*1986; 51: 1173-1182 - A dietary behaviors measure for use with low-income, Spanish-speaking Caribbean Latinos with type 2 diabetes: The Latino Dietary Behaviors Questionnaire.
*J Am Diet Assoc.*2011; 111: 589-599 - The Essentials of Factor Analysis.Continuum International Publishing Group, London and New York2006
Suhr DD. Principle Component Analysis vs Exploratory Factor Analysis. Proceedings of the SAS Users Group International 30 (SUGI 30), Philadelphia, PA, 2005. http://www2.sas.com/proceedings/sugi30/203-30.pdf. Accessed February 25, 2014.

- Application of a new statistical method to derive dietary patterns in nutritional epidemiology.
*Am J Epidemiol.*2004; 159: 935-944 - Dietary patterns are associated with dietary recommendations but have limited relationship to BMI in the Communities Advancing the Studies of Tribal Nations Across the Lifespan (CoASTAL) cohort.
*Public Health Nutr.*2012; 15: 1948-1958 - Associations between food patterns defined by cluster analysis and colorectal cancer incidence in the NIH-AARP diet and health study.
*Eur J Clin Nutr.*2009; 63: 707-717 - Parent calcium-rich-food practices/perceptions are associated with calcium intake among parents and their early adolescent children.
*Public Health Nutr.*2012; 15: 331-340

## Biography

## Article Info

### Publication History

### Footnotes

**STATEMENT OF POTENTIAL CONFLICT OF INTEREST** No potential conflict of interest was reported by the authors.

**FUNDING/SUPPORT** There is no funding to disclose.