Racial and ethnic classification in research and medicine is rooted in a centuries-long practice of categorizing humans into distinct groups to justify colonization, slavery, and genocide. The legacy of these harms to communities lingers today, as the U.S. still uses six distinct categories in racial and ethnic classification for official reporting. Increasingly, people identify with more than one (or none) of these categories, which presents a dilemma for biomedical research and clinical trials that incorrectly assume they can be used to approximate genetic populations. Just as the human genome’s complexity necessitates an updated reference, the spectrum of human diversity requires a dynamic and inclusive model to account for social, cultural, geographic, and genomic populations. Categorical and free-text data from different studies can be combined to reveal complexity in human populations that racialized frameworks have erased. For example, geographic origin(s), language(s), and genomic variation can be combined with demographic data to reveal trends in relationships among variables across cohorts. When aggregated with additional data streams, these data will offer more inclusive, respectful, and granular descriptions of study populations than categorical frameworks. This has the potential to improve the utility of existing free-text data on study populations that have been traditionally excluded from genomic analyses and clinical risk evaluation. Adoption of an open-ended model for self-identification may also influence the way science and society consider population diversity, enhancing precision, inclusiveness, and respect for individual and community identities.
Learning Objectives:
1. Discuss the conceptual limitations of the term, "population" as applied to human genomics.
2. Explain the main difference(s) between "race", "ethnicity", and "ancestry" and how they are conflated with human genetic background.
3. Describe the overall approach to sampling and representation in the 1000 Genomes Project.