Data sources and acknowledgments

Comprehensive analysis of influenza and SARS-CoV-2 is made possible thanks to worldwide networks of data collection, public health surveillance and sharing, including the World Health Organization’s Global Influenza Surveillance and Response System (GISRS), the GISAID Initiative, GenBank, WHO Corona Dashboard, and Our World in Data.

Particular thanks to Rebecca Kondor and John Steel (Centers for Disease Control and Prevention), Nicola Lewis and Ruth Harvey (Worldwide Influenza Centre London), Kanta Subbarao and Ian Barr (Victorian Infectious Diseases Reference Laboratory at the Australian Peter Doherty Institute for Infection and Immunity), Hideki Hasegawa and Shinji Watanabe (Influenza Virus Research Center at the Japan National Institute of Infectious Diseases). The analysis of SARS-CoV-2 includes data from a large number of primary publications.

Computational analysis

Computations are run on high-performance computing infrastructure at the University of Cologne (RRZK), the Icahn School of Medicine at Mount Sinai in New York, and IBM Cloud. Computational methods are described here.

Established by M. Lässig, M. Luksza, M. Meijers, D. Ruchnewitz. Updated every two weeks or when significant new data are received. Design and visualisation by Hector Labs.

Citation: Meijers M, Ruchnewitz D, Eberhardt J, Luksza M, Lässig M. Population immunity predicts evolutionary trajectories of SARS-CoV-2. Cell (2023).

Pathogens: select between 3 lineages of seasonal human influenza (A/H3N2, A/H1N1, and B/Victoria) and SARS-CoV-2.

Tree tab: shows an evolutionary tree of the selected pathogen. Up to 1000 leaves are displayed simultaneously, representing a sample of isolates collected after the date specified. The x coordinate of each node is the collection date of the isolate. The influenza tree combines sequence data from the HA and NA genome segments; "R" indicates branches with NA changes by reassortment. Present and past influenza vaccine strains are marked by “V”.

  • Change the entry “Show strains from” to include older strains.
  • Click on a branch of the tree to display more strains from the corresponding subtree.
  • Click on a node to get information on the corresponding isolate.
  • Color leaves by clade, region, or fitness (see Glossary). Clades with positive fitness are predicted to grow in frequency, clades with negative fitness are predicted to decline. Learn more on basis and limitations of predictions here.
  • Activate "Mutations’’ to display mutations in specific genome regions (influenza: NA and HA genes; SARS-CoV-2: open reading frames).
  • Locate specific strains on the tree by using the search bar at the bottom of the menu. 
  • Change x and y axis to order nodes by time, nucleotide distance, or amino-acid divergence in their x and y coordinates.
  • Change the layout to switch to a fanned display.

Frequencies tab: shows frequency trajectories for a selected time interval. The default view gives inclusive frequencies (values for each clade include its subclades).

  • Change to logarithmic view to focus on small frequencies
  • Click on "Count density" to see the sequence count in the time window used for computation (no shading: > 100, light gray: < 100, dark gray: < 50), which affects the statistical uncertainty of the frequencies. 
  • Stacked view: shows exclusive frequencies (summing up to one).
  • Bottom panels: number of sequence isolates used in the analysis and number of reported cases as a function of time (per day, smoothened).

Map tab: shows exclusive clade frequencies for regions with significant case counts in a selected time window.


Clade: A clade is a set of viral strains that descend from a given mutant strain and share a genetic makeup.

Clade nomenclature: convention used to uniquely identify clades in a given evolutionary tree. Nested characters indicate lineages (i.e., 2a.2a is a subclade of 2a.2), different characters at the same level distinguish disjoint clades (e.g., 2a.2 and 2a.3). Previr uses the conventions adopted by WHO for influenza and by Pangolin for SARS-CoV-2.

Clade frequency: the fraction of globally circulating strains at a given time point that belong to a given clade. Inclusive frequencies count strains in each clade including its nested subclades. Exclusive frequencies count strains in each clade that are not in any named subclade; these frequencies sum up to 1.

Fitness: The absolute fitness of a given viral clade at a given time is the growth rate of the clade-specific case numbers (which is related to the effective reproductive number). The relative fitness of a clade is defined as the difference in growth rate to the total viral population. Positive values of relative fitness signal expected frequency increase, negative values frequency decrease.

Fitness model: delineates input data and computational method to compute fitness estimates. The current version of Previr app focuses on antigenic fitness, the likely prevalent fitness component for influenza and SARS-CoV-2. By this measure, fitter clades have escaped from human population immunity to a higher degree. Learn more on fitness models here.

HA: hemagglutinin, a surface protein of the influenza virus with an important function in cell entry.

NA: neuraminidase, a surface protein of the influenza virus with an important function in cell exit.

Reassortment: a new combination of viral genome segments produced from two parent strains upon co-infection of the same host. In the influenza trees, significant HA-NA reassortments within the same lineage are marked by “R”.

Vaccines: strains recommended by the World Health Organisation as vaccine components. For influenza, these recommendations are updated twice per year; see here.