Vignettes
1. Mapping the neutralization profile of antibodies and sera against HIV envelope
The Bloom lab has developed a pseudotyping-based deep mutational scanning platform that makes it possible to assess the effects of thousands of mutations on properties like antibody neutralization for a diverse array of viral glycoproteins. Radford et. al used this platform to map the neutralization profiles of polyclonal serum samples that are able to neutralize diverse strains of HIV. By characterizing the specificity of these clinically important serum samples against the HIV envelope (Env), they provide a great resource for assessing anti-HIV immune responses and informing prevention strategies.
Structural context is an important component of interpreting this kind of data. By mapping the neutralization profile of various serum samples or antibodies onto a structure of HIV Env, it's possible to visualize a 3D footprint of antibody binding. This type of visualization can help determine if multiple antibodies or serum samples are targeting the same structural epitopes on HIV Env, and therefore can help identify regions of importance for eliciting a broadly neutralizing immune response. Additionally, Radford et. al was able to deconvolve the contribution of multiple epitopes to neutralization by individual polyclonal sera.
dms-viz
is a great tool for analyzing this type of experiment. It integrates the ability to explore the totality of your data through summary metrics and detailed plots, while also showing a representation of this data on the structure of HIV Env. It's trivial to visualize the contribution of multiple epitopes to neutralization using the --condition
feature of dms-viz
.
To check out the data and code for this study, click here. To see how to prepare this kind of data and explore the results of Radford et. al., yourself, check out the tutorial below.
Using dms-viz
You can find the original antibody escape data for this study here. The data on the functional constraints of mutations on HIV Env is here. I've organized this data if you want to follow along here.
configure-dms-viz
is designed to prepare a single dataset at a time. For each of the 7 datasets in this study, the values for each of the command line arguments is described in this datasets.csv
file. Here is an example of a single command for the serum sample IDC508
:
configure-dms-viz format \
--input tests/HIV-Envelope-BF520-DMS/input/IDC508_avg.csv \
--sitemap tests/HIV-Envelope-BF520-DMS/sitemap/sitemap.csv \
--output tests/HIV-Envelope-BF520-DMS/output/IDC508.json \
--name "IDC508" \
--metric "escape_mean" \
--metric-name "Escape" \
--condition "epitope" \
--condition-name "Epitope" \
--join-data tests/HIV-Envelope-BF520-DMS/join-data/functional_effects.csv \
--structure "6UDJ" \
--included-chains "C F M G J P" \
--excluded-chains "B L R A Q K" \
--tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}" \
--filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
--title "IDC508"
configure-dms-viz format \
--input tests/HIV-Envelope-BF520-DMS/input/IDC508_avg.csv \
--sitemap tests/HIV-Envelope-BF520-DMS/sitemap/sitemap.csv \
--output tests/HIV-Envelope-BF520-DMS/output/IDC508.json \
--name "IDC508" \
--metric "escape_mean" \
--metric-name "Escape" \
--condition "epitope" \
--condition-name "Epitope" \
--join-data tests/HIV-Envelope-BF520-DMS/join-data/functional_effects.csv \
--structure "6UDJ" \
--included-chains "C F M G J P" \
--excluded-chains "B L R A Q K" \
--tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}" \
--filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
--title "IDC508"
This results in an output .json
file that can be visualized in the dms-viz
right away. However, if you want to visualize all 7 experiments together, it's possible to combine them together into a single .json
file using the configure-dms-viz join
command like so:
configure-dms-viz join
--input tests/HIV-Envelope-BF520-DMS/input/IDC508_avg.json, tests/HIV-Envelope-BF520-DMS/input/IDC513_avg.json, tests/HIV-Envelope-BF520-DMS/input/IDC561_avg.json, ...
--output tests/HIV-Envelope-BF520-DMS/output/HIV-Envelope-BF520-DMS.json
--description tests/HIV-Envelope-BF520-DMS/README.md
configure-dms-viz join
--input tests/HIV-Envelope-BF520-DMS/input/IDC508_avg.json, tests/HIV-Envelope-BF520-DMS/input/IDC513_avg.json, tests/HIV-Envelope-BF520-DMS/input/IDC561_avg.json, ...
--output tests/HIV-Envelope-BF520-DMS/output/HIV-Envelope-BF520-DMS.json
--description tests/HIV-Envelope-BF520-DMS/README.md
Which results in the .json
specification located here. You can visualize this with dms-viz
below, or you can click here to see the visualization on a separate page.
2. Inferring the fitness landscape of the SARS-CoV-2 proteome from phylogenetic data
The scale of genomic sequencing surveillance of SARS-CoV-2 has led to the public availability of millions of SARS-CoV-2 sequences. Bloom and Neher developed an approach that leverages this massive amount of sequencing data to estimate the fitness effects of mutations in all SARS-CoV-2 proteins. Their approach works by computing the expected count of each mutation under neutral selection and comparing this count to the observed count of mutations along the phylogeny. The result is an estimate of fitness that is very helpful for understanding the evolutionary constraint on the SARS-CoV-2 proteome. This kind of data is particularly useful for assessing the constraint on possible therapeutic targets that are untractable targets for deep mutational scanning.
Structure-guided design of anti-viral therapeutics is a promising approach to developing effective drugs against SARS-CoV-2. It's a major goal of consortiums like the ASAP Discovery Consortium to incorporate evolutionary constraints into the design of therapeutic ligands. The data from Bloom and Neher can be combined with a structural representation of each viral target to show the propensity of the virus to escape in the binding pockets being targeted by medical chemists doing structure-aided design. dms-viz
offers a simple way to visually assess this for a wide range of structure-ligand pairs.
To check out the data and code for this study, click here. To see how to prepare this kind of data and explore the results of Bloom and Neher yourself, check out the tutorial below.
Using dms-viz
You can find the original mutation fitness data for this study here. I've organized this data if you want to follow along here.
configure-dms-viz
is designed to prepare a single dataset at a time. For each of the 23 SARS-CoV-2 proteins in this study, the values for each of the command line arguments are described in this datasets.csv
file. Here is an example of a single command for the Spike protein:
configure-dms-viz format
--input tests/SARS2-Mutation-Fitness/input/S_fitness.csv
--sitemap tests/SARS2-Mutation-Fitness/sitemap/S_sitemap.csv
--output tests/SARS2-Mutation-Fitness/sitemap/S.json
--name "S"
--metric "fitness"
--metric-name "Fitness"
--structure "6VYB"
--included-chains "polymer"
--tooltip-cols "{'expected_count': 'Expected Count'}"
--filter-cols "{'expected_count': 'Expected Count'}"
--filter-limits "{'expected_count': [0, 100]}"
--title "S"
--alphabet "RKHDEQNSTYWFAILMVGPC*"
--exclude-amino-acids "*"
--description "The Spike Glycoprotein. The Structure is has one RBD in the up position. [Structure: 6VYB]"
configure-dms-viz format
--input tests/SARS2-Mutation-Fitness/input/S_fitness.csv
--sitemap tests/SARS2-Mutation-Fitness/sitemap/S_sitemap.csv
--output tests/SARS2-Mutation-Fitness/sitemap/S.json
--name "S"
--metric "fitness"
--metric-name "Fitness"
--structure "6VYB"
--included-chains "polymer"
--tooltip-cols "{'expected_count': 'Expected Count'}"
--filter-cols "{'expected_count': 'Expected Count'}"
--filter-limits "{'expected_count': [0, 100]}"
--title "S"
--alphabet "RKHDEQNSTYWFAILMVGPC*"
--exclude-amino-acids "*"
--description "The Spike Glycoprotein. The Structure is has one RBD in the up position. [Structure: 6VYB]"
This results in an output .json
file that can be visualized in the dms-viz
right away. However, if you want to visualize all 23 experiments together, it's possible to combine them into a single .json
file using the configure-dms-viz join
command described in the example above. This results in the .json
specification located here. You can visualize this with dms-viz
below, or you can click here to see the visualization on a separate page.
3. Exploring the functional constraint and evolutionary potential of the influenza A polymerase PB1 subunit
The influenza RNA-dependent RNA polymerase (RdRp) is a key determinant of zoonosis for novel influenza viruses. However, little is known about the evolutionary potential and effect of mutations on influenza RdRp function. Li et. al., set out to change this by measuring the effect of thousands of mutations on the replicative fitness of influenza RdRp by performing deep mutational scanning on the PB1 subunit of the A/WSN/1933(H1N1) strain. Li et. al., provide a comprehensive map of PB1 mutation fitness that serves as a helpful resource for those interested in understanding influenza replication.
dms-viz
provides a great platform to share deep mutational scanning data as a resource. It offers stable links that contain information about the parameters selected in the visualization, making it possible to highlight and share specific findings. Also, since the influenza RdRp is a heterotrimer of which PB1 is only a single subunit, dms-viz
provides a flexible way to represent a highlight specific subunits of the structure.
To see how to prepare this kind of data and explore the results of Li et. al., yourself, check out the tutorial below.
Using dms-viz
You can find the original mutation fitness data for this study here. I've organized this data if you want to follow along here. I did a little bit of pre-processing on this data in Python to meet the data requirements:
# Import and format the data from the supplement
fitness_df = pd.read_csv("../data/supplemental-data.csv")
# Drop the amplicon column
fitness_df.drop('amplicon', axis=1, inplace=True)
# Replace the three letter code with one letter code
fitness_df['wildtype'] = fitness_df['wildtype'].replace(AA_DICT)
fitness_df['substitution'] = fitness_df['substitution'].replace(AA_DICT)
# Rename the columns
fitness_df.rename(columns={'substitution': 'mutant'}, inplace=True)
# Save the output data
fitness_df.to_csv("../data/fitness.csv", index=False)
# Import and format the data from the supplement
fitness_df = pd.read_csv("../data/supplemental-data.csv")
# Drop the amplicon column
fitness_df.drop('amplicon', axis=1, inplace=True)
# Replace the three letter code with one letter code
fitness_df['wildtype'] = fitness_df['wildtype'].replace(AA_DICT)
fitness_df['substitution'] = fitness_df['substitution'].replace(AA_DICT)
# Rename the columns
fitness_df.rename(columns={'substitution': 'mutant'}, inplace=True)
# Save the output data
fitness_df.to_csv("../data/fitness.csv", index=False)
The values for each of the command line arguments are described in this datasets.csv
file. Here is the resulting command:
configure-dms-viz format \
--input tests/IAV-PB1-DMS/input/pb1_fitness.csv \
--sitemap tests/IAV-PB1-DMS/sitemap/sitemap.csv \
--output tests/IAV-PB1-DMS/sitemap/pb1.json \
--name "IAV PB1" \
--metric "fitness" \
--metric-name "Replicative Fitness" \
--structure "7NHX" \
--included-chains "B" \
--title "IAV PB1 Deep Mutational Scan" \
--description "Deep mutational scan of influenza virus A/WSN/1933(H1N1) PB1 RdRp subunit"
configure-dms-viz format \
--input tests/IAV-PB1-DMS/input/pb1_fitness.csv \
--sitemap tests/IAV-PB1-DMS/sitemap/sitemap.csv \
--output tests/IAV-PB1-DMS/sitemap/pb1.json \
--name "IAV PB1" \
--metric "fitness" \
--metric-name "Replicative Fitness" \
--structure "7NHX" \
--included-chains "B" \
--title "IAV PB1 Deep Mutational Scan" \
--description "Deep mutational scan of influenza virus A/WSN/1933(H1N1) PB1 RdRp subunit"
This results in an output .json
file that can be visualized in the dms-viz
right away. You can visualize this with dms-viz
below, or you can click here to see the visualization on a separate page.