Command Line API
You'll need to use the command line tool configure-dms-viz
to prepare your data for dms-viz
. Follow the instructions in Getting Started to install configure-dms-viz
on your operating system.
Basic Usage
configure-dms-viz
is a command-line tool designed to create a .json
format specification file for dms-viz
. You provide the data that you'd like to visualize along with additional information to customize the analysis. The resulting specification file can be uploaded to dms-viz
for interactive visualization of your data. Below is an overview of the process of using configure-dms-viz
.
configure-dms-viz
has two commands; format
and join
. To format your data, you execute the configure-dms-viz format
command with the required and optional arguments as needed:
configure-dms-viz format \
--name <experiment_name> \
--input <input_csv> \
--metric <metric_column> \
--structure <pdb_structure> \
--output <output_json> \
[optional_arguments]
configure-dms-viz format \
--name <experiment_name> \
--input <input_csv> \
--metric <metric_column> \
--structure <pdb_structure> \
--output <output_json> \
[optional_arguments]
This creates a single dataset that can be loaded into dms-viz
. However, in some cases, you might want to visualize multiple datasets simultaneously. To do this, you use the configure-dms-viz join
command. The join
command takes a list of formatted .json
files and combines them into a single .json
specification file containing each dataset. Optionally, you can add a markdown description of your joined datasets by specifying the path to a .md
file with your desired description:
configure-dms-viz join \
--input <input_jsons> \
--output <output_json> \
--description <markdown_description>
configure-dms-viz join \
--input <input_jsons> \
--output <output_json> \
--description <markdown_description>
Advanced Usage
This is the most basic usage of configure-dms-viz
; however, configure-dms-viz
is a flexible formatting tool that provides many options for customizing your analysis. In addition to the description of the command line API below, we'll detail some highlights of the customization available through configure-dms-viz
.
Custom Filters
configure-dms-viz
allows you to specify quantitative columns in your input data to use as dynamic filters in dms-viz
. The columns you specify will populate sliders in the sidebar under "Filters
". By dragging the slider, you filter out the mutations or sites in the visualization with values less than the selected value for the column you specify.
To add filters with configure-dms-viz
, specify quantitative columns using the --filter-cols
flag by providing a dictionary that establishes your chosen columns and the name that will appear in the visualization (i.e. "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}"
). In this example, the columns that are used as filters are effect
and times_seen
in the input data, and the names that will label the filters are Functional Effect
and Times Seen
.
In addition to specifying filters, you can set their default value and limits with the --filter-limits
flag by providing a dictionary formatted like so: "{'effect': [min, value, max], 'times_seen': [min, value, max]}"
. You can only specify the min and max (i.e. [min, max]
), but it's highly recommended that you set a default value for the filter that makes sense for your data.
Check out vignette #2 in the Vignettes for an example visualization that uses filters.
Custom Tooltips
In a similar process to adding custom filters, configure-dms-viz
allows you to specify columns to include as tooltips. Tooltips are shown for each mutation in your dataset and will appear when you center your mouse over a mutation in the heatmap plot on the left of the visualization.
Use the --tooltip-cols
flag to specify columns that should provide information through tooltips by providing a dictionary like so: "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"
, where the key is the column's name and the value is the label as it should appear in the tooltip.
configure-dms-viz format
This subcommand formats your data for dms-viz
. Below is a description of each argument.
--input
<string>
Path to a
.csv
file with site- and mutation-level data to visualize with a protein structure. See details here for the required columns and format.--name
<string>
Name of the experiment/selection for the tool. For example, the antibody name or serum ID. This property is necessary for combining multiple experiments into a single file.
--sitemap
<string>
Path to a
.csv
file containing a map between reference sites in the experiment and sequential sites. See details here for the required columns and format.--metric
<string>
Name of the column that contains the value to visualize with the protein structure. This tells the tool which column you want to visualize on a protein structure.
--structure
<string>
Either an RSCB PDB ID if using a structure that can be fetched directly from the PDB (i.e.
"6xr8"
). Or, a path to a locally downloaded PDB file (i.e../pdb/my_custom_structure.pdb
).--output
<string>
Path to save the
\*.json
file containing the data for the visualization tool.--condition
<string>
If there are multiple measurements for each mutation, the name of the column that contains the condition distinguishing these measurements.
--metric-name
<string>
The name that will show up for your metric in the plot. This lets you customize the names of your columns in your visualization. For example, if your metric column is called
escape_mean
you can rename it toEscape
for the visualization.--condition_name
<string>
The name that will show up for your condition column in the title of the plot legend. For example, if your condition column is 'epitope', you might rename it to be capitalized as 'Epitope' in the legend title.
--join-data
<list>
A comma-separated list of
.csv
files with data to join to the visualization data. This data can then be used in the visualization tooltips or filters. See details here for formatting requirements.--tooltip-cols
<dict>
A dictionary that establishes the columns that you want to show up in the tooltip in the visualization (i.e.
"{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"
).--filter-cols
<dict>
A dictionary that establishes the columns that you want to use as filters in the visualization (i.e.
"{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}"
).--filter-limits
<dict>
A dictionary that establishes the range and default value for each filter (i.e.
"{'effect': [min, value, max], 'times_seen': [min, value, max]}"
). Optionally, you can only specify the min and max (i.e.[min, max]
), but it's highly recommended that if you have a filter, you should set a default value that's interpretable.--heatmap-limits
<list>
A list that is either 1, 2, or 3 values long that sets either the center, the minimum and maximum, or the minimum, center and maximum of the heatmap's color scale depending on the number of values provided. The format of the list is a string where each of the values is separated by a comma (i.e.
-1, 0, 1
). If only a single value is provided, the center of the heatmap color scale is set based on the provided value. If two values are provided, the range of the scale is set based on the values. If three values are provided, the range and center of the scale are set. Importantly, the values set here only affect the heatmap's color scale, not the protein's color scale; the color scale of the protein will remain symmetric around 0.--included-chains
<string>
A space-delimited string of chain names that correspond to the chains in your PDB structure that correspond to the reference sites in your data (i.e.,
'C F M G J P'
). This is only necessary if your PDB structure contains chains that lack site- and mutation-level measurements.--excluded-chains
<string>
A space-delimited string of chain names that should not be shown on the protein structure (i.e.,
'B L R'
).--alphabet
<string>
A string with no spaces containing all the amino acids in your experiment and their desired order (i.e.
"RKHDEQNSTYWFAILMVGPC-*"
).--colors
<list>
A comma-separated list (with no spaces) of HEX format colors for representing different conditions, i.e.
"#0072B2,#CC79A7,#4C3549,#009E73"
.--negative-colors
<list>
A comma-separated list (with no spaces) of HEX format colors for representing the negative end of the scale for different conditions, i.e.
"#0072B2,#CC79A7,#4C3549,#009E73"
. If not provided, the inverse of each color is automatically calculated.--check-pdb
<bool>
Whether to perform checks on the provided PDB structure including checking if the 'included chains' are present, what % of data sites are missing, and what % of wildtype residues in the data match at corresponding sites in the structure.
--exclude-amino-acids
<list>
A comma-separated list of amino acids that shouldn't be used to calculate the summary statistics (i.e.
"\*, -"
)--description
<string>
A short description of the dataset that shows up in the tool if the user clicks a button for more information.
--title
<string>
A short title to appear above the plot.
--floor
<bool>
Set the default of whether the data will be floored at 0. Takes a boolean, either
True
orFalse
.--summary-stat
<string>
Set the default summary statistic to either
min
,max
,mean
,median
, orsum
.
configure-dms-viz join
This subcommand joins multiple formatted .json
datasets into one that you can then visualize with dms-viz
. Below is a description of each argument.
WARNING
Make sure that you're joining files with unique values for the dataset name.
--input
<list>
A comma-separated list of paths to the
.json
visualization files created byconfigure-dms-viz format
. I.e.--input path/to/my/specification_1.json, path/to/my/specification_2.json, path/to/my/specification_3.json
--output
<string>
Path to save the joined
\*.json
file for the visualization tool.--description
<string>
Path to a
markdown
file describing your dataset.
JSON Schema
The output of the command line tool is a JSON specification file. The schema of the specification file is detailed below.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"mut_metric_df": {
"type": "array",
"items": {
"type": "object"
}
},
"sitemap": {
"type": "object"
},
"metric_col": {
"type": "string"
},
"condition_col": {
"type": ["string", "null"]
},
"conditions": {
"type": "array",
"items": {
"type": "string"
}
},
"condition_colors": {
"type": "object"
},
"negative_condition_colors": {
"type": ["object", "null"]
},
"alphabet": {
"type": "array",
"items": {
"type": "string"
}
},
"pdb": {
"type": "string"
},
"dataChains": {
"type": "array",
"items": {
"type": "string"
}
},
"excludeChains": {
"type": "array",
"items": {
"type": "string"
}
},
"filter_cols": {
"type": ["object", "null"]
},
"filter_limits": {
"type": ["object", "null"]
},
"heatmap_limits": {
"type": ["array", "null"],
"items": {
"type": "number"
}
},
"tooltip_cols": {
"type": ["object", "null"]
},
"excludedAminoAcids": {
"type": ["array", "null"],
"items": {
"type": "string"
}
},
"description": {
"type": ["string", "null"]
},
"title": {
"type": ["string", "null"]
},
"floor": {
"type": ["boolean", "null"]
},
"summary_stat": {
"type": ["string", "null"],
"enum": ["sum", "mean", "median", "max", "min"]
}
},
"required": ["mut_metric_df", "sitemap", "metric_col", "conditions", "condition_colors", "alphabet", "pdb", "dataChains", "excludeChains"]
}
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"mut_metric_df": {
"type": "array",
"items": {
"type": "object"
}
},
"sitemap": {
"type": "object"
},
"metric_col": {
"type": "string"
},
"condition_col": {
"type": ["string", "null"]
},
"conditions": {
"type": "array",
"items": {
"type": "string"
}
},
"condition_colors": {
"type": "object"
},
"negative_condition_colors": {
"type": ["object", "null"]
},
"alphabet": {
"type": "array",
"items": {
"type": "string"
}
},
"pdb": {
"type": "string"
},
"dataChains": {
"type": "array",
"items": {
"type": "string"
}
},
"excludeChains": {
"type": "array",
"items": {
"type": "string"
}
},
"filter_cols": {
"type": ["object", "null"]
},
"filter_limits": {
"type": ["object", "null"]
},
"heatmap_limits": {
"type": ["array", "null"],
"items": {
"type": "number"
}
},
"tooltip_cols": {
"type": ["object", "null"]
},
"excludedAminoAcids": {
"type": ["array", "null"],
"items": {
"type": "string"
}
},
"description": {
"type": ["string", "null"]
},
"title": {
"type": ["string", "null"]
},
"floor": {
"type": ["boolean", "null"]
},
"summary_stat": {
"type": ["string", "null"],
"enum": ["sum", "mean", "median", "max", "min"]
}
},
"required": ["mut_metric_df", "sitemap", "metric_col", "conditions", "condition_colors", "alphabet", "pdb", "dataChains", "excludeChains"]
}