Are you smarter than a design algorithm?
In the Programmatic Genomics Laboratory (PGL), we use deep learning models to predict characteristics of DNA sequences, and then use design methods to minimally edit sequences such that the models predict our desired characteristics. Sometimes, the edits chosen by these procedures are surprising.
Try it yourself to build your intuition: click any nucleotide below to cycle through A → C → G → T and watch the "neural network" evaluate your design in real time. Right-click any edited nucleotide to restore it to its original base, or use the "reset" button to undo all edits. Note: for now, there is no neural network, only a PWM scan.
Objective: Maximize the score by editing the sequence to contain GATA1 binding motifs
(MA0035.2). The score of each motif is calculated from a position weight matrix (PWM) derived from
JASPAR2026, with only matches with high enough scores counting as hits. But, avoid using too many edits! Each edit subtracts
a point from your score, so you will need to carefully choose which ones you want to make. Motif hits will be underlined, with
high information content positions being underlined in pink and low information content positions (usually the flanks) being
underlined in gray.
What is the highest score you can get with one GATA motif? What is the highest score you can get with one edit? Did you find any unexpected ways to make the score high?