Comparative Genomics Tutorial - WordPress

Transcription

BacterialComparativeGenomicsTutorial- ‐Version2Version 1 of this tutorial accompanied the paper:Beginner’s guide to comparative bacterial genome analysis using nextgeneration sequence dataDavid J. Edwards, Kathryn E. HoltBMC Microbial Informatics, om/articles/10.1186/2042-5783-3-2)Last updated May 2017BacterialComparativeGenomicsTutorial- �Visualizingreference- mparativeGenomicsTutorialp1

minaHiSeqpaired- ‐endreadsfromE.coliO104:H4strainTY- ://www.ebi.ac.uk/ena/data/view/SRR292770&display html.Locatethe‘Fastqfiles(ftp)’columnandright- kas seehttp://en.wikipedia.org/wiki/FASTQ eads(named‘SRR292770 1.fastq.gz’and‘SRR292770 hasuitablename,e.g.‘comparison com/en/download/faq/java d,opentheprogramtobegin.Then:1. Toselectthefilesequencetocheck,use'File theTY- ‐2482readsandselectthe'SRR292770 cetheanalysis.ComparativeGenomicsTutorialp2

2. beimprovedbyComparativeGenomicsTutorialp3

lpackagesuchasthecommandlinetoolsFASTX- ‐Toolkit(http://hannonlab.cshl.edu/fastx s/index.php?page ofqualitycontrol,anditspit- tbeforehand,use'File Savereport es/.Compatibility:Requiresa64- sare2.4,2.5,2.6,2.7,3.2,3.3,3.4and3.5)tobepre- nstoSingle- 2,19(5):455- umberofSPAdesreferences ecutableinthecommandbelow.1. tainingtheSRR292770readsfiles:cd comparison tut2. changing‘- ‐t1’to‘- python spades.py -o spades assembly -1 SRR292770 1.fastq.gz -2SRR292770 2.fastq.gz --careful -t ng3threads; 2¾hoursusingtheone.ComparativeGenomicsTutorialp4

3. meit,thenalsocopytheassemblygraph:cp spades assembly/contigs.fasta SRR292770 unordered.fastacp spades assembly/assembly graph.fastg .Youcanthendeletetheoutputfolder‘spades tputsbeforedoingso.Aswehaveonlyusedthedefaultk- ‐mervalues(k- ‐mer bechangedtoexaminehowthechoiceofk- projects - buildsrequirea64- leBandageona32- 1531(20):3350- erk- wiki/Effect- ‐of- ‐kmer- ‐size.ComparativeGenomicsTutorialp5

ntk- 5(6): (NCBIaccessionNC 011748),aclosely- orialp6

ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old refseq/Bacteria/Escherichia coli 55989 uid59383/DownloadthesequenceinFASTAformat,NC 011748.fna(right- esusingpre- cbi.nlm.nih.gov/Traces/wgs/?val AFVS01&display contigs&page thenright- rencegenomeandcontigs,wecanorderthecontigs.1. LaunchtheMauveapplication.2. FromtheToolsmenu,select‘MoveContigs’.3. �OK’.4. enhit‘OK’todismissit.5. dSequence nthiscase‘NC 011748.fna’.6. Clickthe‘AddSequence syouwishtoalign,‘SRR292770 ialp7

7. orMacOSX;evenupto16iterationsandabitmoretimeona32- erightfilesforinput–theyshouldbeFASTAormulti- ‐FASTAsequencefiles.8. spectthefinalalignment(andtheothers)beforehand.9. stheiterationnumber.Rename‘SRR292770 92770 X’folders.ComparativeGenomicsTutorialp8

ouseacommand- Abacas[version1.3.1]tothe texerciseto‘SRR292770 unordered.fasta’first):abacas.1.3.1.pl –r NC 011748.fasta -q SRR292770 unordered.fasta –p‘nucmer’ –c –m –b –o lti- ntigs,weprovidetwoGUI- inTY- herehttps://www.ncbi.nlm.nih.gov/Traces/wgs/?val AFVS01&display contigs&page tructions:1. LaunchtheMauveapplication2. FromtheFilemenu,select‘AlignwithprogressiveMauve ’3. ence 0.fasta’.4. Clickthe‘AddSequence iveassembly,‘AFVR01.fasta’.Ifyouprovideamulti- GenomicsTutorialp9

5. xamplewewilljustaddtheEAECgenomeEc55989.6. ed‘ utputfile(e.g.‘mauve output’),andclick‘Save’.7. Click‘Align houldallbeFASTAormulti- Genbankformat(toprovideanannotation).8. lappear.Tosimplifytheimagealittle,selectView- ‐ Style- ‐ is:ComparativeGenomicsTutorialp10

Row1 O104orderedcontigs.Row2 alternativeassemblyRow3 byselectingTools- ‐ Export- ‐ Exportimage /www.webact.org/)orDoubleACT(http://www.hpa- ‐bioinfotools.org.uk/pise/double actv2.html),seesteps1- ‐2below.ComparativeGenomicsTutorialp11

. tep1,2.3.1).2. tep2,2.3.1)3. Viewthecomparison(s)inACT.a. LaunchtheACTapplicationb. SelectFile- ‐ Openc. willbedisplayed.Click‘morefiles equencefile.d. Clickthe‘Choose rcomparisonfile.Notethatyoucanloadinyourmulti- iveGenomicsTutorialp12

eGenomicsTutorialp13

/wikis/How To Score Genome Assemblies with contigsfile(multi- blyMetricsintheWindows- ��drag- ‐and- H4strain2011C- ‐3493(NCBIaccessionNC 018658.1;downloadNC ive/old refseq/Bacteria/Escherichia coliO104 H4 2011C 3493 RR292770unorderedcontigstothenewreference,2011C- ‐3493.ComparativeGenomicsTutorialp14

011C- berunasacommand- advantageofthecommand- ialp15

/1471- ‐2164- ‐9- ‐75.Inputs:Orderedcontigsfile(multi- redcontigsoftheE.coliO104:H4straininmulti- t.nmpdr.org/inawebbrowserandlogintoyouraccount.2. ploadNewJob’.3. nection.4. helast,withthesub- ieldsforyou,exceptthelast,thestain.Enter‘TY- my).5. Thenextpageshouldhavesub- information(SequencingMethod ‘other’,Coverage ‘ 8x’,Numberofcontigs “101- his:ComparativeGenomicsTutorialp16

ependingonthenumberofjobsinthequeuebeforeyou.6. YourJobs’tab.ComparativeGenomicsTutorialp17

7. essbars).8. .Thefilewillbecall‘562. job no. .ec- ofcommand- itusesamuch- 4strainTY- thatfollows.ComparativeGenomicsTutorialp18

adyusedabove)EAECstr.Ec55989(NC 011748)–downloadNC rchive/old refseq/Bacteria/Escherichia coli 55989 uid59383/EHECO157:H7str.EDL933(NC 002655)–downloadNC ve/old refseq/Bacteria/Escherichia coli O157 H7 EDL933 willalsousethesegenomes: EPECO26:H11str.11368- ‐NC ve/old refseq/Bacteria/Escherichia coli O26 H11 11368 uid41021/ EPECO127:H6str.E2348/69- ‐NC ve/old refseq/Bacteria/Escherichia coli O127 H6 E2348 69 uid59343/ aEPECO111:H9str.E110019- s/wgs/?val nkdirectlyintoyourbrowser] EHECO157:H7str.TW14359- ‐NC ve/old refseq/Bacteria/Escherichia coli O157 H7 TW14359 uid59235/ EHECO157:H7str.EC4115- ‐NC ve/old refseq/Bacteria/Escherichia coli O157 H7 EC4115 uid59091/ EHECO157:H7str.Sakai- ‐NC ve/old refseq/Bacteria/Escherichia coli O157 H7 uid57781/ Stx(shiga- etocutandpastethislinkdirectlyintoyourbrowser] omicsTutorialp19

nt”.PLoSOne,2010.5(6): ctions1. LaunchtheMauveapplication2. FromtheFilemenu,select‘AlignwithprogressiveMauve ’3. ence eratedbyRAST).4. Clickthe‘AddSequence theEHECO157:H7strainEDL933(NC 002655.fna).Ifyouprovideamulti- thesetogetherbeforerunningthealignment.5. xamplewewilljustaddtheEAECgenomeEc55989.6. ed‘ utputfile(e.g.‘mauve torialp20

7. Click‘Align Aormulti- nomicsTutorialp21

8. lappear.Tosimplifytheimagealittle,selectView- ‐ Style- ‐ is:Row1 annotatedO104genome.Row2 EHECgenomeRow3 ingbyselectingTools- ‐ Export- ‐ Exportimage 9. NoticetheEHECgenomehasmore‘

Version 1 of this tutorial accompanied the paper: ! Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data . related strain with a complete genome, available for download fromNCBI. Go to thisl