Cielom naseho posledneho cvicenia v R bude doplnit si vizualizacnne schopnosti o zaklady prace s geografickymi udajmi a mapami a skombinovat ich s vizualizaciou siete formou grafu, co ste zvladli pred tyzdnom. Vynimocne budeme tentokrat pracovat v skupinkach, ako ste rozdeleni na semestralny projekt. Budeme vizualizovat siet adries spamu formou grafu s geografickym substratom, cize pozadim grafu nam bude mapa sveta. Datami budu e-maily, vo forme v akej putuju po sieti protokolom SMTP. Cast hlavicky takeho mailu moze vyzerat napr. takto: *** BEGIN PRIKLAD E-MAILU *** Received: from 200-102-109-107.cslce201.dial.brasiltelecom.net.br (200-102-109-107.cslce201.dial.brasiltelecom.net.br [200.102.109.107] (may be forged)) by relay10.cites.uiuc.edu (8.14.2/8.14.2) with ESMTP id n9A10KTV029630 for ; Fri, 9 Oct 2009 20:01:28 -0500 (CDT) Message-Id: <200910100101.n9A10KTV029630@relay10.cites.uiuc.edu> From: "Harris Delamora" To: m-lexa@illinois.edu Subject: Natural or silicon? MIME-Version: 1.0 Content-type: text/html; charset=iso-8859-1 *** END PRIKLAD E-MAILU *** Zdrojom velkeho poctu e-mailov bude obsah schranky mojho fakultneho protispamoveho filtra /mnt/lexa/mail/mailbox.spam, vizualizacny postup, ktory vytvorime mozete ale pouzit na lubovolnej sade mailov. Trochu narocnejsia je priprava dat, vacsinu prace som za Vas urobil mimo R pomocou programov sed, grep a skriptov v jazyku Perl. Prvym krokom je extrahovat zo sady e-mailov adresy pocitacov uvedene v riadkoch From: a To:. To zabezpecilo nasledovne volanie sed-u: cat /mnt/lexa/mail/mailbox.spam | grep "^From\:\|^To\:" | grep "\@" | sed -e 's/\([^ <]*\)\@\([^ >]*\).*/###\1\@\2/' | sed -e 's/\:.*###/\: /' | grep -v VIAGRA |./fromto.pl | grep " " Vysledkom sed-u je nasledovny zoznam (ak si to chcete zopakovat mate k dispozicii subor temp, na ktory treba najlepsie pod Linuxom zavolat: cat temp | ./fromto.pl | grep " ") : From: ljelleen_1994@exclaimtv.com To: m-lexa@illinois.edu From: llafba_1951@exclaimtv.com To: m-lexa@uiuc.edu From: disruptingi71@rogersandrogers.com To: m-leroy@uiuc.edu ... ktory dalej spracuva skript v Perl-e tak, aby na kazdom riadku bola prva adresa From: a za nou vsetky adresy To:, asi takto: ljelleen_1994@exclaimtv.com m-lexa@illinois.edu llafba_1951@exclaimtv.com m-lexa@uiuc.edu disruptingi71@rogersandrogers.com m-leroy@uiuc.edu varianty skriptu fromto.pl (fromto_ip.pl a fromto_location.pl) vratia podobny subor, ale s IP adresami alebo zemepisnou polohou daneho pocitaca. Zabezpecuju to kratke skripty getip.pl a getlocation.pl (skripty su v ISe) 74.220.207.75 128.174.4.87 74.220.207.75 128.174.4.87 206.80.137.76 128.174.4.87 a 40.3402 -111.6073 40.1111 -88.2008 40.3402 -111.6073 40.1111 -88.2008 40.7052 -79.9196 40.1111 -88.2008 V tejto chvili sa dostavame k samotnej praci s datami v R. Zoznamime sa najprv s balickami maps. library(maps) map("world") Z tejto mapy je mozne vytlacit lubovolnu cast podla nazvu map("world","Hungary") alebo trochu zlozitejsie map("world",c("Hungary","Poland","Czechoslovakia","Germany","Ausria")) s pridanim farieb map("world",c("Hungary","Poland","Czechoslovakia","Germany","Austria"), fill=TRUE, bg="grey", col = c("pink","orange","yellow","green","brown")) tato mapa je automaticky v suradniciach zemepisnej dlzky a sirky. Z Wikipedie si zistime, ze suradnice Prahy su 50 05' 14 25', v desatinnych cislach to bude 50.083 a 14.416. Vykreslime si teda do obrazku Prahu. points(14.416,50.083) text(14.416,50.083, labels=c("PRAHA"), pos=1) Vasou ulohou v cviceni je vizualizovat drahy spamu od zdroja k adresatovi.